x-algo-ml

📁 cloudai-x/x-algo-skills 📅 Jan 20, 2026

总安装量

周安装量

#28456

全站排名

安装命令

npx skills add https://github.com/cloudai-x/x-algo-skills --skill x-algo-ml

Agent 安装分布

claude-code 8

gemini-cli 7

antigravity 7

windsurf 7

codex 7

opencode 7

Skill 文档

X Algorithm ML Architecture

The X recommendation system uses Phoenix, a transformer-based ML system for predicting user engagement. It operates in two stages: retrieval and ranking.

Two-Stage Pipeline

âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â                           RECOMMENDATION PIPELINE                               â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ¤
â                                                                                 â
â   ââââââââââââ     âââââââââââââââââââââââ     âââââââââââââââââââââââ          â
â   â          â     â                     â     â                     â          â
â   â   User   ââââââ¶â   STAGE 1:          ââââââ¶â   STAGE 2:          ââââââ¶ Feedâ
â   â Request  â     â   RETRIEVAL         â     â   RANKING           â          â
â   â          â     â   (Two-Tower)       â     â   (Transformer)     â          â
â   ââââââââââââ     â                     â     â                     â          â
â                    â   Millions â 1000s  â     â   1000s â Ranked    â          â
â                    âââââââââââââââââââââââ     âââââââââââââââââââââââ          â
â                                                                                 â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ

Stage 1: Retrieval (Two-Tower Model)

Efficiently narrows millions of candidates to thousands using approximate nearest neighbor search.

Architecture

User Tower: Encodes user features + engagement history â normalized embedding [B, D]
Candidate Tower: Pre-computed embeddings for all posts in corpus â [N, D]
Similarity: Dot product between user embedding and candidate embeddings

User Tower          Candidate Tower
    â                     â
    â¼                     â¼
[B, D] user emb    [N, D] all posts
    â                     â
    ââââââ dot product ââââ
              â
              â¼
         Top-K by similarity

Stage 2: Ranking (Transformer with Candidate Isolation)

Scores the retrieved candidates using a transformer that predicts multiple engagement actions.

Model Configuration

# phoenix/recsys_model.py
@dataclass
class PhoenixModelConfig:
    model: TransformerConfig              # Grok-1 based transformer
    emb_size: int                         # Embedding dimension D
    num_actions: int                      # 18 action types
    history_seq_len: int = 128            # User history length
    candidate_seq_len: int = 32           # Candidates per batch
    product_surface_vocab_size: int = 16  # Where post was seen
    hash_config: HashConfig               # Hash embedding config

Input Structure

class RecsysBatch(NamedTuple):
    # User identification
    user_hashes: ArrayLike               # [B, num_user_hashes]

    # User engagement history
    history_post_hashes: ArrayLike       # [B, S, num_item_hashes]
    history_author_hashes: ArrayLike     # [B, S, num_author_hashes]
    history_actions: ArrayLike           # [B, S, num_actions]
    history_product_surface: ArrayLike   # [B, S]

    # Candidates to score
    candidate_post_hashes: ArrayLike     # [B, C, num_item_hashes]
    candidate_author_hashes: ArrayLike   # [B, C, num_author_hashes]
    candidate_product_surface: ArrayLike # [B, C]

Hash-Based Embeddings

Multiple hash functions map IDs to embedding tables:

@dataclass
class HashConfig:
    num_user_hashes: int = 2      # Hash user ID 2 ways
    num_item_hashes: int = 2      # Hash post ID 2 ways
    num_author_hashes: int = 2    # Hash author ID 2 ways

Why hashes?

Fixed memory: No need for individual embeddings per user/post
Handles new entities: Any ID maps to some embedding
Collision averaging: Multiple hashes reduce collision impact

Embedding Combination

Each entity type has a “reduce” function that combines hash embeddings:

# User: Concatenate hash embeddings â project to D
def block_user_reduce(...):
    # [B, num_user_hashes, D] â [B, 1, num_user_hashes * D] â [B, 1, D]
    user_embedding = user_embeddings.reshape((B, 1, num_user_hashes * D))
    user_embedding = jnp.dot(user_embedding, proj_mat_1)  # Project down
    return user_embedding, user_padding_mask

# History: Combine post + author + actions + product_surface
def block_history_reduce(...):
    # Concatenate all features, project to D
    post_author_embedding = jnp.concatenate([
        history_post_embeddings_reshaped,
        history_author_embeddings_reshaped,
        history_actions_embeddings,
        history_product_surface_embeddings,
    ], axis=-1)
    history_embedding = jnp.dot(post_author_embedding, proj_mat_3)
    return history_embedding, history_padding_mask

Transformer Input

Final input is concatenation of:

[User (1)] + [History (S)] + [Candidates (C)]
     â              â               â
     â¼              â¼               â¼
 [B, 1, D]    [B, S, D]       [B, C, D]
         â²       â       â±
          â²      â      â±
            [B, 1+S+C, D]

Attention Masking: Candidate Isolation

Critical design: Candidates cannot attend to each other, only to user + history.

                    ATTENTION MASK
         Keys (what we attend TO)
         ââââââââââââââââââââââââââââââââââââââââââââââ¶

         â User â    History (S)    â   Candidates (C)    â
    ââââââ¼âââââââ¼ââââââââââââââââââââ¼ââââââââââââââââââââââ¤
 Q  â U  â  â   â  â   â   â   â   â  â   â   â   â      â
 u  ââââââ¼âââââââ¼ââââââââââââââââââââ¼ââââââââââââââââââââââ¤
 e  â H  â  â   â  â   â   â   â   â  â   â   â   â      â
 r  â i  â  â   â  â   â   â   â   â  â   â   â   â      â
 i  â s  â  â   â  â   â   â   â   â  â   â   â   â      â
 e  â t  â  â   â  â   â   â   â   â  â   â   â   â      â
 s  ââââââ¼âââââââ¼ââââââââââââââââââââ¼ââââââââââââââââââââââ¤
    â C  â  â   â  â   â   â   â   â  â   â   â   â      â
 â  â a  â  â   â  â   â   â   â   â  â   â   â   â      â
 â  â n  â  â   â  â   â   â   â   â  â   â   â   â      â
 â¼  â d  â  â   â  â   â   â   â   â  â   â   â   â      â
    ââââââ´âââââââ´ââââââââââââââââââââ´ââââââââââââââââââââââ

    â = Can attend          â = Cannot attend (diagonal only for candidates)

Why candidate isolation?

Score for post A shouldn’t depend on whether post B is in the batch
Ensures consistent scoring regardless of batch composition
Enables parallel scoring of candidates

Transformer Forward Pass

def __call__(self, batch, recsys_embeddings) -> RecsysModelOutput:
    # 1. Build combined embeddings
    embeddings, padding_mask, candidate_start = self.build_inputs(batch, recsys_embeddings)

    # 2. Pass through transformer (with candidate isolation mask)
    model_output = self.model(
        embeddings,
        padding_mask,
        candidate_start_offset=candidate_start,  # For attention masking
    )

    # 3. Extract candidate outputs
    out_embeddings = layer_norm(model_output.embeddings)
    candidate_embeddings = out_embeddings[:, candidate_start:, :]

    # 4. Project to action logits
    logits = jnp.dot(candidate_embeddings, unembeddings)
    # Shape: [B, num_candidates, num_actions]

    return RecsysModelOutput(logits=logits)

Output: Multi-Action Prediction

Output Shape: [B, num_candidates, num_actions]
                          â
                          â¼
    âââââââââââââââââââââââââââââââââââââââââââââââ
    â Like â Reply â Retweet â Quote â ... (18)   â
    âââââââââââââââââââââââââââââââââââââââââââââââ

Each output is a log-probability. Convert to probability:

probability = exp(log_prob)

Action Embeddings

History actions are encoded as signed vectors:

def _get_action_embeddings(self, actions):
    # actions: [B, S, num_actions] multi-hot vector
    actions_signed = (2 * actions - 1)  # 0â-1, 1â+1
    action_emb = jnp.dot(actions_signed, action_projection)
    return action_emb

This encodes “did action” (+1) vs “didn’t do action” (-1) for each action type.

Product Surface Embeddings

Where the user engaged (home feed, search, notifications, etc.):

def _single_hot_to_embeddings(self, input, vocab_size, emb_size, name):
    # Standard embedding lookup table
    embedding_table = hk.get_parameter(name, [vocab_size, emb_size])
    input_one_hot = jax.nn.one_hot(input, vocab_size)
    return jnp.dot(input_one_hot, embedding_table)

Model Heritage

The sample transformer implementation is ported from the Grok-1 open source release by xAI. The core transformer architecture comes from Grok-1, adapted for recommendation system use cases with custom input embeddings and attention masking for candidate isolation.

Related Skills

/x-algo-engagement – The 18 action types the model predicts
/x-algo-scoring – How predictions become weighted scores
/x-algo-pipeline – Where ML fits in the full system

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台