ai

📁 hyperb1iss/hyperskills 📅 Jan 27, 2026
4
总安装量
4
周安装量
#49727
全站排名
安装命令
npx skills add https://github.com/hyperb1iss/hyperskills --skill ai

Agent 安装分布

codex 4
claude-code 4
mcpjam 3
kiro-cli 3
windsurf 3
zencoder 3

Skill 文档

AI/ML Engineering

Build production AI systems with modern patterns and tools.

Quick Reference

The 2026 AI Stack

Layer Tool Purpose
Prompting DSPy Programmatic prompt optimization
Orchestration LangGraph Stateful multi-agent workflows
RAG LlamaIndex Document ingestion and retrieval
Vectors Qdrant / Pinecone Embedding storage and search
Evaluation RAGAS RAG quality metrics
Experiment Tracking MLflow / W&B Logging, versioning, comparison
Serving BentoML / vLLM Model deployment
Protocol MCP Tool and context integration

DSPy: Programmatic Prompting

Manual prompts are dead. DSPy treats prompts as optimizable code:

import dspy

class QA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="1-5 words")

# Create module
qa = dspy.Predict(QA)

# Use it
result = qa(question="What is the capital of France?")
print(result.answer)  # "Paris"

Optimize with real data:

from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(metric=exact_match)
optimized_qa = optimizer.compile(qa, trainset=train_data)

RAG Architecture (Production)

Query → Rewrite → Hybrid Retrieval → Rerank → Generate → Cite
         │              │                │
         v              v                v
    Query expansion  Dense + BM25   Cross-encoder

LlamaIndex + LangGraph Pattern:

from llama_index.core import VectorStoreIndex
from langgraph.graph import StateGraph

# Data layer (LlamaIndex)
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine()

# Control layer (LangGraph)
def retrieve(state):
    response = query_engine.query(state["question"])
    return {"context": response.response, "sources": response.source_nodes}

graph = StateGraph(State)
graph.add_node("retrieve", retrieve)
graph.add_node("generate", generate_answer)
graph.add_edge("retrieve", "generate")

MCP Integration

Model Context Protocol is the standard for tool integration:

from mcp import Server, Tool

server = Server("my-tools")

@server.tool()
async def search_docs(query: str) -> str:
    """Search the knowledge base."""
    results = await vector_store.search(query)
    return format_results(results)

Embeddings (2026)

Model Dimensions Best For
text-embedding-3-large 3072 General purpose
BGE-M3 1024 Multilingual RAG
Qwen3-Embedding Flexible Custom domains

Fine-Tuning with LoRA/QLoRA

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
)

model = get_peft_model(base_model, config)
# Train on ~24GB VRAM (QLoRA on RTX 4090)

MLOps Pipeline

# MLflow tracking
mlflow.set_experiment("rag-v2")

with mlflow.start_run():
    mlflow.log_params({"chunk_size": 512, "model": "gpt-4"})
    mlflow.log_metrics({"faithfulness": 0.92, "relevance": 0.88})
    mlflow.log_artifact("prompts/qa.txt")

Evaluation with RAGAS

from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_precision

results = evaluate(
    dataset,
    metrics=[faithfulness, answer_relevancy, context_precision],
)
print(results)  # {'faithfulness': 0.92, 'answer_relevancy': 0.88, ...}

Vector Database Selection

DB Best For Pricing
Qdrant Self-hosted, filtering 1GB free forever
Pinecone Managed, zero-ops Free tier available
Weaviate Knowledge graphs 14-day trial
Milvus Billion-scale Self-hosted

Agents

  • ai-engineer – LLM integration, RAG, MCP, production AI
  • mlops-engineer – Model deployment, monitoring, pipelines
  • data-scientist – Analysis, modeling, experimentation
  • ml-researcher – Cutting-edge architectures, paper implementation
  • cv-engineer – Computer vision, VLMs, image processing

Deep Dives

Examples