unofficial-cohere-best-practices

📁 rshvr/unofficial-cohere-best-practices 📅 13 days ago
1
总安装量
1
周安装量
#49881
全站排名
安装命令
npx skills add https://github.com/rshvr/unofficial-cohere-best-practices --skill unofficial-cohere-best-practices

Agent 安装分布

claude-code 1

Skill 文档

Unofficial Cohere Best Practices

This skill provides patterns and code for building with Cohere’s AI models across multiple languages: Python, TypeScript, Java, and Go, plus LangChain/LangGraph integrations.

Prerequisites

Cohere API Key Required – Get your key at https://dashboard.cohere.com

Add to ~/.claude/settings.json:

{
  "env": {
    "CO_API_KEY": "your-api-key",
    "COHERE_API_KEY": "your-api-key"
  }
}

Restart Claude Code after adding your API key.

Quick Reference

Current Models (2025)

Category Model Context Notes
Chat command-a-03-2025 256K Best overall, 111B params
command-r7b-12-2024 128K Small/fast, 7B params
command-r-plus-08-2024 128K Strong reasoning
command-r-08-2024 128K Balanced
Reasoning command-a-reasoning-08-2025 256K Extended thinking with budget_tokens
Vision command-a-vision-07-2025 128K Images + text (charts, OCR, docs)
Translate command-a-translate-08-2025 8K 23 languages
Embed embed-v4.0 128K Multimodal, Matryoshka dims (256-1536)
embed-english-v3.0 512 English, 1024 dims
embed-multilingual-v3.0 512 100+ languages, 1024 dims
Rerank rerank-v4.0-pro 32K Best quality
rerank-v4.0-fast 32K Speed optimized
rerank-v3.5 4K Balanced

Reasoning Model

The command-a-reasoning-08-2025 model supports controllable “thinking” for complex tasks:

response = co.chat(
    model="command-a-reasoning-08-2025",
    messages=[{"role": "user", "content": "Complex problem..."}],
    thinking={"type": "enabled", "budget_tokens": 8000}
)
  • Set budget_tokens to control depth (min 1024, higher = more thorough)
  • Use "type": "disabled" for simple queries (lower latency)

Installation

Python:

pip install cohere                    # Native SDK (v5+)
pip install langchain-cohere          # LangChain integration (v0.5+)
pip install langgraph                 # For agents

TypeScript/JavaScript:

npm install cohere-ai

Java (Maven):

<dependency>
  <groupId>com.cohere</groupId>
  <artifactId>cohere-java</artifactId>
  <version>1.x.x</version>
</dependency>

Go:

go get github.com/cohere-ai/cohere-go/v2

Environment Setup

export CO_API_KEY="your-api-key"      # SDK auto-reads this
export COHERE_API_KEY="your-api-key"  # LangChain uses this

Integration Selection Guide

Use Case Recommended Approach
Simple chat/completion Native SDK ClientV2
Embeddings for vector DB Native SDK or CohereEmbeddings
RAG pipeline LangChain CohereRagRetriever
Tool use / Function calling Native SDK (more control) or LangChain
Multi-step agents LangGraph with create_cohere_react_agent
Simple tool workflows Generic create_react_agent (provider-agnostic)
Reranking search results Native SDK or CohereRerank
Structured JSON output Native SDK with response_format
Image understanding Native SDK with Vision model

Agent Choice: Cohere-Specific vs Generic

  • create_cohere_react_agent: Use for complex multi-step tasks, Cohere-specific features (citations, connectors), better token efficiency on long reasoning chains
  • Generic create_react_agent: Fine for simple tool-calling, consistent cross-provider behavior, when you’re already getting good results

LangChain Model Compatibility

Note: Command A Reasoning and Command A Vision are not yet supported in LangChain. Use the native SDK for these models.

Detailed Documentation

Based on your use case, read the appropriate reference file:

By Language

By Feature

Python Frameworks

Examples

Common Patterns

Native SDK: Basic Chat

import cohere
co = cohere.ClientV2()  # Reads CO_API_KEY from env

response = co.chat(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.message.content[0].text)

Native SDK: Streaming

for event in co.chat_stream(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "Write a poem"}]
):
    if event.type == "content-delta":
        print(event.delta.message.content.text, end="")

Native SDK: Structured JSON Output

response = co.chat(
    model="command-a-03-2025",
    messages=[{"role": "user", "content": "Extract: John is 30 years old"}],
    response_format={
        "type": "json_object",
        "json_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "age": {"type": "integer"}
            },
            "required": ["name", "age"]
        }
    }
)

LangChain: ChatCohere

from langchain_cohere import ChatCohere
from langchain_core.messages import HumanMessage

llm = ChatCohere(model="command-a-03-2025")
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)

LangGraph: ReAct Agent

from langchain_cohere import ChatCohere, create_cohere_react_agent
from langchain.agents import AgentExecutor
from langchain_core.prompts import ChatPromptTemplate

llm = ChatCohere()
tools = [your_tools_here]
prompt = ChatPromptTemplate.from_template("{input}")

agent = create_cohere_react_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = executor.invoke({"input": "Your query"})

API v1 vs v2 Notes

Cohere has migrated to API v2. Key differences:

  • Use ClientV2() instead of Client()
  • Tool calls now include IDs (tool_call_id)
  • RAG documents use documents parameter with structured format
  • Streaming events have new types (content-delta, tool-plan-delta, etc.)
  • Tool results must use document format:
messages.append({
    "role": "tool",
    "tool_call_id": tc.id,
    "content": [{"type": "document", "document": {"data": json.dumps(result)}}]
})

Critical Gotchas

1. Embedding Input Types (MUST GET RIGHT)

Cohere uses asymmetric embeddings. Using wrong input_type = poor search results:

# For STORING documents in vector DB
embedding = co.embed(texts=docs, input_type="search_document", ...)

# For QUERYING against stored documents
embedding = co.embed(texts=[query], input_type="search_query", ...)

2. Batch Limits

  • Embeddings: 96 items per request (API hard limit)
  • Rerank: 1,000 documents recommended (10K max but slower)

3. Tool Calling Differences

Cohere’s tool calling format differs from Claude/OpenAI. LangChain/LangGraph handle this, but:

  • Lower temperature (0.3) recommended for predictable tool calls
  • System prompts may need Cohere-specific adjustments
  • Watch for edge cases in complex multi-tool scenarios

4. Two-Stage Retrieval Pattern (Recommended)

# Stage 1: Fast retrieval with embeddings (cast wide net)
candidates = vectorstore.similarity_search(query, k=30)

# Stage 2: Precise reranking (narrow down)
reranked = co.rerank(query=query, documents=candidates, top_n=10)

5. Text Truncation

Long texts get auto-truncated. Consider chunking at ~8000 chars for embeddings.

6. Embed v4 Dimensions

Use output_dimension parameter with Embed v4 for flexible sizing:

co.embed(model="embed-v4.0", texts=texts, input_type="search_document", output_dimension=512)

Troubleshooting

Issue Solution
ModuleNotFoundError: cohere pip install cohere
API key not found Set CO_API_KEY env var or pass api_key param
LangChain import errors Use from langchain_cohere import ... (not langchain_community)
Rate limits Check dashboard for limits, implement exponential backoff
Tool use errors Ensure tool results are JSON strings or list of document objects
Poor search results Verify input_type matches use case (search_document vs search_query)
Agent not using tools correctly Lower temperature to 0.3, check system prompt
Embed failed Check API key, rate limits, batch size <= 96
Reasoning model slow Reduce budget_tokens or use "type": "disabled"
Vision model errors Ensure image is base64 Data URL format

Quick Test Snippets

# Test embedding
import cohere
co = cohere.ClientV2()
resp = co.embed(model="embed-v4.0", texts=["test"], input_type="search_query", embedding_types=["float"])
print(f"Dim: {len(resp.embeddings.float_[0])}")  # 1536

# Test LLM via LangChain
from langchain_cohere import ChatCohere
llm = ChatCohere(model="command-a-03-2025", temperature=0.3)
print(llm.invoke("Say hello").content)