ai-integrated-api-backend
npx skills add https://github.com/twofourlabs/agent-docs --skill ai-integrated-api-backend
Agent 安装分布
Skill 文档
<essential_principles>
Core Architecture Principles
1. Multi-Provider Design Philosophy
- Never depend on a single AI provider; always maintain fallback chains
- Design provider-agnostic interfaces that abstract vendor-specific implementations
- Implement graceful degradation when premium models fail (e.g., GPT-4 â GPT-3.5 â Gemini-Flash)
- Use configuration-driven provider selection, not hardcoded logic
- Support multiple providers simultaneously (A/B testing, cost optimization)
2. Separation of Concerns
- Models Layer: Database models for bots, LLM configs, prompts, sessions (Django ORM)
- Services Layer: Provider-specific integrations (OpenAI, Anthropic, Gemini, Bedrock, etc.)
- Helpers Layer: Business logic for prompt rendering, fallback routing, response handling
- Gateway Layer: Central JWT-authenticated service for LLM API access
- API Layer: External API managers for weather, astrology, finance data
3. Configuration Over Code
- Store LLM configurations (provider, model, temperature, tokens) in database
- Enable A/B testing by switching active configurations without code deployments
- Support per-bot customization of AI behavior through metadata fields
- Use
is_activeflag to switch between configurations instantly
4. Prompt Engineering Best Practices
- Template-driven prompts with variable substitution (
{{variable_name}}) - Centralized prompt management separate from business logic
- Support for system-level variables ({{current_date}}, {{current_time}}) and user-level variables ({{user_name}}, {{user_age}})
- YAML-based structured prompts for complex multi-section instructions
- Function mapping for dynamic variable resolution
5. Fallback & Resilience
- Every provider has a fallback chain ending in a reliable universal fallback (e.g., Gemini-Flash)
- Fallbacks inherit system instructions and region configurations
- Support provider-specific fallbacks (e.g., Gemini Pro â Gemini Flash) and cross-provider fallbacks
- Region-aware fallbacks for AWS Bedrock (ap-south-1, us-east-1, etc.)
6. Security & Authentication
- JWT tokens for internal LLM gateway authentication
- Token caching to reduce overhead (cache duration: token expiry – 2 seconds)
- API key management from environment variables
- Service-to-service authentication with signed payloads </essential_principles>
<system_architecture>
High-Level Data Flow
Full Request Flow (External API â AI â Response)
User Request (Chat Message)
â
API Endpoint (validate input, check session)
â
Session Manager (create/resume session, lock pricing)
â
External API Layer (if needed: fetch astrology, weather, etc.)
âââ Cache Check (multi-layer: instance â distributed â database)
âââ External API Call (if cache miss)
âââ Data Transformation Pipeline (validate â normalize â enrich â format)
â
Prompt Builder
âââ Load Bot LLM Config (model, provider, temperature, metadata)
âââ Render Prompt Template (replace {{variables}} with actual values)
âââ Assemble Context (user data + external data + history)
âââ Build Final Prompt (YAML or JSON structured)
â
LLM Gateway Manager
âââ Get/Generate JWT Token (cached)
âââ Build Fallback Chain (provider-specific + universal)
âââ POST to Internal Gateway (/v1/generate/)
â
Internal LLM Gateway (separate service)
âââ Validate JWT
âââ Route to Primary Provider (OpenAI, Anthropic, Gemini, Bedrock, etc.)
âââ On Failure: Try Fallback 1
âââ On Failure: Try Fallback 2
âââ On Failure: Universal Fallback (Gemini-Flash)
â
Response Processing
âââ Parse LLM Response
âââ Validate Output
âââ Store in Database
â
Billing (if session-based)
âââ Calculate Units (tokens, messages, minutes)
âââ Check Wallet Balance
âââ Deduct Amount
âââ Log Transaction
â
Save to Database (conversation history)
â
Return to User (API response)
Multi-Provider Routing with Fallbacks
Request â Load BotLLMConfig â Determine Primary Provider
â
Primary: Gemini 2.5-Pro
âââ Fallback 1: Gemini 2.5-Flash
âââ Fallback 2: Gemini 2.0-Flash
âââ Universal Fallback: Gemini 2.0-Flash
Primary: AWS Bedrock Deepseek-v3 (region: ap-south-1)
âââ Fallback 1: Byteplus Deepseek-v3
âââ Fallback 2: Gemini 2.0-Flash (region-agnostic)
âââ Universal Fallback: Gemini 2.0-Flash
Primary: Byteplus
âââ Fallback 1: Gemini 2.0-Flash
âââ Universal Fallback: Gemini 2.0-Flash
</system_architecture>
<domain_knowledge_index> All implementation details, code examples, and architectural patterns are organized below:
System Architecture:
- Multi-Provider Architecture (see <multi_provider_architecture>)
- Database Schema for Bots, LLM Configs (see <database_schema>)
- LLM Gateway Pattern with JWT (see <gateway_pattern>)
Provider Integration:
- Provider Implementations (OpenAI, Anthropic, Gemini, Bedrock, Byteplus, OpenRouter)
- Fallback Mechanisms & Retry Logic (see <fallback_mechanisms>)
- Region Configuration for AWS Bedrock (see <region_configuration>)
Prompt Management:
- Prompt Templating System with {{variables}} (see <prompt_templating>)
- Prompt Design Patterns (YAML-structured, role-based) (see <prompt_design_patterns>)
- Dynamic Variables & Function Mapping (see <dynamic_variables>)
External API Integration:
- External API Managers (Astrology, Weather, etc.)
- Data Transformation Pipelines (validate â normalize â enrich â format)
- Multi-Layer Caching Strategy (instance â distributed â database)
Production Considerations:
- JWT Authentication & Security (see <authentication_security>)
- Caching Strategies (token cache, response cache)
- Monitoring & Observability (Langfuse integration)
- Error Handling & Graceful Degradation
Code Examples:
- Complete Working Code Samples (see <implementation_examples>) </domain_knowledge_index>
<database_schema>
Django Models for AI-Integrated Backend
Bot Model (Core Entity)
class Bot(models.Model):
uuid = models.UUIDField(default=uuid4, editable=False, unique=True)
user = models.ForeignKey("users.UserProfile", on_delete=models.CASCADE, related_name="bots")
# Identity
name = models.CharField(max_length=255)
slug = models.SlugField(max_length=255, unique=True, blank=True, db_index=True)
description = models.TextField(blank=True)
avatar = models.CharField(max_length=255, null=True, blank=True)
# Status and Type
status = models.CharField(max_length=20, choices=BotStatus.choices, default=BotStatus.DRAFT, db_index=True)
bot_type = models.CharField(max_length=30, choices=BotType.choices, default=BotType.COMPANION)
# Configuration
rank = models.PositiveIntegerField(default=0, db_index=True)
metadata = models.JSONField(default=dict, blank=True) # Store required_fields, pricing, etc.
# Timestamps
launched_on = models.DateTimeField(null=True, blank=True, db_index=True)
created_on = models.DateTimeField(auto_now_add=True)
updated_on = models.DateTimeField(auto_now=True)
deleted_at = models.DateTimeField(null=True, blank=True, db_index=True)
Key Design Decisions:
metadataJSONField for flexible per-bot configuration (required_fields, character_rating, pricing)- Soft deletion with
deleted_attimestamp instead of hard delete slugauto-generated from name for SEO-friendly URLsrankfor custom ordering in UI
BotLLMConfig Model (Multi-Provider Configuration)
class BotLLMConfig(models.Model):
bot = models.ForeignKey(Bot, on_delete=models.CASCADE, related_name="llm_configs")
# LLM Configuration
prompt = models.TextField(blank=True) # Can be template with {{variables}}
model_name = models.CharField(max_length=100) # e.g., "gpt-4", "claude-3-sonnet", "gemini-2.5-pro"
llm_provider = models.CharField(max_length=50, choices=BotLLMProvider.choices) # openai, anthropic, gemini, bedrock, etc.
# Generation Parameters
temperature = models.FloatField(default=0.7)
top_p = models.FloatField(default=0.95)
top_k = models.IntegerField(null=True, blank=True)
max_output_tokens = models.PositiveIntegerField(default=1024)
# Activation
is_active = models.BooleanField(default=False) # Only one active config per bot
# Additional Configuration
metadata = models.JSONField(default=dict, blank=True) # Store region_name, tools, etc.
created_on = models.DateTimeField(auto_now_add=True)
updated_on = models.DateTimeField(auto_now=True)
class Meta:
db_table = "llm_configurations"
ordering = ["-is_active", "created_on"]
def save(self, *args, **kwargs):
# Ensure only one active config per bot
if self.is_active:
BotLLMConfig.objects.filter(bot=self.bot, is_active=True).exclude(pk=self.pk).update(is_active=False)
super().save(*args, **kwargs)
Key Design Decisions:
- One
Botcan have multipleBotLLMConfigentries for A/B testing is_activeflag controls which config is used (only one active per bot)- Save method enforces single active config constraint
metadatastores provider-specific settings (e.g.,region_namefor AWS Bedrock)
BotLLMProvider Choices
class BotLLMProvider(models.TextChoices):
OpenAI = "openai", _("OpenAI")
GEMINI = "gemini", _("Gemini")
ANTHROPIC = "anthropic", _("Anthropic")
GROQ = "groq", _("Groq")
AI_SDK = "ai-sdk", _("AI-SDK")
BEDROCK = "bedrock", _("Bedrock")
BYTEPLUS = "byteplus", _("Byteplus")
OPENROUTER = "openrouter", _("OpenRouter")
Why This Design:
- Supports multiple providers without code changes
- Easy to add new providers (add to choices, implement integration)
- Database-level validation ensures valid provider names </database_schema>
<fallback_mechanisms>
Intelligent Fallback Routing
Core Fallback Pattern (from Production Code)
def get_fallback_routing(provider: str, model_name: str, system_instruction: str, llm_metadata: dict = None):
"""
Ensures system_instruction and region_name are added properly to fallback chain.
Args:
provider: Primary provider (gemini, bedrock, byteplus, etc.)
model_name: Model identifier (gemini-2.5-pro, deepseek.v3-v1:0, etc.)
system_instruction: System prompt to pass to all fallbacks
llm_metadata: Additional metadata including region_name for Bedrock
Returns:
List of fallback configurations, each containing:
- provider: str
- model: str
- retry: int (0 = no retry, >0 = retry count)
- system_instruction: str
- region_name: str (for Bedrock only)
"""
provider = (provider or "").lower()
fallbacks = []
# Provider-specific fallback chains
if provider == "gemini":
if model_name == "gemini-2.5-pro":
fallbacks.append({"provider": "gemini", "model": "gemini-2.5-flash", "retry": 0})
else:
fallbacks.append({"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0})
fallbacks.append({"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0})
elif provider == "byteplus":
fallbacks.append({"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0})
elif provider == "bedrock":
if model_name == "deepseek.v3-v1:0":
fallbacks.append({"provider": "byteplus", "model": "deepseek-v3-1-250821", "retry": 0})
fallbacks.append({"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0})
else:
print(f"[LLM_GATEWAY_CLIENT] â ï¸ No specific fallback defined for provider: {provider}")
# Universal fallback (always added)
fallbacks.append({"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0})
# Enrich all fallbacks with system_instruction and region_name
for fb in fallbacks:
fb["system_instruction"] = system_instruction
if fb["provider"] == "bedrock" and llm_metadata:
fb["region_name"] = llm_metadata.get("region_name", "ap-south-1")
return fallbacks
Fallback Chain Examples
Example 1: Gemini 2.5-Pro Request
fallbacks = get_fallback_routing("gemini", "gemini-2.5-pro", "You are a helpful assistant")
# Result:
[
{"provider": "gemini", "model": "gemini-2.5-flash", "retry": 0, "system_instruction": "..."},
{"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0, "system_instruction": "..."},
{"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0, "system_instruction": "..."}
]
Example 2: AWS Bedrock Deepseek Request
fallbacks = get_fallback_routing(
"bedrock",
"deepseek.v3-v1:0",
"You are an astrologer",
llm_metadata={"region_name": "ap-south-1"}
)
# Result:
[
{"provider": "byteplus", "model": "deepseek-v3-1-250821", "retry": 0, "system_instruction": "..."},
{"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0, "system_instruction": "..."},
{"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0, "system_instruction": "..."}
]
Fallback Execution Flow (Gateway Side)
# Pseudo-code for LLM Gateway
def generate_with_fallbacks(primary_config, fallback_chain):
try:
return call_llm(primary_config)
except Exception as e:
log_error(f"Primary provider failed: {e}")
for fallback in fallback_chain:
try:
return call_llm(fallback)
except Exception as fe:
log_error(f"Fallback {fallback['provider']}/{fallback['model']} failed: {fe}")
continue
# All fallbacks exhausted
raise Exception("All LLM providers failed")
Key Design Principles
- Universal Fallback: Always end with a reliable provider (Gemini-Flash is cheap + fast)
- Same-Provider Fallbacks First: Try cheaper models from same provider before switching
- Cross-Provider Fallbacks: Switch providers only when necessary
- Inherit System Instructions: All fallbacks get the same system prompt
- Region Awareness: AWS Bedrock fallbacks include region configuration
- No Retry in Fallback:
retry: 0means don’t retry within fallback, just move to next </fallback_mechanisms>
<region_configuration>
AWS Bedrock Region-Based Routing
Region Metadata Pattern
# Store region in BotLLMConfig metadata
bot_llm_config = BotLLMConfig.objects.create(
bot=bot,
llm_provider="bedrock",
model_name="deepseek.v3-v1:0",
metadata={
"region_name": "ap-south-1", # AWS region
"additional_config": {...}
}
)
# When building fallback chain
fallbacks = get_fallback_routing(
provider="bedrock",
model_name="deepseek.v3-v1:0",
system_instruction=prompt,
llm_metadata=bot_llm_config.metadata
)
Multi-Region Fallback Strategy
def get_bedrock_region_fallbacks(model_name, regions=["ap-south-1", "us-east-1", "eu-west-1"]):
"""
Create fallback chain across multiple AWS regions.
"""
fallbacks = []
for region in regions:
fallbacks.append({
"provider": "bedrock",
"model": model_name,
"region_name": region,
"retry": 0
})
# Add non-Bedrock fallbacks
fallbacks.append({"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0})
return fallbacks
Region Selection Logic
# Primary region from user location or bot config
primary_region = determine_region_from_user(user) # e.g., "ap-south-1" for India
# Fallback regions based on latency/availability
fallback_regions = ["us-east-1", "eu-west-1"]
# Build full chain
region_chain = [primary_region] + fallback_regions
</region_configuration>
<prompt_templating>
Prompt Template System with Variable Substitution
Core Templating Pattern (from Production Code)
class PromptTemplateHelper:
"""
Helper class for rendering prompt templates by resolving variable placeholders.
"""
@classmethod
def render_template(cls, prompt_text: str, **kwargs) -> str:
"""
Render a prompt template by replacing variable placeholders with actual values.
Example:
prompt = "Hi {{user_name}}, today is {{current_date}}."
result = PromptTemplateHelper.render_template(prompt, user_doc=user_doc)
# Output: "Hi John, today is 15 January 2026."
"""
function_map = cls.get_function_mapping()
# Find all {{variable_name}} in the prompt
variables = re_findall(r"\{\{(.*?)\}\}", prompt_text)
for var in variables:
function = function_map.get(var)
if function:
value = function(**kwargs)
prompt_text = prompt_text.replace(f"{{{{{var}}}}}", str(value or ""))
return prompt_text
Available Template Variables (Production System)
AVAILABLE_VARIABLES = [
# Bot-specific variables
{"name": "{{bot_name}}", "description": "Name of the bot", "category": "Bot Info"},
{"name": "{{bot_description}}", "description": "Bot description", "category": "Bot Info"},
{"name": "{{bot_scene_desc}}", "description": "Bot Scene description", "category": "Bot Info"},
# User interaction variables
{"name": "{{user_name}}", "description": "User name", "category": "User Input"},
{"name": "{{user_age}}", "description": "User age", "category": "User Input"},
{"name": "{{user_gender}}", "description": "User gender", "category": "User Input"},
{"name": "{{user_message}}", "description": "Current user message", "category": "User Input"},
{"name": "{{conversation_history}}", "description": "Conversation History", "category": "User Input"},
{"name": "{{user_astro_data}}", "description": "Astro data for the user", "category": "User Input"},
# System variables
{"name": "{{current_time}}", "description": "Current time", "category": "System"},
{"name": "{{current_date}}", "description": "Current date", "category": "System"},
{"name": "{{day_of_week}}", "description": "Current day of week", "category": "System"},
# Common prompt variables
{"name": "{{random_number_1_4}}", "description": "Generate random number 1-4", "category": "Common"},
]
Function Mapping (Variable Resolution)
@classmethod
def get_function_mapping(cls):
"""Return mapping of variable names to handler functions."""
return {
# Bot-specific variables
"bot_name": cls.function_bot_name,
"bot_description": cls.function_bot_description,
"bot_scene_desc": cls.function_bot_scene_desc,
# User interaction variables
"user_name": cls.function_user_name,
"user_age": cls.function_user_age,
"user_gender": cls.function_user_gender,
"user_message": cls.function_user_message,
"user_astro_data": cls.function_user_astro_data,
# System variables
"current_time": cls.function_current_time,
"current_date": cls.function_current_date,
"day_of_week": cls.function_day_of_week,
"conversation_history": cls.function_conversation_history,
# Common prompt variables
"random_number_1_4": cls.function_random_number_1_4,
}
@classmethod
def function_current_time(cls, **kwargs):
"""Return the current system time."""
return UtilHelper.timezone_to_ist(datetime.now()).strftime("%I:%M %p")
@classmethod
def function_current_date(cls, **kwargs):
"""Return the current system date."""
return UtilHelper.timezone_to_ist(datetime.now()).strftime("%d %B %Y")
@classmethod
def function_user_name(cls, **kwargs):
"""Return user name from user_doc."""
return cls.get_sanitized_user_doc(**kwargs).get("name", "")
Usage Example
# Define template in database or config
prompt_template = """
You are {{bot_name}}, a helpful assistant.
User Information:
- Name: {{user_name}}
- Age: {{user_age}}
- Gender: {{user_gender}}
Current Context:
- Date: {{current_date}}
- Time: {{current_time}}
- Day: {{day_of_week}}
User Message: {{user_message}}
Please respond helpfully and naturally.
"""
# Render with actual values
rendered_prompt = PromptTemplateHelper.render_template(
prompt_template,
bot_doc={"name": "Mira", "description": "Vedic astrologer"},
user_doc={"name": "John", "age": 28, "gender": "male", "messages": "What's my future?"}
)
# Output:
"""
You are Mira, a helpful assistant.
User Information:
- Name: John
- Age: 28
- Gender: male
Current Context:
- Date: 15 January 2026
- Time: 02:45 PM
- Day: Thursday
User Message: What's my future?
Please respond helpfully and naturally.
"""
</prompt_templating>
<prompt_design_patterns>
YAML-Structured Prompt Pattern (Production Example)
Complex Multi-Section Prompt (Astrology Bot)
@classmethod
def get_mira_prompt(cls, **kwargs):
"""
Build YAML-structured prompt for astrology bot.
Required kwargs: bot_doc, language, app_id
Optional: memories, profile_details_1, profile_details_2
"""
bot_doc = kwargs.get("bot_doc")
_prompt = {
"Role": f"You are {bot_doc.get('name')}, an experienced {str.upper(bot_doc.get('sex', ''))} Vedic astrologer with expertise in predictive astrology...",
"Query House Mapping": {
"Marriage": {
"Houses To Analyze": "7, 2, 11, 8",
"Karaka Planets": "Venus, Jupiter",
"Special Checks": "Manglik status, 7th lord placement, Venus strength, Jupiter aspects",
"Timing Triggers": "7th lord dasha, Venus period, Jupiter transit to 7th",
},
"Career": {
"Houses To Analyze": "10, 1, 2, 6, 11",
"Karaka Planets": "Sun, Saturn, Mercury",
"Special Checks": "10th lord strength, Sun placement, D9 10th house",
"Timing Triggers": "10th lord dasha, Saturn period, Sun antardasha",
},
# ... more query types
},
"Analysis Steps": {
"Core": "Understand the question AND the feeling underneath. Map to houses, planets, timing.",
"Analysis": "Houses â Lords â Karakas â Dashas â Divisionals â Yogas â Doshas.",
"Timing": "Find activation windows. Always check next 60 days for shifts.",
"Response Crafting": "Give analysis and response following guidelines below.",
},
"Response Format": {
"Structure": "A single paragraph text based on the instructions",
"Style": "Insightful, Helpful, focussed, non-repetitive",
"Language": cls.get_mira_prompt_language_blob(**kwargs),
"Response Instructions": [
"Check if greeting is necessary, if yes greet with a smile",
"Based on Kundli, make a compliment and find a coincidence with user",
"Check kundli for last 3 months and find something user has faced",
"Provide good news, something to look forward to in next 3 months",
"Answer user query using Vedic astrology principles",
"Provide timing predictions with high/low chance of happening",
"Before closing, send follow-up question based on findings",
],
"Length": cls.get_prompt_response_length_text(**kwargs),
},
"Guidelines": [
"Answer astro questions based on Vedic astrology principles",
"Keep user gender in check at all times",
"Balance traditional predictions with modern life",
"Do not diagnose or prescribe for health",
"Do not suggest life-threatening actions",
"Do not ask to meet or send physical items",
"Do not guarantee real-world outcomes, only state possibilities",
"If credibility questioned, state - I am AI powered but validated by experienced astrologer",
],
"Daily Context": {
"Date": PromptTemplateHelper.function_current_date(),
"Time": PromptTemplateHelper.function_current_time(),
"Day of week": PromptTemplateHelper.function_day_of_week(),
},
}
# Add conditional sections
if memories := kwargs.get("memories"):
_prompt["Past User Memories"] = memories
if profile_details_1 := kwargs.get("profile_details_1"):
_prompt["First Profile Kundli"] = profile_details_1
if profile_details_2 := kwargs.get("profile_details_2"):
_prompt["Second Profile Kundli"] = profile_details_2
# Convert to YAML string
_yaml_prompt = yaml.dump(_prompt, sort_keys=False, default_flow_style=False)
return _yaml_prompt
Why YAML for Complex Prompts?
- Hierarchical Structure: Natural for nested instructions
- Easy to Read: LLMs parse YAML well
- Version Control Friendly: Diff-able, merge-able
- Dynamic: Can add/remove sections programmatically
- Type Safety: Python dict â YAML ensures structure
Language-Specific Prompt Customization
@classmethod
def get_mira_prompt_language_blob(cls, **kwargs):
language_blob = "Modern Hindi language, written in Roman font. Do not use Devanagari font"
blob_map = {
("4", str(Languages.HINDI)): "Modern Hindi language, written in Devanagari font. Do not use english font",
("4", str(Languages.HINGLISH)): "Modern Hindi language, written in Roman font. Do not use Devanagari font",
("4", str(Languages.ENGLISH)): "Modern English language, written in Roman font. Do not use Devanagari font",
}
language = kwargs.get("language")
app_id = kwargs.get("app_id")
if language:
if _blob := blob_map.get((str(app_id), str(language))):
language_blob = _blob
return language_blob
</prompt_design_patterns>
<gateway_pattern>
LLM Gateway with JWT Authentication
Internal LLM Gateway Manager (Production Code)
class InternalLLMGatewayManager:
"""
Manages authentication and request handling for the LLM Gateway.
Provides:
- JWT token generation and caching
- Preconfigured RequestManager for LLM Gateway API
"""
@classmethod
def _get_jwt_token(cls, seconds: int = 3600) -> str:
"""
Retrieve or generate a cached JWT token for LLM Gateway authentication.
Token is cached for (expiry - 2 seconds) to avoid race conditions.
"""
key = CacheKey.LLM_GATEWAY_TOKEN.format(seconds=seconds)
jwt_token = CacheManager.get(key)
if jwt_token is None:
jwt_token = cls._generate_jwt_token(seconds=seconds)
CacheManager.set(key, jwt_token, duration=seconds - 2)
return jwt_token
@classmethod
def _generate_jwt_token(cls, seconds: int = 3600) -> str:
"""
Generate a new JWT token for authenticating with the LLM Gateway.
"""
payload = {
"service": "django-backend",
"exp": timezone.now() + timedelta(seconds=seconds),
}
return jwt_encode(
payload,
settings.LLM_GATEWAY_JWT_KEY,
algorithm=settings.LLM_GATEWAY_JWT_ALGORITHM,
)
@classmethod
def _request_gateway(cls) -> RequestManager:
"""
Create and return a configured RequestManager instance for the LLM Gateway.
Ensures valid JWT token is attached in Authorization header.
"""
jwt_token = cls._get_jwt_token()
if not jwt_token:
raise ValueError("LLM Gateway token is not available or invalid.")
if not settings.LLM_GATEWAY_HOST:
raise ValueError("LLM_GATEWAY_HOST is not defined in settings")
headers = {
"Authorization": f"jwt {jwt_token}",
"Content-Type": "application/json",
"app-identifier": "dev-django" if getattr(settings, "IS_TESTING_SERVER", False) else "prod-django",
}
return RequestManager(
base_url=f"{settings.LLM_GATEWAY_HOST}/",
headers=headers,
)
@classmethod
@observe(as_type="generation") # Langfuse observability
def generate_response(cls, payload):
"""
Send generation request to LLM Gateway.
Payload structure:
{
"provider": "gemini",
"model": "gemini-2.5-pro",
"system_instruction": "You are...",
"messages": [...],
"temperature": 0.7,
"max_tokens": 1024,
"fallbacks": [...],
"metadata": {"region_name": "ap-south-1"}
}
"""
return cls._request_gateway().post("v1/generate/", json=payload)
Gateway Configuration (Django Settings)
# twofourlabs/settings.py
LLM_GATEWAY_JWT_KEY = COMMON_SECRETS.get("llm_gateway_jwt_key", "dummy_key")
LLM_GATEWAY_JWT_ALGORITHM = COMMON_SECRETS.get("llm_gateway_jwt_algorithm", "HS256")
LLM_GATEWAY_HOST = INTERNAL_HOSTS.get("llm_gateway") # e.g., "https://llm-gateway.internal.example.com"
Usage Pattern
# Build payload
payload = {
"provider": "gemini",
"model": "gemini-2.5-pro",
"system_instruction": rendered_prompt,
"messages": conversation_history,
"temperature": 0.7,
"max_tokens": 1024,
"fallbacks": get_fallback_routing("gemini", "gemini-2.5-pro", rendered_prompt)
}
# Send to gateway (JWT handled automatically)
response = InternalLLMGatewayManager.generate_response(payload)
# Parse response
ai_message = response.json()["choices"][0]["message"]["content"]
Why Gateway Pattern?
- Centralized Authentication: Single JWT management for all AI services
- Provider Abstraction: Backend doesn’t need provider-specific SDKs
- Fallback Handling: Gateway orchestrates fallback chain
- Rate Limiting: Apply at gateway level
- Monitoring: Single point for observability (Langfuse)
- Cost Optimization: Gateway can implement caching, deduplication
- Security: API keys never exposed to backend, only to gateway </gateway_pattern>
<implementation_examples>
Complete Working Code Samples
Example 1: Full Chat Flow with Multi-Provider AI
# Step 1: Load Bot Configuration
bot = Bot.objects.get(slug="astrologer-mira")
llm_config = bot.llm_configs.filter(is_active=True).first()
# Step 2: Fetch External Data (Astrology API)
astro_data = fetch_kundli_data(user_birth_details) # External API call
# Step 3: Build Prompt with Template Variables
prompt_text = PromptHelper.get_prompt(
app_identifier=AppIdentifier.iOS_MIRA,
bot_doc={
"name": bot.name,
"description": bot.description,
"sex": "female"
},
user_doc={
"name": user.name,
"age": user.age,
"gender": user.gender,
},
language=Languages.ENGLISH,
app_id="4",
memories=user_memories,
profile_details_1=astro_data
)
# Step 4: Build Fallback Chain
fallbacks = PromptHelper.get_fallback_routing(
provider=llm_config.llm_provider,
model_name=llm_config.model_name,
system_instruction=prompt_text,
llm_metadata=llm_config.metadata
)
# Step 5: Prepare LLM Request Payload
payload = {
"provider": llm_config.llm_provider,
"model": llm_config.model_name,
"system_instruction": prompt_text,
"messages": conversation_history,
"temperature": llm_config.temperature,
"max_tokens": llm_config.max_output_tokens,
"top_p": llm_config.top_p,
"fallbacks": fallbacks,
"metadata": llm_config.metadata
}
# Step 6: Call LLM Gateway
response = InternalLLMGatewayManager.generate_response(payload)
# Step 7: Parse and Save Response
ai_message = response.json()["choices"][0]["message"]["content"]
ChatHistory.objects.create(
session=session,
sender_type="bot",
message=ai_message
)
# Step 8: Billing (if applicable)
if session.is_paid:
deduct_from_wallet(user, session.per_message_price)
# Step 9: Return to User
return {"message": ai_message, "session_id": session.id}
Example 2: Adding a New AI Provider
# Step 1: Add to BotLLMProvider choices
class BotLLMProvider(models.TextChoices):
# ... existing providers
MISTRAL = "mistral", _("Mistral AI")
# Step 2: Create BotLLMConfig for new provider
BotLLMConfig.objects.create(
bot=bot,
llm_provider="mistral",
model_name="mistral-large-latest",
temperature=0.7,
top_p=0.95,
max_output_tokens=1024,
is_active=False, # A/B test alongside existing config
metadata={"api_endpoint": "https://api.mistral.ai"}
)
# Step 3: Add fallback logic
def get_fallback_routing(provider, model_name, system_instruction, llm_metadata=None):
# ... existing code
elif provider == "mistral":
if model_name == "mistral-large-latest":
fallbacks.append({"provider": "mistral", "model": "mistral-medium-latest", "retry": 0})
fallbacks.append({"provider": "gemini", "model": "gemini-2.0-flash", "retry": 0})
# ... rest of code
Example 3: A/B Testing Different Models
# Create two configs for same bot
config_a = BotLLMConfig.objects.create(
bot=bot,
llm_provider="gemini",
model_name="gemini-2.5-pro",
temperature=0.7,
is_active=True, # Currently active
metadata={"experiment": "model_a"}
)
config_b = BotLLMConfig.objects.create(
bot=bot,
llm_provider="anthropic",
model_name="claude-3-sonnet-20240229",
temperature=0.7,
is_active=False, # Ready to activate
metadata={"experiment": "model_b"}
)
# Switch to config B (no code deployment needed!)
config_b.is_active = True
config_b.save() # This automatically sets config_a.is_active = False
# All new requests now use Claude instead of Gemini
</implementation_examples>
<external_api_integration>
External API Integration Pattern
Data Flow: External API â Transformation â AI Prompt
User Request
â
Fetch External Data (Astrology, Weather, Finance API)
âââ Cache Check (multi-layer)
âââ API Call (if cache miss)
âââ Save to Database
â
Data Transformation Pipeline
âââ Validate (ensure required fields present)
âââ Normalize (standardize format)
âââ Enrich (add computed fields)
âââ Format for AI (XML, JSON, Markdown)
â
Inject into Prompt Template
â
LLM Call with Multi-Provider Fallbacks
â
Response to User
Multi-Layer Caching Strategy
class DataFetcher:
def __init__(self):
self._instance_cache = {} # Request-scope cache
def get_kundli_data(self, user_id, birth_details):
# Layer 1: Instance cache (fastest, request scope)
cache_key = f"kundli_{user_id}"
if cache_key in self._instance_cache:
return self._instance_cache[cache_key]
# Layer 2: Distributed cache (Redis, 10 min TTL)
cached = CacheManager.get(cache_key)
if cached:
self._instance_cache[cache_key] = cached
return cached
# Layer 3: Database (persistent)
db_record = KundliData.objects.filter(user_id=user_id).first()
if db_record and db_record.is_fresh:
data = db_record.data
CacheManager.set(cache_key, data, duration=600)
self._instance_cache[cache_key] = data
return data
# Layer 4: External API call
data = self._call_astrology_api(birth_details)
# Write back to all layers
KundliData.objects.update_or_create(user_id=user_id, defaults={"data": data})
CacheManager.set(cache_key, data, duration=600)
self._instance_cache[cache_key] = data
return data
Data Transformation Pipeline
def transform_astro_data(raw_api_response):
# Step 1: Validate
if not validate_astro_response(raw_api_response):
raise ValueError("Invalid astrology API response")
# Step 2: Normalize
normalized = {
"ascendant": raw_api_response.get("Ascendant", "").capitalize(),
"planets": normalize_planet_positions(raw_api_response["planet_chart_data"]),
"dashas": normalize_dashas(raw_api_response["current_vdasha"])
}
# Step 3: Enrich
enriched = {
**normalized,
"processed_at": datetime.now().isoformat(),
"summary": generate_summary(normalized["planets"])
}
# Step 4: Format for AI (Markdown)
formatted = f"""
### Kundli Data
**Ascendant**: {enriched["ascendant"]}
**Planetary Positions**:
{format_planets_markdown(enriched["planets"])}
**Current Dasha**: {enriched["dashas"]["major"]} / {enriched["dashas"]["minor"]}
**Summary**: {enriched["summary"]}
"""
return formatted
</external_api_integration>
<authentication_security>
JWT Authentication & Security
JWT Token Lifecycle
Request LLM Service
â
Check Cache for JWT Token
â
If Not Cached:
Generate JWT Token
âââ Payload: {"service": "django-backend", "exp": now + 3600s}
âââ Sign with Secret Key (HS256)
âââ Cache for (3600 - 2) seconds
â
Attach to Request Header: "Authorization: jwt {token}"
â
Send to LLM Gateway
â
Gateway Validates JWT
âââ Verify Signature
âââ Check Expiry
âââ Allow/Deny Request
Security Best Practices
- Never Expose API Keys: Only gateway has provider API keys
- Short Token Lifetimes: 1 hour max, cache for (expiry – 2 seconds)
- Service Identification: Include “service” claim in JWT payload
- Environment-Based Secrets: Load JWT_KEY from environment, never hardcode
- HTTPS Only: All gateway communication over TLS
- Rate Limiting: Implement at gateway level (per service, per user)
- Audit Logging: Log all gateway requests with service identifier </authentication_security>
<monitoring_observability>
Monitoring & Observability with Langfuse
Langfuse Integration
from langfuse import observe
class InternalLLMGatewayManager:
@classmethod
@observe(as_type="generation") # Automatically tracks LLM calls
def generate_response(cls, payload):
return cls._request_gateway().post("v1/generate/", json=payload)
What Langfuse Tracks
- Request Metadata: Provider, model, temperature, tokens
- Latency: Time to first token, total generation time
- Cost: Token usage, estimated cost per provider
- Fallbacks: Which fallbacks were triggered
- Errors: Provider failures, retry counts
- User Context: Session ID, user ID, bot ID
Monitoring Dashboard Metrics
- Success Rate: % of successful LLM calls
- Latency P50/P95/P99: Response time percentiles
- Fallback Rate: How often primary provider fails
- Provider Distribution: % of requests per provider
- Cost per Request: Average cost by provider/model
- Error Types: Network, rate limit, timeout, validation </monitoring_observability>
<quick_start>
Getting Started
When implementing a new AI feature:
- Design Django Models: Create
Bot,BotLLMConfigfollowing <database_schema> - Set Up Providers: Add provider to
BotLLMProvider.choices, implement in gateway - Configure Fallbacks: Define fallback chain in
get_fallback_routing() - Create Prompt Templates: Use
{{variable}}syntax, define inPromptHelper - Integrate External APIs (if needed): Fetch data, transform, cache
- Build LLM Gateway Client: Use
InternalLLMGatewayManager.generate_response() - Test Fallbacks: Kill primary provider, verify fallback works
- Add Monitoring: Ensure Langfuse
@observedecorator is present - Deploy: Update BotLLMConfig via Django admin (no code deployment!)
Common Use Cases:
- Building a chatbot: database-schema â prompt-templating â gateway-pattern
- Adding a new AI provider: Update BotLLMProvider â Add fallback logic â Test
- Implementing prompt templates: prompt-templating â dynamic-variables
- Setting up region failover: region-configuration â fallback-mechanisms
- Integrating external API: external-api-integration â multi-layer caching </quick_start>
<when_to_use> Use this skill when:
- Building chatbot or AI assistant backends (Django/Python)
- Implementing multi-provider LLM support (OpenAI, Anthropic, Gemini, Bedrock, Byteplus, OpenRouter)
- Designing fallback and retry mechanisms for AI services
- Creating prompt management systems with {{variable}} templates
- Setting up region-based AI routing (AWS Bedrock multi-region)
- Implementing JWT-based authentication for internal AI gateways
- Integrating external APIs (astrology, weather, finance) with AI
- Building session-based billing for AI services
- Migrating from single-provider to multi-provider AI architecture
- Optimizing AI costs through intelligent provider selection and fallbacks
- Implementing A/B testing for LLM configurations
Do NOT use this skill for:
- Frontend AI integration (use frontend-specific frameworks)
- Real-time streaming implementations (different patterns)
- Fine-tuning or training models (this is about inference)
- Non-Django Python frameworks (patterns may need adaptation) </when_to_use>
<success_criteria> You have successfully implemented an AI-integrated backend when:
- â Multiple AI providers are supported with seamless database-driven switching
- â Fallback chains prevent single points of failure (verified by testing provider outages)
- â
Prompts use
{{variable}}templates decoupled from code - â LLM configurations are stored in database (BotLLMConfig), not hardcoded
- â JWT authentication secures internal LLM gateway calls with token caching
- â Region-based routing works for AWS Bedrock (tested across ap-south-1, us-east-1, etc.)
- â
Observability is in place (Langfuse
@observedecorator on LLM calls) - â External API data is cached (multi-layer: instance â Redis â database)
- â Graceful error handling provides user-friendly messages (no raw API errors)
- â
The system can A/B test different models via
is_activeflag (no code changes) - â Fallback system_instruction inheritance works (all fallbacks get same prompt)
- â Session-based billing deducts from wallet correctly (if applicable) </success_criteria>