flash
npx skills add https://github.com/runpod/skills --skill flash
Agent 安装分布
Skill 文档
Runpod Flash
runpod-flash (v1.0.0) is a Python SDK for distributed execution of AI workloads on RunPod’s serverless infrastructure. Write Python functions locally, decorate with @remote, and Flash handles GPU/CPU provisioning, dependency management, and data transfer.
- Package:
pip install runpod-flash - Import:
from runpod_flash import remote, LiveServerless, GpuGroup, ... - CLI:
flash - Python: >=3.10, <3.15
Getting Started
1. Install Flash
pip install runpod-flash
2. Set your RunPod API key
Get a key from RunPod account settings, then either export it:
export RUNPOD_API_KEY=your_api_key_here
Or save in a .env file in your project directory (Flash auto-loads via python-dotenv):
echo "RUNPOD_API_KEY=your_api_key_here" > .env
3. Write and run a remote function
import asyncio
from runpod_flash import remote, LiveServerless
gpu_config = LiveServerless(name="my-first-worker")
@remote(resource_config=gpu_config, dependencies=["torch"])
async def gpu_task(data):
import torch
tensor = torch.tensor(data, device="cuda")
return {"sum": tensor.sum().item(), "gpu": torch.cuda.get_device_name(0)}
async def main():
result = await gpu_task([1, 2, 3, 4, 5])
print(result)
if __name__ == "__main__":
asyncio.run(main())
First run takes ~1 minute (endpoint provisioning). Subsequent runs take ~1 second.
4. Or create a Flash API project
flash init my_project
cd my_project
pip install -r requirements.txt
# Edit .env and add your RUNPOD_API_KEY
flash run # Start local FastAPI server at localhost:8888
flash run --auto-provision # Pre-deploy all endpoints (faster testing)
API explorer available at http://localhost:8888/docs.
5. Build and deploy to production
flash build # Scan @remote functions, package artifact
flash build --exclude torch,torchvision # Exclude packages in base image (500MB limit)
flash deploy new production # Create deployment environment
flash deploy send production # Upload and deploy
flash deploy list # List environments
flash deploy info production # Show details
flash deploy delete production # Tear down
Core Concept: The @remote Decorator
The @remote decorator marks functions for remote execution on RunPod infrastructure. Code inside runs remotely; code outside runs locally.
from runpod_flash import remote, LiveServerless
config = LiveServerless(name="my-worker")
@remote(resource_config=config, dependencies=["torch", "numpy"])
async def gpu_compute(data):
import torch # MUST import inside function
tensor = torch.tensor(data, device="cuda")
return {"result": tensor.sum().item()}
result = await gpu_compute([1, 2, 3])
@remote Signature
def remote(
resource_config: ServerlessResource, # Required: GPU/CPU config
dependencies: list[str] = None, # pip packages
system_dependencies: list[str] = None,# apt-get packages
accelerate_downloads: bool = True, # CDN acceleration
local: bool = False, # Execute locally (testing)
method: str = None, # HTTP method (LoadBalancer only)
path: str = None, # HTTP path (LoadBalancer only)
)
CRITICAL: Cloudpickle Scoping Rules
Functions decorated with @remote are serialized with cloudpickle. They can ONLY access:
- Function parameters
- Local variables defined inside the function
- Imports done inside the function
- Built-in Python functions
They CANNOT access: module-level imports, global variables, external functions/classes.
# WRONG - external references
import torch
@remote(resource_config=config)
async def bad(data):
return torch.tensor(data) # torch not accessible
# CORRECT - everything inside
@remote(resource_config=config, dependencies=["torch"])
async def good(data):
import torch
return torch.tensor(data)
Return Behavior
- Decorated function is always awaitable (
await my_func(...)) - Queue-based resources return
JobOutputwith.output,.error,.status - Load-balanced resources return your dict directly
Resource Configuration Classes
Choose based on execution model and environment:
| Class | Queue | HTTP | Environment | Use Case |
|---|---|---|---|---|
LiveServerless |
Yes | No | Dev | GPU with retries, remote code exec |
CpuLiveServerless |
Yes | No | Dev | CPU with retries, remote code exec |
ServerlessEndpoint |
Yes | No | Prod | GPU, custom Docker images |
CpuServerlessEndpoint |
Yes | No | Prod | CPU, custom Docker images |
LiveLoadBalancer |
No | Yes | Dev | GPU low-latency HTTP APIs |
CpuLiveLoadBalancer |
No | Yes | Dev | CPU low-latency HTTP APIs |
LoadBalancerSlsResource |
No | Yes | Prod | GPU production HTTP |
CpuLoadBalancerSlsResource |
No | Yes | Prod | CPU production HTTP |
Queue-based: Best for batch, long-running tasks, automatic retries. Load-balanced: Best for real-time APIs, low-latency, direct HTTP routing.
Live* classes: Fixed optimized Docker image, full remote code execution. Non-Live classes: Custom Docker images, dictionary payload only.
Common Parameters
LiveServerless(
name="worker-name", # Required, unique
gpus=[GpuGroup.AMPERE_80], # GPU type(s)
workersMin=0, # Min workers
workersMax=3, # Max workers
idleTimeout=300, # Seconds before scale-down
networkVolumeId="vol_abc123", # Persistent storage
env={"KEY": "value"}, # Environment variables
template=PodTemplate(containerDiskInGb=100),
)
GPU Groups (GpuGroup enum)
GpuGroup.ANY– Any available (not for production)GpuGroup.AMPERE_16– RTX A4000, 16GBGpuGroup.AMPERE_24– RTX A5000, 24GBGpuGroup.AMPERE_48– A40/RTX A6000, 48GBGpuGroup.AMPERE_80– A100, 80GBGpuGroup.ADA_24– RTX 4090, 24GBGpuGroup.ADA_32_PRO– RTX 5090, 32GBGpuGroup.ADA_48_PRO– RTX 6000 Ada, 48GBGpuGroup.ADA_80_PRO– H100, 80GBGpuGroup.HOPPER_141– H200, 141GB
CPU Instance Types (CpuInstanceType enum)
Format: CPU{generation}{type}_{vcpu}_{memory_gb}
| Instance Type | Gen | Type | vCPU | RAM |
|---|---|---|---|---|
CPU3G_1_4 |
3rd | General | 1 | 4GB |
CPU3G_2_8 |
3rd | General | 2 | 8GB |
CPU3G_4_16 |
3rd | General | 4 | 16GB |
CPU3G_8_32 |
3rd | General | 8 | 32GB |
CPU3C_1_2 |
3rd | Compute | 1 | 2GB |
CPU3C_2_4 |
3rd | Compute | 2 | 4GB |
CPU3C_4_8 |
3rd | Compute | 4 | 8GB |
CPU3C_8_16 |
3rd | Compute | 8 | 16GB |
CPU5C_1_2 |
5th | Compute | 1 | 2GB |
CPU5C_2_4 |
5th | Compute | 2 | 4GB |
CPU5C_4_8 |
5th | Compute | 4 | 8GB |
CPU5C_8_16 |
5th | Compute | 8 | 16GB |
Use with instanceIds parameter:
config = LiveServerless(
name="cpu-worker",
instanceIds=[CpuInstanceType.CPU5C_4_8],
workersMax=5,
)
Or use explicit CPU classes:
from runpod_flash import CpuLiveServerless
config = CpuLiveServerless(name="cpu-worker", workersMax=5)
PodTemplate
Override pod-level settings:
from runpod_flash import PodTemplate
template = PodTemplate(
containerDiskInGb=100,
env=[{"key": "PYTHONPATH", "value": "/workspace"}],
)
config = LiveServerless(name="worker", template=template)
NetworkVolume
from runpod_flash import NetworkVolume, DataCenter
volume = NetworkVolume(
name="model-storage",
size=100, # GB
dataCenterId=DataCenter.EU_RO_1,
)
LoadBalancer Resources
When using LoadBalancerSlsResource or LiveLoadBalancer:
methodandpathare required on@remotepathmust start with “/”methodmust be one of: GET, POST, PUT, DELETE, PATCH
from runpod_flash import remote, LiveLoadBalancer
api = LiveLoadBalancer(name="api-service")
@remote(api, method="POST", path="/api/process")
async def process(x: int, y: int):
return {"result": x + y}
@remote(api, method="GET", path="/api/health")
def health():
return {"status": "ok"}
Key differences from queue-based:
- Direct HTTP routing (no queue), lower latency
- Returns dict directly (no JobOutput wrapper)
- No automatic retries
Error Handling
Queue-Based Resources
job_output = await my_function(data)
if job_output.error:
print(f"Failed: {job_output.error}")
else:
result = job_output.output
JobOutput fields: id, status, output, error, started_at, ended_at
Load-Balanced Resources
try:
result = await my_function(data) # Returns dict directly
except Exception as e:
print(f"Error: {e}")
Runtime Exceptions
FlashRuntimeError (base)
RemoteExecutionError # Remote function failed
SerializationError # cloudpickle serialization failed
GraphQLError # GraphQL base error
GraphQLMutationError # Mutation failed
GraphQLQueryError # Query failed
ManifestError # Invalid/missing manifest
ManifestServiceUnavailableError # State Manager unreachable
Common Patterns
Hybrid GPU/CPU Pipeline
from runpod_flash import remote, LiveServerless, CpuInstanceType
cpu_config = LiveServerless(name="preprocessor", instanceIds=[CpuInstanceType.CPU5C_4_8])
gpu_config = LiveServerless(name="inference", gpus=[GpuGroup.AMPERE_80])
@remote(resource_config=cpu_config, dependencies=["pandas"])
async def preprocess(data):
import pandas as pd
return pd.DataFrame(data).to_dict('records')
@remote(resource_config=gpu_config, dependencies=["torch"])
async def inference(data):
import torch
tensor = torch.tensor(data, device="cuda")
return {"result": tensor.sum().item()}
async def pipeline(raw_data):
clean = await preprocess(raw_data)
return await inference(clean)
Parallel Execution
results = await asyncio.gather(
process_item(item1),
process_item(item2),
process_item(item3),
)
Local Testing
@remote(resource_config=config, local=True)
async def my_function(data):
return {"status": "ok"} # Runs locally, skips remote
Cost Optimization
- Use
workersMin=0to scale from zero - Use
idleTimeout=600to reduce churn - Use smaller GPUs if they fit your model
- Use
Live*classes for spot pricing in dev - Pass URLs/paths instead of large data objects
CLI Commands
flash init
flash init [project_name]
Creates a project template:
project_name/
âââ main.py # FastAPI entry point
âââ workers/
â âââ gpu/__init__.py # GPU router
â â âââ endpoint.py # GPU @remote function
â âââ cpu/__init__.py # CPU router
â âââ endpoint.py # CPU @remote function
âââ .env # API key template
âââ .gitignore
âââ .flashignore # Deployment ignore patterns
âââ requirements.txt
âââ README.md
flash run
flash run [--auto-provision] [--host HOST] [--port PORT]
| Option | Default | Description |
|---|---|---|
--auto-provision |
off | Pre-deploy all endpoints before serving |
--host |
localhost |
Server host (or FLASH_HOST env) |
--port |
8888 |
Server port (or FLASH_PORT env) |
flash build
flash build [--exclude PACKAGES] [--keep-build] [--preview]
| Option | Description |
|---|---|
--exclude pkg1,pkg2 |
Skip packages already in base Docker image |
--keep-build |
Don’t delete .flash/.build/ after packaging |
--preview |
Build then run in local Docker containers |
Build steps: scan @remote decorators, group by resource config, create flash_manifest.json, install dependencies for Linux x86_64, package into .flash/artifact.tar.gz.
500MB deployment limit – use --exclude for packages in base image:
flash build --exclude torch,torchvision,torchaudio
--preview mode: Creates Docker containers per resource config, starts mothership on localhost:8000, enables end-to-end local testing.
flash deploy
flash deploy new <env_name> [--app-name NAME] # Create environment
flash deploy send <env_name> [--app-name NAME] # Deploy archive
flash deploy list [--app-name NAME] # List environments
flash deploy info <env_name> [--app-name NAME] # Show details
flash deploy delete <env_name> [--app-name NAME] # Delete (double confirmation)
flash deploy send requires flash build to have been run first.
flash undeploy
flash undeploy list # List all deployed resources
flash undeploy <name> # Undeploy specific resource
flash env / flash app
flash env list|create|info|delete <name> # Environment management
flash app list|get <name> # App management
Architecture Overview
Deployment Architecture
Mothership Pattern: Coordinator endpoint + distributed child endpoints.
flash buildscans code, creates manifest + archiveflash deploy senduploads archive, provisions resources- Mothership boots, reconciles desired vs current state
- Child endpoints query State Manager GraphQL for service discovery (peer-to-peer)
- Functions route locally or remotely based on manifest
Cross-Endpoint Routing
Functions on different endpoints can call each other transparently:
ProductionWrapperintercepts callsServiceRegistrylooks up function in manifest- Local function? Execute directly
- Remote function? Serialize args (cloudpickle), POST to remote endpoint
Serialization: cloudpickle + base64, max 10MB payload.
Common Gotchas
- External scope in @remote functions – Most common error. Everything must be inside.
- Forgetting
await– All remote functions must be awaited. - Undeclared dependencies – Must be in
dependencies=[]parameter. - Queue vs LB confusion – Queue returns
JobOutput, LB returns dict directly. - Large serialization – Pass URLs/paths, not large data objects.
- Imports at module level – Import inside
@remotefunctions, not at top of file. - LoadBalancer requires method+path –
@remote(config, method="POST", path="/api/x") - Bundle too large (>500MB) – Use
--excludefor packages in base Docker image. - Endpoints accumulate – Clean up with
flash undeploy list/flash undeploy <name>.