rust-ml

📁 peixotorms/odinlayer-skills 📅 4 days ago
1
总安装量
1
周安装量
#41337
全站排名
安装命令
npx skills add https://github.com/peixotorms/odinlayer-skills --skill rust-ml

Agent 安装分布

amp 1
opencode 1
kimi-cli 1
codex 1
github-copilot 1
claude-code 1

Skill 文档

Machine Learning Development

Domain Constraints

Domain Rule Design Constraint Rust Implication
Large data Efficient memory Zero-copy, streaming
GPU acceleration CUDA/Metal support candle, tch-rs
Model portability Standard formats ONNX
Batch processing Throughput over latency Batched inference
Numerical precision Float handling ndarray, careful f32/f64
Reproducibility Deterministic Seeded random, versioning

Critical Rules

  • Avoid copying large tensors — memory bandwidth is the bottleneck. Use references, views, in-place ops.
  • Batch operations for GPU efficiency — GPU has overhead per kernel launch, batch to amortize.
  • Use standard model formats — train in Python, deploy in Rust via ONNX.

Use Case to Framework

Use Case Recommended Why
Inference only tract (ONNX) Lightweight, portable
Training + inference candle, burn Pure Rust, GPU
PyTorch models tch-rs Direct bindings
Data pipelines polars Fast, lazy eval

Key Crates

Purpose Crate
Tensors ndarray
ONNX inference tract
ML framework candle, burn
PyTorch bindings tch-rs
Data processing polars
Embeddings fastembed

Inference Server Pattern

use std::sync::OnceLock;
use tract_onnx::prelude::*;

static MODEL: OnceLock<SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>> = OnceLock::new();

fn get_model() -> &'static SimplePlan<...> {
    MODEL.get_or_init(|| {
        tract_onnx::onnx()
            .model_for_path("model.onnx")
            .unwrap()
            .into_optimized()
            .unwrap()
            .into_runnable()
            .unwrap()
    })
}

async fn predict(input: Vec<f32>) -> anyhow::Result<Vec<f32>> {
    let model = get_model();
    let input = tract_ndarray::arr1(&input).into_shape((1, input.len()))?;
    let result = model.run(tvec!(input.into()))?;
    Ok(result[0].to_array_view::<f32>()?.iter().copied().collect())
}

Batched Inference Pattern

async fn batch_predict(inputs: Vec<Vec<f32>>, batch_size: usize) -> Vec<Vec<f32>> {
    let mut results = Vec::with_capacity(inputs.len());

    for batch in inputs.chunks(batch_size) {
        let batch_tensor = stack_inputs(batch);
        let batch_output = model.run(batch_tensor).await;
        results.extend(unstack_outputs(batch_output));
    }

    results
}

Common Mistakes

Mistake Domain Violation Fix
Clone tensors Memory waste Use views
Single inference GPU underutilized Batch processing
Load model per request Slow Singleton pattern
Sync data loading GPU idle Async pipeline