cairo-optimization
npx skills add https://github.com/keep-starknet-strange/starknet-agentic --skill cairo-optimization
Agent 安装分布
Skill 文档
Coding Cairo
Rules and patterns for writing efficient Cairo code. Sourced from audit findings and production profiling.
Attribution: Originally authored by feltroidprime (cairo-skills). Integrated with permission into starknet-agentic.
When to Use
- Implementing arithmetic (modular, parity checks, quotient/remainder)
- Optimizing loops (slow iteration, repeated
.len()calls, index-based access) - Splitting or assembling integer limbs (u256 â u128, u32s â u128, felt252 â u96)
- Packing struct fields into storage slots
- Using
BoundedIntfor zero-overhead arithmetic with compile-time bounds - Choosing integer types (u128 vs u256, BoundedInt vs native types)
Not for: Profiling/benchmarking (use benchmarking-cairo if available)
Quick Reference â All Rules
| # | Rule | Instead of | Use |
|---|---|---|---|
| 1 | Combined quotient+remainder | x / m + x % m |
DivRem::div_rem(x, m) |
| 2 | Cheap loop conditions | while i < n |
while i != n |
| 3 | Constant powers of 2 | 2_u32.pow(k) |
match-based lookup table |
| 4 | Pointer-based iteration | *data.at(i) in index loop |
pop_front / for / multi_pop_front |
| 5 | Cache array length | .len() in loop condition |
let n = data.len(); before loop |
| 6 | Pointer-based slicing | Manual loop extraction | span.slice(start, length) |
| 7 | Cheap parity/halving | index & 1, index / 2 |
DivRem::div_rem(index, 2) |
| 8 | Smallest integer type | u256 when range < 2^128 |
u128 (type encodes constraint) |
| 9 | Storage slot packing | One slot per field | StorePacking trait |
| 10 | BoundedInt for limbs | Bitwise ops / raw u128 math | bounded_int::{div_rem, mul, add} |
| 11 | Fast 2-input Poseidon | poseidon_hash_span([x,y]) |
hades_permutation(x, y, 2) |
Always / Never Rules
1. Always use DivRem::div_rem â never separate % and /
Cairo computes quotient and remainder in a single operation. Using both % and / on the same value doubles the cost.
// BAD
let q = x / m;
let r = x % m;
// GOOD
let (q, r) = DivRem::div_rem(x, m);
2. Never use < or > in while loop conditions â use !=
Equality checks are cheaper than comparisons in Cairo.
// BAD
while i < n { ... i += 1; }
// GOOD
while i != n { ... i += 1; }
3. Never compute 2^k with pow() â use a lookup table
u32::pow() is expensive. Use a match lookup for known ranges.
// BAD
let p = 2_u32.pow(depth.into());
// GOOD â match-based lookup
fn pow2(n: u32) -> u32 {
match n {
0 => 1, 1 => 2, 2 => 4, 3 => 8, 4 => 16, 5 => 32,
6 => 64, 7 => 128, 8 => 256, 9 => 512, 10 => 1024,
// extend as needed
_ => core::panic_with_felt252('pow2 out of range'),
}
}
4. Always iterate arrays with pop_front / for / multi_pop_front â never index-loop
Index-based access (array.at(i)) is more expensive than pointer-based iteration.
// BAD
let mut i = 0;
while i != data.len() {
let val = *data.at(i);
i += 1;
}
// GOOD â pop_front
while let Option::Some(val) = data.pop_front() { ... }
// GOOD â for loop (equivalent)
for val in data { ... }
// GOOD â batch iteration
while let Option::Some(chunk) = data.multi_pop_front::<4>() { ... }
5. Never call .len() inside a loop condition â cache it
.len() recomputes every iteration. Store it once.
// BAD
while i != data.len() { ... i += 1; }
// GOOD
let n = data.len();
while i != n { ... i += 1; }
6. Always use span.slice() instead of manual loop extraction
slice() manipulates pointers directly â no element-by-element copying.
// BAD
let mut result: Array = array![];
let mut i = 0;
while i != length {
result.append(*data.at(start + i));
i += 1;
}
// GOOD
let result = data.slice(start, length);
7. Always use DivRem for parity checks â never use bitwise ops
Bitwise AND is more expensive than div_rem in Cairo. Use DivRem::div_rem(x, 2) to get both the halved value and parity in one operation.
// BAD
let is_odd = (index & 1) == 1;
index = index / 2;
// GOOD
let (q, r) = DivRem::div_rem(index, 2);
if r == 1 { /* odd branch */ }
index = q;
8. Always use the smallest integer type that fits the value range
u128 instead of u256 when the range is known. Adds clarity, prevents intermediate overflow.
// BAD â u256 for a value known to be < 2^128
fn deposit(value: u256) { assert(value < MAX_U128, '...'); ... }
// GOOD â type encodes the constraint
fn deposit(value: u128) { ... }
9. Always use StorePacking to pack small fields into one storage slot
Multiple small fields (basis points, flags, bounded amounts) can share a single felt252 slot.
use starknet::storage_access::StorePacking;
const POW_2_128: felt252 = 0x100000000000000000000000000000000;
impl MyStorePacking of StorePacking<MyStruct, felt252> {
fn pack(value: MyStruct) -> felt252 {
value.amount.into() + value.fee_bps.into() * POW_2_128
}
fn unpack(value: felt252) -> MyStruct {
let u256 { low, high } = value.into();
MyStruct { amount: low, fee_bps: high.try_into().unwrap() }
}
}
10. Always use BoundedInt for byte cutting, limb assembly, and type conversions
Never use bitwise ops (&, |, shifts) or raw u128/u256 arithmetic for splitting or combining integer limbs. Use bounded_int::div_rem to extract parts and bounded_int::mul + bounded_int::add to assemble them. BoundedInt tracks bounds at compile time, eliminating overflow checks.
Assembling limbs (e.g., 4 x u32 â u128):
// BAD â direct u128 arithmetic (28,340 gas)
fn u32s_to_u128(d0: u32, d1: u32, d2: u32, d3: u32) -> u128 {
d0.into() + d1.into() * POW_2_32 + d2.into() * POW_2_64 + d3.into() * POW_2_96
}
// GOOD â BoundedInt (13,840 gas, 2x faster)
fn u32s_to_u128(d0: u32, d1: u32, d2: u32, d3: u32) -> u128 {
let d0_bi: u32_bi = upcast(d0);
let d1_bi: u32_bi = upcast(d1);
let d2_bi: u32_bi = upcast(d2);
let d3_bi: u32_bi = upcast(d3);
let r: u128_bi = add(add(add(d0_bi, mul(d1_bi, POW_32_UI)), mul(d2_bi, POW_64_UI)), mul(d3_bi, POW_96_UI));
upcast(r)
}
Splitting values (e.g., felt252 â two u96 limbs):
// GOOD â div_rem to split, mul+add to reassemble
fn felt252_to_two_u96(value: felt252) -> (u96, u96) {
match u128s_from_felt252(value) {
U128sFromFelt252Result::Narrow(low) => {
let (hi32, lo96) = bounded_int::div_rem(low, NZ_POW96_TYPED);
(lo96, upcast(hi32))
},
U128sFromFelt252Result::Wide((high, low)) => {
let (lo_hi32, lo96) = bounded_int::div_rem(low, NZ_POW96_TYPED);
let hi64: BoundedInt<0, { POW64 - 1 }> = downcast(high).unwrap();
(lo96, bounded_int::add(bounded_int::mul(hi64, POW32_TYPED), lo_hi32))
},
}
}
Extracting bits (e.g., building a 4-bit selector):
// GOOD â div_rem by 2 extracts LSB, quotient is right-shifted value
let (qu1, bit0) = bounded_int::div_rem(u1, TWO_NZ); // bit0 in {0,1}
let (qu2, bit1) = bounded_int::div_rem(u2, TWO_NZ);
let selector = add(bit0, mul(bit1, TWO_UI)); // selector in {0..3}
See garaga/selectors.cairo and cairo-perfs-snippets for production examples.
11. Always use hades_permutation for 2-input Poseidon hashes
poseidon_hash_span has overhead for span construction. For exactly two inputs, use the permutation directly.
// BAD
let hash = poseidon_hash_span(array![x, y].span());
// GOOD
let (hash, _, _) = hades_permutation(x, y, 2);
Code Quality
- DRY: Extract repeated validation into helper functions. If two functions validate-then-write the same struct, extract a shared
_set_config(). scarb fmt: Run before every commit..tool-versions: Pin Scarb (2.15.1) and Starknet Foundry (0.56.0) versions with ASDF for reproducible builds.- Keep dependencies updated: Newer Scarb/Foundry versions include gas optimizations and compiler improvements.
BoundedInt Optimization
BoundedInt<MIN, MAX> encodes value constraints in the type system, eliminating runtime overflow checks. Use the CLI tool (bounded_int_calc.py in this skill directory) to compute bounds â do NOT calculate manually.
Critical Architecture Decision: Avoid Downcast
The #1 optimization pitfall: Converting between u16/u32/u64 and BoundedInt at function boundaries.
The Problem
If your functions take u16 and return u16, you must:
downcastinput toBoundedInt(expensive â requires range check)- Do bounded arithmetic (cheap)
upcastresult back tou16(cheap but wasteful)
The downcast operation adds a range check that dominates the savings from bounded arithmetic. In profiling:
downcast: 161,280 steps (18.86%)bounded_int_div_rem: 204,288 steps (23.89%)- Total bounded approach: worse than original!
The Solution: BoundedInt Throughout
Use BoundedInt types as function inputs AND outputs. This eliminates downcast entirely.
// BAD: Converts at every call (downcast overhead kills performance)
pub fn add_mod(a: u16, b: u16) -> u16 {
let a: Zq = downcast(a).expect('overflow'); // EXPENSIVE
let b: Zq = downcast(b).expect('overflow'); // EXPENSIVE
let sum: ZqSum = add(a, b);
let (_q, rem) = bounded_int_div_rem(sum, nz_q);
upcast(rem)
}
// GOOD: BoundedInt in, BoundedInt out (no downcast)
pub fn add_mod(a: Zq, b: Zq) -> Zq {
let sum: ZqSum = add(a, b);
let (_q, rem) = bounded_int_div_rem(sum, nz_q);
rem
}
Refactoring Strategy
When optimizing existing code:
- Identify the hot path â profile to find which functions use modular arithmetic heavily
- Change signatures â update function inputs/outputs to use
BoundedInttypes - Propagate types outward â callers must also use
BoundedInt - Downcast only at boundaries â convert from u16/u32 only at system entry points (e.g., deserialization)
Type Conversion Rules
| From | To | Operation | Cost |
|---|---|---|---|
u16 |
BoundedInt<0, 65535> |
upcast |
Free (superset) |
u16 |
BoundedInt<0, 12288> |
downcast |
Expensive (range check) |
BoundedInt<0, 12288> |
u16 |
upcast |
Free (subset) |
BoundedInt<A, B> |
BoundedInt<C, D> where [A,B] â [C,D] |
upcast |
Free |
BoundedInt<A, B> |
BoundedInt<C, D> where [A,B] â [C,D] |
downcast |
Expensive |
Key insight: upcast only works when target range is a superset of source range. You cannot upcast u32 to BoundedInt<0, 150994944> because u32 max (4294967295) > 150994944.
Prerequisites
# Scarb.toml
[dependencies]
corelib_imports = "0.1.2"
// CORRECT imports â copy exactly
use corelib_imports::bounded_int::{
BoundedInt, upcast, downcast, bounded_int_div_rem,
AddHelper, MulHelper, DivRemHelper, UnitInt,
};
use corelib_imports::bounded_int::bounded_int::{SubHelper, add, sub, mul};
Copy-Paste Template
Working example for modular arithmetic mod 100:
use corelib_imports::bounded_int::{
BoundedInt, upcast, downcast, bounded_int_div_rem,
AddHelper, MulHelper, DivRemHelper, UnitInt,
};
use corelib_imports::bounded_int::bounded_int::{SubHelper, add, sub, mul};
type Val = BoundedInt<0, 99>; // [0, 99]
type ValSum = BoundedInt<0, 198>; // [0, 198]
type ValConst = UnitInt<100>; // singleton {100}
impl AddValImpl of AddHelper<Val, Val> {
type Result = ValSum;
}
impl DivRemValImpl of DivRemHelper<ValSum, ValConst> {
type DivT = BoundedInt<0, 1>;
type RemT = Val;
}
fn add_mod_100(a: Val, b: Val) -> Val {
let sum: ValSum = add(a, b);
let nz_100: NonZero<ValConst> = 100;
let (_q, rem) = bounded_int_div_rem(sum, nz_100);
rem
}
CLI Tool
Use bounded_int_calc.py in this skill directory. Always use CLI â never calculate manually.
# Addition: [a_lo, a_hi] + [b_lo, b_hi]
python3 bounded_int_calc.py add 0 12288 0 12288
# -> BoundedInt<0, 24576>
# Subtraction: [a_lo, a_hi] - [b_lo, b_hi]
python3 bounded_int_calc.py sub 0 12288 0 12288
# -> BoundedInt<-12288, 12288>
# Multiplication
python3 bounded_int_calc.py mul 0 12288 0 12288
# -> BoundedInt<0, 150994944>
# Division: quotient and remainder bounds
python3 bounded_int_calc.py div 0 24576 12289 12289
# -> DivT: BoundedInt<0, 1>, RemT: BoundedInt<0, 12288>
# Custom impl name
python3 bounded_int_calc.py mul 0 12288 0 12288 --name MulZqImpl
BoundedInt Bounds Quick Reference
| Operation | Formula |
|---|---|
| Add | [a_lo + b_lo, a_hi + b_hi] |
| Sub | [a_lo - b_hi, a_hi - b_lo] |
| Mul (unsigned) | [a_lo * b_lo, a_hi * b_hi] |
| Div quotient | [a_lo / b_hi, a_hi / b_lo] |
| Div remainder | [0, b_hi - 1] |
Negative Dividends: SHIFT Pattern
bounded_int_div_rem doesn’t support negative lower bounds. When a subtraction produces a negative-bounded result that needs reduction, add a multiple of the modulus first:
// sub_mod: (a - b) mod Q via SHIFT
pub fn sub_mod(a: Zq, b: Zq) -> Zq {
let a_plus_q: BoundedInt<12289, 24577> = add(a, Q_CONST); // shift by +Q
let diff: BoundedInt<1, 24577> = sub(a_plus_q, b); // now non-negative
let (_q, rem) = bounded_int_div_rem(diff, nz_q());
rem
}
// fused_sub_mul_mod: a - (b*c) mod Q via large SHIFT
// OFFSET = 12288 * Q = 151007232 (smallest multiple of Q >= max product)
pub fn fused_sub_mul_mod(a: Zq, b: Zq, c: Zq) -> Zq {
let prod: ZqProd = mul(b, c);
let a_offset: BoundedInt<151007232, 151019520> = add(a, OFFSET_CONST);
let diff: BoundedInt<12288, 151019520> = sub(a_offset, prod);
let (_q, rem) = bounded_int_div_rem(diff, nz_q());
rem
}
Rule: SHIFT = ceil(|min_possible_value| / modulus) * modulus. Adding SHIFT preserves the result mod Q (since SHIFT â¡ 0 mod Q) while making all values non-negative.
Common BoundedInt Mistakes
- Downcast at every function call â the biggest performance killer. Use
BoundedInttypes throughout, not just inside arithmetic functions. - Trying to upcast to a narrower type â
upcast(val: u32)toBoundedInt<0, 150994944>fails because u32 max > 150994944. - Wrong imports â use exact imports from Prerequisites section above.
- Wrong subtraction bounds â it’s
[a_lo - b_hi, a_hi - b_lo], NOT[a_lo - b_lo, a_hi - b_hi]. - Negative dividend in
bounded_int_div_remâ div_rem doesn’t support negative lower bounds. Add a SHIFT (multiple of modulus) before reducing. See SHIFT pattern above. - Missing intermediate types â always annotate:
let sum: ZqSum = add(a, b); - Division quotient off-by-one â integer division floors:
24576 / 12289 = 1, not 2. - Using
UnitIntvsBoundedIntfor constants â useUnitInt<N>for singleton constants like divisors. - Using
div_remvsbounded_int_div_remâ the function isbounded_int_div_rem, notdiv_rem. - Bounds exceed u128::max â BoundedInt bounds are hard-capped at 2^128. Larger values crash the Sierra specializer: ‘Provided generic argument is unsupported.’