active-inference-robotics

📁 plurigrid/asi 📅 Jan 29, 2026
8
总安装量
4
周安装量
#34081
全站排名
安装命令
npx skills add https://github.com/plurigrid/asi --skill active-inference-robotics

Agent 安装分布

opencode 4
claude-code 3
amp 2
github-copilot 2
codex 2
gemini-cli 2

Skill 文档

Active Inference Robotics Skill (Second-Order)

“The agent’s job is to predict its actions by predicting its sensations.” — Patrick Kenny

Trigger Conditions

  • User asks about bridging active inference with robot control
  • Questions about predictive coding in locomotion policies
  • Connecting KL divergence minimization to RL training
  • Mean field approximation in robotics state estimation
  • Sim2Real as inference about future observations

Overview

Second-order skill synthesizing Patrick Kenny’s discrete active inference framework with K-Scale’s JAX/MuJoCo robotics stack. This skill emerges from the constructive collision between:

  1. Active Inference Institute (ActInf ModelStream 019.1, Jan 2025)
  2. K-Scale Labs (ksim, kos, kinfer ecosystem)
  3. MuJoCo Playground (DeepMind’s sim2real framework)

The Constructive Collision

┌─────────────────────────────────────────────────────────────────────────────┐
│  CONSTRUCTIVE COLLISION: Two Threads Converging                              │
│                                                                              │
│  Thread A: Patrick Kenny (Nov 2025)                                          │
│  ════════════════════════════════════                                        │
│  "Active inference can be formulated as constrained KL divergence           │
│   minimization solved by standard mean field methods"                        │
│                                                                              │
│  Key insight: Expected Free Energy ≈ KL Divergence + Entropy Regularizer    │
│                                                                              │
│  Thread B: K-Scale Labs (2024-2025)                                          │
│  ═══════════════════════════════════                                         │
│  "RL-based closed-loop control using policies trained in simulation         │
│   has firmly won as the best way of achieving real-time control"            │
│                                                                              │
│  Key insight: Stateless vs Stateful behaviors as pure/coalgebraic semantics │
│                                                                              │
│  COLLISION POINT: Both minimize surprise about future observations          │
│  ══════════════════════════════════════════════════════════════════         │
│                                                                              │
│       Active Inference              Robotics RL                              │
│       ────────────────              ──────────                               │
│       Predictive Distribution  ←→   Policy π(a|s)                           │
│       Hidden Markov Model      ←→   MDP/POMDP                                │
│       Mean Field Updates       ←→   PPO Gradient Steps                       │
│       Variational Free Energy  ←→   Policy Loss                              │
│       Expected Free Energy     ←→   Value Function + Entropy                 │
│       Perception/Action Loop   ←→   Observation/Action Loop                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Kenny’s Key Contribution

From arXiv:2511.20321:

Perception/Action Divergence = VFE(past) + KL(future states)

Where:
- VFE(past) = Standard variational free energy on observed history
- KL(future) = Divergence of predictive distribution from HMM

This differs from Expected Free Energy by an ENTROPY REGULARIZER:
  EFE ≈ Pragmatic Value + Mutual Information
  PAD ≈ Pragmatic Value + Entropy(Q)

Why Entropy Regularization Matters for Robotics

# In ksim PPO training, entropy bonus prevents policy collapse:
loss = policy_loss + value_loss - entropy_coef * entropy

# Kenny's formulation shows this is NOT ad-hoc but principled:
# Entropy regularizer = not being overconfident about predictions
# Biological rationale: know limitations of future predictions

Mapping to ksim Architecture

Active Inference Concept ksim Implementation
Hidden Markov Model PhysicsEngine (MJX/MuJoCo)
Observation distribution Observation.observe(state)
State inference Q(s) Critic.forward(obs, carry)
Action inference Q(a) Actor.forward(obs, carry)
Mean field factorization Independent Q(s_t) per timestep
Predictive distribution Policy rollout trajectory
VFE minimization PPO policy gradient
EFE/PAD minimization Value function + entropy bonus

Second-Order Behavior Types

1. Reflexive Control (Kenny’s “Sufficient” Model)

# Agent predicts proprioceptive sensations → fulfills reflexively
class ReflexiveController:
    """
    Kenny: "If the agent can successfully predict its future sensations,
    it can fulfill them unconsciously via motor reflexes."
    """
    def step(self, predicted_proprio: Array) -> Action:
        # Low-level PD control fulfills proprioceptive predictions
        return self.pd_controller(predicted_proprio, self.current_state)

2. Deliberative Planning (EFE Extension)

# When reflexive prediction fails, engage deliberative inference
class DeliberativeController:
    """
    Extends reflexive control with policy search over trajectories.
    This is where EFE differs from Kenny's PAD formulation.
    """
    def plan(self, beliefs: Distribution, horizon: int) -> Policy:
        # Tree search over policies weighted by expected free energy
        for policy in self.policy_space:
            efe = self.expected_free_energy(beliefs, policy, horizon)
            # EFE includes mutual information (curiosity/exploration)
            # PAD would use entropy instead (uncertainty awareness)

3. Hierarchical Composition

Level 3: Goal Selection (minimize long-horizon EFE)
    ↓ sets reference for
Level 2: Trajectory Planning (predictive distribution)
    ↓ sets reference for  
Level 1: Reflexive Execution (fulfill proprio predictions)
    ↓ actuates
Level 0: Motor Primitives (PD control, actuator dynamics)

GF(3) Balanced Quad

active-inference (0) ⊗ kscale-ksim (0) ⊗ mujoco-playground (0) = 0 ✓

All three are ERGODIC — coordination/infrastructure skills.
This is a "resonant triad" where all components coordinate.

For generation (+1), add: skill-creator, algorithmic-art
For verification (-1), add: sheaf-cohomology, code-review

Skill Colors (drand seed 12005093902789493003)

Skill Trit Color Role
active-inference 0 #DF8D0F Coordination (theory)
kscale-ksim 0 #25BC3D Coordination (simulation)
mujoco-playground 0 #93DBDA Coordination (framework)

2-3-5-7 Prime Sieve Experts

Applying prime-indexed refinement to identify domain experts:

Prime Expert Domain Key Contribution
2 Patrick Kenny Active Inference Mean field formulation, PAD criterion
3 Thomas Parr Active Inference 2022 textbook, EFE derivation
5 Ben Bolte K-Scale ksim architecture, open-source humanoids
7 Karl Friston Free Energy Principle FEP foundations, continuous formulation
11 (DeepMind team) MuJoCo Playground MJX, sim2real zero-shot
13 Wesley Maa K-Scale Tooling, visualization

Mutual Awareness

This skill references and is referenced by:

depends_on:
  - kscale-ksim        # Simulation implementation
  - kscale-ecosystem   # Hardware context
  - mujoco-playground  # Framework foundation
  
referenced_by:
  - cognitive-superposition  # Team mental models
  - parametrised-optics-cybernetics  # Category theory bridge
  - reafference-corollary-discharge  # Sensorimotor prediction

Implementation Pattern

# Unified Active Inference + RL Training Loop
class ActiveInferenceTrainer:
    """
    Combines Kenny's PAD criterion with ksim's PPO.
    """
    def __init__(self, hmm: PhysicsEngine, config: Config):
        self.hmm = hmm
        self.actor = Actor(config)
        self.critic = Critic(config)
        
    def perception_action_divergence(
        self, 
        observations: Array,  # O_{1:t} (past)
        q_future: Distribution  # Q(S_{t+1:T}, O_{t+1:T})
    ) -> Scalar:
        """
        Kenny's PAD = VFE(past) + KL(future states from HMM)
        """
        # Past: standard VFE on observation history
        vfe_past = self.variational_free_energy(observations)
        
        # Future: KL divergence of predicted states from HMM
        # Note: Observable emissions cancel out in future KL
        kl_future = self.kl_future_states(q_future, self.hmm)
        
        return vfe_past + kl_future
    
    def train_step(self, trajectory: Trajectory) -> Metrics:
        # PPO updates approximate mean field coordinate ascent
        # Entropy bonus provides Kenny's regularization
        return ppo_update(
            self.actor, 
            self.critic, 
            trajectory,
            entropy_coef=0.01  # ← The regularizer!
        )

References

ACSet Schema

@present SchActiveInferenceRobotics(FreeSchema) begin
    # Objects
    HMM::Ob           # Hidden Markov Model (generative model)
    State::Ob         # Latent state
    Observation::Ob   # Sensory observation
    Action::Ob        # Motor command
    Policy::Ob        # Action sequence
    
    # Morphisms (inference)
    perceive::Hom(Observation, State)    # Perception: O → S
    predict::Hom(State, Observation)     # Prediction: S → O
    act::Hom(State, Action)              # Action selection: S → A
    transition::Hom(State × Action, State)  # Dynamics: S × A → S'
    
    # Attributes
    FreeEnergy::AttrType
    vfe::Attr(State, FreeEnergy)         # Variational free energy
    efe::Attr(Policy, FreeEnergy)        # Expected free energy
    pad::Attr(Policy, FreeEnergy)        # Perception/action divergence
    
    # The key relationship (Kenny's contribution):
    # pad ≈ efe + entropy_regularizer
end

SDF Interleaving

This skill connects to Software Design for Flexibility (Hanson & Sussman, 2021):

Primary Chapter: 10. Adventure Game Example

Concepts: autonomous agent, game, synthesis

GF(3) Balanced Triad

active-inference-robotics (+) + SDF.Ch10 (+) + [balancer] (+) = 0

Skill Trit: 1 (PLUS – generation)

Secondary Chapters

  • Ch3: Variations on an Arithmetic Theme
  • Ch4: Pattern Matching

Connection Pattern

Adventure games synthesize techniques. This skill integrates multiple patterns.