react-native-executorch

📁 software-mansion-labs/react-native-skills 📅 Feb 10, 2026

总安装量

周安装量

#59352

全站排名

安装命令

npx skills add https://github.com/software-mansion-labs/react-native-skills --skill react-native-executorch

Agent 安装分布

amp 2

claude-code 2

github-copilot 2

codex 2

kimi-cli 2

gemini-cli 2

Skill 文档

When to Use This Skill

Use this skill when you need to:

Build AI features directly into mobile apps without cloud infrastructure
Deploy LLMs locally for text generation, chat, or function calling
Add computer vision (image classification, object detection, OCR)
Process audio (speech-to-text, text-to-speech, voice activity detection)
Implement semantic search with text embeddings
Ensure privacy by keeping all AI processing on-device
Reduce latency by eliminating cloud API calls
Work offline once models are downloaded

Overview

React Native Executorch is a library developed by Software Mansion that enables on-device AI model execution in React Native applications. It provides APIs for running machine learning models directly on mobile devices without requiring cloud infrastructure or internet connectivity (after initial model download). React Native Executorch provides APIs for LLMs, computer vision, OCR, audio processing and embeddings without cloud dependencies, as well as a variety of pre-exported models for common use cases. React Native Executorch is a way of bringing ExecuTorch into the React Native world.

Key Use Cases

Use Case 1: Mobile Chatbot/Assistant

Trigger: User asks to build a chat interface, create a conversational AI, or add an AI assistant to their app

Steps:

Choose appropriate LLM based on device memory constraints
Load model using ExecuTorch hooks
Implement message handling and conversation history
Optionally add system prompts, tool calling, or structured output

Result: Functional chat interface with on-device AI responding without cloud dependency

Reference: ./references/reference-llms.md

Use Case 2: Computer Vision

Trigger: User needs to classify images, detect objects, or recognize content in photos

Steps:

Select vision model (classification, detection, or segmentation)
Load model for image processing task
Pass image and process results
Display detections or classifications in app UI

Result: App that understands image content without sending data to servers

Reference: ./references/reference-cv.md

Use Case 3: Document/Receipt Scanning

Trigger: User wants to extract text from photos (receipts, documents, business cards)

Steps:

Choose OCR model matching target language
Load appropriate recognizer for alphabet/language
Capture or load image
Extract text regions with bounding boxes
Post-process results for application

Result: OCR-enabled app that reads text directly from device camera

Reference: ./references/reference-ocr.md

Use Case 4: Voice Interface

Trigger: User wants to add voice commands, transcription, or voice output to app

Steps:

For voice input: Capture audio at correct sample rate â transcribe with STT model
For voice output: Generate speech from text â play through audio context
Handle audio format/sample rate conversion

Result: App with hands-free voice interaction

Reference: ./references/reference-audio.md

Use Case 5: Semantic Search

Trigger: User needs intelligent search, similarity matching, or content recommendations

Steps:

Load text or image embeddings model
Generate embeddings for searchable content
Compute similarity scores between queries and content
Rank and return results

Result: Smart search that understands meaning, not just keywords

Reference: ./references/reference-nlp.md

Core Capabilities by Category

Large Language Models (LLMs)

Run text generation, chat, function calling, and structured output generation locally on-device.

Supported features:

Text generation and chat completions
Function/tool calling
Structured output with JSON schema validation
Streaming responses
Multiple model families (Llama 3.2, Qwen 3, Hammer 2.1, SmolLM2, Phi 4)

Reference: See ./references/reference-llms.md

Computer Vision

Perform image understanding and manipulation tasks entirely on-device.

Supported tasks:

Image Classification – Categorize images into predefined classes
Object Detection – Locate and identify objects with bounding boxes
Image Segmentation – Pixel-level classification
Style Transfer – Apply artistic styles to images
Text-to-Image – Generate images from text descriptions
Image Embeddings – Convert images to numerical vectors for similarity/search

Reference: See ./references/reference-cv.md and ./references/reference-cv-2.md

Optical Character Recognition (OCR)

Extract and recognize text from images with support for multiple languages and text orientations.

Supported features:

Text detection in images
Text recognition across different alphabets
Horizontal text (standard documents, receipts)
Vertical text support (experimental, for CJK languages)
Multi-language support with language-specific recognizers

Reference: See ./references/reference-ocr.md

Audio Processing

Convert between speech and text, and detect speech activity in audio.

Supported tasks:

Speech-to-Text – Transcribe audio to text (supports multiple languages including English)
Text-to-Speech – Generate natural-sounding speech from text
Voice Activity Detection – Detect speech segments in audio

Reference: See ./references/reference-audio.md

Natural Language Processing

Convert text to numerical representations for semantic understanding and search.

Supported tasks:

Text Embeddings – Convert text to vectors for similarity/search
Tokenization – Convert text to tokens and vice versa

Reference: See ./references/reference-nlp.md

Getting Started by Use Case

I want to build a chatbot or AI assistant

Use useLLM hook or LLMModule with one of the available language models.

What to do:

Choose a model from available LLM options (consider device memory constraints)
Use the useLLM hook or LLMModule to load the model
Send messages and receive responses
Optionally configure system prompts, generation parameters, and tools

Reference: ./references/reference-llms.md

Model options: ./references/reference-models.md – LLMs section

I want to enable function/tool calling in my LLM

Use useLLM hook or LLMModule with tool definitions to allow the model to call predefined functions.

What to do:

Define tools with name, description, and parameter schema
Configure the LLM with tool definitions
Implement callbacks to execute tools when the model requests them
Parse tool results and pass them back to the model

Reference: ./references/reference-llms.md – Tool Calling section

I want structured data extraction from text

Use useLLM hook or LLMModule with structured output generation using JSON schema validation.

What to do:

Define a schema (JSON Schema or Zod) for desired output format
Configure the LLM with the schema
Generate responses and validate against the schema
Use the validated structured data in your app

Reference: ./references/reference-llms.md – Structured Output section

I want to classify or recognize objects in images

Use useClassification hook or ClassificationModule for simple categorization or use useObjectDetection hook or ObjectDetectionModule for locating specific objects.

What to do:

Choose appropriate computer vision model based on task
Load the model with the appropriate hook or module
Pass image (local, remote, or base64)
Process results (classifications, detections with bounding boxes)

Reference: ./references/reference-cv.md

Model options: ./references/reference-models.md – Classification and Object Detection sections

I want to extract text from images

Use useOCR hook or OCRModule for horizontal text or use useVerticalOCR hook or VerticalOCRModule for vertical text (experimental).

What to do:

Choose appropriate OCR model and recognizer matching your target language
Load the model with appropriate hook or module
Pass image
Extract detected text regions with bounding boxes and confidence scores
Process results based on your application needs

Reference: ./references/reference-ocr.md

Model options: ./references/reference-models.md – OCR section

I want to convert speech to text or text to speech

Use useSpeechToText hook or SpeechToTextModule for transcription or use useTextToSpeech hook or TextToSpeechModule for voice synthesis.

What to do:

For Speech-to-Text: Capture or load audio, ensure 16kHz sample rate, transcribe
For Text-to-Speech: Prepare text, specify voice parameters, generate audio waveform, play using audio context

Reference: ./references/reference-audio.md

Model options: ./references/reference-models.md – Speech to Text and Text to Speech sections

I want to find similar images or texts

Use useImageEmbeddings hook or ImageEmbeddingsModule for images or useTextEmbeddings hook or TextEmbeddingsModule for text.

What to do:

Load appropriate embeddings model
Generate embeddings for your content
Compute similarity metrics (cosine similarity, dot product)
Use similarity scores for search, clustering, or deduplication

Reference:

Text: ./references/reference-nlp.md
Images: ./references/reference-cv-2.md

I want to apply artistic filters to photos

Use useStyleTransfer hook or StyleTransferModule to apply predefined artistic styles to images.

What to do:

Choose from available artistic styles (Candy, Mosaic, Udnie, Rain Princess)
Load the style transfer model
Pass image
Retrieve and use the stylized image

Reference: ./references/reference-cv-2.md

Model options: ./references/reference-models.md – Style Transfer section

I want to generate images from text

Use useTextToImage hook or TextToImageModule to create images based on text descriptions.

What to do:

Load the text-to-image model
Provide text description (prompt)
Optionally specify image size and number of generation steps
Receive generated image (may take 20-60 seconds depending on device)

Reference: ./references/reference-cv-2.md

Model options: ./references/reference-models.md – Text to Image section

Understanding Model Loading

Before using any AI model, you need to load it. Models can be loaded from three sources:

1. Bundled with app (assets folder)

Best for small models (< 512MB)
Available immediately without download
Increases app installation size

2. Remote URL (downloaded on first use)

Best for large models (> 512MB)
Downloaded once and cached locally
Keeps app size small
Requires internet on first use

3. Local file system

Maximum flexibility for user-managed models
Requires custom download/file management UI

Model selection strategy:

Small models (< 512MB) â Bundle with app or download from URL
Large models (> 512MB) â Download from URL on first use with progress tracking
Quantized models â Preferred for lower-end devices to save memory

Reference: ./references/reference-models.md – Loading Models section

Device Constraints and Model Selection

Not all models work on all devices. Consider these constraints:

Memory limitations:

Low-end devices: Use smaller models (135M-1.7B parameters) and quantized variants
High-end devices: Can run larger models (3B-4B parameters)

Processing power:

Lower-end devices: Expect longer inference times
Audio processing requires specific sample rates (16kHz for STT and VAD, 24kHz for TTS output)

Storage:

Large models require significant disk space
Implement cleanup mechanisms to remove unused models
Monitor total downloaded model size

Guidance:

Always check model memory requirements before recommending models
Prefer quantized model variants on lower-end devices
Show download progress for models > 512MB
Test on target devices before release

Reference: ./references/reference-models.md

Important Technical Requirements

Audio Processing

Audio must be in correct sample rate for processing:

Speech-to-Text or VAD input: 16kHz sample rate
Text-to-Speech output: 24kHz sample rate
Always decode/resample audio to correct rate before processing

Reference: ./references/reference-audio.md

Image Processing

Images can be provided as:

Remote URLs (http/https) – automatically cached
Local file URIs (file://)
Base64-encoded strings

Image preprocessing (resizing, normalization) is handled automatically by most hooks.

Reference: ./references/reference-cv.md and ./references/reference-cv-2.md

Text Tokens

Text embeddings and LLMs have maximum token limits. Text exceeding these limits will be truncated. Use useTokenizer to count tokens before processing.

Reference: ./references/reference-nlp.md

Core Utilities and Error Handling

The library provides core utilities for managing models and handling errors:

ResourceFetcher: Manage model downloads with pause/resume capabilities, storage cleanup, and progress tracking.

Error Handling: Use RnExecutorchError and error codes for robust error handling and user feedback.

useExecutorchModule: Low-level API for custom models not covered by dedicated hooks.

Reference: ./references/core-utilities.md

Common Troubleshooting

Model not loading: Check model source URL/path validity and sufficient device storage

Out of memory errors: Switch to smaller model or quantized variant

Poor LLM quality: Adjust temperature/top-p parameters or improve system prompt

Audio issues: Verify correct sample rate (16kHz for STT and VAD, 24kHz output for TTS)

Download failures: Implement retry logic and check network connectivity

Reference: ./references/core-utilities.md for error handling details, or specific reference file for your use case

Quick Reference by Hook

Hook	Purpose	Reference
`useLLM`	Text generation, chat, function calling	reference-llms.md
`useClassification`	Image categorization	reference-cv.md
`useObjectDetection`	Object localization	reference-cv.md
`useImageSegmentation`	Pixel-level classification	reference-cv.md
`useStyleTransfer`	Artistic image filters	reference-cv-2.md
`useTextToImage`	Image generation	reference-cv-2.md
`useImageEmbeddings`	Image similarity/search	reference-cv-2.md
`useOCR`	Text recognition (horizontal)	reference-ocr.md
`useVerticalOCR`	Text recognition (vertical, experimental)	reference-ocr.md
`useSpeechToText`	Audio transcription	reference-audio.md
`useTextToSpeech`	Voice synthesis	reference-audio.md
`useVAD`	Voice activity detection	reference-audio.md
`useTextEmbeddings`	Text similarity/search	reference-nlp.md
`useTokenizer`	Text to tokens conversion	reference-nlp.md
`useExecutorchModule`	Custom model inference (advanced)	core-utilities.md