← Back to blog Announcement

Run AI Models On-Device — Zero Config, Five Minutes

CLI, Rust, Flutter, Swift, Kotlin, Unity — run 25+ ML models on-device with one command. No tensor shapes, no preprocessing scripts.

Glenn Sonna

· June 2, 2026 · 3 min read

on-device-airun-ml-locallyrust-mledge-inferenceopen-source-ai

You already know why on-device AI matters. Privacy, latency, cost. You’ve read the guides.

Now you want to actually do it. Here’s what that looks like with Xybrid — no tensor shapes, no preprocessing scripts, no ML expertise.

Install

cargo install xybrid-cli

Text-to-Speech

xybrid run --model kokoro-82m --input "Hello from the edge" --output hello.wav

That’s it. Xybrid resolved the model from the registry, downloaded it, ran inference, and saved a WAV file. You configured nothing.

Kokoro is an 82M parameter TTS model with 24 voices. First run downloads ~80MB and caches it locally. Subsequent runs are instant.

Speech Recognition

xybrid run --model whisper-tiny --input recording.wav

Whisper Tiny transcribes audio in real-time on any modern laptop. Outputs plain text.

Text Generation

xybrid run --model qwen3.5-0.8b --input "Explain quantum computing in one sentence"

Qwen 3.5 0.8B runs locally via llama.cpp. 201 languages, fits in 500MB quantized.

Browse the Registry

xybrid models list

25+ models, all hosted on HuggingFace, downloaded on-demand, cached locally:

Model	Task	Size	Notes
kokoro-82m	Text-to-Speech	82M	24 voices, high quality
kitten-tts-nano-0.8	Text-to-Speech	15M	Ultra-lightweight
qwen3-tts-0.6b	Text-to-Speech	600M	Multilingual
whisper-tiny	Speech Recognition	39M	Real-time, multilingual
wav2vec2-base-960h	Speech Recognition	95M	CTC-based
lfm2.5-350m	Text Generation	354M	9 languages, edge-optimized
smollm2-360m	Text Generation	360M	Best tiny LLM
qwen3.5-0.8b	Text Generation	800M	201 languages
gemma-4-e2b	Text Generation	5.1B	Multimodal
mistral-7b	Text Generation	7B	Function calling

Beyond the CLI

The CLI is the fastest way to evaluate. When you’re ready to integrate into an app, Xybrid has SDKs for Flutter, Swift, Kotlin, Unity, and Rust — same models, same behavior, every platform.

Xybrid is in beta (v0.1.0-beta9), open-source under Apache 2.0.

GitHub: github.com/xybrid-ai/xybrid

On-Device AI: The Complete Guide — hardware, privacy, cost, and how to get started.
Edge AI vs Cloud AI: When to Run Models On-Device — the decision framework.
Add Text-to-Speech to Your Flutter App in 15 Minutes — hands-on Flutter tutorial.

Jun 26, 2026 · 7 min read

Why We Chose Rust Over Python for ML Inference

Not training — inference. How Rust's zero-cost abstractions, lack of GIL, and FFI story make it the better choice for shipping ML to production devices.

rust-mlrust-vs-pythonml-inference

Mar 23, 2026 · 12 min read

On-Device AI: The Complete Guide to Running ML Models Locally

Everything you need to know about running machine learning models directly on mobile and desktop devices — privacy, latency, cost benefits, and how to get started.

on-device-aiedge-inferencemobile-ml

Jul 3, 2026 · 6 min read

How We Made ONNX Runtime 6.8x Faster on Apple Silicon with CoreML

Real benchmarks showing when Apple's Neural Engine helps (and when it hurts). Lessons from optimizing ML inference across execution providers.

onnx-runtimecoremlapple-silicon-ml