Registry

28 models. Ready to run.

On-device, cross-platform. Browse the catalog, copy the integration snippet, ship.

Showing 28 of 28

vision-language gguf

Bonsai 27B

PrismML Bonsai 27B 1-bit vision-language model for local reasoning.

Prismml · 27.3B params · 3.5 GB

Text Generation gguf

Gemma 3 1B

Gemma 3 1B - Google's mobile-optimized instruction-tuned LLM with 32K context

Google · 1B params · 749.4 MB

Text Generation gguf

Gemma 4 E2b

Gemma 4 E2B - Google's compact multimodal LLM with 2.3B effective params, 128K context, and audio/image/video understanding

Google · 5.1B params · —

Text Generation gguf

Gemma 4 E4b

Gemma 4 E4B - Google's mid-range multimodal LLM with 4.5B effective params, 128K context, and audio/image/video understanding

Google · 8B params · —

Text Generation gguf

Gemma3npc 1B

Gemma3NPC 1B - Mobile-friendly NPC roleplay fine-tune of Gemma 3 1B for in-character game dialogue

Google · 1B params · 806.0 MB

Text Generation gguf

Gemma3npc It

Gemma3NPC-it - NPC roleplay fine-tune of Gemma3n-E4B for in-character game dialogue

Google · 7B params · 3.8 GB

Text to Speech onnx

Kitten Tts Micro 0.8

KittenTTS Micro 0.8 - Compact StyleTTS 2 (40M params, 8 named voices, OpenPhonemizer)

KittenML · 40M params · 90.2 MB

Text to Speech onnx

Kitten Tts Mini 0.8

KittenTTS Mini 0.8 - High-quality StyleTTS 2 (80M params, 8 named voices, OpenPhonemizer)

KittenML · 80M params · 111.1 MB

Text to Speech onnx

Kitten Tts Nano 0.2

KittenTTS Nano 0.2 - Ultra-lightweight TTS (15M params, <25MB)

KittenML · 15M params · 18.3 MB

Text to Speech onnx

Kitten Tts Nano 0.8

KittenTTS Nano 0.8 - Ultra-lightweight StyleTTS 2 (15M params, 8 named voices, OpenPhonemizer)

KittenML · 15M params · 78.3 MB

Text to Speech onnx

Kokoro 82M

Kokoro 82M - High-quality TTS with 24 voices, Misaki dictionary

Hexgrad · 82M params · 174.9 MB

vision-language gguf

Lfm2 Vl 450M

Liquid LFM2-VL 450M compact vision-language model for edge inference.

Liquid AI · 450M params · 209.1 MB

Text Generation gguf

Lfm2.5 1.2B Instruct

Liquid LFM2.5 1.2B Instruct - hybrid conv+attention LLM optimized for edge deployment (agentic tasks, data extraction, RAG, tool calling)

Liquid AI · 1.2B params · 697.0 MB

Text Generation gguf

Lfm2.5 1.2B Thinking

Liquid LFM2.5 1.2B Thinking - hybrid conv+attention reasoning LLM optimized for edge deployment (fast CPU inference, low memory)

Liquid AI · 1.2B params · 697.0 MB

Text Generation gguf

Lfm2.5 230M

Liquid LFM2.5 230M - smallest hybrid conv+attention LLM optimized for edge deployment (9 languages, tool calling, fast CPU inference)

Liquid AI · 230M params · 146.3 MB

Text Generation gguf

Lfm2.5 350M

Liquid LFM2.5 350M - hybrid conv+attention LLM optimized for edge deployment (9 languages, tool calling)

Liquid AI · 354M params · —

Text Generation gguf

Llama 3.2 1B

Llama 3.2 1B - Meta's lightweight mobile-optimized instruction-tuned LLM with 128K context

Meta · 1B params · 754.3 MB

Text Generation gguf

Ministral 3 3B

Ministral 3 3B Instruct - Mistral AI's edge-optimized instruction-tuned LLM with 256K context

Mistral · 3.4B params · 2.0 GB

Text Generation gguf

Mistral 7B

Mistral 7B Instruct v0.3 - High-quality desktop LLM with function calling support

Mistral · 7B params · 4.0 GB

Text to Speech gguf

Neutts Air Q4

NeuTTS Air (~500M) — codec TTS with voice cloning. Q4 GGUF backbone + NeuCodec decoder. Higher quality than Nano.

Neuphonic · 500M params · 755.0 MB

Text to Speech gguf

Neutts Nano Q4

NeuTTS Nano (120M) — codec TTS with voice cloning. Q4 GGUF backbone + NeuCodec ONNX decoder.

Neuphonic · 120M params · 469.8 MB

Text Generation gguf

Phi4 Mini

Phi-4 Mini 3.8B - Microsoft's compact reasoning LLM with 128K context

Microsoft · 3.8B params · 8.1 GB

Text Generation gguf

Qwen2.5 0.5B Instruct

Qwen 2.5 0.5B Instruct - Small but capable instruction-tuned LLM from Alibaba Cloud

Qwen · 500M params · 455.2 MB

Text Generation gguf

Qwen3.5 0.8B

Qwen 3.5 0.8B - Alibaba Cloud's lightweight multimodal LLM (text-only mode, 201 languages)

Qwen · 800M params · 495.7 MB

Text Generation gguf

Qwen3.5 2B

Qwen 3.5 2B - Alibaba Cloud's compact multimodal LLM (text-only mode, 201 languages)

Qwen · 2B params · 1.2 GB

Text Generation gguf

Smollm2 360M

SmolLM2 360M - HuggingFace's best tiny LLM, excellent quality/size ratio

HuggingFace · 360M params · 254.2 MB

Speech Recognition onnx

Wav2vec2 Base 960H

Wav2Vec2 Base 960h - English ASR with CTC decoding

Meta · 95M params · 220.3 MB

Speech Recognition safetensors

Whisper Tiny

Whisper Tiny - Fast multilingual ASR (Candle/SafeTensors runtime)

OpenAI · 39M params · 84.7 MB