March 20, 2026

Kitten TTS: Voice AI Under 25MB

Text-to-speech has historically been a heavy technology. Good models required GPU inference, gigabytes of storage, and complex setup. That made TTS impractical for many indie projects — adding voice features meant either paying for expensive APIs or dealing with significant infrastructure overhead.

Kitten TTS changes that equation entirely. It is a new open-source text-to-speech library that delivers high-quality voice synthesis in models as small as 25MB. No GPU required. Just pip install and generate speech on any CPU.

What Makes Kitten TTS Special

The project, developed by KittenML and released under the Apache 2.0 license, offers three model sizes:

Model	Parameters	Size
kitten-tts-mini	80M	80 MB
kitten-tts-micro	40M	41 MB
kitten-tts-nano	15M	56 MB
kitten-tts-nano (int8)	15M	25 MB

The int8 quantized nano model weighs in at just 25MB. That is smaller than many images on a webpage. You could bundle it inside a mobile app, a browser extension, or even run it on a Raspberry Pi.

Eight built-in voices come with the library: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo. Each voice has a distinct character, giving you variety without needing to train custom models.

Technical Details

Kitten TTS is built on ONNX runtime, which is why it runs efficiently on CPU. ONNX provides hardware-agnostic inference, meaning the same model works across Windows, macOS, Linux, and potentially mobile platforms.

Key features:

24 kHz output — High-quality audio at a standard sample rate
Adjustable speech speed — Control playback rate via the speed parameter
Text preprocessing — Built-in pipeline handles numbers, currencies, and units
Simple API — Generate speech in just a few lines of Python

Getting Started

Installation is straightforward. Install via pip from the GitHub release:

pip install https://github.com/KittenML/KittenTTS/releases/download/0.8.1/kittentts-0.8.1-py3-none-any.whl

Then generate speech with minimal code:

from kittentts import KittenTTS

model = KittenTTS("KittenML/kitten-tts-mini-0.8")
audio = model.generate("This runs without a GPU.", voice="Jasper")

# Save to file
model.generate_to_file("Hello world", "output.wav", voice="Luna")

The models download automatically from Hugging Face Hub on first use. Subsequent calls use the cached model.

Use Cases for Indie Builders

For indie hackers and small teams, Kitten TTS opens up possibilities that were previously out of reach:

Voice assistants — Build voice-enabled apps without API costs
Audiobook generators — Convert content to audio at no per-minute fee
Accessibility features — Add screen reader capabilities to your apps
Game voiceovers — Generate dynamic NPC dialogue
Content creation tools — Add TTS to your video or podcast workflow

The zero marginal cost per generation is the killer feature. Unlike cloud TTS services that charge by the character or minute, Kitten TTS lets you generate unlimited speech on your own hardware.

Current Limitations

It is worth noting that Kitten TTS is in developer preview. The APIs may change between releases, and the project notes some users have reported issues with the int8 quantized model.

Additionally, the current models are English-only. Multilingual support is on the roadmap, along with a mobile SDK and higher-quality models.

The Bottom Line

Kitten TTS represents a significant milestone for voice AI. It proves that high-quality text-to-speech does not require massive models or expensive infrastructure. You can run it anywhere, on anything, for free.

For indie hackers building products with voice features, this removes one of the biggest barriers to entry. No API keys to manage. No per-use fees to worry about. Just download a model and start generating.

Try the live demo on Hugging Face Spaces to hear the quality for yourself.

Building with AI?

Check out my tutorials on local models, AI automation, and tools for indie hackers.