March 20, 2026

Kitten TTS: Voice AI Under 25MB

Text-to-speech has historically been a heavy technology. Good models required GPU inference, gigabytes of storage, and complex setup. That made TTS impractical for many indie projects — adding voice features meant either paying for expensive APIs or dealing with significant infrastructure overhead.

Kitten TTS changes that equation entirely. It is a new open-source text-to-speech library that delivers high-quality voice synthesis in models as small as 25MB. No GPU required. Just pip install and generate speech on any CPU.

What Makes Kitten TTS Special

The project, developed by KittenML and released under the Apache 2.0 license, offers three model sizes:

ModelParametersSize
kitten-tts-mini80M80 MB
kitten-tts-micro40M41 MB
kitten-tts-nano15M56 MB
kitten-tts-nano (int8)15M25 MB

The int8 quantized nano model weighs in at just 25MB. That is smaller than many images on a webpage. You could bundle it inside a mobile app, a browser extension, or even run it on a Raspberry Pi.

Eight built-in voices come with the library: Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo. Each voice has a distinct character, giving you variety without needing to train custom models.

Technical Details

Kitten TTS is built on ONNX runtime, which is why it runs efficiently on CPU. ONNX provides hardware-agnostic inference, meaning the same model works across Windows, macOS, Linux, and potentially mobile platforms.

Key features:

Getting Started

Installation is straightforward. Install via pip from the GitHub release:

pip install https://github.com/KittenML/KittenTTS/releases/download/0.8.1/kittentts-0.8.1-py3-none-any.whl

Then generate speech with minimal code:

from kittentts import KittenTTS model = KittenTTS("KittenML/kitten-tts-mini-0.8") audio = model.generate("This runs without a GPU.", voice="Jasper") # Save to file model.generate_to_file("Hello world", "output.wav", voice="Luna")

The models download automatically from Hugging Face Hub on first use. Subsequent calls use the cached model.

Use Cases for Indie Builders

For indie hackers and small teams, Kitten TTS opens up possibilities that were previously out of reach:

The zero marginal cost per generation is the killer feature. Unlike cloud TTS services that charge by the character or minute, Kitten TTS lets you generate unlimited speech on your own hardware.

Current Limitations

It is worth noting that Kitten TTS is in developer preview. The APIs may change between releases, and the project notes some users have reported issues with the int8 quantized model.

Additionally, the current models are English-only. Multilingual support is on the roadmap, along with a mobile SDK and higher-quality models.

The Bottom Line

Kitten TTS represents a significant milestone for voice AI. It proves that high-quality text-to-speech does not require massive models or expensive infrastructure. You can run it anywhere, on anything, for free.

For indie hackers building products with voice features, this removes one of the biggest barriers to entry. No API keys to manage. No per-use fees to worry about. Just download a model and start generating.

Try the live demo on Hugging Face Spaces to hear the quality for yourself.


Building with AI?

Check out my tutorials on local models, AI automation, and tools for indie hackers.

Read More Posts