M in MLX is for Magic

For a long time, the desktop app ran our text-to-speech stack on a familiar path: Transformers.js in the JavaScript runtime, loading models and tensors with Hugging Face’s web-oriented tooling. It worked well across platforms and kept the stack portable.

Starting in desktop v0.1.72, macOS can use a different engine: MLX, Apple’s array framework for machine learning on Apple Silicon, wired through a native path so inference stays on the GPU/ANE side of the machine where it belongs.

Transformers.js is still there. On Windows, and whenever MLX is not the right fit, the app continues to use the JS runtime you already know. On Mac, MLX is now an option—and for many voices, the default you will want.

Standing on great shoulders

This direction was heavily inspired by mlx-audio-swift, a modular Swift SDK for running audio models on MLX. Seeing how that project structured TTS, codecs, and model loading on Apple Silicon helped us push our own stack further without inventing the wheel in a vacuum.

Why this matters

Speed. Kokoro was already fast, but going through MLX directly unlocks a roughly 3× to 4× speedup for that path versus the old JS stack. For everyday reading and synchronized text highlighting, that difference is easy to miss. You were never waiting on audio the way you wait on a long export. For anyone turning whole books or long documents into audio files, the gain is not subtle. Shorter wall-clock time per chapter means practical audiobook-sized projects in reach without babysitting the machine all weekend.

Models. A faster runtime is not just about the voices you already had. It unblocks heavier models that would have been painful to run end-to-end in the old loop. In v0.1.72 we added Marvis TTS: a larger, more capable model that sounds more natural and more human than lightweight options, with the tradeoff that it is more expensive to run. On MLX, that tradeoff is manageable. At 1.0× or even 2× playback speed (depending on your device) you can still get real-time or near–real-time generation for actually listening as you go, not only for batch exports. Note that this is only available on macOS.

If you started with us on Kokoro TTS, you already know what a small, Apache-licensed model can do. MLX lets Kokoro stretch even further, and it gives headroom to add voices like Marvis without making “premium quality” feel like “walk away and come back tomorrow.”

Hear this post

Watch on YouTube the WithAudio Web Companion extension read this article in the Ethan voice, which is backed by Marvis TTS in the WithAudio desktop app. It is a straight line from the new MLX-backed engine to what you are hearing in the tab.

Try it

To get started, update to v0.1.72 or later on macOS with Apple Silicon and choose a voice that uses the MLX engine. You'll notice faster exports and smoother Marvis read-aloud sessions on supported devices. For platform details and tips on using different engines or languages, see the language setup guide. On Windows, the experience remains the same with Transformers.js for now.