Models are constantly being trained and fine-tuned based on user and customer feedback. Please check back often, as we push changes frequently. Rime currently has five models in production:Documentation Index
Fetch the complete documentation index at: https://docs.rime.ai/llms.txt
Use this file to discover all available pages before exploring further.
coda, arcanav3, arcanav2, mistv3, and mistv2. All are available via the cloud API and on-premises.
Which model should I use?
| Pick this | When |
|---|---|
coda | Default for most new apps. Rime’s flagship — top-rated voice quality in human evaluations, sub-100ms model latency, and the recommended replacement for all Arcana traffic. |
arcana | You need multilingual code-switching across languages that Coda doesn’t yet support. Otherwise, prefer Coda. |
mistv3 | You need the fastest TTFA (~37 ms P50). |
mistv2 | You need custom pronunciation control for brand names and uncommon words (not yet on Mist v3). For other use cases, prefer Mist v3. |
Feature matrix
| Attribute | Coda | Arcana | Mist |
|---|---|---|---|
| Number of voices | 184 | 94 | 94 |
| Multilingual | ✅ | ✅ | ❌ |
| Text normalization | ✅ | ✅ | ✅ |
spell() function | ✅ | ✅ | ✅ |
| Speed adjustment | ✅ | ✅ | ✅ |
| Custom pronunciation (Speech QA) | ❌ | ❌ | ✅ |
| Custom pauses | ❌ | ❌ | ✅ |
Coda
Coda, released May 2026, is Rime’s new flagship TTS model and the successor to Arcana. It pairs a sophisticated LLM backbone with a dedicated speech inference engine, trained on the conversational full-duplex data preferred by production voice AI deployments.- In human-led voice-quality evaluations, Coda surpasses both prior Rime models and competitor TTS offerings — including naturalness, prosody, and artifact-free output
- Sub-100ms model latency on the GPU engine when self-hosted or on-prem.
- Cloud API users add roughly 25–50ms network round-trip from most of the continental US when routed to the closest regional endpoint.
- Multilingual support for English, Spanish, French, Portuguese, German, and Japanese using a shared voice lineup
- Word-level timestamps for text-audio alignment and interruption handling
- Supports the
spell()function for spelling out sequences letter by letter or number by number - Available via
modelId: codathrough Rime’s API endpoints
Arcana
Arcana, released April 2025, is Rime’s previous flagship TTS model — known for naturalness and emotional depth in synthesized speech.- Highly expressive, natural-sounding speech with emotional nuance
- Fine-grained control over prosody, pacing, and tone
- Supports a wide range of vocal demographics, including different ages, accents, and cultural backgrounds
- Enhanced realism for dynamic, conversational, and character-driven use cases
- Supports the
spell()function for spelling out sequences letter by letter or number by number - Available via
modelId: arcanathrough Rime’s API endpoints
Mist v3
Mist v3, released March 2026, is a major update to the engine powering our classic Mist model.- Typical TTFB is now well below 100ms — a significant performance improvement over previous versions, achieved without sacrificing the quality and predictability of Mist
modelId: mistv3- Our most popular Mist speakers are all available — see the full voice list
speedAlphabehavior is reversed compared to Mist and Mist v2, bringing it to parity with Arcana: higher values produce faster speech
Mist v2
Mist v2, released February 2025, has the following features:- Multi-lingual English + Spanish, plus more languages coming soon
- More realistic speech with natural and contextual nuances
- Advanced pronunciation control
- Ultra-fast on-prem latency of ~70ms, perfect for real-time applications
- More accents, demographics, and speaking styles
Mist (legacy)
Mist is Rime’s next-generation TTS engine, released April 2023, capable of synthesizing conversational speech. Using themodelId parameter for Rime’s TTS endpoints, specifying mistv2 or mist, will allow you to synthesize speech using this newer family of models. As of February 2025, the default value for modelId when unspecified is mist.
Model v1 was released in April 2022 and has been deprecated.
Additional controls (Arcana only)
The controls in this section apply to Arcana only. Coda, Mist v3, and Mist v2 do not expose
temperature, top_p, or repetition_penalty.temperature: Controls the randomness of the generated speech.- Low (0): Produces more predictable and focused speech.
- High (1+): Introduces variability in prosody and expression, potentially leading to more dynamic speech patterns.
repetition_penalty: Discourages the model from repeating the same sounds.- Low (
<1): May result in repetitive speech patterns. - High (
>1): Encourages variation, leading to more natural-sounding speech and realistic laughter.
- Low (
top_p: Determines the diversity of choices by limiting the selection to a subset of probable sounds.- Low (0): Restricts the model to the most probable sounds, resulting in more monotonic speech.
- High (1): Allows for a broader range of sound choices, enhancing the naturalness and variability of speech.

