Models - Rime Docs

Models are constantly being trained and fine-tuned based on user and customer feedback. Please check back often, as we push changes frequently.

Rime’s API is available in multiple regions. Use the regional endpoint closest to your deployment for the lowest latency.

Rime currently has five models in production: coda, arcanav3, arcanav2, mistv3, and mistv2. All are available via the cloud API and on-premises, and all stream audio in real time over HTTP and WebSockets.

Which model should I use?

Pick this	When
`coda`	Default for most new apps. Rime’s flagship — top-rated voice quality in human evaluations, sub-100ms model latency, and the recommended replacement for all Arcana traffic.
`arcana`	You need multilingual code-switching across languages that Coda doesn’t yet support. Otherwise, prefer Coda.
`mistv3`	You need the fastest TTFA (~37 ms P50).
`mistv2`	You need custom pronunciation control for brand names and uncommon words (not yet on Mist v3). For other use cases, prefer Mist v3.

For benchmarked latency and throughput numbers, see Latency.

Feature matrix

Attribute	Coda	Arcana	Mist
Number of voices	184	94	94
Multilingual	✅	✅	❌
Text normalization	✅	✅	✅
`spell()` function	✅	✅	✅
Speed adjustment	✅	✅	✅
Pronunciation control	❌	❌	✅
Custom pauses	❌	❌	✅

Coda

Coda, released May 2026, is Rime’s new flagship TTS model and the successor to Arcana. It pairs a sophisticated LLM backbone with a dedicated speech inference engine, trained on the conversational full-duplex data preferred by production voice AI deployments.

In human-led voice-quality evaluations, Coda surpasses both prior Rime models and competitor TTS offerings — including naturalness, prosody, and artifact-free output
Sub-100ms model latency on the GPU engine when self-hosted or on-prem.
- Cloud API users add roughly 25–50ms network round-trip from most of the continental US when routed to the closest regional endpoint.
Multilingual support for English, Spanish, French, Portuguese, German, and Japanese using a shared voice lineup
Word-level timestamps for text-audio alignment and interruption handling
Supports the spell() function for spelling out sequences letter by letter or number by number
Available via modelId: coda through Rime’s API endpoints

Arcana

Coda is meaningfully better than Arcana across naturalness, prosody, and artifact-free output. We recommend migrating all existing Arcana traffic to Coda — just swap modelId: arcana for modelId: coda in your requests.

Arcana, released April 2025, is Rime’s previous flagship TTS model — known for naturalness and emotional depth in synthesized speech.

Highly expressive, natural-sounding speech with emotional nuance
Fine-grained control over prosody, pacing, and tone
Supports a wide range of vocal demographics, including different ages, accents, and cultural backgrounds
Enhanced realism for dynamic, conversational, and character-driven use cases
Supports the spell() function for spelling out sequences letter by letter or number by number
Available via modelId: arcana through Rime’s API endpoints

Mist v3

Mist v3, released March 2026, is a major update to the engine powering our classic Mist model.

Typical TTFB is now well below 100ms — a significant performance improvement over previous versions, achieved without sacrificing the quality and predictability of Mist
modelId: mistv3
Our most popular Mist speakers are all available — see the full voice list
speedAlpha behavior is reversed compared to Mist and Mist v2, bringing it to parity with Arcana: higher values produce faster speech

Mist v2

Mist v2, released February 2025, has the following features:

Multi-lingual English + Spanish, plus more languages coming soon
More realistic speech with natural and contextual nuances
Advanced pronunciation control
Ultra-fast on-prem latency of ~70ms, perfect for real-time applications
More accents, demographics, and speaking styles

Mist (legacy)

Mist is Rime’s next-generation TTS engine, released April 2023, capable of synthesizing conversational speech. Using the modelId parameter for Rime’s TTS endpoints, specifying mistv2 or mist, will allow you to synthesize speech using this newer family of models. As of February 2025, the default value for modelId when unspecified is mist. Model v1 was released in April 2022 and has been deprecated.

Additional controls (Arcana only)

The controls in this section apply to Arcana only. Coda, Mist v3, and Mist v2 do not expose temperature, top_p, or repetition_penalty.

Arcana also supports several additional controls due to its LLM backbone. We recommend leaving these on the default values.

temperature: Controls the randomness of the generated speech.
- Low (0): Produces more predictable and focused speech.
- High (1+): Introduces variability in prosody and expression, potentially leading to more dynamic speech patterns.
repetition_penalty: Discourages the model from repeating the same sounds.
- Low (<1): May result in repetitive speech patterns.
- High (>1): Encourages variation, leading to more natural-sounding speech and realistic laughter.
top_p: Determines the diversity of choices by limiting the selection to a subset of probable sounds.
- Low (0): Restricts the model to the most probable sounds, resulting in more monotonic speech.
- High (1): Allows for a broader range of sound choices, enhancing the naturalness and variability of speech.

See the Arcana API reference pages for more details.

​Which model should I use?

​Feature matrix

​Coda

​Arcana

​Mist v3

​Mist v2

​Mist (legacy)

​Additional controls (Arcana only)

Which model should I use?

Feature matrix

Coda

Arcana

Mist v3

Mist v2

Mist (legacy)

Additional controls (Arcana only)