Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.rime.ai/llms.txt

Use this file to discover all available pages before exploring further.

Models are constantly being trained and fine-tuned based on user and customer feedback. Please check back often, as we push changes frequently.
Rime’s API is available in multiple regions. Use the regional endpoint closest to your deployment for the lowest latency.
Rime currently has five models in production: coda, arcanav3, arcanav2, mistv3, and mistv2. All are available via the cloud API and on-premises.

Which model should I use?

Pick thisWhen
codaDefault for most new apps. Rime’s flagship — top-rated voice quality in human evaluations, sub-100ms model latency, and the recommended replacement for all Arcana traffic.
arcanaYou need multilingual code-switching across languages that Coda doesn’t yet support. Otherwise, prefer Coda.
mistv3You need the fastest TTFA (~37 ms P50).
mistv2You need custom pronunciation control for brand names and uncommon words (not yet on Mist v3). For other use cases, prefer Mist v3.
For benchmarked latency and throughput numbers, see Latency.

Feature matrix

AttributeCodaArcanaMist
Number of voices1849494
Multilingual
Text normalization
spell() function
Speed adjustment
Custom pronunciation (Speech QA)
Custom pauses

Coda

Coda, released May 2026, is Rime’s new flagship TTS model and the successor to Arcana. It pairs a sophisticated LLM backbone with a dedicated speech inference engine, trained on the conversational full-duplex data preferred by production voice AI deployments.
  • In human-led voice-quality evaluations, Coda surpasses both prior Rime models and competitor TTS offerings — including naturalness, prosody, and artifact-free output
  • Sub-100ms model latency on the GPU engine when self-hosted or on-prem.
    • Cloud API users add roughly 25–50ms network round-trip from most of the continental US when routed to the closest regional endpoint.
  • Multilingual support for English, Spanish, French, Portuguese, German, and Japanese using a shared voice lineup
  • Word-level timestamps for text-audio alignment and interruption handling
  • Supports the spell() function for spelling out sequences letter by letter or number by number
  • Available via modelId: coda through Rime’s API endpoints

Arcana

Coda is meaningfully better than Arcana across naturalness, prosody, and artifact-free output. We recommend migrating all existing Arcana traffic to Coda — just swap modelId: arcana for modelId: coda in your requests.
Arcana, released April 2025, is Rime’s previous flagship TTS model — known for naturalness and emotional depth in synthesized speech.
  • Highly expressive, natural-sounding speech with emotional nuance
  • Fine-grained control over prosody, pacing, and tone
  • Supports a wide range of vocal demographics, including different ages, accents, and cultural backgrounds
  • Enhanced realism for dynamic, conversational, and character-driven use cases
  • Supports the spell() function for spelling out sequences letter by letter or number by number
  • Available via modelId: arcana through Rime’s API endpoints

Mist v3

Mist v3, released March 2026, is a major update to the engine powering our classic Mist model.
  • Typical TTFB is now well below 100ms — a significant performance improvement over previous versions, achieved without sacrificing the quality and predictability of Mist
  • modelId: mistv3
  • Our most popular Mist speakers are all available — see the full voice list
  • speedAlpha behavior is reversed compared to Mist and Mist v2, bringing it to parity with Arcana: higher values produce faster speech

Mist v2

Mist v2, released February 2025, has the following features:
  • Multi-lingual English + Spanish, plus more languages coming soon
  • More realistic speech with natural and contextual nuances
  • Advanced pronunciation control
  • Ultra-fast on-prem latency of ~70ms, perfect for real-time applications
  • More accents, demographics, and speaking styles

Mist (legacy)

Mist is Rime’s next-generation TTS engine, released April 2023, capable of synthesizing conversational speech. Using the modelId parameter for Rime’s TTS endpoints, specifying mistv2 or mist, will allow you to synthesize speech using this newer family of models. As of February 2025, the default value for modelId when unspecified is mist. Model v1 was released in April 2022 and has been deprecated.

Additional controls (Arcana only)

The controls in this section apply to Arcana only. Coda, Mist v3, and Mist v2 do not expose temperature, top_p, or repetition_penalty.
Arcana also supports several additional controls due to its LLM backbone. We recommend leaving these on the default values.
  • temperature: Controls the randomness of the generated speech.
    • Low (0): Produces more predictable and focused speech.
    • High (1+): Introduces variability in prosody and expression, potentially leading to more dynamic speech patterns.
  • repetition_penalty: Discourages the model from repeating the same sounds.
    • Low (<1): May result in repetitive speech patterns.
    • High (>1): Encourages variation, leading to more natural-sounding speech and realistic laughter.
  • top_p: Determines the diversity of choices by limiting the selection to a subset of probable sounds.
    • Low (0): Restricts the model to the most probable sounds, resulting in more monotonic speech.
    • High (1): Allows for a broader range of sound choices, enhancing the naturalness and variability of speech.
See the Arcana API reference pages for more details.