When you send text to Rime’s TTS models, a normalization layer runs first. It expands numbers, dates, currency, phone numbers, measurements, and other non-standard words into their spoken form before the model synthesizes audio.Documentation Index
Fetch the complete documentation index at: https://docs.rime.ai/llms.txt
Use this file to discover all available pages before exploring further.
Rime handles text normalization automatically. Most common formats — currency with symbols, dates with years, clock times, phone numbers, and standard measurements — expand correctly without any preprocessing. Just write naturally. If something sounds wrong, debug it with
/textnorm before adding a pre-processing layer to your application.What’s handled, at a glance
Click any category for the full input/output reference.| Category | Examples | Reference |
|---|---|---|
| Numbers, currency, ranges, measurements | $1,045.96, 5kg, 98°F, 13-50, 1/2, 1e6, (213) 555-9274 | Numbers, currency, and measurements |
| Dates and times | 10/12/2024, 2021-03-15, April 2, 2024, 3:45pm, 15:45, noon | Dates and times |
| Addresses, URLs, emails | 529 Main St., Boston, MA 02129, https://app.rime.ai, name@example.com | Addresses, URLs, and emails |
| Abbreviations, acronyms, initialisms | Dr. Smith, e.g., NASA, d. n. a. | Abbreviations, acronyms, and initialisms |
| Symbols and percentages | &, $, %, 100% | Symbols and percentages |
| Punctuation and prosody | ,, ., ?, ... | Punctuation |
Forced letter-by-letter reading
For account numbers, confirmation codes, SKUs, and acronyms the normalizer doesn’t recognize, wrap the string inspell(...) to force letter-by-letter pronunciation. Works on both Arcana and Mist.
Brand names, product names, and uncommon words
Rime’s models may not nail uncommon brand or product names on the first try. Two options:- Submit the word to Rime to add to the dictionary (around 24 hours). The fastest way is the Speech QA dashboard, which surfaces out-of-vocabulary words from your traffic and lets you approve or correct them in one place. You can also reach the team directly at support@rime.ai or via your shared Slack channel.
- Use custom pronunciations inline with Rime’s phonetic alphabet and
phonemizeBetweenBrackets: true. See Custom pronunciation for the full reference.
Feature availability: Arcana vs. Mist
The features below apply to the Mist family (v1, v2, v3) unless otherwise noted.| Feature | Arcana | Mist |
|---|---|---|
| Native text normalization (numbers, currency, dates, etc.) | ✅ | ✅ |
spell() for forced letter-by-letter | ✅ | ✅ |
| Punctuation-driven prosody | ✅ | ✅ |
pauseBetweenBrackets (custom pause tags like <750>) | ❌ | ✅ |
phonemizeBetweenBrackets (inline phonetic strings) | ❌ | ✅ |
| Deterministic per-term pronunciation config | ❌ | ✅ |
Debugging with the textnorm endpoint
Rime exposes a/textnorm endpoint that returns the normalized form of an input string — exactly what the TTS model receives before synthesis. It’s the fastest way to separate normalization issues from synthesis issues.
Triage workflow
When something sounds off:- Capture the exact input text that produced the bad output.
- POST it to
/textnormand look at the normalized output. - Compare the normalized output to what you expected the model to say.
- If normalization is wrong, you have a reproducible signal — flag it to Rime with the input, expected normalization, and actual normalization. Fixes ship on Rime’s side.
- If normalization looks correct but speech still sounds off, the issue is in synthesis, not normalization — try a different voice, model version, or sampling settings.
Testing checklist
Before going to production, test the voice against realistic versions of:- Every date format your backend can produce (MM/DD, MM/DD/YYYY, ISO, relative like “tomorrow”).
- Every currency you’ll quote, including round numbers and fractional cents.
- The longest realistic phone number, account number, and confirmation code.
- Your top 20 most-spoken product or brand names.
- At least one utterance from each of: quote, greeting, confirmation, error, payment, scheduling.
- The same utterance regenerated 5 to 10 times. Consistency across regenerations is what catches sampling variance.
/textnorm to see what the model actually received, then flag it to Rime so the fix lands for everyone.
Pre-normalizing in your application
For most applications, pre-normalizing is unnecessary and adds latency and engineering complexity. Rime’s normalizer handles common patterns natively, and the fastest way to fix a pronunciation issue is to verify with/textnorm and flag any miss to Rime.
If you have a specific pattern that genuinely needs to be pre-expanded — for example, an alphanumeric ID format unique to your domain, or a flow where regenerated text must read identically every time — see Pre-normalizing text for guidance and a drop-in prompt template.
