Text normalization

When you send text to Rime’s TTS models, a normalization layer runs first. It expands numbers, dates, currency, phone numbers, measurements, and other non-standard words into their spoken form before the model synthesizes audio.

Rime handles text normalization automatically. Most common formats — currency with symbols, dates with years, clock times, phone numbers, and standard measurements — expand correctly without any preprocessing. Just write naturally. If something sounds wrong, debug it with /textnorm before adding a pre-processing layer to your application.

Handled at a glance

Click any category for the full input/output reference.

Category	Examples	Reference
Numbers, currency, ranges, measurements	`$1,045.96`, `5kg`, `98°F`, `13-50`, `1/2`, `1e6`, `(213) 555-9274`	Numbers, currency, and measurements
Dates and times	`10/12/2024`, `2021-03-15`, `April 2, 2024`, `3:45pm`, `15:45`, `noon`	Dates and times
Addresses, URLs, emails	`529 Main St., Boston, MA 02129`, `https://app.rime.ai`, `name@example.com`	Addresses, URLs, and emails
Abbreviations, acronyms, initialisms	`Dr. Smith`, `e.g.`, `NASA`, `d. n. a.`	Abbreviations, acronyms, and initialisms
Symbols and percentages	`&`, `$`, `%`, `100%`	Symbols and percentages
Punctuation and prosody	`,`, `.`, `?`, `...`	Punctuation

Forced letter-by-letter reading

For account numbers, confirmation codes, SKUs, and acronyms the normalizer doesn’t recognize, wrap the string in spell(...) to force letter-by-letter pronunciation. Works on Coda, Arcana, and Mist.

Input:  Your confirmation code is spell(PRM423GDDML2354).
Output: Your confirmation code is P R M, 4 2 3, G D D, M L, 2 3, 5 4.

For full reference, see Spell function.

Brand names, product names, and uncommon words

Rime’s models may not nail uncommon brand or product names on the first try. Two options:

Submit the word to Rime to add to the dictionary (typically about a week). Reach out to your account manager via Slack or email, or contact sales@rime.ai — mention if you need faster turnaround or have an SLA.
Use custom pronunciations inline with the Rime phonetic alphabet and phonemizeBetweenBrackets: true. See Custom pronunciation for the full reference.

phonemizeBetweenBrackets works on Mist v1 and v2 only. It is not yet supported on Mist v3, Coda, or Arcana. For brand or product name pronunciations on those models, submit the word to Rime to add to the dictionary, respell phonetically in plain English (accepting that this is approximate), or use Mist v1/v2 for flows where pronunciation control matters.

For full reference, see Custom pronunciation. To check whether a word is already in Rime’s dictionary, use the Coverage API.

Feature availability across models

Feature	Coda	Arcana	Mist
Native text normalization (numbers, currency, dates, etc.)	✅	✅	✅
`spell()` for forced letter-by-letter	✅	✅	✅
Punctuation-driven prosody	✅	✅	✅
`pauseBetweenBrackets` (custom pause tags like `<750>`)	❌	❌	✅
`phonemizeBetweenBrackets` (inline phonetic strings)	❌	❌	v1, v2 only
Deterministic per-term pronunciation config	❌	❌	✅

Coda and Arcana both have parity with Mist for numbers, currency, and abbreviation expansion. For flows that need precise pause durations (legal disclaimers, regulated read-backs) or guaranteed pronunciation of brand and product names, Mist is the safer choice.

Debugging with the textnorm endpoint

The fastest way to spot-check normalization without writing code is the Generate page in the Rime web app: paste any input string and it shows you the normalized version Rime will synthesize from. For programmatic debugging, Rime also exposes a /textnorm endpoint that returns the normalized form of an input string — exactly what the TTS model receives before synthesis. It’s the fastest way to separate normalization issues from synthesis issues in scripts and pipelines.

curl -X POST https://optimize.rime.ai/textnorm \
    -H "Authorization: Bearer $(rime key)" \
    -H "Content-Type: application/json" \
    -d '{"text":"1234 1,2,3,4 1-800-444-4141 "}'

{"normalized":"one two three four, one , two , three , four, one, eight hundred, four four four, four one four one"}

This endpoint covers Rime’s English text normalization. Output is the same regardless of which model you’ll synthesize with. For the full request and response reference, see the Text Normalization API.

Triage workflow

When something sounds off:

Capture the exact input text that produced the bad output.
POST it to /textnorm and look at the normalized output.
Compare the normalized output to what you expected the model to say.
If normalization is wrong, you have a reproducible signal — flag it to Rime with the input, expected normalization, and actual normalization. Fixes ship on Rime’s side.
If normalization looks correct but speech still sounds off, the issue is in synthesis, not normalization — try a different voice, model version, or sampling settings.

Testing checklist

Before going to production, test the voice against realistic versions of:

Every date format your backend can produce (MM/DD, MM/DD/YYYY, ISO, relative like “tomorrow”).
Every currency you’ll quote, including round numbers and fractional cents.
The longest realistic phone number, account number, and confirmation code.
Your top 20 most-spoken product or brand names.
At least one utterance from each of: quote, greeting, confirmation, error, payment, scheduling.
The same utterance regenerated 5 to 10 times. Consistency across regenerations is what catches sampling variance.

For anything that sounds wrong, POST the exact input to /textnorm to see what the model actually received, then flag it to Rime so the fix lands for everyone.

Pre-normalizing in your application

For most applications, pre-normalizing is unnecessary and adds latency and engineering complexity. Rime’s normalizer handles common patterns natively, and the fastest way to fix a pronunciation issue is to verify with /textnorm and flag any miss to Rime. If you have a specific pattern that genuinely needs to be pre-expanded — for example, an alphanumeric ID format unique to your domain, or a flow where regenerated text must read identically every time — see Pre-normalizing text for guidance and a drop-in prompt template.

​Handled at a glance

​Forced letter-by-letter reading

​Brand names, product names, and uncommon words

​Feature availability across models

​Debugging with the textnorm endpoint

​Triage workflow

​Testing checklist

​Pre-normalizing in your application

Handled at a glance

Forced letter-by-letter reading

Brand names, product names, and uncommon words

Feature availability across models

Debugging with the textnorm endpoint

Triage workflow

Testing checklist

Pre-normalizing in your application