This guide focuses on Coda, but the same patterns apply to the Arcana and Mist model families, which share Coda’s normalizer and
spell() function.Sound like a person
The goal here isn’t to fool anyone into thinking they’re talking to a human. It’s to make callers feel comfortable enough to talk naturally, which leads to better outcomes than a stiff, overly formal agent. Real speech meanders: fillers, restarts, soft pauses, the occasional “yeah, no.” Coda doesn’t accept SSML: no<break>, no <emotion>, no inline tags except spell(). That’s by design. Rime’s philosophy is to keep things as simple as possible.
Coda reads the semantic content of what you send and shapes its emotional delivery accordingly, so the only levers you need are word choice and punctuation. See Punctuation for the specific cues: exclamation marks and interrobangs for excitement, commas and ellipses for pacing.
Show, don’t tell
“Be conversational” doesn’t work as an instruction. Give the model concrete examples to pattern-match against in your system prompt. These pairs aren’t universal good/bad; the right register depends on call type, persona, and caller. But LLM defaults rarely sound natural, and best-practice patterns produce more realistic audio in most cases. Even on a formal call, people stumble and reach for filler words occasionally. A little of that texture goes a long way.| Typical LLM output | Best practice example |
|---|---|
| ”I can certainly assist you with that inquiry." | "Yeah, I can help with that. One sec." |
| "Unfortunately, I am required to inform you that your request cannot be processed at this time." | "So… I’m not going to be able to do that today. Here’s what I can do instead." |
| "I will now transfer you to the appropriate department for further assistance." | "Okay, one moment. I’m going to grab someone who can take this from here.” |
Disfluencies in the text itself
Have the model use “um,” “uh,” “so,” “yeah,” and “well” where a person would actually hesitate. Don’t reach for tags; the disfluency is the prompt. Sprinkle, don’t stack: two “um”s in a row reads as a bug.Punctuation is your only prosody tool
- Comma. Short internal pause with a slight rise.
- Period. Sentence end, falling pitch.
- Question mark. Rising intonation.
- Ellipsis. Hesitant or trailing pause. Use sparingly.
- Semicolon. Somewhere between a comma and a period.
Personality as audible behavior
Replace adjectives like “friendly” or “warm” with observable speech patterns the model can imitate. “Friendly” is interpretation; “starts sentences with ‘yeah’” is instruction. Maintain a calm, even baseline. Save exclamation marks for moments that actually warrant them. A real support agent isn’t excited about every line.Normalize cleanly
Rime’s normalizer handles most common formats natively: currency with symbols, full dates, clock times with minutes, phone numbers, percentages, and standard measurements. Pre-expand only the gaps below. See Text normalization and Pre-normalization for the full reference. Pass through as-is. Rime handles these natively:/textnorm endpoint before shipping to confirm how Rime will read it.
Use spell() for IDs
When something needs to be read letter-by-letter (confirmation codes, account numbers, SKUs, vanity phone letters), wrap it in spell(). The function groups characters into chunks of three (or two) and handles symbols like @ and -.
spell() for standard phone numbers (digit grouping is more natural without it) or for real words that happen to be uppercase. Avoid dashes inside numeric IDs; they cause awkward pauses. Use spaces or spell() instead.
Drop-in system prompt
A complete system prompt that bakes in all of the above. Paste it into your LLM’s system message and adapt it to your agent’s persona.voice-system-prompt.md

