As convenient as having an AI agent read your documents aloud seems, the actual experience of text-to-speech (TTS) is often marred by the stilted cadence of an obviously generated voice. This guide demonstrates how to configure an OpenClaw assistant to read documents aloud in a voice that doesnβt suck to listen to. By adding Rime TTS to OpenClaw, you can convert any text to natural-sounding speech via instant messaging. Simply open a chat with your AI assistant, attach the text as a document or paste it in a message, and the bot returns a voice note in your desired mode of delivery: a verbatim reading, summary, or podcast discussion. Compare how the OpenClaw assistant delivers a podcast-style reading when it uses our custom Rime.ai reading skill and when it uses its built-in TTS:Documentation Index
Fetch the complete documentation index at: https://docs.rime.ai/llms.txt
Use this file to discover all available pages before exploring further.
| Rime TTS | Default TTS |
|---|---|
Prerequisites
To follow this guide, you need:- OpenClaw
- FFmpeg
- Python 3
- A Rime API key
- A Telegram account
Step 1: Create a Telegram bot and connect it to OpenClaw
This guide uses Telegram as the primary interface with OpenClaw, but you could easily adapt it to use your preferred messaging service. First, create a new bot using Telegramβs BotFather:- Open Telegram and search for @BotFather.
- Send
/newbotand follow the prompts to choose a name and username. - BotFather replies with your bot token, which looks like
123456789:ABCdefGHIjklMNOpqrsTUVwxyz.
~/.openclaw/.env:
~/.openclaw/openclaw.json file:
access not configured message with an access code. Copy the access code and run the following command in your terminal to pair the bot with OpenClaw:
Step 2: Add your Rime API key
OpenClaw reads environment variables from the~/.openclaw/.env file. Add your Rime API key to it:
Step 3: Disable OpenClawβs built-in TTS
OpenClaw has a built-in TTS system that the assistant uses by default. We need to disable the built-in TTS so that OpenClaw instead uses the new Rime skill we are adding. Update youropenclaw.json file as follows:
- Turns off auto-TTS so the built-in pipeline doesnβt generate audio automatically
- Disables Edge TTS so it canβt be used as a fallback
- Denies the built-in
ttstool so the LLM canβt call it directly
Step 4: Install the rime-reader skill
Therime-reader skill reads documents aloud in three modes:
- In verbatim mode, it reads the document aloud, word for word, in your chosen voice.
- In summary mode, it summarizes the documentβs content in your chosen voice.
- In podcast mode, two AI hosts, each with a different voice, summarize and discuss the content.
rime-reader skill by cloning it from the following repository into your ~/.openclaw/skills/ directory:
rime.py) that handles all three modes and a SKILL.md that teaches the LLM how to use it.
How rime.py works
The script has three modes, driven by the following command-line arguments:- A file path for document reading
--textfor a single utterance--segmentsfor podcast
Chunking
In verbatim and summary mode,rime.py uses chunking to break long text into sentence-aligned chunks of roughly 400 characters each. This ensures that no single API call is too large.
Synthesis
The script then sends the chunks to the Rime API, which synthesizes them and returns raw audio bytes.Stitching
The script concatenates the bytes from each chunk and generates silences between them, stitching them all into a singlebytearray.
You can specify a voice for each segment in podcast mode:
Encoding
Then,rime.py encodes the bytearray by making an ffmpeg call that converts the raw audio buffer to OGG Opus, the format that Telegram expects:
.ogg path to stdout. The LLM reads this and uses it in a MEDIA: directive with [[audio_as_voice]] to deliver it as a Telegram voice note bubble.
Step 5: Register the skill and configure the personality
Enable the skill in~/.openclaw/openclaw.json:
Personality (SOUL.md)
The~/.openclaw/workspace/SOUL.md file configures OpenClawβs agent personality. The LLM reads the file at the start of every session.
Add the following Document Reading section below to your SOUL.md file. Without it, the bot skips the rime-reader skill and generates audio using whichever TTS model it finds first. Since weβve disabled the default TTS model, it would fail to generate any audio and fall back to replying in text.
Step 6: Test the flow
Restart the gateway, so that you can test the document reading flow:/new to your bot.
Then send a document or paste text in the chat and ask the bot to read it.
The bot should ask you to choose from the three delivery modes: verbatim, summary, or podcast.
Choose the verbatim mode.
Next, it should prompt you to pick a voice.
Once youβve selected a voice, you should receive a voice note of your text.
Tuning
The botβs behavior is driven bySOUL.md, which means you can reshape it. Just edit the file or tell the bot, directly in your Telegram chat, to update it for you.
Consider how you can tweak various aspects of your OpenClaw assistant:
Voice
You can select a default voice for your OpenClaw assistant by editingSOUL.md or sending a Telegram message telling the bot to, βUse Transom next time.β
You can use any of the available Arcana voices: atrium, lyra, transom, parapet, fern, thalassa, truss, sirius, eliphas, lintel, or one of the many others listed on Rimeβs Voices page.
Podcast personality
The LLM writes the podcast script before synthesizing it, so you can steer the tone. Try adding a line such as the following to the Document Reading section of yourSOUL.md:
SOUL.md entirely and just tell the bot to, βMake the podcast hosts argue like an old married couple.β The LLM will adapt the script on the fly.
Skip the prompts
If you always want the same voice and delivery mode, you can hardcode them inSOUL.md to skip the bot prompts.
For example, you could replace the first two steps with the following instruction:
SOUL.md afresh every session, your changes take effect immediately after you send /new in Telegram.
