Daily

This guide demonstrates how to build a real-time voice agent using Pipecat, Daily’s open-source framework for building voice agents. Rime provides natural-sounding speech synthesis. You can mix and match different services for each component of your Pipecat pipeline. This tutorial uses:

silero for voice activity detection (VAD)
gpt-4o-transcribe for speech-to-text (STT)
gpt-4o-mini for generating responses
rime for text-to-speech (TTS)

The result is a working voice agent that runs locally and opens in your browser.

Demo of a voice agent conversation using Pipecat and Rime

The guide uses the following Pipecat terminology:

A pipeline is a sequence of frame processors. Audio frames flow in, are transcribed, processed by the LLM, and synthesized into speech, then flow back out.
A transport handles real-time audio input and output (I/O). Pipecat supports multiple transports, including WebRTC (browser), WebSocket, and local audio devices.
Frame processors are the building blocks. Each service (STT, LLM, and TTS, respectively) is a processor that transforms frames as they flow through the pipeline.

If you’d like to experiment directly with Rime’s TTS API before building a full voice agent, check out: TTS in five minutes.

Step 1: Prerequisites

Gather the following API keys and tools before starting:

1.1 A Rime API key

Sign up for a Rime account and copy your API key from the API Tokens page. This enables access to the Rime API for generating TTS.

1.2 An OpenAI API key

Create an OpenAI account and generate an API key from the API keys page. This key enables STT transcription and LLM responses.

1.3 Python

Install Python 3.10 or later. Verify your installation by running the following command in your terminal:

python --version

Step 2: Project setup

Set up your project folder, environment variables, and dependencies.

2.1 Create the project folder

Create a new folder for your project and navigate into it:

mkdir rime-pipecat-agent
cd rime-pipecat-agent

2.2 Set up environment variables

In the new directory, create a file called .env and add the keys that you gathered in Step 1:

RIME_API_KEY=your_rime_api_key
OPENAI_API_KEY=your_openai_api_key

Replace the placeholder values with your actual API keys.

2.3 Configure dependencies

Install the uv package manager:

curl -LsSf https://astral.sh/uv/install.sh | sh

Create a pyproject.toml file and add the following dependencies to it:

[project]
name = "rime-pipecat-agent"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
    "python-dotenv>=1.1.1",
    "pipecat-ai[openai,rime,silero,webrtc,runner]>=0.0.100",
    "pipecat-ai-small-webrtc-prebuilt>=2.0.4",
]

Pipecat uses a plugin system where each service integration is a separate package. In this code, the extras in brackets ([openai,rime,silero,webrtc,runner]) install the following plugins:

openai adds STT and LLM services for transcription and generating responses.
rime adds a TTS service for synthesizing speech.
silero adds VAD for detecting when the user starts and stops speaking.
webrtc provides the transport for browser-based audio via WebRTC.
runner adds a development runner that handles server setup and WebRTC connections.

The pipecat-ai-small-webrtc-prebuilt package provides a ready-to-use browser client that connects to your agent. Then, install the dependencies by running this command:

uv sync

Step 3: Create the agent

Create an agent.py file to contain all the code that gets your agent talking. If you’re in a rush and just want to run it, skip to Step 3.5: Full agent code. Otherwise, continue reading to code the agent step-by-step.

3.1 Load environment variables and configure imports

Add the following imports and initialization code to agent.py:

import os
from dotenv import load_dotenv

from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.frames.frames import LLMMessagesAppendFrame
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.rime.tts import RimeNonJsonTTSService
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport
from pipecat.runner.run import main
from pipecat.runner.types import SmallWebRTCRunnerArguments

load_dotenv()

Each import corresponds to a frame processor or utility:

Pipeline chains processors together in sequence.
PipelineRunner manages the event loop and runs the pipeline.
LLMMessagesAppendFrame triggers the LLM to respond when queued.
Services like OpenAISTTService, OpenAILLMService, and RimeNonJsonTTSService are the frame processors that do the actual work.
OpenAILLMContext maintains conversation history across turns.
SileroVADAnalyzer detects speech boundaries so the agent knows when you’ve finished talking.
SmallWebRTCTransport handles peer-to-peer WebRTC connections for browser-based audio.
SmallWebRTCRunnerArguments provides connection details when a user connects to the agent.

3.2 Define the system prompt

Add the following configuration below the imports:

SYSTEM_PROMPT = """You are a helpful voice assistant.
Keep your responses short and conversational - no more than 2-3 sentences.
Be friendly and natural."""

This system prompt defines your agent’s personality. It can be as simple or complex as you like. Later in the guide, you’ll see an example of a detailed system prompt that fully customizes the agent’s behavior.

3.3 Code the conversation pipeline

Add the following bot function to agent.py:

async def bot(runner_args: SmallWebRTCRunnerArguments):

The Pipecat runner automatically discovers any function named bot in your module. When a user connects via WebRTC, the runner calls this function and passes connection details through runner_args. Inside the bot function, add the WebRTC transport configuration:

    transport = SmallWebRTCTransport(
        runner_args.webrtc_connection,
        TransportParams(
            audio_in_enabled=True,
            audio_out_enabled=True,
            vad_analyzer=SileroVADAnalyzer(),
        ),
    )

This creates the WebRTC transport and enables audio I/O as well as Silero VAD for detecting when the user starts and stops speaking. Next, add the AI services for transcription, response generation, and speech synthesis:

    stt = OpenAISTTService(
        api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-4o-transcribe",
    )

    llm = OpenAILLMService(
        api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-4o-mini",
    )

    tts = RimeNonJsonTTSService(
        api_key=os.getenv("RIME_API_KEY"),
        voice_id="atrium",
        model="arcana",
    )

These configure OpenAI for STT and LLM responses, and the Rime arcana model for TTS. Add the conversation context:

    context = OpenAILLMContext(
        messages=[{"role": "system", "content": SYSTEM_PROMPT}]
    )
    context_aggregator = llm.create_context_aggregator(context)

This maintains the conversation history, so the LLM can reference previous messages. Add the pipeline that connects all the components:

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ])

Frames flow through the processors in order.

Audio in: Raw microphone input from the user
Transcription: Converting speech to text via the STT provider
User context: Aggregating the user’s message into the conversation history
LLM response: Generating a reply based on the conversation so far
Speech synthesis: Converting the LLM’s text response to audio via Rime TTS
Audio out: Streaming the synthesized speech back to the user
Assistant context: Recording the assistant’s response in the conversation history

The context aggregator appears twice to capture both sides of the conversation. Finally, add the task runner and an event handler for greeting the user:

    task = PipelineTask(
        pipeline,
        params=PipelineParams(enable_metrics=True),
    )

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        await task.queue_frames([LLMMessagesAppendFrame(
            messages=[{"role": "system", "content": "Say hello and introduce yourself."}],
            run_llm=True
        )])

    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
    await runner.run(task)

The on_client_connected event fires when a user connects to the agent. It appends a system message prompting the LLM to greet the user and triggers an immediate response with run_llm=True.

3.4 Create the main entrypoint

Add the following code at the bottom of agent.py:

if __name__ == "__main__":
    main()

Pipecat’s main helper from pipecat.runner.run automatically:

Discovers the bot function in your module
Starts a FastAPI server with WebRTC endpoints
Serves a prebuilt browser client at /client
Sets up the WebRTC connection and passes the connection to your bot function

When you run the agent, Pipecat starts a local HTTP server. Open the browser client to connect via WebRTC. The server runs locally, but the agent makes API calls to OpenAI and Rime.

3.5 Full agent code

At this point, your agent.py file should look like the complete example below:

Full agent code

import os
from dotenv import load_dotenv

from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineParams, PipelineTask
from pipecat.frames.frames import LLMMessagesAppendFrame
from pipecat.services.openai.stt import OpenAISTTService
from pipecat.services.openai.llm import OpenAILLMService
from pipecat.services.rime.tts import RimeNonJsonTTSService
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport
from pipecat.runner.run import main
from pipecat.runner.types import SmallWebRTCRunnerArguments

load_dotenv()

SYSTEM_PROMPT = """You are a helpful voice assistant.
Keep your responses short and conversational - no more than 2-3 sentences.
Be friendly and natural."""


async def bot(runner_args: SmallWebRTCRunnerArguments):
    transport = SmallWebRTCTransport(
        runner_args.webrtc_connection,
        TransportParams(
            audio_in_enabled=True,
            audio_out_enabled=True,
            vad_analyzer=SileroVADAnalyzer(),
        ),
    )

    stt = OpenAISTTService(
        api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-4o-transcribe",
    )

    llm = OpenAILLMService(
        api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-4o-mini",
    )

    tts = RimeNonJsonTTSService(
        api_key=os.getenv("RIME_API_KEY"),
        voice_id="atrium",
        model="arcana",
    )

    context = OpenAILLMContext(
        messages=[{"role": "system", "content": SYSTEM_PROMPT}]
    )
    context_aggregator = llm.create_context_aggregator(context)

    pipeline = Pipeline([
        transport.input(),
        stt,
        context_aggregator.user(),
        llm,
        tts,
        transport.output(),
        context_aggregator.assistant(),
    ])

    task = PipelineTask(
        pipeline,
        params=PipelineParams(enable_metrics=True),
    )

    @transport.event_handler("on_client_connected")
    async def on_client_connected(transport, client):
        await task.queue_frames([LLMMessagesAppendFrame(
            messages=[{"role": "system", "content": "Say hello and introduce yourself."}],
            run_llm=True
        )])

    runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
    await runner.run(task)


if __name__ == "__main__":
    main()

Step 4: Test your agent

The full pipeline is now ready for you to test. You can run the agent from the terminal using uv and interact with it in your browser.

4.1 Start the agent

Run the following command to start your agent:

uv run agent.py

You’ll see output indicating the server is starting.

4.2 Connect to your agent

Open a browser and navigate to http://localhost:7860/client. Allow microphone access when prompted. You can now talk to your agent using your microphone.

Step 5: Customize your agent

Now that your agent is running, you can experiment with different voices and personalities.

5.1 Change the voice

Update the tts initialization in your bot function to try a different voice:

tts = RimeNonJsonTTSService(
    api_key=os.getenv("RIME_API_KEY"),
    voice_id="celest",
    model="arcana",
)

Rime offers many voices with different personalities. See the full list on the Voices page.

5.2 Fine-tune agent personalities

Create a new file called personality.py with the following content:

Example personality file

SYSTEM_PROMPT = """
CHARACTER:
You are Detective Marlowe, a world-weary noir detective from the 1940s who
somehow ended up as an AI assistant. You treat every question like it's a
case to be cracked and speak in dramatic, hard-boiled metaphors.

PERSONALITY:
- Cynical but secretly caring underneath the tough exterior
- Treats mundane tasks like high-stakes mysteries
- References your "years on the force" and "cases that still haunt you"
- Suspicious of technology but grudgingly impressed by it
- Has strong opinions about coffee and rain

SPEECH STYLE:
- Keep responses to 2-3 sentences maximum
- Use noir metaphors like "this code is messier than a speakeasy on a Saturday night"
- Dramatic pauses with "..." for effect
- Call the user "kid" or "pal" occasionally
- End with ominous or philosophical observations

RESTRICTIONS:
- Never break character
- Don't use emojis or special characters
- Stay family-friendly despite the noir tone
"""

INTRO_MESSAGE = "The name's Marlowe... I've seen things that would make your code freeze, pal. So what case are you bringing to my desk tonight?"

Update your agent.py to import and use this prompt:

from personality import SYSTEM_PROMPT, INTRO_MESSAGE

Then update the on_client_connected handler to use your custom intro message:

@transport.event_handler("on_client_connected")
async def on_client_connected(transport, client):
    await task.queue_frames([LLMMessagesAppendFrame(
        messages=[{"role": "system", "content": f"Say: {INTRO_MESSAGE}"}],
        run_llm=True
    )])

Storing your system prompt in a separate file keeps your personality configuration separate from your agent logic, making it easy to experiment with different characters.

Next steps

Pipecat’s modular design makes it easy to swap components. Experiment with your agent by:

Replacing OpenAI with another STT provider, such as Deepgram or AssemblyAI
Using a different LLM, such as Anthropic, Gemini, or a local model
Switching transports to use WebSocket for server-to-server or Daily’s hosted rooms for production deployments

To learn more about the Pipecat framework, including its transport options, deployment patterns, and advanced features, browse the Pipecat documentation. View Rime’s Pipecat demo agents for a ready-to-use multilingual agent example that switches languages dynamically.

Troubleshooting

If you encounter problems while following this guide, consult the quick fixes below.

No audio output and other TTS errors

Check your TTS service class: The arcana model requires RimeNonJsonTTSService. If you see WebSocket HTTP 400 errors in the logs, you may be using RimeTTSService (which is only compatible with models like mistv2).
Verify your Rime API key: Ensure the key is valid and has TTS permissions.

The agent doesn’t respond to speech

Check microphone permissions: Ensure you’ve enabled microphone access in your browser.
Verify VAD is working: Look for logs indicating speech detection. If the logs are missing, check your Silero installation.
Test audio input: Use a different microphone or headset.

”API key not set” errors

Check environment variables: Ensure you set all keys correctly (with no extra spaces) in .env.
Verify the .env file location: The file should be in the same directory as agent.py.

Audio quality issues

Check your microphone: Test using a different input device or headset.
Reduce background noise: The VAD may struggle to detect speech in noisy environments.

Introduction

Getting started

Documentation

Step 1: Prerequisites

1.1 A Rime API key

1.2 An OpenAI API key

1.3 Python

Step 2: Project setup

2.1 Create the project folder

2.2 Set up environment variables

2.3 Configure dependencies

Step 3: Create the agent

3.1 Load environment variables and configure imports

3.2 Define the system prompt

3.3 Code the conversation pipeline

3.4 Create the main entrypoint

3.5 Full agent code

Step 4: Test your agent

4.1 Start the agent

4.2 Connect to your agent

Step 5: Customize your agent

5.1 Change the voice

5.2 Fine-tune agent personalities

Next steps

Troubleshooting

No audio output and other TTS errors

The agent doesn’t respond to speech

”API key not set” errors

Audio quality issues

Introduction

Getting started

Documentation

Documentation Index

​Step 1: Prerequisites

​1.1 A Rime API key

​1.2 An OpenAI API key

​1.3 Python

​Step 2: Project setup

​2.1 Create the project folder

​2.2 Set up environment variables

​2.3 Configure dependencies

​Step 3: Create the agent

​3.1 Load environment variables and configure imports

​3.2 Define the system prompt

​3.3 Code the conversation pipeline

​3.4 Create the main entrypoint

​3.5 Full agent code

​Step 4: Test your agent

​4.1 Start the agent

​4.2 Connect to your agent

​Step 5: Customize your agent

​5.1 Change the voice

​5.2 Fine-tune agent personalities

​Next steps

​Troubleshooting

​No audio output and other TTS errors

​The agent doesn’t respond to speech

​”API key not set” errors

​Audio quality issues

Step 1: Prerequisites

1.1 A Rime API key

1.2 An OpenAI API key

1.3 Python

Step 2: Project setup

2.1 Create the project folder

2.2 Set up environment variables

2.3 Configure dependencies

Step 3: Create the agent

3.1 Load environment variables and configure imports

3.2 Define the system prompt

3.3 Code the conversation pipeline

3.4 Create the main entrypoint

3.5 Full agent code

Step 4: Test your agent

4.1 Start the agent

4.2 Connect to your agent

Step 5: Customize your agent

5.1 Change the voice

5.2 Fine-tune agent personalities

Next steps

Troubleshooting

No audio output and other TTS errors

The agent doesn’t respond to speech

”API key not set” errors

Audio quality issues