This guide demonstrates how to build a real-time voice agent using Pipecat, Daily’s open-source framework for building voice agents. Rime provides natural-sounding speech synthesis.You can mix and match different services for each component of your Pipecat pipeline. This tutorial uses:
silero for voice activity detection (VAD)
gpt-4o-transcribe for speech-to-text (STT)
gpt-4o-mini for generating responses
rime for text-to-speech (TTS)
The result is a working voice agent that runs locally and opens in your browser.The guide uses the following Pipecat terminology:
A pipeline is a sequence of frame processors. Audio frames flow in, are transcribed, processed by the LLM, and synthesized into speech, then flow back out.
A transport handles real-time audio input and output (I/O). Pipecat supports multiple transports, including WebRTC (browser), WebSocket, and local audio devices.
Frame processors are the building blocks. Each service (STT, LLM, and TTS, respectively) is a processor that transforms frames as they flow through the pipeline.
If you’d like to experiment directly with Rime’s TTS API before building a full voice agent, check out: TTS in five minutes.
Pipecat uses a plugin system where each service integration is a separate package. In this code, the extras in brackets ([openai,rime,silero,webrtc,runner]) install the following plugins:
openai adds STT and LLM services for transcription and generating responses.
rime adds a TTS service for synthesizing speech.
silero adds VAD for detecting when the user starts and stops speaking.
webrtc provides the transport for browser-based audio via WebRTC.
runner adds a development runner that handles server setup and WebRTC connections.
The pipecat-ai-small-webrtc-prebuilt package provides a ready-to-use browser client that connects to your agent.Then, install the dependencies by running this command:
Create an agent.py file to contain all the code that gets your agent talking. If you’re in a rush and just want to run it, skip to Step 3.5: Full agent code. Otherwise, continue reading to code the agent step-by-step.
Add the following configuration below the imports:
Copy
SYSTEM_PROMPT = """You are a helpful voice assistant.Keep your responses short and conversational - no more than 2-3 sentences.Be friendly and natural."""
This system prompt defines your agent’s personality. It can be as simple or complex as you like. Later in the guide, you’ll see an example of a detailed system prompt that fully customizes the agent’s behavior.
The Pipecat runner automatically discovers any function named bot in your module. When a user connects via WebRTC, the runner calls this function and passes connection details through runner_args.Inside the bot function, add the WebRTC transport configuration:
Copy
transport = SmallWebRTCTransport( runner_args.webrtc_connection, TransportParams( audio_in_enabled=True, audio_out_enabled=True, vad_analyzer=SileroVADAnalyzer(), ), )
This creates the WebRTC transport and enables audio I/O as well as Silero VAD for detecting when the user starts and stops speaking.Next, add the AI services for transcription, response generation, and speech synthesis:
Transcription: Converting speech to text via the STT provider
User context: Aggregating the user’s message into the conversation history
LLM response: Generating a reply based on the conversation so far
Speech synthesis: Converting the LLM’s text response to audio via Rime TTS
Audio out: Streaming the synthesized speech back to the user
Assistant context: Recording the assistant’s response in the conversation history
The context aggregator appears twice to capture both sides of the conversation.Finally, add the task runner and an event handler for greeting the user:
The on_client_connected event fires when a user connects to the agent. It appends a system message prompting the LLM to greet the user and triggers an immediate response with run_llm=True.
Pipecat’s main helper from pipecat.runner.run automatically:
Discovers the bot function in your module
Starts a FastAPI server with WebRTC endpoints
Serves a prebuilt browser client at /client
Sets up the WebRTC connection and passes the connection to your bot function
When you run the agent, Pipecat starts a local HTTP server. Open the browser client to connect via WebRTC. The server runs locally, but the agent makes API calls to OpenAI and Rime.
Open a browser and navigate to http://localhost:7860/client. Allow microphone access when prompted.You can now talk to your agent using your microphone.
Create a new file called personality.py with the following content:
Example personality file
Copy
SYSTEM_PROMPT = """CHARACTER:You are Detective Marlowe, a world-weary noir detective from the 1940s whosomehow ended up as an AI assistant. You treat every question like it's acase to be cracked and speak in dramatic, hard-boiled metaphors.PERSONALITY:- Cynical but secretly caring underneath the tough exterior- Treats mundane tasks like high-stakes mysteries- References your "years on the force" and "cases that still haunt you"- Suspicious of technology but grudgingly impressed by it- Has strong opinions about coffee and rainSPEECH STYLE:- Keep responses to 2-3 sentences maximum- Use noir metaphors like "this code is messier than a speakeasy on a Saturday night"- Dramatic pauses with "..." for effect- Call the user "kid" or "pal" occasionally- End with ominous or philosophical observationsRESTRICTIONS:- Never break character- Don't use emojis or special characters- Stay family-friendly despite the noir tone"""INTRO_MESSAGE = "The name's Marlowe... I've seen things that would make your code freeze, pal. So what case are you bringing to my desk tonight?"
Update your agent.py to import and use this prompt:
Copy
from personality import SYSTEM_PROMPT, INTRO_MESSAGE
Then update the on_client_connected handler to use your custom intro message:
Storing your system prompt in a separate file keeps your personality configuration separate from your agent logic, making it easy to experiment with different characters.
Pipecat’s modular design makes it easy to swap components. Experiment with your agent by:
Replacing OpenAI with another STT provider, such as Deepgram or AssemblyAI
Using a different LLM, such as Anthropic, Gemini, or a local model
Switching transports to use WebSocket for server-to-server or Daily’s hosted rooms for production deployments
To learn more about the Pipecat framework, including its transport options, deployment patterns, and advanced features, browse the Pipecat documentation.View Rime’s Pipecat demo agents for a ready-to-use multilingual agent example that switches languages dynamically.
Check your TTS service class: The arcana model requires RimeNonJsonTTSService. If you see WebSocket HTTP 400 errors in the logs, you may be using RimeTTSService (which is only compatible with models like mistv2).
Verify your Rime API key: Ensure the key is valid and has TTS permissions.