Rime On-Premises Deployment Quickstart

On-prem is in public beta. For more information regarding access to Docker images and pricing info, reach out to help@rime.ai.

Introduction

Why On-Premises?

Deploying on-premises offers several advantages over using cloud APIs over a public network. One of the main benefits is speed; by hosting the services locally, you can significantly reduce network latency, resulting in faster system responses and data processing.

Security

With an on-premises deployment, all sensitive data remains within your corporate network, ensuring enhanced security as it is not transmitted over the Internet. This setup helps in complying with strict data privacy and protection regulations.

Performance

Latency

  • Mist: Our tests have shown median latency of around 80ms with randomly generated sentences between 40 and 50 characters.
  • Arcana: You should be getting a time-to-first-frame latency around 400ms on H100, and a real-time-factor (RTF) lower than 1.

Components

On-Premise Components

Prerequisites

Hardware Requirements

  • GPU
    • For Mist
      • NVIDIA T4, L4, A10G, or any equivalent or higher-performing GPU
    • For Arcana
      • NVIDIA H100, or any equivalent or higher-performing GPU
  • Storage
    • 50 GB storage
  • CPU
    • 8 vCPUs
  • Memory requirements
    • 32 GiB

Software Requirements

  • Supported Linux Distributions
    • Debian 12 (bookworm), x86_64
    • Ubuntu Server 24.04 (jammy), x86_64
  • NVIDIA drivers
    • 570 or higher (for CUDA 12.8)
  • Docker
  • NVIDIA Container Toolkit

Installations

NVIDIA Drivers
Follow https://www.nvidia.com/en-us/drivers to install the latest NVIDIA drivers, or use the following instructions on Debian-based systems:
NVIDIA Driver Installation (Debian-based)
# Update packages
sudo apt-get update

# Install basic toolchain and kernel headers
sudo apt-get install -y gcc make wget linux-headers-$(uname -r)

# Download and install the NVIDIA driver.
NVIDIA_DRIVER_VERSION=570.103.01
NVIDIA_DRIVER_PATH=/opt/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER_VERSION}.run
sudo rm -f "${NVIDIA_DRIVER_PATH}"
sudo wget "https://us.download.nvidia.com/tesla/${NVIDIA_DRIVER_VERSION}/NVIDIA-Linux-x86_64-${NVIDIA_DRIVER_VERSION}.run" -O "${NVIDIA_DRIVER_PATH}"
sudo chmod +x "${NVIDIA_DRIVER_PATH}"
sudo "${NVIDIA_DRIVER_PATH}" --silent --no-questions
Docker
Follow https://docs.docker.com/engine/install to install Docker on your system.
NVIDIA Container Toolkit
Follow https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html to install the NVIDIA Container Toolkit.
Verification
To verify that you have all the prerequisites installed, run the following command:
Verify Prerequisites
docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubi9 nvidia-smi
You should see your GPU listed in the output, alongside the driver version and CUDA version.

Firewall Requirements

The Rime API instance will listen on port 8000 for http traffic, and port 8001 for websockets traffic. You will also need to allow the following outbound traffic in your firewall rules:
  • http://optimize.rime.ai/usage: registers on-prem usage with our servers.
  • http://optimize.rime.ai/license: verifies that your on-prem license is active.
  • us-docker.pkg.dev on port 443: container image registry.

Self-Service Licensing & Credentials

API Key Generation

Refer to our user interface dashboard to generate the necessary keys and credentials for authenticating and authorizing the deployment and use of our services.

Deployment

The deployment consists of two services, each powered by a container image:
  • API service: responsible for handling the HTTP and WebSocket requests, and for verifying the license. It serves as a proxy to the TTS service.
  • TTS service: responsible for model inference.
There is a 1:1 relationship between the API service and the TTS service: for each TTS model, you will need a corresponding API service. Multiple pairs of API and TTS services can be deployed on the same machine.

Artifact Registry Login

Key file to be provided by Rime.
Login to Artifact Registry
cat KEY-FILE | docker login -u _json_key --password-stdin https://us-docker.pkg.dev

Container Images

TTS Service

Currently the latest image versions are:
  • us-docker.pkg.dev/rime-labs/arcana/v2/de:20250823
  • us-docker.pkg.dev/rime-labs/arcana/v2/en:20250823
  • us-docker.pkg.dev/rime-labs/arcana/v2/es:20250823
  • us-docker.pkg.dev/rime-labs/arcana/v2/fr:20250823
  • us-docker.pkg.dev/rime-labs/mist/v2/en:20250814

API Service

The latest image version is:
  • us-docker.pkg.dev/rime-labs/api/service:20250731

Docker Compose Configuration

A simple way of deploying on a machine is to use Docker Compose. Create a docker-compose.yml file with your editor of choice to define the services and their configurations:
docker-compose.yml
version: '3.8'
services:
  api:
    image: <image_id>
    depends_on:
      - model
    ports:
      - "8000:8000"  # HTTP API
      - "8001:8001"  # WebSocket API
    restart: unless-stopped
    environment:
      - MODEL_URL=http://model:8080/invocations

  model:
    image: <image_id>
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
              count: all
    ports:
      - "8080:8080"
    restart: unless-stopped
When running on Kubernetes, ensure that MODEL_URL points to http://0.0.0.0:8080/invocations instead of the Docker Compose service name.

Start Docker Compose

Start Docker Compose
docker compose up -d

Deployment Steps

  1. Environment Setup: Prepare your AWS environment according to the specifications required for optimal deployment.
  2. Service Deployment: Using Docker, deploy the images on your server.
  3. Networking Setup: Configure the network settings, including the Internet Gateway and port settings, to ensure connectivity and security.
  4. Licensing and Authentication: Generate and apply the necessary API key via our dashboard to start using the services.
Note: Once the containers are started, expect 5 minutes delay for warm up before sending first tts requests.

Additional Information

  • Troubleshooting Guide: A troubleshooting guide will be provided to help resolve common issues during deployment.
  • Available voices/models: all voices are currently available.

Requests and Response Formats

HTTP Requests

Request:
Health Check
curl http://localhost:8000/health
which should return
{
    "apiStatus":"ok",
    "timestamp":timestamp,
    "licenseStatus":"valid"/"expired-or-not-set",
    "modelReachable":true/false
}
Request Example
curl -X POST "http://localhost:8000" -H "Authorization: Bearer <API KEY> -H "Content-Type: application/json" -d '{
  "text": "I would love to have a conversation with you. The new model is out.",
  "speaker": "joy",
  "modelId": "mist"
}' -o result_mist.txt
Response:
Response Format
{"audioContent":{"model_output":"<base64>"}}
Sample response file: result.txt

Receiving a response in mp3 format

Request:
Request Example
curl -X POST "http://localhost:8000" -H "Authorization: Bearer <API KEY>" -H "Content-Type: application/json" -H "Accept: audio/mp3" -d '{
  "text": "I would love to have a conversation with you.",
  "speaker": "joy",
  "modelId": "mist"
}' -o result.mp3
Response: Sample response file: result.mp3

Receiving a response in pcm (raw) format

Request:
Request Example
curl -X POST "http://localhost:8000" -H "Authorization: Bearer <API KEY>" -H "Content-Type: application/json" -H "Accept: audio/pcm" -d '{
  "text": "I would love to have a conversation with you.",
  "speaker": "joy",
  "modelId": "mist"
}' -o result.pcm
Response: Sample response file: result.pcm

Websockets Endpoints

The json websockets endpoint will be served at port 8001. For example ws://localhost:8001 which will be eqivalent to our cloud websockets-json api .