On-Prem
Customer Documentation for Self-Hosting Rime On-Premise
1. Introduction
Why On-Premises?
Deploying on-premises offers several advantages over using cloud APIs over a public network. One of the main benefits is speed; by hosting the services locally, you can significantly reduce network latency, resulting in faster system responses and data processing.
Performance
Our tests have shown median latency of around 80ms
with randomly generated sentences between 40 and 50 characters.
Security
With an on-premises deployment, all data remains within your corporate network, ensuring enhanced security as it is not transmitted over the Internet. This setup helps in complying with strict data privacy and protection regulations.
Components
Prerequisites
NVIDIA Drivers: Ensure that NVIDIA drivers are installed and properly configured on your systems to support necessary computations and operations.
Docker: Install Docker on your system to manage the containerized application.
NVIDIA Container Toolkit: Install the NVIDIA Container Toolkit to enable GPU support within Docker containers.
Other Prerequisites:
g5g.xlarge
or larger EC2 machine for on-prem rime-tts GPU- 50 GB storage
- In general, rime-tts and and api instance containers can be run on the same machine. For more complex setups, where each container needs to run on separate machine, we recommend
t2.micro
or larger EC2 machine with 10 GB storage for on-prem API instance. - Recommended Linux Distribution:
- Ubuntu Server 22.04
- Before shipping docker images, we will do our tests with
Ubuntu Server 22.04
. If there is a limitation on your side not to use this distribution, please let us know. For reference, below are other Linux distributions supported by NVIDIA:
2. Deployment Environments
This documentation will cover specific instructions and considerations for deploying the services within an AWS environment, ensuring optimal configuration and performance.
3. Self-Service Licensing & Credentials
API Key Generation: Refer to our user interface dashboard to generate the necessary keys and credentials for authenticating and authorizing the deployment and use of our services.
4. Deploy Our Services
Pull Images from DockerHub
Text-to-Speech (TTS) Image: Pull the latest TTS service image from DockerHub using the provided Docker command.
API Image: Similarly, pull the latest API service image from DockerHub to be used in conjunction with the TTS service.
Docker Compose File: Create a docker-compose.yml
file with your editor of choice to define the services and their configurations.
Start docker compose:
Networking
Rime API instance will listen on port 8000
.
You’ll need to permit outbound network traffic to http://optimize-api-55fd43d0f53d69ee.elb.us-west-1.amazonaws.com/usage
and http://optimize-api-55fd43d0f53d69ee.elb.us-west-1.amazonaws.com/license
to verify with our servers that you have an active on-prem licensing agreement and to register usage. Additionally, you’ll need access to DockerHub, a container image repository platform, so you’ll need to allow outbound traffic to their servers on port 443
.
Deployment Steps
- Environment Setup: Prepare your AWS environment according to the specifications required for optimal deployment.
- Service Deployment: Using Docker, deploy the images on your server.
- Networking Setup: Configure the network settings, including the Internet Gateway and port settings, to ensure connectivity and security.
- Licensing and Authentication: Generate and apply the necessary API key via our dashboard to start using the services.
Note: Once the containers are started, expect 5 minutes delay for warm up before sending first tts requests.
Additional Information
- Troubleshooting Guide: A troubleshooting guide will be provided to help resolve common issues during deployment.
- Available voices/models: all voices are currently available.
5. Requests and Response Formats
Sending a simple curl request
Request:
Response:
Sample response file: result.txt
Receiving a response in mp3 format
Request:
Response:
Sample response file: result.mp3
Receiving a response in pcm (raw) format
Request:
Response:
Sample response file: result.pcm