Get Started

Introduction

Welcome to the documentation for Dubbix. A powerful solution designed to revolutionize the way you approach audio localization and content creation.

What is inside our API v2?

Our API streamlines content translation. Just upload your video and get the translated result effortlessly.

API v1 deprecation notice

Support for API v1 was officially discontinued following the release of API v2.

As of May 2025, API v1 has been fully deprecated and is no longer accessible.

All integrations should now use API v2.

Get Started

Limitations

The list of API v2 limitations

We are constantly working to improve our API, however, this is a list of current limitations:

1. Currently we support the following ways to upload your video:

upload via link using links to video on Youtube, Google Drive, S3, Vimeo or a direct access link;

local uploading from your device.

2. Our documentation playground does not support uploading files (e.g. for local file uploading endpoint or srt uploading endpoint. Use Postman, curl or any programming language for testing and execution instead.

3. It is not yet possible to generate video with subtitles via API. This feature will be added in future, in the meantime, you can use our web platform to get a version of translated video with subtitles in target language burnt into it.

API Documentation

Text-to-Speech (TTS)

The Text-to-Speech (TTS) API provides translation and speech synthesis capabilities. It accepts text in a source language, translates it to a target language, and returns the synthesized speech as either raw audio bytes or a WAV file.

Base URL:** `https://api.example.com`

Endpoints

WebSocket /tts_stream/bytes

Real-time text-to-speech streaming endpoint that accepts text input and streams audio chunks back as they are generated.

**URL:** `ws://api.example.com/tts_stream/bytes` or `wss://api.example.com/tts_stream/bytes`

**Protocol:** WebSocket

#### Connection

Establish a WebSocket connection to the endpoint. Each connection is assigned a unique user ID automatically.

#### Message Format

Send JSON messages with the following structure:

API Documentation

Voice Conversion

This endpoint performs speech‑to‑speech voice conversion.
You provide:

– A reference audio file **(voice you want to mimic)**
– An input audio file **(content you want spoken in that voice)**


The API returns a converted WAV file with the input content spoken in the reference voice.

## Endpoint

POST /voice_convert/vc

## Request

### Headers

accept: application/json
Content-Type: multipart/form-data

### Form Data Parameters

reference_file (file, required) - Reference audio file (.wav or .mp3) containing the target voice input_file (file, required) - Input audio file (.wav or .mp3) containing the content to convert

## Example Requests

### cURL

Send JSON messages with the following structure:

### Python (requests)

```python
import requests

## Limitations:

- Reference audio file should be between 4 and 20 seconds, recommended range is 12-15 seconds.

## Notes

- Both files must be short enough for processing (recommend < 30MB each).
- Output is always returned as WAV.
- The endpoint streams the file back — remember to save the binary output.

API Documentation

Audio & Video Summarizer

This API allows you to **upload audio or video files** and receive:

– **Transcript** (text of spoken content)
– **Timestamps** (aligned with transcript segments)
– **Summary** (shortened version of the content)

1. Summarize Audio/Video File

**Endpoint**

accept: application/json
Content-Type: multipart/form-data

**Description**

Upload an audio or video file (e.g., `.wav`, `.mp3`, `.mp4`) and get back a transcript, timestamps, and a language‑specific summary.

## Recommendations for Use:

+ Always include a unique `request_id` for better tracking.
+ Keep file sizes reasonable (<100MB recommended).

API Documentation

Dubbing/Translate

This endpoint performs speech-to-speech translation (dubbing) of an uploaded audio or video file from a source language to a target language. Optionally, it can apply lip-syncing if the input is a video file.
API Documentation

Speech-to-Speech

The Speech-to-Speech (S2S) API provides real-time audio translation capabilities. It accepts audio files in various formats, transcribes the speech, translates it to the target language, and returns synthesized speech in the target language. The API also supports speaker voice cloning to preserve the original speaker’s voice characteristics.

**Base URL:** `https://api.example.com`

## Endpoints

### POST /speech/translate

Translates speech from one language to another, returning the translated audio as a WAV file.

**URL:** `POST /speech/translate`

**Content-Type:** `multipart/form-data`

Select Your Dubbix Account

Dubbing App

Essential AI tools for creators

The perfect app for fast, quality AI tasks. Great for creators needing quick dubbing, translations, and voice tools.

AI Creative Studio

Enterprise production tools

A powerful workspace for teams. Create, collaborate, and launch AI video projects like a pro.