- Documentation
Introduction
Welcome to the documentation for Dubbix. A powerful solution designed to revolutionize the way you approach audio localization and content creation.
What is inside our API v2?
Our API streamlines content translation. Just upload your video and get the translated result effortlessly.
API v1 deprecation notice
Support for API v1 was officially discontinued following the release of API v2.
As of May 2025, API v1 has been fully deprecated and is no longer accessible.
All integrations should now use API v2.
Limitations
The list of API v2 limitations
We are constantly working to improve our API, however, this is a list of current limitations:
1. Currently we support the following ways to upload your video:
upload via link using links to video on Youtube, Google Drive, S3, Vimeo or a direct access link;
local uploading from your device.
3. It is not yet possible to generate video with subtitles via API. This feature will be added in future, in the meantime, you can use our web platform to get a version of translated video with subtitles in target language burnt into it.
Text-to-Speech (TTS)
Base URL:** `https://api.example.com`
Endpoints
WebSocket /tts_stream/bytes
Real-time text-to-speech streaming endpoint that accepts text input and streams audio chunks back as they are generated.
**URL:** `ws://api.example.com/tts_stream/bytes` or `wss://api.example.com/tts_stream/bytes`
**Protocol:** WebSocket
#### Connection
Establish a WebSocket connection to the endpoint. Each connection is assigned a unique user ID automatically.
#### Message Format
Send JSON messages with the following structure:
Voice Conversion
This endpoint performs speech‑to‑speech voice conversion.
You provide:
– A reference audio file **(voice you want to mimic)**
– An input audio file **(content you want spoken in that voice)**
The API returns a converted WAV file with the input content spoken in the reference voice.
## Endpoint
POST /voice_convert/vc
## Request
### Headers
accept: application/json
Content-Type: multipart/form-data
### Form Data Parameters
reference_file (file, required) - Reference audio file (.wav or .mp3) containing the target voice input_file (file, required) - Input audio file (.wav or .mp3) containing the content to convert
## Example Requests
### cURL
Send JSON messages with the following structure:
### Python (requests)
```python
import requests
## Limitations:
- Reference audio file should be between 4 and 20 seconds, recommended range is 12-15 seconds.
## Notes
- Both files must be short enough for processing (recommend < 30MB each).
- Output is always returned as WAV.
- The endpoint streams the file back — remember to save the binary output.
Audio & Video Summarizer
This API allows you to **upload audio or video files** and receive:
– **Transcript** (text of spoken content)
– **Timestamps** (aligned with transcript segments)
– **Summary** (shortened version of the content)
1. Summarize Audio/Video File
**Endpoint**
accept: application/json
Content-Type: multipart/form-data
**Description**
Upload an audio or video file (e.g., `.wav`, `.mp3`, `.mp4`) and get back a transcript, timestamps, and a language‑specific summary.
## Recommendations for Use:
+ Always include a unique `request_id` for better tracking.
+ Keep file sizes reasonable (<100MB recommended).
Dubbing/Translate
Speech-to-Speech
**Base URL:** `https://api.example.com`
## Endpoints
### POST /speech/translate
Translates speech from one language to another, returning the translated audio as a WAV file.