SOTA VOX Kit ASR
Speech recognition

Transcribe audio and video recordings with maximum accuracy and at high speed.

Audio processing speed up to x50 times faster than the original audio
Punctuation and numbers - Punctuation marks are inserted automatically. Numbers are displayed in numerical format, not in words.
Recognition accuracy
Recognition accuracy in Russian is up to 95%.
On-the-fly processing
Support for stream mode via gRPC and MRCP protocols.
Dialogue Markup
The speech of the client and the employee will be structured in chronological order.
Expandable dictionary
The ability to quickly add new vocabulary to your dictionary on any topic or subject area.

Available speech models

We can adapt existing language and acoustic models to any subject area to improve recognition quality

Television and radio broadcasting
The model is optimized for processing TV broadcasts, news stories, radio broadcasts, podcasts and films.
Telephony
This model is designed to process recordings of telephone conversations on arbitrary topics.
Microphone
The model is optimized for processing audio recordings made with an external microphone, such as interviews.
Knowledge extraction
Text analytics engine (NLP|NLU) for understanding the meaning and extracting relevant data given the context.

Technical features

- Russian language (Cloud/On-Premis)
- English language (Cloud/On-Premis)
- Kazakh language (Cloud/On-Premis)
- Uzbek language (Cloud/On-Premis)

Russian language:
‍Telephony up to 95%
Media up to 98%

‍English language:
‍Telephony up to 85%
Media up to 87%

‍Kazakh language:
‍Telephony up to 95%
Media up to 97%

‍Uzbek language:
‍Telephony up to 95%
Media up to 98%

Telephony (phone): Wav PCM, 8 kHz / 16 bit
Media (broadcast): Wav PCM, 16 kHz / 16 bit

All major formats and codecs are supported: WAV, AAC, OGG, MP3, WMA, MuLaw, ALaw, Linear16, RawOpus - automatically converted to WAV