| Home | About Us | Contact Us | Support | Twitter Linkedin Facebook RSS
Vocapia Logo Leading edge speech processing technology

Speech to Text Technology

Vocapia Research develops speech processing technologies such as core multilingual large vocabulary speech recognizers for automatic speech transcription, for audio indexing, and for speech-text alignment.

The VoxSigma software suite provides large vocabulary speech recognition capabilities in multiple languages, as well as audio segmentation and partitioning, speaker identification and language recognition. The speech-to-text software suite has been designed for professional users needing to transcribe large quantities of audio and video documents such as broadcast data, either in batch mode or in real-time. Versions specifically target the transcription of conversational telephone speech and call-center data.

VoxSigma™ Software Suite

The Vocapia Research VoxSigma software suite for Linux offers state of the art performance for broadcast data and conversational data in many languages. The VoxSigma API includes Unix/Linux commands, C and C++ libraries, REST API, import and export in XML format. The VoxSigma software is available both via licensing and via our web service.

[Voxsigma request form]

Speech Transcription

Substantial advances in speech recognition technology have been achieved over the last decade. This core technology, available in multiple languages, serves as the basis for a range of applications such as voice-interactive database access, as well as more demanding tasks such as the transcription of broadcast data. Vocapia Research has speech-to-text systems with vocabulary sizes up to 300K words for many languages including Arabic, Cantonese, Czech, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Italian, Latvian, Lithuanian, Mandarin, Pashto, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Swahili, Swedish, Turkish and Urdu.

Audio Indexing

Large vocabulary continuous speech recognition is a key technology that can be used to enable content-based information access in audio and video documents. Most of the linguistic information is encoded in the audio channel of audiovisual data, which once transcribed can be accessed using text-based tools. Via language identification, speech recognition, and speaker recognition, spoken document retrieval can support random access using specific criteria to relevant portions of audio documents, reducing the time needed to identify recordings in large multimedia databases. Some applications are data mining, news-on-demand, and media monitoring.

Speech-Text Alignment

Speech-Text Alignment is the process of synchronizing a speech signal with a speech transcript or closely related text, providing time codes for words and sentences. The alignment process assigns timecodes to each word and each punctuation mark in the audio transcript and provides confidence scores to identify areas where the alignment may not be perfect in particular when the provided transcript differs from what has really been said. There are many uses of this technology, including audio books, language learning, and video subtitling.

Spoken language identification

Spoken language identification is the process recognizing the language spoken in an audio document (broadcast audio, podcast, telephone). The standard VoxSigma language identification component can recognize one of 40 languages.

 
Saturday December 21, 2024

© Vocapia Research SAS,
2006-2024. All rights reserved.

Legal Notice   Privacy
About Us
API
Apply for job
Apps
Contact Us
Logos
FAQs
Glossary
News
Publications
Request form
Services
Speech-to-text
STT for Linux
Support
Technologies
Videos
VoxSigma