Vocapia Research develops speech processing technologies such as core
multilingual large vocabulary speech recognizers for automatic speech
transcription, for audio indexing, and for speech-text alignment.
The VoxSigma software suite provides large vocabulary speech recognition
capabilities in multiple languages, as well as audio segmentation and
partitioning, speaker identification and language recognition. The
speech-to-text software suite has been designed for professional users needing
to transcribe large quantities of audio and video documents such as broadcast
data, either in batch mode or in real-time. Versions specifically target the
transcription of conversational telephone speech and call-center data.
VoxSigma™ Software Suite
The Vocapia Research VoxSigma software
suite for Linux offers state of the art performance for broadcast data and
conversational data in many languages. The VoxSigma API includes Unix/Linux commands, C and
C++ libraries, REST API, import and export in XML format. The VoxSigma software
is available both via licensing and via our web service.
[Voxsigma request form]
Speech Transcription
Substantial advances in speech recognition technology have been
achieved over the last decade. This core technology, available in
multiple languages, serves as the basis for a range of applications
such as voice-interactive database access, as well as more demanding
tasks such as the transcription of broadcast data. Vocapia Research
has speech-to-text systems with
vocabulary sizes up to 300K words for many languages including
Arabic, Cantonese, Czech, Dutch,
English, Finnish, French, German,
Greek, Hebrew, Hindi, Hungarian, Italian, Latvian, Lithuanian,
Mandarin, Pashto, Persian, Polish, Portuguese, Romanian, Russian,
Spanish, Swahili, Swedish, Turkish and Urdu.
Audio Indexing
Large vocabulary continuous speech recognition is a key technology
that can be used to enable content-based information access in audio
and video documents. Most of the linguistic information is encoded in
the audio channel of audiovisual data, which once transcribed can be
accessed using text-based tools. Via language identification, speech
recognition, and speaker recognition, spoken document retrieval can
support random access using specific criteria to relevant portions of
audio documents, reducing the time needed to identify recordings in
large multimedia databases. Some applications are data mining,
news-on-demand, and media monitoring.
Speech-Text Alignment
Speech-Text Alignment is the process of synchronizing a speech signal with a
speech transcript or closely related text, providing time codes for words and
sentences. The alignment process assigns timecodes to each word and each
punctuation mark in the audio transcript and provides confidence scores to
identify areas where the alignment may not be perfect in particular when the
provided transcript differs from what has really been said. There are many uses
of this technology, including audio books, language learning, and video
subtitling.
Spoken language identification
Spoken language identification is the process recognizing the language
spoken in an audio document (broadcast audio, podcast, telephone). The
standard VoxSigma language identification component can recognize one
of 40 languages.
|