The VoxSigma software suite is available as a Web service via a simple REST
API over HTTPS allowing customers to quickly reap the benefits of regular
improvements to the technology and take advantage of additional features
offered by the online environment, such as daily updates of language
models. The VoxSigma SaaS is available 24/7/365 with failover servers and
geographic redundancy.
The service offers four main processing methods : the segmentation and
partitionning of the audio, the identification of the language, the conversion
of recorded speech input to text, and the synchronization of a transcription
with the speech signal (also called speech-text alignment).
REST API features
- Protocol :
REST API over HTTPS;
POST, GET and PUT HTTP methods are accepted;
Both URI encoded requests and MIME multi-part requests are supported;
Two submission modes: file and streaming (audio and XML).
- Availability : Service available 24/7/365 with failover
servers and geographic redundancy
- Supported functions : speech-to-text transcription,
language identification, speech-text synchronization
- Supported languages : Arabic, Cantonese, Czech, Dutch,
English, Finnish, French, German, Greek,
Hebrew, Hindi, Hungarian, Italian, Latvian, Lithuanian, Mandarin, Pashto, Persian, Polish,
Portuguese, Romanian, Russian, Spanish, Swahili, Swedish, Turkish, Ukrainian and Urdu (more
under development)
- Special features : on the fly language model adaptation,
daily updates of language models for broadcast data
- Audio input : AAC, AIFF, ASF, FLAC, MS-Wave, MPEG,
Ogg/Vorbis, Nist Sphere, Sun AU
- Output : XML data with speaker diarization, language
identification tags, word transcription, punctuation, confidence measures, numerical entities
and other specific entities
-
Special needs
- Batch processing offered as an online or offline service to process
archives [request form]
- Model customization is offered on demand to ensure you
get the best possible results for your needs
[contact form]
SaaS Status
Pricing
- We offers various usage plans: daily plan, monthly plan, batch plan, ...
- For our generic systems and large quantities the price is on the order of 0.01 euro (or $0.01)
per minute.
- Note that our pricing is based on speech duration, i.e. silences are not counted and there
is no minimum cost per submission.
- We offer free trials upon request.
- We no longer offer a pay-as-you-go usage plan. If your data processing
needs are relatively low or are irregular, or if you need to process video
data or want to manually adapt or correct the automatic transcriptions, please
check out our partner's service . This service
pay-as-you-go also offers many export
formats such as XML, CSV, SRT, SBV, RTF, VTT, PDF, DOC, DOCX.
GUI Scribe3
The Scribe3 service is a graphical user interface for the REST API service. It
make it is easy to upload any audio or video file (or a set of files) to submit
these files for processing and then to analyse and edit the processing result.
In addition to the default XML output provided by VoxSigma, the processing result
can be converted to the following formats: RTF, SRT, SBV, VTT, DOC, and DOCX.
Document based adaptation
Automatic on-the-fly adaptation allows the user to provide texts related to the
audio document being processed, what can be considered topic/domain
adaptation. These accompanying texts serve to increase the lexical coverage of
the
speech-to-text system and to adapt the language model to the specific domain
of the audio document with the aim of improving the transcription accuracy.
On-demand batch processing
Batch processing is offered as an
offline or online service to process audio and audiovisual archives, in
particular when specific needs and models are required
[
request form].
If you are interested in a particular language or technology please
use our
contact form or our
VoxSigma request form, or send a note
directly to
contact@vocapia.com.
Support
We provide hotline support (via email and phone) for our products and services to
help users and system integrators solve problems in the shortest
possible timeframe
[
support form].