Airbus, in collaboration with IRIT and Safety Data-CFH held a challenge to
assess the current state-of-the-art in automatic speech recognition and call
sign detection in English Air Traffic Control (ATC) communications. The
challenge provided participants with annotated training data and a leaderboard
to assess the performance of their systems on a heldout set of development
data. ATC communications are challenging for today's technology as the audio
is contaminated with various types of noises, and the speech is spoken by a
wide range of speakers with different native and non-native accents. The
speech is generally in English, but may be in the language spoken in the
country (in the case of this challenge, French) and may contain code-switching
with English. ATC communications are generally spoken at a fast speech rate
and make use of domain specific grammar and vocabulary. There are many
potential uses of speech technology in the domain of air traffic
communications to improve safety and training.
The Vocapia Research and the Spoken Language Processing Group
at LIMSI CNRS submission to
the Airbus ATC challenge 2018 ranked first for both the speech
recognition and call sign detection tasks. The speech-to-text transcription
technology used for the challenge is based on that under-development over the
last 20 years, including deep neural networks for both the acoustic and
linguistic models. Compared with more general transcription tasks, the Air
Traffic Contol communications are at the same time more complicated and
simpler: the language is nominally more constratined with a more or less
controlled vocabulary and syntax, but the environmental conditions can be quite
challenging with various noises and transmission drop out. The call sign
detection task requires locating a flight identifier in the automatic
transcription. The call sign my be complete, adhering to the full structure
(airline code, followed by 3-5 numbers and optionally 1 or 2 letters) or
partial. The exchanges between the pilot and control occur in a known context
which simplifies the task of understanding partial call signs by humans.
However, this contextual information was not available to the automatic
systems, complexifying the call sign detection task.
About Vocapia Research
Vocapia Research is a French R&D company and software publisher with over
20 years of experience in providing leading edge speech technologies for many
languages, including most major European languages as well as Arabic,
Mandarin, and Russian. The Vocapia Research VoxSigma
® software
suite uses advanced language technologies such as language identification,
speech recognition, and speaker identification to transform raw audio and
audiovisual data into structured and searchable XML documents. This technology
relies on decades of research at LISN, with which there is a privileged
partnership. Joint systems developed with LISN have achieved top ranks in
national and international challenges on speech-to-text transcription. Located
at the heart of the science innovation cluster of Paris Saclay, France,
Vocapia Research is a leader in developing and adapting AI-based solutions for
both civil and defence applications. These applications include audio and
audiovisual data mining (broadcast and web data, telephone speech), production
of subtitles, OSINT and COMINT, and the analysis of aeronautical
communications (air traffic control, voice command). Readers who wish to get
more information about Vocapia Research are invited to check out the Vocapia
Research website or use the contact information
page
http://www.vocapia.com/contact.