Vocapia ranked 1st in the Airbus ATC Challenge

Orsay - October 31, 2018

Vocapia ranked 1st in the Airbus ATC Challenge

Airbus, in collaboration with IRIT and Safety Data-CFH held a challenge to assess the current state-of-the-art in automatic speech recognition and call sign detection in English Air Traffic Control (ATC) communications. The challenge provided participants with annotated training data and a leaderboard to assess the performance of their systems on a heldout set of development data. ATC communications are challenging for today's technology as the audio is contaminated with various types of noises, and the speech is spoken by a wide range of speakers with different native and non-native accents. The speech is generally in English, but may be in the language spoken in the country (in the case of this challenge, French) and may contain code-switching with English. ATC communications are generally spoken at a fast speech rate and make use of domain specific grammar and vocabulary. There are many potential uses of speech technology in the domain of air traffic communications to improve safety and training.

The Vocapia Research and the Spoken Language Processing Group at LIMSI CNRS submission to the Airbus ATC challenge 2018 ranked first for both the speech recognition and call sign detection tasks. The speech-to-text transcription technology used for the challenge is based on that under-development over the last 20 years, including deep neural networks for both the acoustic and linguistic models. Compared with more general transcription tasks, the Air Traffic Contol communications are at the same time more complicated and simpler: the language is nominally more constratined with a more or less controlled vocabulary and syntax, but the environmental conditions can be quite challenging with various noises and transmission drop out. The call sign detection task requires locating a flight identifier in the automatic transcription. The call sign may be complete, adhering to the full structure (airline code, followed by 3-5 numbers and optionally 1 or 2 letters) or partial. The exchanges between the pilot and control occur in a known context which simplifies the task of understanding partial call signs by humans. However, this contextual information was not available to the automatic systems, complexifying the call sign detection task.

About Vocapia Research

Vocapia Research is a French R&D company and software publisher with over 20 years of experience in providing leading edge speech technologies for many languages, including most major European languages as well as Arabic, Mandarin, and Russian. The Vocapia Research VoxSigma^® software suite uses advanced language technologies such as language identification, speech recognition, and speaker identification to transform raw audio and audiovisual data into structured and searchable XML documents. This technology relies on decades of research at LISN, with which there is a privileged partnership. Joint systems developed with LISN have achieved top ranks in national and international challenges on speech-to-text transcription. Located at the heart of the science innovation cluster of Paris Saclay, France, Vocapia Research is a leader in developing and adapting AI-based solutions for both civil and defence applications. These applications include audio and audiovisual data mining (broadcast and web data, telephone speech), production of subtitles, OSINT and COMINT, and the analysis of aeronautical communications (air traffic control, voice command). Readers who wish to get more information about Vocapia Research are invited to check out the Vocapia Research website or use the contact information page http://www.vocapia.com/contact.