Speaker Diarization

Application: enriching and adding structure to audiovisual data

Speaker diarization, also called speaker segmentation and clustering, is the process of partitioning an input audio stream into homogeneous segments according to speaker identity. Speaker diarization can also improve the readability of automatic transcription by structuring the audio stream into speaker turns.

A recent novel use of speaker diarization is as a 'Who's Who' in audio documents, thus providing a means of knowing 'who spoke when'. In the context of the Quaero program, this technology was applied to the task of determining the speaking time of political speakers during the last Presidential election period in France as an aid to human operators.

Other applications: Multilingual Audio Indexing, Transcription of Speeches, Teleconference Transcription, Subtitling, Telephone Speech Analytics.