Speech recognition is a complex task relying on many sources of knowledge (acoustics, phonetics, linguistics, semantics) and complex techniques (signal processing, acoustic-phonetic models, neural networks, statistical language models).

Speech recognition accuracy is highly dependent on the type of data to be processed and is typically measured in terms of word error rate which can be as low as a few percent for some tasks and as high as 40% on very challenging tasks.

Due to the complexity of the speech production and perception processes, the rate of progress in speech recognition has always been slow. However the reduction in word error rate has been steady for about 25 years and progress is expected to continue for many years to come. Progress can also be measured relative to human speech recognition performance, and in this respect the performance gap between humans and machines is being reduced slowly but surely year after year.