Human hearing provides a powerful inspiration for what we might be able to do with machine algorithms, to extract various kinds of meaning and information from sound signals. By modeling the auditory periphery (the cochlea), we can construct a robust representation of what the ear sends to the brain. By modeling the auditory brainstem and midbrain, we make hypothetical representations of the "images" that may project to sheets of auditory cortex as retinal images project to visual cortex. These feature-engineered stages provide a robust foundation for analysis and interpretation of speech, music, sound events, and complicated mixtures and environments. Simple abstractions of these auditory representations have also been shown to be useful improvements over conventional sound-processing front ends. We are also excited to find that visual representations of such front-end features show promise as sound-access aids for users who are deaf or hard-of-hearing.
Richard F. "Dick" Lyon is a Principal Research Scientist at Google, where he leads the Sound Understanding team in applying ideas from human hearing to problems in machine interpretation of sounds of all sorts. He is a Fellow of the IEEE "for contributions to VLSI signal processing, models of hearing, handwriting recognition, and electronic color photography", and a Fellow of the ACM "for contributions to machine perception and for the invention of the optical mouse." In 2005 he received the Progress Medal from the Royal Photographic Society "for the development of the Foveon X3 sensor." He has over 70 issued U.S. patents. In 2017 his book "Machine Hearing: Extracting Meaning from Sound" was published by Cambridge University Press.