Department of Psychology
Rutgers University New Brunswick
Language processing in the face of variability: Distributional learning all the way down?
On the one hand, talker variability is one of the fundamental challenges for speech recognition: each talker has their own mapping from linguistic units to sounds, which means that an effective listener must use a different recognition function for each talker. On the other hand, talker variability means that speech is a source of rich information about who the talker is. This dual nature of talker variability means that speech and talker recognition are inextricably linked: knowing something about who is talking makes it easier to understand what they are saying, and knowing something about how someone talks unlocks the rich social meaning of speech.
I argue that the concept of a talker’s generative model, or the probabilistic distributions of sounds associated with each phonetic/linguistic category, is a useful general purpose conceptual tool for understanding the link between talker variability, speech recognition, and social identity. With such phonetic cue distributions, we can use information theoretic tools to quantify both the extent and structure of talker variability across different phonetic systems, and establish in-principle consequences of talker variability for both speech recognition and socio-indexical inferences from speech. These tools both provide a unifying perspective on the diverse range of strategies that listeners use to deal with talker variability, and raise new questions and challenges for future research, since to take advantage of structure in talker variability listeners must learn that structure, which substantially expands the challenge of language acquisition and suggests an important role for inductive biases to guide this learning.