For the entirety of the week, it’s been hard to avoid the question:
Is it Yanny or is it Laurel?
More importantly – why do people hear this sound clip differently?
Our ears help us hear; Our brains help us discriminate, process and understand.
The true answer is actually multifold.
Following some of the thoughts and theories of many sound experts during the week, a conclusion was made that the word “Laurel” contains lower frequency/pitch sounds and the word “Yanny” contained higher frequency/pitch sounds. Following this theory, people with high frequency reductions in their hearing can’t easily attend to the higher pitched signal in the word “Yanny.” Others have stated the the energy between the words are similar, it just matters what the brain attends to. The theory surmises that the brain attends to that with which it is most comfortable. The factors that determine this are current hearing levels, processing skills and sound experience.
But this is only a part of the answer.
Let’s start by understanding the acoustic structure of the sounds that make up both of these words:
/l/ + /au/ + /r/ + /l/ = Laurel
/j/ (‘y’ sound) + /a/ + /n/ + /i/ (“ee” sound) = Yanny
Every speech sound is made up of a fundamental pitch – or frequency – and subsequent ”harmonics.” The fundamental pitch is called “Fo”, the first harmonic is “F1” and the second harmonic is “F2.” Incidentally, Fo is related to the larynx, F1 is related to the size/shape of the pharynx and F2 is related to the size/shape of the mouth.
The /j/ (‘y’) sound in the word “Yanny” is considered a glide. These sounds take on many characteristics of vowels and tend to be easier to hear in general. Slow transitions from one harmonic to the next can make it easy for the brain to process this sound. Glides are made up of predominately higher pitched energy. The harmonics (F2/F3) range from 2200 Hz to 3000 Hz, which is on the higher scale, although not at the highest end of the frequency range for speech sounds. Still, it may be surmised that some of that energy may not be accessible to someone with a high frequency hearing loss, which can impact their abilities to clearly understand.
The /n/ sound in the word “Yanny” Is a nasal sound. Nasal symptoms are made up of mostly low frequency/pitch energy – predominately under 500 Hz. This makes this sound fairly easy to understand regardless of hearing capacities. In fact, even people with severe hearing losses tend to be able to pick up on the nasal sounds because of all of the low pitched energy.
The /l/ sound and the /r/ sound in the word ”Laurel” are considered liquid consonant sounds. The fundamental pitch of each sound is about the same, the harmonics are on the high side (1300Hz and 2700Hz for /l/, somewhat lower for /r/). Although this is lower pitched energy than glide sounds, it is by no means at the lowest level of the frequency scale – in fact, its fairly mid range for speech information.
This is a lot of information about speech acoustics, but it gives us the evidence that actual hearing capacity may contribute to part of which word we can “hear” because of availability of harmonics. However, not the only factor.
In sound engineering, we are trained to adjust the sound so that an individual can perceive it better. For example, in a concert hall we modify the sound of the band so that the audience has a better listening experience. We do this by increasing the bass sounds, possibly increasing the treble sounds, adding some reverberation. All these are modifications that can change the listeners perception of the sound. In other words, our brain processes the sound based on all of the nuances involved. This helps us determine what we are hearing – and also contributes to our emotional response to what we are hearing.
If you take the initial sound file “Laurel” And you do nothing more than just decrease the volume on your audio player, you are essentially increasing the bass component of that sound clip. By increasing the bass, it may change your perception of the word. If you have the capacity to adjust the different frequencies of the sound clip – if you possess an equalizer that allows for more advanced adjustments – you can increase the treble, decrease the bass and even adjust the mid-frequencies and hear the differences in the sound clip. You may be able to add effects – such as increased compression, reverberation, sustain – as though it was a music signal. Those effects are also going to adjust the brain’s perception of that sound clip. In fact, it’s the same sound clip – it’s the same word. How the sound is manipulated changes your perception of that word.
In the same way, what you use to listen to the word will change your perception – because every speaker is not the same and every headphone has a different sound signature.
Confusing? Maybe. But it all boils down to one thing: sound is not black or white. It’s a whole variety of grays, colored as much by the structure of the sound itself as it is colored by our individual perception of that sound. To make matters even more complex, the factors that contribute to our individual perceptions are based both in science (hearing capacities, auditory processing skills, etc) and art (emotions, experience, interpretation).
There are so many variables that contribute to sound discrimination, and it is easy to come up with many theories based on these variables. However, the only truth is that the reasoning is so complex because sound is so interpretative. Nobody is truly wrong about what they hear.
The ears hear. The brain understands, discriminates and processes.
But – it was really “Laurel”…..