It’s been a while since I’ve strayed into the Mystic side of Linguistic Mystic. This evening, while Wikipedia-Surfing, I stumbled upon an interesting reported phenomenon: Electronic Voice Phenomena (or EVP). Since I spend a great deal of my free time looking at voices and how speech works, I was interested to see what a bit of phonetic analysis would do to some of the examples that its proponents have given.


EVP is not a well-studied phenomenon and there is little (if any) scientific evidence in favor of its existence. This post should not be construed as an endorsement of this phenomenon or an assertion of its reality. I try to keep an open mind on such things, but I’m doing this analysis for my own interest (if nothing else, “paraphonetics” is a cool sounding field name), not for any legitimate, scholarly purpose. Take this post (and, if you’d like, the phenomenon itself) with a grain of scientific salt.

What is EVP?

In the words of the American Association of Electronic Voice Phenomena’s FAQ page on EVP:

Electronic Voice Phenomena (EVP) is the term traditionally used to describe unexpected sounds or voices sometimes found on recording media. EVP initially involved audio tape recorders, but in later years, virtually any recording medium became a vehicle for phenomena. The term Instrumental TransCommunication (ITC) came into being to describe these expanded modes of audio- and video-format communication. Other acronyms used in the literature include Electronic Disturbance Phenomena (EDP) and Trans-Dimensional Communication (TDC).

For a more two-sided (and skeptical) discussion and other resources, I encourage you to visit the Wikipedia page on EVP.

Long story short, EVP are anomalous voices that show up in recordings, often claimed to come from the dead. These voices are reported to be phrases, words, or even dialogues with a living speaker.

Praat and the Paranormal

When I first read about this, I decided to try and find some samples of this phenomenon and run them through Praat, a Phonetic Analysis program. Luckily, the AAEVP provides a number of examples on their site. The one I’ll be analyzing today comes from Vicki Talbott’s Examples, and purports to feature a discussion between her and her son who had recently died, discussing the proper pronunciation of the word “evidentiary”. I encourage you to read her explanation (the last example on her page) and listen to the file a few times before I proceed.

As you can hear, her voice is quite clear (albeit recorded), but the other voice is nearly incomprehensible if you’re not sure what you’re looking for. However, I was curious just how much of the data I’d expect to find in speech would be there, and how much is my brain filling in the blanks. Let’s look a little more closely at the acoustics of the voices.

What’s in a voice?

We hear patterns of sound based on the emphasis and damping of certain parts of the sound spectrum. The vibration of our vocal folds is fairly constant (excepting the occaisional pitch or voicing change), but we’re almost constantly moving our mouths and tongue. Just as your voice changes when you put your mouth to a flexible tube and talk while bending the tube, the sound of your vocal folds vibrating is changed by the position of your tongue, lips, and velum in your mouth and throat. Different vowel sounds are created by modifying the shape of the mouth, which in turn modifies the sound escaping your mouth to be heard by others. This is called the Source-Filter Model of Speech Production.

So, when we hear another person make a sound, say, the vowel ‘i’ (as in feet), we’re analyzing which parts of the sound from their vocal folds are being damped (supressed) and which parts resonate (are stronger). For example, in the vowel /i/, there are strong bands of resonating sound (called ‘formants’) around (roughly) 250hz, 2500hz, and 3000hz. We hear these particular parts of the spectrum being emphasized, and interpret them as somebody making an /i/.

These formants (along with the gaps between them and some other sounds) are what we’re listening for in speech. In clear speech, the formants are well defined and strong, but in distorted or mumbled speech, they’re very tough to pick out, both by computer and with our ears.

Evidentiary evidence

So, for comparison, I’ve recorded a file of myself saying “evidentiary”. Give it a listen, if you’d like.

When I open this file in Praat, it shows me a part of the spectrum (0-5000hz). On that Spectrogram, there are darker parts and lighter parts. The darker parts show the formants (the resonating parts of the spectrum), and the lighter parts show the damped portions. I’ve also had Praat draw red dots on the formants, to make them a bit more distinct. Here’s a screenshot of the spectrogram for my “evidentiary”, labeled with English on top, and IPA on the bottom:


As you can see, the heights and separation between the formants (black parts with red dotted lines) are distinctly different for the initial “e” and the “ia” in the middle. If they weren’t, the vowels would just sound the same. Similarly, there are other trademark signs of speech sounds. The ‘sh’ sound (ti in English) shows up with a burst of noise around 3000-7000hz (as one would expect), and the ‘n’ makes everything a bit damped and quieter (as do all nasal sounds). All the formants are well defined, and Praat doesn’t have much trouble finding them and sticking to them.

Now, let’s look at a spectrogram of Talbott’s recording, annotated the same way, with red dot formants, and using her transcriptions from the diagram at the bottom of her site:


Of course, the spacing is different, and based on the white streaks around 200hz and 3500hz, it looks like she’s done some filtering to isolate these sounds. The interesting part about this is that there aren’t any well defined formants. Praat is great at finding formants in good files, but it’s also quite adept at finding them in bakcground noise if there’s not any good speech in a given file. As you can see, there are three pretty constant bands of red dots going across the entire spectrogram, with the same amount of variation in the silence as in the “spoken” portions. Although Praat thinks they’re formants, when compared to the relatively sharp black lines in my version, it looks like it’s just finding whatever pattern it can in the noise.

It doesn’t seem like there’s much of anything in the way of clear formants or expected voice patterns. The noise for the ‘sh’ is missing from ‘ti’, the ‘n’ doesn’t seem to affect much, and the formant patterns over the two different /i/’s don’t really match (as they did in mine). Over all, there’s not a lot here to latch on to, and, as you likely noticed when listening to it, it’s by no means obvious what’s being said. Most of the auditory cues we use to pick out meaningful speech are absent acoustically, yet, with a few repetitions, we can usually convince ourselves that we’re hearing speech here.

What does it all mean?

Based on what I see here (in this one example), it seems like many of the fundamental characteristics of human speech are missing in the second, purportedly paranormal voice. I suspect that this is what makes it nearly incomprehensible without coaching.

What does that mean for EVP? Well, nothing, really, because my study here isn’t particularly scientific. Just because a phledgling phonetician doesn’t see speech through one method of analysis doesn’t mean it’s not there. Also, I can’t be sure what sorts of filters were used that might have changed the sound quality. I’m not sure what results a different file would yield.

However, even if this were a perfect analysis, all that I’m proving here is that it’s actually similar to normal human speech. The EVP people will still defend their assertions, and the skeptics will still have their objections to their claims (and methodology, and other such things).

The difficulty with Paraphonetics

The other relevant question is whether such study really matters at all. To the people who believe in EVP, the clarity (or closeness to normal human speech) may not be particularly relevant.

Phonetics is a very exact sort of science, but anything to do with the paranormal is extremely subjective. We can scientifically measure things all day long, but in the end, these sorts of phenomena depend on the interpretation of the listener. Perhaps Vicki Talbott heard “evidentiality” in that noise because of her previous question (using context to make sense of inaudible portions of a “conversation”). Perhaps the noise just coincidentally sounds enough like “evidentiality” to trip the human brain’s speech analysis functions. However, as is the case with all paranormal claims, one can never prove the negative (we can’t prove completely that nothing paranormal occurred in this tape). You’re welcome to believe whatever you’d like on the subject.

Regardless, next time you go out ghost-hunting, you might want to grab a copy of Praat. It can never hurt, and at the very least, Praat can help you find some phantom formants in the background noise. It might not sound scary to you, but in the middle of a research project, they can be downright terrifying.

