lawrence abu hamdan
Forensic Listening and the ReorganiSation of the Speaking-subject*
Lawrence Abu Hamdan chronicles how the State utilises the pseudo-science of forensic listening to preserve juridical and territorial power.
In “Mengele’s Skull,” Thomas Keenan and Eyal Weizman suggest that unlike the seminal 1961 trial in Jerusalem of Adolf Eichmann, which was archetypal of an era defined by eyewitness testimony, in the mid-eighties international justice became a stage for a different type of narrative; “a second narrative, not the story of the witness but that of the thing in the context of war crimes investigation and human rights.”1 The authors claim that what catalysed this new era into existence was the exhumed remains of the German SS officer and Nazi physician Joseph Mengele.
One year before the forensic examination of Mengele’s remains, a piece of legislation was passed in British criminal law which unknowingly also marked a crucial and forensic shift in the conventions of testimony. The 1984 Police and Criminal Evidence Act (PACE) ordered all police interview rooms to be equipped with audio recording machines, so that all interrogations from then on would be audio-recorded instead of transcribed into text. The passing of this law unintentionally catalysed the birth of a radical form of listening that would over the next twenty-eight years transform the speaking-subject in the process of law. This legislation fundamentally stretched the role of the juridical ear from simply hearing words spoken aloud to actively listening to the process of speaking, as a new form of forensic evidence. This essay is dedicated to understanding the type of listening that this moment in 1984 inaugurated; I seek to amplify both its origins and its role in the contemporary juridical and political forums, in which we see the fragile balance of fundamental human and civil rights predicated on listening and the voice tipping into an uncertain future which calls into question the very means through which we can negotiate politics and the law.
Code E of PACE was seen as a solution to claims that the police were falsifying confessions and altering statements made during interviews, as prior to this point all statements were simply written down “verbatim” by the police officers and then signed off by the suspect.
Were it not for a handful of linguists practicing a rare strand of forensic phonetic analysis, PACE would have remained a simple and transparent article of legal reform. Instead, the act exponentially increased the use of speaker profiling, voice identification, and voice prints in order to, among other things, determine regional and ethnic identity as well as to facilitate so-called voice line-ups.
Prior to PACE, if it was suspected that someone’s voice was on an incriminating recording—for example a bugged telephone conversation, or a CCTV surveillance tape—that person was asked to come to the police station and give a voluntary voice sample. After PACE, doing so was no longer voluntary, and all such recordings were added to a growing audio archive of cassette tapes. This archive quickly became accessed by the little known scientific field of forensic linguistics; this unexpected convergence thereby added the voice as a new medium through which to conduct legal investigations. Soon the forensic listener was required not only to identify voices, but to investigate background sounds in order to determine where, with what machine, and at what time of day a recording had been made—thus enabling a wide range of sonic frequencies to testify.
Legislation similar to PACE was adopted by many other countries in the mid-1980s, resulting in the permanent installation of audio recording machines in police interview rooms around the world. As in Britain, these policies resulted in the establishment of independent forensic audio labs, and today there are even postgraduate university programs devoted to the field.
The advent of PACE is representative of an epistemic and technological shift which gave rise to new forms of testimony based on the analysis of objects rather than witness accounts. In the case of forensic listening there is no clean shift from witness account to the expert analysis of objects because the witness account and the object under investigation become the same thing. The voice is at once the means of testimony and the object of forensic analysis.
JP French Associates, the UK’s most prominent independent forensic audio laboratory, has worked on over 5,000 cases since 1984. Its founder, Peter French, told me in reference to PACE that “whereas up to that point […] I had a trickle of work coming in, all of a sudden it was as though there had been a thunderstorm and it started raining cassette tapes2.”3 However, this overnight transformation of the voice as a legal object of investigation must be seen in the greater context of the role of the voice in law at large. Would this thunderstorm have happened if the voice was not already such a complex article of evidence central to the formation, mediation, and practice of the law? The PACE legislation formalizes a regime of listening that was always present within law: that the initiation of audio recording machines in police interview rooms drew upon, brought to the surface, and professionalized a way of listening to the voice specific to political and legal forums.
For the law to acquire its performative might, it must be delegated to the voice. For the law to come into effect it must be announced and it must be heard. As a site where speech acts, the trial allows us to understand how the voice serves to activate certain forms of governance and control, and how the ways in which we are heard.
In the United States Supreme Court there is a vocal tradition that I find quite revealing: when the clerk enters the courtroom at the beginning of the day they inaugurate the proceedings by striking the gavel onto the woodblock then waiting for silence, before announcing, “the Honourable, the Chief Justice, and the Associate Justices of the Supreme Court of the United States”—and then, for four seconds, they interrupt their own speech and sing out “OYEZ OYEZ OYEZ”—before returning to the declaration that the court is now sitting and that God is now blessing the honourable court. Then with a second strike of the gavel the clerk sits down.
These announcements, in combination with other oaths and speech acts, function as a juridical amplifier, the switch that makes legally inaudible speech audible. These acts operate through the voice in order to transform words from the normal conditions of communication to the extraordinary conditions of testimony. And yet something more than the speaking of words is found in the clerk’s call. In those four seconds when his annunciation shifts from a prescribed set of spoken words to the ineffability of non-verbal sounds—“OYEZ OYEZ OYEZ”—we see that it is not simply language that legislates but also the extra-linguistic elements of the voice itself.
The legal action habeas corpus offers us some insight into the use of the voice as both a verbal and a non-verbal instrument. This ancient writ, which translates to “may you have the body,” stipulates that a person under arrest must be physically brought before a judge. The judge must see and hear the suspect live. The voice is a corporeal product that contains its own excess, with this corporeal excess announcing to the court the absolute presence of the witness. This bodily excess of the voice resides not in its linguistic functions, but in its non-verbal affects; such as its pitch, accent, glottal stops, intonations, inflections, and impediments. As byproducts of the event of language, these affects reveal other kinds of evidence, evidence that may evade the written documentation of legal proceedings but does not escape the ears of the judge and of those listening to a trial in the space of the courtroom.
These paralinguistic elements of testimony produce a division of the voice, which in turn establishes two witnesses within one voice: one witness speaks on behalf of language and the other on behalf of the body. Often the testimony provided by each of these two witnesses is corroborated by the other, but they can also betray one another—an internal betrayal between language and body, between subject and object, fiction and fact, truth and lie. This betrayal exists in a single human utterance in which the self gives itself away. This splitting of the voice into two selves, or into two witnesses, can also be seen as an extension of the well-established legal principle of “testis unis, testis nullus”, which translates to “one witness, no witness,” and which means that testimony provided by any one person in court is to be disregarded unless corroborated by the testimony of at least one other. The law, it seems, requires a certain doubling of testimony, and this doubling even extends to the single witness. In the eyes of the law, the testimony of the single witness, whether the suspect or the survivor, has to be split into language and its bodily conduit for it to be considered testimony at all.
This doubling of testimony marks the terrain which became occupied by forensic linguists and acousticians within the field of law after 1984. In the cases of forensic listening these professional listeners became the expert witnesses speaking on behalf of the paralinguistic attributes of a person’s testimony. After 1984 these were the people called in to corroborate and resolve the inherent division of the legal voice, formalizing an acoustic practice inherent to jurisprudence.
The audio cassette recorders at the centre of the PACE policy show how technology is also inextricably linked to what I claim is an historical audio event. The invention of the stethoscope by Rene Laennec in 1816 formally inaugurated the practice of auscultation (listening to the inner sounds of the body).4 The stethoscope communicates medicine as a terrain of care and a space where the concerns of the patient can be heard. It symbolizes the human communication between doctor and patient. Yet its material legacy is quite different. What the stethoscope actually did was to allow the doctor to bypass the subjective testimony of patients and instead communicate directly with their bodies. Understanding how to interpret sounds from hearts, stomachs, and lungs meant that the doctor could communicate with the objective truth of the body, as this emerging acoustic lexicon was thought of as a collection of voices which, unlike the speech of the patient, didn’t lie. The stethoscope shifted the medical ear from listening to the patient’s self-diagnosis to listening to the sounds of the body.
Like forensic listening, the stethoscope pits the subject against itself as simultaneous testimonies can be emitted from the body and from the speaking voice. In auscultation there exists a very literal example of this doubling of the voice. While listening to the lungs with a stethoscope, the patient is asked to say the letter “e”. If the lungs are clear, the doctor will detect the spoken “e” (“ee”) as sounding like an “ee”. Adversely, if the lungs contain fluid or a tumour, the patient’s spoken “e” will sound like a phonetic “a” (“ay”). The “e” sound gets transmuted to an “a” sound through the body. This “e” to “a” transmutation shows us the ways in which the voice becomes doubled in the medical ear and how one voice can produce multiple accounts of itself. The example becomes increasingly literal if we examine the name for this auditory event, egophony.5 Literally ego “the self” and phone speech sound. Yet this self-identifying speech-sound (ego-phony) could also be understood as ego-phony the fraudulent self. And when we combine all these definitions we arrive at a name for a form of listening that almost perfectly describes the intentions of auscultation, i.e. detecting a fraudulent (phony) speech-sound (phone) which betrays the self (ego).
The paradox of the stethoscope is that it simultaneously produces an objective distance from the patient and a deeper proximity to their body. As a non-electronic device it simply connects a material path through which vibrations can be channelled from the inner body of the patient directly to the eardrums of the doctor. This distanced yet deep material form of human contact is also characteristic of forensic listening, whereby one listens not to the semantics of language but to the molecular constitution of individual phonemes. This shared practice of listening which re-orientates subject into object reveals a direct lineage from auscultation to forensic phonetics. Auscultation offers the law, as it offered medical practice, the promise of amplifying the objective aspects of an otherwise deeply subjective account of an event. Yet in such cases one can adequately listen to only one aspect of the voice at a time; the qualities of the voice as object mute the subjective and semantic enunciations or vice versa. The shift from one form of listening to another can happen insidiously and invisibly and yet, its political impact and effect on the listened-to populace can be radical.
During my 2010 interview with the forensic linguist Peter French he told me: “Last week, a colleague and I spent three working days listening to one word from a police interview tape.”6 This exemplified French’s radical approach to both listening and the theoretical paradigms that surround sound production. Unlike many sound theorists who focus on sound’s ephemeral and immaterial qualities, French’s approach is markedly material. The contemporary dominant school of audio culture is heavily influenced by Don Ihde’s 1976 text Listening and Voice: A Phenomenology of Sound, which puts forward the impossibility of fundamentally grasping sound.7 French’s formulation however, renders sound dissectible, replicable, physical and corporeal in its qualities as object. What allows French’s radical approach to sound is the forensic intensity at which he listens, which allows the audio object to reveal a large amount of information as to its production and its form: the space in which it was recorded, the machine that recorded it, geographical origin of the accent, as well as details of the age, health, and ethnicity of a voice.
Yet as with all cases of legal, social, and ethnic profiling, French walks a thin ethical line. Ironically, what allows French to maintain his credibility in a time in which law enforcement increasingly reaches out to forensic linguistics in odious forms of surveillance and profiling that target huge swathes of the population, is his ability to listen better. French understands the limits of what can be detected through the voice and therefore avoids exploiting the law’s generally increasing demand for the empty promises of forensic science and its ignorance regarding their practical capacity.
Right now forensic listening is being applied more than ever before. Its application is primarily on two fronts: speaker profiling of asylum seekers and developing voice-activated algorithms for the security industry. Today it is applied on such a scale that law enforcement agencies and security services cannot often afford the expert listening of people like Dr. French. Hence, frighteningly, we are entering a time in which there is both an over-capacity demand for the governance of the voice, and an inadequacy of authentic means of producing such a governance. In other words, we have now entered a sorry phase where bad listening (and therefore bad evidence) is flooding the forum.
It is not simply governance of the voice that has been made more pervasive but also the employment of these modes of listening in the control of territory and the production of space. Their use as agents of spatial control is made clear if we take a closer look at legal terminology and practice, in order to see how forensic listening becomes a technically instantiated and formalized process of fundamental legal concepts. If we divide the term “jurisdiction,” which connotes a territorial range over which a legal authority extends, we see that “juris” refers to a legal authority or right and “diction” refers to speech. “Diction” in linguistics is also defined as the manner of enunciating and uttering sounds and words, indicating not simply speech but the process of enunciation and amplification of words. By understanding the etymology of the term jurisdiction, we see that the law itself operates as a speech-space in which those within its range of audibility are subject to its authority. As a fundamental principle of legal governance jurisdiction reveals to us the power of sound in the construction of the space and time of the law. Much like the radio in the workplace, the audio medium affords the law a means of controlling space and interpolating its subjects while remaining predominantly out of sight.
By 2003, the United States and the United Kingdom were entrenched on two fronts in the war on terror. These wars forced mass migrations that became the catalyst for immigration authorities around the world to turn to forensic speech analysis to determine if the accents of asylum seekers correlated with their claimed national origins to determine legitimacy of asylum entitlement. On a scale similar to the 1984 PACE act, this produced a huge proliferation of forensic listening, this time employed to help determine the validity of asylum claims made by thousands of people without identity documents, particularly in Australia, Belgium, Germany, the Netherlands, New Zealand, Sweden, Switzerland, and the United Kingdom.
In most of the countries listed above the protocol is as follows: a telephone interview is organized between the asylum seeker and a private company run by forensic phoneticians based in Sweden, Sprakab. Using anonymised analysts (which many claim are actually former refugees with no linguistic training) the claimant’s voice is elicited, recorded, and analysed and subsequently a report is produced and given to the immigration authorities. The confidence in, and the rapidly increasing predominance of, this kind of investigation within immigration law is troubling, given that its accuracy has been called into question by many forensic linguists, phoneticians, and other practitioners around the world8. One of their criticisms is that citizenship is a bureaucratic distinction and that the voice is a socially and culturally produced artifact that cannot be tidily assimilated into the nation-state.
In undertaking extensive research into this politically potent form of listening I heard many shocking accounts of vocal discrimination and wrongful deportations—none more so than that of Mohamed, a Palestinian asylum seeker who, after having the immigration authorities lose his Palestinian identity card, was forced to undergo an accent analysis to prove his origins. Subsequently he was told he was lying about his identity because of the way he pronounced the word for tomato. Instead of “bandora” he said “banadora.” This tiny “a” syllable is the sound that provides the UK border agency with the apparent certainty of Mohamed’s Syrian origin: a country only 22 kilometres away from his hometown of Jenin in Palestine. Therefore, in designating this syllable as a marker of Syrian nationality, the Border Agency implies that this vowel, used in the word tomato, is coterminous with Syria’s borders. The fact that this syllable designates citizenship above an identity card that contradicts it forces us to rethink how borders are being made perceptible and how configurations of vowels and consonants are made legally accountable.
Locating this Syrian vowel in the speech of a Palestinian surely proves nothing more than the displacement of the Palestinians themselves. In other words, the instability of an accent, its borrowed and hybridized phonetic form, is testament not to someone’s origins but only to an unstable and migratory lifestyle, which is of course common among those fleeing from conflict and seeking asylum, often spending years getting to the target country and living in diversely populated camps along the way. Moreover, it should be remembered that in such camps one may want to conceal the origin of one’s voice because of the continual fear of persecution.
When calling for ways in which to implement better practice in cases of language analysis for the determination of origin of undocumented and illegal migrants (LADO), forensic linguist Helen Fraser says that we “need to clearly separate linguistic data from potentially biasing background on the applicant’s ‘story’.”9 Clearly in this expression of objectivity we see how linguists want to auscultate the accent and go beyond the potentially traumatic and pathetic “story” of a person’s flight; preferring to find in their speech another type of testimony. However, for adept forensic listeners this accent object (linguistic data) should also be heard as a “story” in itself, one that could reveal an account just as traumatic. For listeners who are not content with drawing a border around a single phonetic article, the accent should be understood as a biography of migration, as an irregular and itinerant concoction of contagiously accumulated voices, rather than an immediately distinguishable sound that avows its unshakable roots neatly within the confines of a nation state. In the clear distinction between biographical data and linguistic data, we see how this policy is used as a practice which does not seek to excavate the life of an accent, only the virtual impossibility of locating its place of birth.
Like all practices of auscultation, the forensic analysts can be understood as operating in the excess of the speaker. In the case of Mohamed, his rejected status is owed to an interviewee who Mohamed claims was an Iraqi Kurd and whose Arabic dialect was so different to his that he had to shift his way of speaking simply to be understood and to understand. Listening is never simply a passive, objective and receptive process, but rather an act that plays a fundamental role in the construction and facilitation of the speech of the interlocutor (whether subject or object). Therefore what becomes amplified in such investigations is not the true identity of the sonic object under investigation but the political potency of the listening itself and the agency of the listener. The results of this forensic listening tell us little about Mohamed’s accent but a great deal about the contemporary political context in which this audio investigation participates.
In the form of listening that is presented in the case of Mohamed the forensic listening paradox is perfectly performed: in an attempt to hear objectively, the listener’s own subjectivity emerges and is made distinctly audible. This then allows one to ask the question: as an inter-subjective process can listening ever be objective? Will listening always be tainted by the subjectivity of that which listens? In attempting to answer these questions we quickly reach the fundamental paradox and the empty promise of forensic listening. Perhaps the only way to detach oneself from any given situation is to listen, as Dr. French does, to a single syllable for three days; until the sound becomes completely abstracted from humanity and the culturally pre-programmed prejudice of the ear.
The Right to Silence
In attempting to establish a correlation between voice and citizenship we encounter another vocal legal paradox. In criminal charges against a citizen of the United Kingdom, the criminal is afforded the right to protection from self-incrimination; commonly known as the right to silence.10 This is a fundamental legal right not to speak if you feel that your speech would in some way incriminate you. With speech profiling becoming a more and more widespread form of investigation, it is not only our words that can incriminate us but the phonological content of our voices as well. Just as our speech is being mutated by the legal system we must fight to rephrase the legal diction so that the ways in which our voices are placed under custody and investigated remains transparent.
My proposal for altering the way the law speaks to us entails changes from the moment of one’s arrest onwards, and therefore entails amending the right to silence. In the United Kingdom, the revised version might read:
You do not have to say anything. But it may harm your defence if you do not mention when questioned something which you later rely on in court. Anything you do say, [including the way you say it] may be given in evidence against you.
This fundamental legal right is only afforded to the citizen; the asylum seeker, for example, has no recourse to silence, as the burden of proof lies not with the prosecutor in such cases but with the claimant themselves: in other words, if they don’t speak they will be deported. Without the right to silence, the asylum seeker is forced to speak to the law; they must make themselves audible to the system and yet they remain without control over the conditions of how they are being heard. What they do retain, however, is the human right to freedom of expression and it is my argument that this policy of listening contravenes this fundamental right.
These forensic speech analyses force us to redefine our right to freedom of speech, a concept that must now be extended to encompass not only the words we speak, but also the sonic quality of our speech itself. The voice has long been understood as the very means by which one can secure and advocate one’s political and legal interests, but these recent shifts in the way the law listens affirm that the stakes and conditions of speech have altered in a non-transparent way. This seemingly minute shift can have a dramatic impact on people’s lives. The more radical the practices of listening at the core of legal investigations become, the more they herald the advent of a moment to redefine and reshape the political conventions of speech and sound in society. It seems that the battle for free speech is no longer about fighting to speak freely, but fighting the control over the very conditions under which we are being heard.
The Whole Truth
The latest development in forensic linguistics is the product of the combined labour of mathematicians and speech-scientists to produce computer algorithms that allow users to automatically profile voices for a variety of different applications. The most prominent of these applications is “voice stress analysis,” the premise of which is that, through a frequency analysis, the physiological conditions of stress are made audible by the non-verbal elements of a voice. This technology is said to be able to determine all sorts of psychological verdicts based on jittering frequencies, glottal tension and vocal intensity, all regardless of language.
At Delft University in Holland a team of linguists and computer scientists are developing a kind of “trauma-ometer” application for emergency calls whereby the algorithmic listening software would determine the priority of a call depending on the level of stress detected in the caller’s voice. The idea behind this is that the tension of the vocal chords produce “jitter,” which in linguistics relates to fluctuations in pitch, and that the level of stress a person is undergoing can be observed in the intensity at which these minute fluctuations occur. Therefore the scale of the emergency is legible as affect on the body that witnessed it. Regardless of what is being said, the first response to the event will then be a response to the body of its witness. In building a hierarchy of trauma this machine also produces a chain of command that situates the paralinguistic aspects of the voice as an authority over the words that the caller wishes to relay. The stress the body undergoes here is considered the objective truth of the event; yet in my next example these same physiological attributes are taken to reveal the opposite—a lie.
A piece of software called Layered Voice Analysis 6.50 (LVA 6.50), developed by Israeli company Nemeysesco Ltd, is the major application of this new form of forensic voice profiling; it is currently employed as a lie detection method by the Los Angeles Police Department, European, Russian and Israeli governments, and insurance companies all over the world. In the UK, Harrow council and many others are using it to measure the veracity of benefit claims made by disabled citizens11. Lynn Robbins, director of the company Voice Analysis Technologies LLC, the main retailer of the software, told me in an interview that based on analysis of the voice as it resonates through the body, LVA 6.50 can not only determine whether a person is lying, but is able to deliver a whole series of verdicts—detecting, for example, embarrassment, over-emphasis, inaccuracy, voice manipulation, anxiety, and whether or not the interviewee is attempting to outsmart his/her interlocutor; in the future, I was told, it will even be able to hear sex offending tendencies.12
Commander Sid Hale is piloting the same software for the Los Angeles Police Department and explains that: “Unlike the polygraph we don’t need to cooperate with the suspect, we don’t need to wire them up with skin responses or respirators, it does it in real time.” This idea of being able to access the body of the person who is the object of one’s interest without touching it is very attractive to law enforcement agencies, just as it was to doctors who first used the stethoscope in 1816. Reports from that time say that one of the benefits of the stethoscope was that it meant doctors no longer needed to press an ear to the patient’s body, and hence it provided them with a hygienic distance from the potentially diseased patient.
One key, politically sensitive effect of the fact that LVA 6.50 can operate without physical interaction—the voice analysis might be conducted during a telephone conversation, or using a pre-recorded sample—is that testing can be undertaken without the consent or knowledge of the subject.
In the context of borders and prisons, this hygienic distance allows the authorities to access the emotional and bodily content of the non-citizen (e.g. the prisoner or refugee) without needing them to formally enter the society of citizenship. At the border this test can be performed before a person formally enters the country, or even before they leave their country of origin—meaning that LVA 6.50, in making use of the distance of audibility, enables the extension of the border itself. This software simultaneously extends the range of the law’s juris-diction while also designating those who must remain beyond its range of responsibility/audibility, differentiating between those to be afforded the rights of a citizen and those to be denied those rights, and distancing the possibility of claiming refugee status.
Although in the legal context there has never been a need for an ear to be pressed against the suspect’s body, the principle of habeas corpus, as discussed above, requires that the subject be brought physically before the law (e.g. in an interrogation room or courtroom) in order to have a legal hearing. Yet we could easily imagine how LVA 6.50 would eradicate the necessity for the physical presence of the suspect, as it requires only a voice to access the corpus. In this sense, LVA 6.50 short circuits the process of habeas corpus,13 using an algorithm and a visual interface to give the law access to what a person’s body is “really” saying as they speak, even if that body is thousands of miles away.
Voice stress analysis is not only designed to distance the user from the subject of analysis; it also works to remove or minimize the presence and role of the user (the interrogator, insurance broker, or border guard, etc.). In an interview situation, the visual interface flashes up its verdicts as the interviewee speaks. This machine thus promises to listen on behalf of its operator, reducing or putting into question their interpretative and intuitive capacities. In this sense this technology not only mutes the words of the speaker, but also deafens the listener. And although a direct lineage can be traced from the stethoscope to voice stress analysis technologies, the removal of the necessity for the operator to listen articulates the fundamental break with auscultation as a practice. Unlike the work of forensic listeners like Dr. French, in the microscopic analysis of the frequencies of the human voice LVA 6.50 can hear beyond the range of human audibility and therefore excludes the possibility of building new auditory skills.
Not only does LVA 6.50 listen on behalf of its user, but in registering emotional content this software feels on behalf of its user as well. Using this software the interviewer no longer needs to be sensitive to the psychological condition of his subject. The machine thus produces apathetic operators who listen neither to words nor tone of voice, and therefore minimizes the extent to which the interviewer dirties themselves with the subjectivity of the interviewee. This machine is so attractive to law enforcers because it recognizes the fundamental flaw of previous modes of forensic listening; that the subjectivity of the speaker is replaced by that of the listener/interpreter/aural investigator. In order to produce the laboratory conditions for justice and a completely objectified realm of listening, law enforcement recognizes that listening must be relegated to the machine. Yet in voice stress analysis there still remains the glitch of the subject contaminating the legal laboratory, as these algorithms first have to be programmed by people who could have bigoted ears and economic agendas. To produce a verdict the algorithm needs to learn the logics of those verdicts—e.g., in order for it to profile the voice of a sex offender it first needs someone to teach it the vocal attributes of a sex offender.
In response to the astounding claims of LVA 6.50’s highly sensitive and microscopic listening, a group of speech scientists and mathematicians in the department of phonetics at the University of Stockholm closely examined the product’s technical patent and reverse engineered the software in order to test its scientific credibility. The idea that the machine would work “regardless of language” was taken seriously by the group, who tested the software using only vowel speech sounds and single phonemes. Interested to see how the machine produced its wide range of judgments the group used the pure object of speech; de-subjectified voices speaking only vowels without thought or semantics. After months of testing the machine and collecting large amounts of data they understood that the software analysis was operating on the very basic level of amplitude and found that it simply had to do with a person’s capacity to hold a steady pitch and volume. They also claim that the distinctions between the various verdicts (e.g. between embarrassment and out-smart or excitement and inaccuracy) are arbitrarily placed along this scale. According to their investigation, the claim that the technology functions as a lie detector is bogus; one of the mathematicians working on the reverse engineering project told me that its logic was akin to “a horoscope or a prophecy” in its pseudo-scientific nature.14
LVA 6.50 amplifies the dark phrenology of the voice which is operative today. Regardless of accuracy software which use the voice as biometric tool deeply confuse its role as a conduit for language and negotiation. Simply by virtue of the fact that insurance companies, government councils and police departments use these forms of listening offered by LVA 6.50, the software is weaponized, regardless of its credibility amongst the scientific community.
In the sites where speech acts it is our speech which is under attack. The promise (empty or not) of LVA 6.50 or of LADO (the accent analysis of asylum seekers) to reorient the speaking subjects contained within any given juris-diction is already underway. We arrive at an uncertain future of the voice and a moment to question its very legitimacy as both an object of legal investigation and the means through which the law becomes enacted. Assuming an increasing proliferation of these emergent and mutated strands of forensic listening forces us to ask more general questions about the role of the voice as a central legal infrastructure; will it still be a fair and just hearing when nobody is listening?
1 Thomas Keenan and Eyal Weizman, “Mengele’s Skull”, Cabinet, issue 43, Fall 2011, pp.61–67, p.62.
2 Peter French in interview with author, 2010.
4 Laennec’s work to classify the sounds of the body is a major contribution to medical diagnosis and the image of the stethoscope is now a symbol of the medical profession at large.
5 When we consider the contemporary spelling of Egophony.
The image documents voiceprints (voice fingerprints) of two different voices saying the word “you.” The horizontal axis is time and the vertical axis is frequency. The contour lines then illustrate the amplitude of a specific frequency at a specific time in the pronunciation of the word “you.” In the use of cartographic techniques to produce voiceprints we can see clearly the interrelation of the control over both voice and territory. Image source: Ira Freeman, Sound and Ultrasonics.
6 Peter French in interview with author, 2010.
7 Don Ihde, Listening and Voice: A Phenomenology of Sound, Athens, OH: Ohio University Press, 1976. The continuing prevalence of this school of thought is demonstrated in the 2009 book Sounding New Media by Frances Dyson, who states in the introduction: “As Don Ihde pointed out decades ago ‘a sound is always multiple, always heterogeneous, being neither visible or tangible, sound is never quite an object, never a full guarantor of knowledge.’”
8 Diana Eades, “Applied Linguistics and Language Analyses in Asylum Seeker Cases” Applied Linguistics 26/4, 2005, pp.503-526.
9 Helen Fraser, “The role of linguistics and native speakers in language analysis for the determination of speaker origin. A response to Tina Cambier-Langeveld” The International Journal of Speech, Language and the Law, 18.1, 2011, pp.121-130
10 Known as Miranda rights in the United States.
11 Harrow council claims they have saved roughly £330,000 of benefit payouts in the first seven months of using this software. ITN News, February 2010.
12 Robbins, 2012.
13 “Short circuit” understood not in its everyday use to mean an electrical malfunction but in its original sense whereby a current travels along an unintended path, following the route of least resistance.
14 Takanen in interview with author, 2012, Stockholm.