08 May 2015

What can the Turing Test Tell Us?

Alan Turing's contribution to mathematics, cryptography and computer science was inestimable. Not only did he shorten World War Two, saving thousands of lives, he advanced us onto the path of digital computers. His suicide after being coerced into hormone treatment is a massive blot on the intellectual landscape in Britain. It is an enduring source of shame. Turing's work remained classified for decades because of the fear that war might break out again and knowing how to break the complex codes used by the Germans was too valuable an advantage to throw away. Nowadays, cryptography has advanced to the point where keeping Turing's work a secret no longer confers much advantage.

Turing was prescient in many ways. Not only did he set the paradigm for how digital computers work, but he understood that one day such machines might become so sophisticated that they were indistinguishable from intelligent beings. He was the first person to consider artificial intelligence (AI). Thinking about AI led him to construct one of the most famous thought experiments ever proposed. The Turing Test is not only a way to distinguish intelligence, it is actually a way of thinking about intelligence without getting bogged down in the details of how intelligence works. For Turing and many of us, the argument is that if a machine can communicate in a way that is it indistinguishable from a human being, then we must assume that it is intelligent, however it achieves this. It's a pragmatic definition of intelligence and one that leads to a practical threshold, beyond which all AI researchers wish to pass.

However underpinning the test are some assumptions about communication, language, and intelligence that I wish to examine. The first is that all human beings all seem to be considered good judges for the Turing Test. I think a good case can be made for considering this a false assumption. The second is the assumptions that mere word use is how we define not only intelligence, but language. Both of these are demonstrably false. If the assumptions the test is built on are false, then we need to rethink what the test is measuring, and whether we still feel this is a sufficient measure of intelligence.

Turing Judges.

The idea of the Turing Test is that a person sits at a teletype machine that prints texts and allows the operator to type text. The human and the test subject sit in different rooms and use the teletype machines to communicate. A machine can be said to pass the Turing Test if a human operator of the teletype cannot tell that the subject is not human. This puts word use at the forefront of Turing's definition of what it means to be intelligent. 

Human beings use of language is indeed one of our defining features. Animals use faculties that hint at a proto-language facility. No animal uses language in the sense that we do. At best animals show one or two of the target properties that define language. They might for example have several grunts that indicate objects (often types of predator), but no syntax or grammar. There has been significant interest in programs that sought to teach apes to use language either as symbols or gestures. But most of this research has been discredited. Koko the gorilla was supposedly one of the most sophisticated language uses, but her "language" in fact consisted of rapidly cycling through the repertoire of signs, with the handler picking the signs that made most sense to them. In other experiments subtle cues from handlers told the animals what signs to use. More rigorous experiments show that chimps can understand some language, particularly nouns, but then so can grey parrots, some dogs, and other animals. Crucially they don't use language to communicate. In fact a far more impressive demonstration of intelligence is the ability of crows to improvise tools to retrieve food, or the coordinated pack hunting of aquatic mammals like orca and dolphins. So animals do not use language, but are none the less intelligent. 

Humans are all at different levels when it comes to language use. Some of us are extraordinarily gifted with language and others struggle with the basics. The distinctions are magnified when we restrict language to just written words. This restriction alone is doubtful. Language as written language, even if used for a dialogue, is only small part of what language use consists of. A great deal of what we communicate in language is conveyed by tone of voice, facial expression, hand gestures, or body posture. Those people who can use written language well are rare. So a Turing judge is not simply distinguishing a machine from a human, but is placing a machine on a scale that includes novelists and football hooligans. What happens when the subject responds to any question by chanting "Oi, oi, oi, Come on you reds!"? Intelligence, particularly as measured by word use, is not a simple proposition. 

The Turing Test using text alone would be more interesting if we could define in advance what elements would convince us that the generator of the text was human. To the best of my knowledge this has never been achieved. We don't know what criteria constitute a valid or successful test. We just assume that any generic human being is a good judge. There's no reason to believe that this is true. As I've mentioned many times now, individuals are actually quite poor at solo reasoning tasks (See An Argumentative Theory of Reason). Reason does not work the way they we thought it did. Mercier & Sperber have argued that at least one of the many fallacies that we almost inevitably fall prey to—confirmation bias—is a feature of reason, rather than a bug. M&S argue that this is because reason evolved to help small groups make decisions and those who make proposals think and argue differently to those who critique them. On this account, any given individual would most likely be a poor Turing judge. 

Humans beings evolved to use language. Almost without exception, we all use it without giving it much thought. Certain disorders or diseases may prevent language use, but these stand out against the background of general language use: from the Amazon jungles to the African veldt, humans speak. The likelihood is that we've been using language for tens of thousands of years (See When Did Language Evolve?). But writing is another story. Writing is unusual amongst the world's languages, in that only a minority of living languages are written, or were before contact with Europe. Writing was absent from the Americas, from the Pacific, from Australia and New Guinea. The last two have hundreds of languages each. Unlike speaking, writing is something that we learn with difficulty. No child spontaneously begins to communicate in writing. Writing co-opts skills evolved for other purposes. And as a consequence our ability to use writing to express ourselves is extremely variable. Most people are not very good at it. Those who are, are usually celebrated as extraordinary individuals. Writers and their oeuvre are very important in literary cultures.

So to chose writing as the medium of a test for intelligence is an extremely doubtful choice. We don't expect intelligent human beings to be good at writing. Many highly intelligent people are lousy writers. We don't even expect people who are gifted speakers to be good at writing, which is why politicians do not write their own speeches! Writing is not a representative skill. Indeed it masks our inherent verbal skill.

In fact it might be better to use another skill altogether, i.e. tool making. A crow can modify found objects (specifically bending wire into a hook) to retrieve food items. Another important manifestation of intelligence is the ability to work in groups. Some orca, for example, coordinate their movements to create a bow-wave that can knock a seal off an ice-flow. This is a feat that involves considerable ability at abstract thought, and they pass this acquired knowledge onto to their offspring. The ability to fashion a tool or coordinate actions to achieve a goal are at least as interesting as manifestations of intelligence as language is.

Language and Recognition.

My landlady talks to her cats as though they understand her. She has one-sided conversations with them. Explains to them narratively when their behaviour causes her discomfort, as though they might understand and desist (they never do). She's not peculiar in this. Many people feel their pets are intelligent and can understand them even if they cannot speak. Why is this? Well, at least in part, it's because we recognise certain elements of posture in animals corresponding to emotions. The basic emotions are not so different in our pets that we cannot accurately understand their disposition: happy, content, excited, tired, frightened, angry, desire. With a little study we can even pick up nuances. A dog that barks with ears pinned back is saying something different to one that has its ears forward. A wagging tail or a purr can be a different signal depending on circumstances. A lot of it has to do with displays of and reception of affection. 

Intelligence is not simply about words or language. Depending on our expectations the ability to follow instructions (dogs) or the ability to ignore instructions (cats) can be judged intelligent. The phrase emotional intelligence is now something of a cliché, but it tells us something very important about what intelligence is. A dog that responds to facial expressions, to posture and tone of voice is displaying intelligence of the kind that has a great deal of value to us. Some people value relationships with animals precisely because the communication is stuck at this level. A dog does not try to deceive or communicate in confusingly abstract terms. An animal broadcasts its own disposition ("emotions") without filtering and it responds directly to human dispositions. Many people would say that this type of relationship is more honest.

There's a terrible, but morbidly fascinating, neurological condition called Capgras Syndrome. In this condition a person can recognise the physical features of humans, but their ability to connect those features with emotions is compromised. Usually when one sees a familiar face there is an accompanying emotion that tells us what our relationship with the person is. If we feel disgust or anger on recognition, then we know them to be enemies, perhaps dangerous and we act to avoid or perhaps confront them. If the emotion is joy or love then we know it's a friend or loved one. In Capgras the emotional resonance is absent. With loved ones the absence of that emotion is so strange that the most plausible explanation often seems to be that these are mere replicas of loved ones, or lookalikes. The lack of emotion in response to a known face can be incapacitating in the sense of disrupting every existing relationship. In the classic novel, The Echo Maker, by Richard Powers, the man with Capgras is able to recognise and respond to his sister's voice on the telephone, but does not feel anything when he sees her. The same is true for his home and even his dog. The only way he can explain it is that they are all substitutes cleverly recreated to fool him. Only he isn't "fooled" which creates a nightmarish situation for him. 

The problem, then, with the Turing Test is that it is rooted in the old Victorian conceit about reason being our highest faculty. Reason was, until quite recently, considered to float above the mere bodily processes of emotion. In other words it was very much caught up in Cartesian mind/body dualism and the metaphors associated with matter and spirit (See Metaphors and Materialism). Reason is associated, by default, with spirit, since it seems to be distinct from emotion. We now know that nothing could be further from the truth. Cut off from emotions our minds cannot function properly. We cannot make decisions, cannot assess information, and cannot take responsibility for our actions. The Turing test assumes that intelligence is an abstract quality, separable from the body. But these assumptions are demonstrably false.

What Kind of Intelligence?

I've already pointed out that language is more than words. I've expanded the idea of language to include the prosody, gesture and posture associated with the words (which as we know shapes the meaning of the words). An ironic eyebrow lift can make words mean something quite different than their face value. The ability to use and detect irony depends on non-verbal cues. This is why, for example, irony seldom works on Twitter. Text tends to be taken on face value, and attempts at irony simply cause misunderstanding. This is true in all text based media. In the absence of emotional cues we are forced to try to interpolate the disposition of the interlocutor. Getting a computer to work with irony would be an interesting test of intelligence!

Indeed trying to assess the internal disposition of the hidden interlocutor is a key aspect of the Turing Test. Faced with a Turing Test subject I suspect that most of us would ask questions designed to evoke emotional responses. This is because we intuit that what makes us human is not the words we use, but the feelings we communicate. Someone who acts without remorse is routinely referred to as "inhuman". In most cases humans are not good at making empathetic connections using text - which is why text-based online forums seem to be populated with borderline, if not outright, sociopaths. It's the medium, not the message. Personally I find that doing a lot of online communication produces a profound sense of alienation and brings out my underlying psycho-pathology. Writing an essay however is far more productive exercise than trying to dialogue in text. Even the telephone, with it's limited frequency range, is better for communicating, because tone of voice and inflection communicates sufficient to establish an empathetic connection. 

So if a computer can play chess better than a human being (albeit with considerable help from a team of programmers) then that is impressive, but not intelligent. The computer plays well because it does not feel anything, does not have to respond to its environment (internal or external), and does not have any sense of having won or lost. It has nothing for us to relate to. Similarly, even if a computer ever managed to use language with any kind of facility, i.e. if it could form grammatically and idiomatically correct sentences, it would probably still seem inhuman because it would not share our concerns and values. It would not empathise with us, nor us with it. 

I suppose that in the long run a computer might be able to simulate both language and an interest in our values so that in text form it might fool a human being. But would this constitute intelligence? I think not. A friendly dog would be more intelligent by far. Which is not to say that such a computer would not be a powerful tool. But we'd be better off using it to predict the weather or model a genome than trying to simulate what any of us, or any dog, can do effortlessly.

An argument against this point of view is that our minds are tuned to over-estimate intelligence or emotions in objects we see. So we see faces in clouds and agency in inanimate objects. So an approximation of intelligence would not have to be all that sophisticated to stimulate the emotions in us that would make us judge it intelligent. For example, in movies robots are often given a minimal ability to emote in order to make them sympathetic characters. The robot, Number five, in the film Short Circuit has "eyebrows" and an emotionally expressive voice and this is enough for us to empathise with it. So perhaps we will be easily fooled into believing in machine intelligence. But this means that simulation of intelligence is insufficiently impressive because people are easily fooled.

This point is brilliantly made in the movie Blade Runner. The Voight-Kampff test is designed to distinguish "replicants" from humans based on subtle differences in emotional responses. The replicants are otherwise indistinguishable from humans. The test of Rachael is particularly difficult because she has been raised to believe she is human (the logic of the movie breaks down to some extent because we do not learn by Deckard persists in asking 100 questions if Rachael is answering satisfactorily). Ridley Scott has muddied the waters further by suggesting that the blade runner, Deckard, is himself a replicant, though based on the original story and the context of the film this seems an unlikely twist.

So there are two major problems here: what makes a good Turing test; and who makes a good Turing judge. The whole set up seems under-defined and poorly thought out at present. My impression is that passing the Turing test as it is usually specified is a trivial matter that would tell us nothing about artificial intelligence or humanity that we do not already know. 


It seems to me that we have many reasons to rethink the Turing Test. It seems to be rooted in a series of assumptions that are untenable in light of contemporary knowledge. As a test for intelligence the Turing Test no longer seems reasonable. On one hand the way that it defines intelligence is far too limited. The definition of intelligence it uses is rooted in Cartesian Dualism which sees intelligence as an abstract quality, not rooted in physicality, not embodied. And this is simply false. Emotions, as felt in the body, for example, play a key role in how we process information and make decisions.

As much as anything our decision on whether or not an entity is intelligent or not, will be based on how we feel about it, how interacting with it feels to us. We will compare the feeling of interacting with the unknown entity, to how it feels to interact with an intelligent being. And until it feels right we will not judge that entity intelligent.

In Turing's day we simply did not understand how decision making worked. We still thought of abstract reasoning as a detachable mental function unrelated to being embodied. We still saw reason as the antithesis of emotion. Now we know that emotion is an indivisible part of the process. We must now consider that reason itself may not have evolved for seeking truth, but merely for optimising decision making in small groups. At the very least, the lone teletype operator needs to be replaced with a group of people; and mere words must be replaced by tasks that involve creativity and cooperation. A machine ought to show the ability to cooperate with a human being to achieve a shared goal before being judged "intelligent". The idea that we can judge intelligence at arms length, rationally, dispassionately has little interest or value any more. We judge intelligence through interaction, physical interaction as much as anything.

As George Lakoff and his colleagues have shown, abstract thought is rooted in metaphors deriving from how we physically interact with the world. Our intelligence is embodied and the idea of disembodied intelligence is no longer tenable. As interesting as the idea may appear, there is no ghost in the machine that can be extracted or instantiated and maintained apart from the body. Any attempts to create disembodied intelligence will only result in a simulacrum, not in intelligence that we can recognise as such.

Buddhists will often smugly claim this as their own insight, though most Buddhists I know are crypto-dualists (most believe in life after death and karma for example). I've argued at length that the Buddha's insight was into the nature of experience and that he avoided drawing ontological conclusions. Thus, although we read the texts as being a critique of doctrines involving souls, the methods of Buddhism were always different from the methods of Brahmanism. The Brahmins sought to experience the ātman as a reality, and from the Upaniṣadic description ātman could be experienced as a sense of oneness or connection with everything in the world (oceanic boundary loss). Buddhists deconstructed experience itself to show that nothing in experience persisted and that therefore, even if there was a soul we must either always experience it, or it could never be experienced, and since we start off not experiencing it, no permanent soul can ever be experienced (which is not a comment on whether or not such a soul exists!). Therefore the experiences of the Brahmins are of something other than ātman. Only after Buddhists had started down the road of misguided ontological speculation did this become an opinion about the existence of a soul. So the superficial similarities between ancient Buddhist and modern scientific views is an accident of a philosophical wrong turn on the part of Buddhists. They got it partly right by accident, which is not really worth being smug over.

History shows that we must proceed with real caution here. Our Western views on intelligence have been subject to extreme bias in the past and this has led to some horrific consequences for those people who failed our tests for completely bogus reasons. We must constantly subject our views on intelligence to the most rigorous criticism and scepticism we are capable of. Our mistakes in this field ought to haunt us and make us extremely uncomfortable. This is yet another reason why tests for intelligence ought to require more interactivity. If we do create intelligence we need to know we can get along with it, and it with us. And we know that we have a poor record on this score.

The Turing Test seems not to have been updated to take account of what we know about ourselves nowadays. The test itself is anachronistic. The method is faulty, because it is based on a faulty understanding of intelligence and decision making. We are not even asking the correct question about intelligence. With all due respect to Alan Turing, he was a man of his time, a glorious pioneer, but we're moved on since he came up with this idea and it's had its day. 


See also: Why Artificial Intelligences Will Never Be Like Us and Aliens Will Be Just Like Us. (27 June 2014)
Related Posts with Thumbnails