The film The Imitation Game has helped a wider audience understand the genius of Alan Turing and the important role he played in the development of computing and artificial intelligence. Turing is the father of the computer science, no less.
But as well as his work on the Turing Machine and Enigma, he is rightly famed for devising the Turing test to determine if a computer is intelligent. It is a simple idea. Can you tell if you are talking to a computer or a human? If the computer fools you into thinking it’s a person, then it must be intelligent.
In 2014, the Turing test was passed for the first time. But instead of celebrating another AI milestone, attention focused on the test. Is it the best test for artificial intelligence? New Scientist ran an article proposing the Lovelace 2.0 test as a better way of determining intelligence. This is named after another British pioneer of computer science, Ada Lovelace.
The Lovelace test works on the basis that creativity requires intelligence, so a computer is asked to create something, such as draw a Dalek surfing the waves at Westward Ho!, or tell a story about why Ken Dodd is at the airport with a suitcase not full of cash. This is repeated with increasingly difficult requests until the computer is judged to have failed. The result is a comparative score rather than a pass/fail.
Both these tests are trying to determine if a computer is intelligent. But defining intelligence is very difficult and experts continue to argue about whether current theories and tests reflect the important characteristics of intelligence.
What both these tests do instead is to look for proxy indicators from which you could infer the presence of intelligence. Looking for proxy indicators is theoretically imperfect but there are other situations where this pragmatic approach to testing works rather well. For example, if it is difficult to measure the characteristic for which you are testing, or if the answer to the proposed test can't be known in advance. Many systems that analyse big data, where the exact dataset or the real world conditions are not known, fit this bill. Day by day, we can’t be certain that the process we have performed has produced the outcome we desired because we don’t know what that outcome should be.
The most popular approach to finding out if the service provides the right answers is to test predominately with a known set of data to show the desired behaviour because you can work out the answer if you know the inputs. However, this only provides confidence that the system works with this artificial data, and the data rarely represents the variety of conditions of its real world data counterparts.
It is better to test with real data sets and devise tests that either show definitively that the system doesn't work or provide us with a defined level of confidence that the process has worked, although there is a measured chance that it hasn’t. You can’t be 100 per cent certain if you’ve passed, but you know if you’ve failed. This is similar to checking if a number is prime using a primality test based on Fermat’s Little Theorem. It’s possible to be certain that a number is not a prime but not that it definitely is a prime.
We also might make intelligent assumptions and build a model to provide guidelines for what good enough will look like in the outcome. Or we may run some trials and extrapolate from the data samples they collect to create expected results. Or we may do both and correlate success in creating one successful answer with the likelihood of seeing successes elsewhere. The extrapolation approach is common and often appropriate in non-functional testing, such as performance testing (always taking care that the extrapolation is not built on a false assumption).
You don’t need to be a genius like Turing or Lovelace to devise such tests but they do require more thought, skill and technique. Before you let that put you off this approach, you can be sure that the cost and effort involved in deriving the right test is much less than that of executing many tests that create a false sense of confidence.
In the case of artificial intelligence tests, even if the Lovelace test supplants the Turing test, the latter lives on across the internet. Those annoying little security tests that ask you to type in a distorted word, or number, are called CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart.