top of page
Title:
About the science of AI.
Publish Date:
Abstract:
The science of AI is concerned with the study of intelligent forms of behaviour in computational terms.
But what does it tell us when a good semblance of a behaviour can be achieved using cheap tricks that seem to have little to do with what we intuitively imagine intelligence to be? Are these intuitions wrong, and is intelligence really just a bag of tricks? Or are the philosophers right, and is a behavioural understanding of intelligence simply too weak? I think both of these are wrong. Paper suggest in the context of question-answering that what matters when it comes to the science of AI is not a good semblance of intelligent behaviour at all, but the behaviour itself, what it depends on, and how it can be achieved. I go on to discuss two major hurdles that I believe will need to be cleared.
Tags:
NLP
Review:
Dennis kuriakose
Date:
27 August 2024
Review:
Related paper on Winograd challenge from Levesque et all (2012)
The paper is a paradigm shift in natural language processing system evaluation; proposed by researchers in the University of Toronto back in 2012/2014. This paper underpins the philosophical development of today's chatbots aka LLMs - AI systems which are of the world around it and the conventions and notions that drive our life forward.
1. Key Question the Paper is Trying to Answer:
The paper seeks to address the inadequacies of current AI evaluation methods, particularly the Turing Test, by asking: How can we more accurately assess true machine intelligence, beyond mere mimicry of human behavior?
2. Different Approaches Taken So Far:
Turing Test: The classic method to evaluate AI by determining if it can imitate human conversation well enough to deceive a human judge. However, this method has been criticized for allowing AI to rely on tricks and superficial responses rather than demonstrating true understanding.
Loebner Competition: A practical implementation of the Turing Test, which often highlights the weaknesses of the test by showing how AI systems can use deception and superficial strategies to appear intelligent.
Recognizing Textual Entailment (RTE): Another approach focuses on understanding relationships between sentences but is still limited in measuring deep comprehension.
3. Author's Proposed Approach:
The author proposes using the Winograd Schema Challenge as a superior alternative to the Turing Test. This challenge consists of binary-choice questions where AI must resolve ambiguous pronoun references in sentences. It requires a deeper understanding of context and language, and it is specifically designed to avoid the pitfalls of the Turing Test, such as reliance on cheap tricks or superficial linguistic patterns.
4. Challenges (from Section 5):
The author identifies two primary scientific hurdles:
"Much of what we come to know about the world and the people around us is not from personal experience but is due to our use of language. People talk to us, we listen to weather reports and the dialogue in movies, and we read text messages, sports scores, mystery novels, etc. And yet, it appears that we need to use extensive knowledge to make good sense of all this language.
Even the most basic child-level knowledge seems to call upon a wide range of logical constructs. Cause and effect and non-effect, counterfactuals, generalized quantifiers, uncertainty, other agent’s beliefs, desires and intentions, etc. And yet, symbolic reasoning over these constructs seems to be much too demanding computationally."
5. Conclusion:
The author concludes that while the Turing Test has been a valuable tool historically, it falls short in evaluating genuine intelligence. The Winograd Schema Challenge offers a more robust alternative by focusing on deep understanding and linguistic reasoning. This approach pushes AI systems to demonstrate true comprehension rather than relying on superficial tricks, making it a better measure of machine intelligence.
Technology Posts
bottom of page