首页    期刊浏览 2024年07月03日 星期三
登录注册

文章基本信息

  • 标题:Visual Turing test for computer vision systems
  • 本地全文:下载
  • 作者:Donald Geman ; Stuart Geman ; Neil Hallonquist
  • 期刊名称:Proceedings of the National Academy of Sciences
  • 印刷版ISSN:0027-8424
  • 电子版ISSN:1091-6490
  • 出版年度:2015
  • 卷号:112
  • 期号:12
  • 页码:3618-3623
  • DOI:10.1073/pnas.1422953112
  • 语种:English
  • 出版社:The National Academy of Sciences of the United States of America
  • 摘要:SignificanceIn computer vision, as in other fields of artificial intelligence, the methods of evaluation largely define the scientific effort. Most current evaluations measure detection accuracy, emphasizing the classification of regions according to objects from a predefined library. But detection is not the same as understanding. We present here a different evaluation system, in which a query engine prepares a written test ("visual Turing test") that uses binary questions to probe a system's ability to identify attributes and relationships in addition to recognizing objects. Today, computer vision systems are tested by their accuracy in detecting and localizing instances of objects. As an alternative, and motivated by the ability of humans to provide far richer descriptions and even tell a story about an image, we construct a "visual Turing test": an operator-assisted device that produces a stochastic sequence of binary questions from a given test image. The query engine proposes a question; the operator either provides the correct answer or rejects the question as ambiguous; the engine proposes the next question ("just-in-time truthing"). The test is then administered to the computer-vision system, one question at a time. After the system's answer is recorded, the system is provided the correct answer and the next question. Parsing is trivial and deterministic; the system being tested requires no natural language processing. The query engine employs statistical constraints, learned from a training set, to produce questions with essentially unpredictable answers--the answer to a question, given the history of questions and their correct answers, is nearly equally likely to be positive or negative. In this sense, the test is only about vision. The system is designed to produce streams of questions that follow natural story lines, from the instantiation of a unique object, through an exploration of its properties, and on to its relationships with other uniquely instantiated objects.
  • 关键词:scene interpretation ; computer vision ; Turing test ; binary questions ; unpredictable answers
国家哲学社会科学文献中心版权所有