|
|
Visual features of intermediate complexity and their use in classification, by Shimon Ullman, Michel Vidal-Naquet, and Erez Sali
WRT: 1.0 hours
On a total of one side of one sheet of paper, using 10 pt type or larger, with standard interline spacing and margins, respond to all the following.
Ullman is part of a group that is submitting a big proposal to the National Science Foundation to establish a center for Brains, Minds, and Machines.
He decides to solicit a letter of support from Patrick Winston. Winston says he would be delighted to help, but, alas, he asks you, his student, for talking points in the form of a draft letter.
Winston says the proposal is still under construction, but the paragraph introducing Shimon's portion of the proposal is as follows. Winston tells you it contains all you need to know about the proposed work, because your endorsement is to focus on lavishing praise on past work, particularly the work described in Visual features of intermediate complexity and their use in classification.
| The goal of this thrust is to combine vision with aspects of language and social cognition to obtain complex knowledge about the surrounding environment. Over the last decade, computational models have made significant progress in the task of recognizing hundreds of natural object categories under realistic viewing conditions. However, to obtain full understanding of visual scenes, computational models should be able to extract from the scene any meaningful information that a human observer can extract, about actions, agents, goals, object properties, scenes and object configurations, social interactions, and more. We refer to this as the Turing test for visionbeing able to use vision to answer a large and flexible set of queries about objects and agents in the image in a human-like manner. The object domain goes below and above single objects i.e., the recognition of meaningful objects parts (door knob, zipper) and configurations (a table set for dinner). Agent domain queries include actions, goals, and interactions (hugging, quarreling). Understanding queries and formulating appropriate answers requires interactions between vision and natural language. Interpreting goals, and interactions requires connections between vision and social cognition. Answering queries also requires task-dependent processing, i.e., different visual processes achieve different goals. These problems can are divided into sub-tasks described below. |
Winston suggests that, before you begin work on your draft letter, you read the abstract of Analogical Retrieval via Intermediate Features: The Goldilocks Principle.
Supply the draft letter.