[Talk Summary 11] Explain and answer: Intelligent systems which can communicate about what they see

Dr. Marcus Rohrbach from University of California, Berkeley made a talk "Explain and answer: Intelligent systems which can communicate about what they see" on Friday, 2016//12/02. In the talk, Marcus presented the models which can answer questions but at the same time are modular and expose their semantic reasoning structure, and showed how to generate explanations given only image captions as training data.

To begin the talk, Marcus showed the motivation of how to make the computer able to talk to about the visual world. He introduced two components in a successful communication: (1) the ability to answer natural language questions about the visual world, and (2) the ability of the system to explain in natural language, allowing a human to trust and understand it. to deal with tasks such as visual question answering, he emphasized that it is important to integrate the representation of textual and visual information together.

 Marcus described the whole process of the system by dividing it into several steps. The first step is tell what a picture is about, describing novel captions by capturing the objects in the picture that has no a prior description. The second step relates to natural language processing; it analyzes questions to understand what we want to ask. After that the system will locate the asked object in the picture to give an answer. 

For example, there are a basket, a plate, a fork, a carrot on the plate, and an orange in the basket in a picture. The question is that "What is the fruit on the plate?". The system first locates the fruit on the plate and then recognizes what kind of fruit there.

For the explanation task, Marcus presented an architecture for visual ground. His grounding approach is the fully supervised version of GroundeR. Additionally, he applied deep fine-grained classifier for visual information. Based on these features, the system is able to give more descriptive explanation for its answer. 

Two answers and explanations from the system:
(1) what is the bird doing?
      Because they are on the ground
(2) what is the bird?
      This is a Western Grebe because this bird has a long white neck, pointy yellow beak and red eye. 

100 Porter Hall
Carnegie Mellon University  



Popular posts from this blog


FolkTrails: Interpreting Navigation Behavior in a Social Tagging System

[Talk Summary 1] Web as a textbook: Curating Targeted Learning Paths through the Heterogeneous Learning Resources on the Web