Showing posts from 2016

[Talk Summary 13] Concept Map Extraction from Textbooks

Shuting Wang, 5th year PhD student, from Computer Science department, Penn State University had the talk "Concept Map Extraction from Textbooks" on Dec 05, 2016. In the talk, she presented three parts of her work: extracting concept hierarchy from textbook, using prerequisite to extract concept maps from textbook, and using the concept maps for automatic assessment.

First of all, Shuting talked about how to extract concept hierarchy from textbook. In her work, she extracted important concepts in each book chapter using Wikipedia as a resource and then construct a concept hierarchy for that book. She presented the process to construct a concept hierarchy as follows:
Build a concept dictionary from Wikipedia entities related to the topic of the book.Select concept candidates in the concept dictionary based on title and content similarity between a section (subchapter) in the book and Wikipedia articles.Construct the concept hierarchy from the table of content order of the book w…

[Talk Summary 12] A/B Testing at Scale

Dr. Pavel Dmitriev, a Principal Data Scientist, from Microsoft's Analysis and Experimentation team had a talk about "A/B Testing at Scale" on Thursday, 2016/12/08. The talk was about the introduction of a controlled experiment, four real experiments that Microsoft had been running, and 5 challenges about testing at scale.

Dr. Pavel started the talk with a brief introduction of controlled experiments, aka A/B tests. A/B testing is a method of comparing two versions of a webpage or app against each other to determine which one performs better. A/B testing is also used to evaluate a new feature of an application. If the feature has an effect on users, the result will show the significant difference (p<0.05); the lack of different is called null hypothesis.

With the evolving product development process, Dr. Pavel presented the motivation for A/B testing. In classical software development, a product is usually designed, developed, tested and then released. However, in cust…

[Talk Summary 11] Explain and answer: Intelligent systems which can communicate about what they see

Dr. Marcus Rohrbach from University of California, Berkeley made a talk "Explain and answer: Intelligent systems which can communicate about what they see" on Friday, 2016//12/02. In the talk, Marcus presented the models which can answer questions but at the same time are modular and expose their semantic reasoning structure, and showed how to generate explanations given only image captions as training data.

To begin the talk, Marcus showed the motivation of how to make the computer able to talk to about the visual world. He introduced two components in a successful communication: (1) the ability to answer natural language questions about the visual world, and (2) the ability of the system to explain in natural language, allowing a human to trust and understand it. to deal with tasks such as visual question answering, he emphasized that it is important to integrate the representation of textual and visual information together.

 Marcus described the whole process of the system …

[Talk Summary 10] Parse Tree Fragmentation of Ungrammatical Sentences

Huma Hashemi, ISP graduate student, University of Pittsburgh had a talk about "Parse Tree Fragmentation of Ungrammatical Sentences" on Friday, 2016/11/18. She presented an evaluation of Parser Robustness for ungrammatical sentences.

Huma started the talk by giving a introduction about natural language processing (NLP) that brings about a motivation for her proposal. One of the most challenging issues that NPL has to deal with is "noisier" texts such as English-as-a-second language and machine translation. For many NLP applications that requires a parser, the sentences may not be well-formed, for instance, information extraction, question answering and summarization systems. Therefore, to build a good NLP application, a parser should be able to parse ungrammatical sentences.

Huma's research focuses on answering the question "how much parser's performance degrades when deal with grammar mistake?" and evaluation of a parser on ungrammatical sentences…

[Talk Summary 9] The Next Frontier in AI: Unsupervised Learning

Yann LeCun, Director of AI Research at Facebook, and Silver Professor of Dara Science, Computer Science, Neural Science, and Electrical Engineering at New York University, held a talk about unsupervised learning, the next frontier in AI, on Friday, 2016/11/18 at CMU.
At the beginning of the talk, prof. LeCun introduced Neuroscience, supervised learning, deep learning, multi-layer neural nets, convolutional network architecture, very deep convNet architectures, Memory-augmented networks. He presented different kinds of application using machine learning such as image recognition and question answering. 
The main part of the talk presented by prof. LeCun was about obstacles to AI. The challenge of the next several years is to let machines learn from raw, unlabeled data, such as video or text. This is known as unsupervised learning. AI systems today do not possess "common sense", which humans and animals acquire by observing the world, acting in it, and understanding the physical…

[Talk Summary 8] Data-Driven Science of Science

Dr. Ying Ding from School of Informatics and Computing, Indiana University gave the talk "Data-Driven Science of Science" on November, 04, 2016 at School of Information Science, University of Pittsburgh.

In the talk. Dr. Ding presented an overview about Data Science and the current layers of bibliometrics, which are macro level in complex network, meso level in bibliometrics, and micro level in collaboration and team science. Currently, most of research work has been focusing on analyzing data from complex network and bibliometrics. Dr. Ding suggested that collaboration and team science, as a micro level, should be a new trend to have the attention of data scientists.

In addition, Dr. Ding concisely summarized her work related to Data Science which is beyond the bibliometrics. The three following are the main of her research:
Data-Driven Discovery: entity metrics, computational hypothesis generation, and digital innovation (e.g. machine reading)Data-Driven Decision Making: un…

Writing Research Articles

By reading the first chapte "Becoming an anthor" of the book  "Authoring a PhD: How to plan, draft, write and finish a doctoral thesis or dissertation", I found some useful information about the differences between the classical and taught PhD model, which provides us with a significant attention about what knowledge and skills we should focus on more and allocate our time efficiently. For example, in "classical model", the thesis requirement is to write a big book thesis integrating set of chapters from 80,000 to 100,000 words; on the other hand, that of the "taught PhD model" is a papers model dissertation including four or five publishable quality papers, around 60,00 words. My school's dotoral program is a taught PhD model that includes coursework, examinations and a dissertation. Therefore, practicing writing papers is crucial for us, even at our very first year. It helps us build up our academic writting  and authoring s…

[Talk Summary 7] Modeling Human Commucation Dynamics

Professor Louis-Philippe Morency from CMU presented the talk "Modeling Human Communication Dynamics" on October, 21, 2016. at CS department, University of Pittsburgh.

He started the talk with an introduction about his research that focuses on creating computational technologies to analyze, recognize and predict human subtle communicative behaviors in social context.  Speaking of human communicative behaviors, he indicated three main aspects, namely, verbal, vocal and visual aspects.

There are four challenges in modeling human communication dynamics as stated by prof. Morency, which are behavioral, multimodal, interpersonal, and societal dynamics. He suggested that the model can broadly apply to healthcare, education, and online multimedia.

In the three main parts of the talk, he performed his group's recent achievements, mutlimodal machine learning, and predicting listener behaviors. Firstly, he gave a demo about a healthcare decision support system using Multisense, ana…

[Talk Summary 6] Language and Social Dynamics

Cristian Danescu-Niculescu-Mizil from University of Cornell gave the talk "Language and Social Dynamics" on September, 30, 2016 at CMU. He presented his past work including exploring the relation between users and their community, a computational framework for identifying and characterizing politeness, and predicting the future evolution of a dyadic relationship.

At the beginning, Cristian talked about the linguistic change of users when they get involved in a community. He showed that users follow a determined life-cycle regarding to the process they adopt new community norms. When taking part in a community, new members adapt to existing community norms. For a long time, members also may adapt new norms or be innovators themselves, setting new trends. Others may keep their old styles and have no reaction to the change. Those members are more likely to leave their communities. Based on this assumption, the system can predict how long they will stay active in the community.

[Talk Summary 5] Two Case Studies in Semantic Inference

Dipanjan Das from Google presented the talk "Two Case Studies in Semantic Inference" on October, 14, 2016. He performed the two different semantic inference tasks. The first one focuses on the structure of natural language questions. And, the other is about more unstructured forms.
Firstly, Dipanjan described a method for parsing natural language questions to logical forms. These logical forms can be softly mapped to the information stored in the structured knowledge base. And then the system matches the forms (sub-graphs) from knowledge base to Question Answer mechanism. In the semantic parsing process, he presented a Dependency Parser that extract sentences' structure, and DepLambda to parse into logical forms, which is based on lambda calculus. 
Dipanjan carried out an empirical study to compares DepLambda techniques to other baselines. He showed that DepLambda, in two test collections, performs better than Simple Graph, CCG Graph and Deptree.
In the second part of the t…

[Talk Summary 4] Entity/Event-level Sentiment Detection and Inference

Lingjia Deng from Intelligent Systems Program, University of Pittsburgh had a talk about Entity/Event-level Sentiment Detection and Inference on October, 07, 2016.

Lingjia's work focused on sentiment analysis and opinion mining. She introduced an sentiment analysis model that aims at detecting both explicit and implicit sentiments expressed among entities and events in text.

As stated in the talk, most of the work in opinion mining focuses on extracting explicit sentiments. For example, in the sentence "she loves traveling", the sentiment that is explicitly expressed is positive. However, in another example, "People celebrate that Trump was defeated." there are two opinions in this sentence. One is also positive, based on the word celebrate, people celebrate something. The other is negative, based on the word defeated, Trump was defeated. But besides these explicit opinions, we can infer implicitly another sentiment in the sentence which is that people is negativ…

A Survival Guide to a PhD - Andrej Karpathy

I found this article on the Facebook of a professor. After reading through the whole content, I have learned some useful experience from the author. I quote some ideas here:

"There are very few people who make it to the top PhD programs. You’d be joining a group of a few hundred distinguished individuals in contrast to a few tens of thousands (?) that will join some company." (just personal opinion. Don't condemn :) )

"As a PhD student you’re your own boss. Want to sleep in today? Sure. Want to skip a day and go on a vacation? Sure. All that matters is your final output and no one will force you to clock in from 9am to 5pm. Of course, some advisers might be more or less flexible about it and some companies might be as well, but it’s a true first order statement." (just motivate myself)

"You will inevitably find yourself working very hard (especially before paper deadlines). You need to be okay with the suffering and have enough mental stamina and determina…

[Talk Summary 3] Personalized Recommendations using Knowledge Graphs

Rose Catherine from Language Technologies Institute, Carnegie Mellon University presented the talk Personalized Recommendations using Knowledge Graphs by 09/23/2016.

At the beginning, Rose introduced general ideas about personalized recommendations and gave some examples such as Amazon recommending customers products they would like to buy and introducing movies to users based on their favorite ones.

And then, Rose focused on improving the performance of recommender systems using knowledge graphs (KGs). She introduced other works that use the combination of content-based and collaborative filtering techniques and connected concept-based KGs on recommendations. To improve the current methods, Rose's research group investigates three techniques for making recommendations. Those KG-based recommendations use a probabilistic logic system called ProPPR which stands for Programming with Personalized PageRank.
The first approach is EntitySim that uses only the links of the graph to find the…

[Talk Summary 2] Dynamic Information Retrieval Model

In this seminar, professorGrace Hui Yang presents a dynamic information retrieval model which helps users explore the information space in order to find out which documents are relevant and which aren't, satisfying their information need. The dynamic IR task aims to find relevant documents for a session of multiple queries. It happens when information needs are complex, vague, evolving, often containing multiple subtopics.

A dynamic system is one which changes or adapts over time, based on a sequence of events. While static IR does not learn directly from users and the parameters are updated periodically, dynamic IR is developed from interactive IR which exploits users' feedback to give recommendation or improve the search result or optimize ranking, which is called dynamic ranking principle. But interactive IR just treat interaction independently and response to immediate feedback. Dynamic IR tries to optimize over all interaction, uses it for long term gain and models future …

[Talk Summary 1] Web as a textbook: Curating Targeted Learning Paths through the Heterogeneous Learning Resources on the Web

In this seminar, Igor presents an idea about organizing heterogeneous educational resources on the web into structure alike to a textbook or course. Thanks to the structure, engines might be able to allow learners to navigate a sequence of webpages that take them from their prior knowledge to material they want to learn. He gives an opinion that educational resources on the internet are diversity; they could be articles, lecture notes, tutorials, slides, etc. And those materials are provided by various kind of creators from different perspectives, and thus feed a variety of learners who do not necessarily rely on textbooks.
To approach this task, Igor first presents a document as a bag-of-technical-terms consisting of two multi sets, a set of Explained terms and a set of Assumed terms. Explained: the term appears in the context and is explained to be understood by readers. Assumed: the term corresponding to a explained term is assumed to be familiar with readers, and is required for und…


I am a first-year Ph.D. student at School of Information Sciences, University of Pittsburgh. I received my Master and Bachelor degrees from University of Information Technology. I am very interested in building Adaptive Educational Systems, Intelligent Tutoring Systems, Educational Data Mining, Semantic Information Retrieval, User Modeling and Cognitive Systems. I am currently working at the Personalized Adaptive Web Systems Lab (PAWS). My advisor is Dr. Peter Brusilovsky. Currently, I have been taking part in the projects Open Corpus Personalized Learning and cWadeIn.

Here is my homepage HUNG K CHAU