Wednesday, November 6, 2013

Sentiment Analysis and Opinion Mining

In this post, I am sharing my notes on the first three chapters of the book Sentiment Analysis and Opinion Mining by Bing Liu
  • Sentiment Analysis Research
    • Document level, sentence level, entity and aspect level
  • Sentiment Lexicon and Its Issues
    • Context matters: It sucks vs vacuum cleaner sucks
    • Questions and conditional statements might not express an opinion: Is android good? If android is good, then I will buy it
    • Sarcastic sentences: Nokia 5310 is great to use as a brick
    • No opinionated words mention: its color changed after one time use
  • Opinion Spam Detection
Definition (opinion): An opinion is a quintuple, (ei, aij, sijkl, hk, tl ), where ei is the name of an entity, aij is an aspect of ei, sijkl is the sentiment on aspect aij of entity ei, hk is the opinion holder, and tl is the time when the opinion is expressed by hk. The sentiment sijkl is positive, negative, or neutral, or expressed with different strength /intensity levels, e.g., 1–5 stars as used by most review sits on the Web. When an opinion is on the entity itself as a whole, the special aspect GENERAL is used to denote it. Here, ei and aij together represent the opinion target. 

Task 1 (entity extraction and categorization): Extract all entity expressions in D, and categorize or group synonymous them into clusters (or categories). Each entity expression cluster indicates a unique entity ei

Task 2 (aspect extraction and categorization): Extract all aspect expressions of the entities, and categorize them into clusters. Each aspect expression cluster of entity ei represents a unique aspect aij.
Task 3 (opinion holder extraction and categorization): Extract opinion holders for opinions from text or structured data and categorize them. The task is analogous to the above two tasks.
Task 4 (time extraction and standardization): Extract the times when opinions are given and standardize different time formats. The task is also analogous to the above tasks.
Task 5 (aspect sentiment classification): Determine whether an opinion on an aspect aij is positive, negative or neutral, or assign a numeric sentiment rating to the aspect.
Task 6 (opinion quintuple generation): Produce all opinion quintuples (ei, aij, sijkl, hk, tl) expressed in document d based on the results of the above tasks. This task is seemingly very simple but it is in fact very difficult in many cases as example below shows. 

Example: Posted by: big John Date: Sept. 15, 2011

(1) i bought a Samsung camera and my friends brought a canon camera yesterday. (2) in the past week, we both used the cameras a lot. (3) the photos from my Samy are not that great, and the battery life is short too. (4) my friend was very happy with his camera and loves its picture quality. (5) i want a camera that can take good photos. (6) i am going to return it tomorrow.
Output:
(Samsung, picture_quality, negative, big John, Sept-15-2011)
(Samsung, battery_life, negative, big John, Sept-15-2011)
(Canon, GENERAL, positive, big John’s_friend, Sept-15-2011)
(Canon, picture_quality, positive, big John’s_friend, Sept-15-2011)
  • Sentiment classification using supervised learning
    • Terms and their frequency. Part of speech. Sentiment words and phrases. Rules of opinions. Sentiment shifters. Syntactic dependency
    • Utilize the features listed to run traditional or new ML algorithms
  • Sentiment classification using unsupervised learning
    • Five patterns of POS tags used for extracting two-word phrases, such as adjective followed by a noun
    • Sentiment orientation (SO) of phrases using point-wise mutual information (PMI)
    • Lexicon based method: words/phrases mapped to strength like [-2,+2]
  • Sentiment rating prediction
    • SVM one-VS-all (OVA) approach (reported as poor)
    • Similarity graph is generated to smoothen the ratings of SVM OVA
    • Constrained ridge regression on bag of opinions (sentiment-word, negator, modifier)
    • Aggregate rating of aspects
    • Learning from comprehensive reviews only with a Bayesian model
  • Cross-domain sentiment classification
  • Cross-language sentiment classification

No comments:

Post a Comment