Sentiment Analysis

Sentiment Analysis#

What is sentiment analysis#

Sentiment analysis is the process of determining whether a piece of writing is positive, negative or neutral.
It’s also known as opinion mining, deriving the opinion or attitude of a speaker.
It applies natural language processing, text analysis, computational linguistics, and machine learning to identify and extract subjective information in source materials such as reviews, comments, and news articles.
The goal of sentiment analysis is to know the attitude of a speaker or writer with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event.

Why sentiment analysis#

Sentiment analysis is a useful tool for businesses to understand the sentiment of their customers towards their brand, product or service.
It can also be used to understand the sentiment of their competitors.
Sentiment analysis can also be used to understand the sentiment of the general public towards certain issues.

Sentiment analysis types#

Depending on the scale of the sentiment analysis, there are three types of sentiment analysis:

Document-level sentiment analysis: This is the most common type of sentiment analysis. It is used to determine the overall sentiment of a document.
Sentence-level sentiment analysis: This type of sentiment analysis is used to determine the sentiment of a sentence within a document.
Aspect-level sentiment analysis: This type of sentiment analysis is used to determine the sentiment of a specific aspect of a document.

These three types of sentiment analysis can be grouped into two categories:

Coarse-grained sentiment analysis: This type of sentiment analysis is used to determine the overall sentiment of a document. It is also known as document-level sentiment analysis.
Fine-grained sentiment analysis: This type of sentiment analysis is used to determine the sentiment of a specific aspect of a document. It is also known as aspect-level sentiment analysis.

Or, depending on the number of classes, there are two types of sentiment analysis:

Binary sentiment analysis: This type of sentiment analysis is used to determine whether a document is positive or negative.
Multi-class sentiment analysis: This type of sentiment analysis is used to determine whether a document is positive, negative or neutral.

Or, depending on the method, there are two types of sentiment analysis:

Lexicon-based sentiment analysis: This type of sentiment analysis is used to determine the sentiment of a document by using a predefined list of positive and negative words.
Machine learning-based sentiment analysis: This type of sentiment analysis is used to determine the sentiment of a document by using machine learning algorithms.

Coarse-grained sentiment analysis#

Coarse-grained sentiment analysis entails two tasks: subjectivity classification and polarity classification.

Subjectivity classification
- Subjective: The document expresses some personal feelings, views, or beliefs.
- Objective: The document does not express any personal feelings, views, or beliefs.
- Example: “I love this movie” is subjective, “The movie was released in 2019” is objective.
Polarity classification
- Positive: The document expresses a positive sentiment.
- Negative: The document expresses a negative sentiment.
- Neutral: The document expresses no sentiment.
- Example: “I love this movie” is positive, “I hate this movie” is negative, “I don’t care about this movie” is neutral.

Fine-grained sentiment analysis#

Fine-grained sentiment analysis entails one more task: aspect classification. Aspect means a specific feature of a product or service.

Lexicon-Based Methods#

Lexicon-based methods are based on a predefined list of positive and negative words.
Corpus-specific lexicons are built for a specific domain.
- i.e. a lexicon for law-related documents (number of times a judge uses the word “guilty” in a document)
- i.e. a lexicon for movie reviews (number of times a reviewer uses the word “good” in a document)
General dictionaries are built for a general domain.
- WordNet: a general dictionary for English words
- LIWC: Linguistic Inquiry and Word Count
- MPQA: Multi-Perspective Question Answering

Measuring Economic Policy Uncertainty (EPU)#

This proxy for Economic Policy Uncertainty (EPU) comes from computer searches of newspapers

US index: 10 major papers get monthly counts of articles with:
- E {economic or economy}, and
- P {regulation or deficit or federal reserve or congress or legislation or white house}, and
- U {uncertain or uncertainty}
Divide the count for each month by the count of all articles
Normalize and sum 10 papers to get the U.S monthly index

US News-based economic policy uncertainty index

h:500px

Monetary Policy Stance#

WordNet#

WordNet is a general dictionary for English words.
English words are grouped into sets of synonyms called synsets.
Synonym sets (synsets) are interlinked by means of conceptual-semantic and lexical relations.
Words are organized into a category hierarchy.
- antonym: a word that means the opposite of another word
- holonym: a word that denotes a whole of which the other word is a part
- meronym: a word that denotes a part of which the other word is a whole
- hypernym: a word that denotes a more general concept than the other word
- hyponym: a word that denotes a more specific concept than the other word

Example: the word “bass” has the following synsets:

WordNet Supersenses (Word Categories)

General Dictionaries#

LIWC (pronounced “Luke”): Linguistic Inquiry and Word Counts
- 2300 words 70 lists of category-relevant words, e.g. “emotion”, “cognition”, “work”, “family”, “positive”, “negative” etc.
Mohammad and Turney (2011):
- code 10,000 words along four emotional dimensions: joy–sadness, anger-fear, trust-disgust, anticipation-surprise
Warriner et al (2013):
- code 14,000 words along three emotional dimensions: valence, arousal, dominance.
Bing Liu’s Opinion Lexicon:
- Positive words: 2006
- Negative words: 4783
- Useful properties: includes mis-spellings, morphological variants, slang, and social-media mark-up
MPQA Subjectivity Lexicon: maintained by Theresa Wilson, Janyce Wiebe, and Paul Hoffmann
SentiWordNet: attaches polarity scores to synsets in WordNet
Harvard General Inquirer:
- a lexicon attaching syntactic, semantic, and pragmatic information to part-of-speech tagged words

MPQA Subjectivity Lexicon#

SentiWordNet#

Harvard General Inquirer#

LIWC#

Disagreement among Lexicons#

Underlying vocabularies are different among lexicons. Therefore, it is difficult to compare the results of different lexicons.

Bias in Lexicons#

NLP “Bias” is statistical bias

Sentiment scores that are trained on annotated datasets also learn from the correlated non-sentiment information.
Supervised sentiment models are confounded by correlated language factors.
- e.g., in the training set maybe people complain about Mexican food more often than Italian food because Italian restaurants tend to be more upscale.

This is a universal problem

supervised models (classifiers, regressors) learn features that are correlated with the label being annotated.
unsupervised models (topic models, word embeddings) learn correlations between topics / contexts.
dictionary methods, while having other limitations, mitigate this problem
- the researcher intentionally “regularizes” out spurious confounders with the targeted language dimension.
- helps explain why economists often still use dictionary methods.

Building your own lexicons#

Much larger lexicons can be inferred from large corpora.
We can capture different dimensions of sentiment that might be important for a specific domain.
We can develop lexicons that are sensitive to the norms of specific domains.

Machine Learning Methods#

Supervised Methods#

Supervised methods are based on a predefined set of documents with known sentiment labels.
The sentiment labels are used to train a classifier.
The classifier is then used to predict the sentiment labels of new documents.
The classifier can be a linear classifier (e.g. logistic regression) or a non-linear classifier (e.g. support vector machine).

Unsupervised Methods#

Unsupervised methods are based on a predefined set of documents without known sentiment labels.
The documents are used to train a topic model.
The topic model is then used to predict the sentiment labels of new documents.
The topic model can be a probabilistic topic model (e.g. LDA) or a non-probabilistic topic model (e.g. LSA).
The sentiment labels are inferred from the topic model.

Supervised Classification#

What is supervised classification?

The learned prediction of the most likely of a set of \(k > 1\) predefined nominal classes for an instance.

Learning phase (training)

Input: a set of known instances \(x^{(i)}\) with correct output class \(c(x^{(i)} )\).
Output: a model \(X \to C\) that maps any instance to its output class.

Application phase (prediction)

Input: a set of unknown instances \(x^{(i)}\) without output classes.
Output: the output class \(c(x^{(i)} )\) for each instance.

Feature-based Classification#

Feature-based representation

A feature vector is an ordered set of values of the form \(x = (x_1 , \ldots , x_m )\).
Each feature \(x_j\) denotes a measurable property of an input, \(1 \le j \lt m\).
Each instance \(o_j\) is mapped to a vector \(x^{(i)} = (x_1^{(u)} , \ldots , x_m^{(i)} )\) where \(x_j^{(i)}\) denotes the value of feature \(x_j\) .

Text mining using feature-based classification

The main challenge is to engineer features that help solve a given task.
In addition, a suitable classification algorithm needs to be chosen.

Classification Algorithms#

Binary vs. multiple-class classification (recap)

Binary: many classification algorithms work for \(k = 2\) classes only.
Multiple: some algorithms can be extended to \(k > 2\) classes via multiple binary classifiers, e.g., one-versus-all.

Selected supervised classification algorithms

Naïve Bayes: predicts classes based on conditional probabilities.
Support vector machine: maximizes the margin between classes.
Decision tree: sequentially compares instances on single features.
Random forest: majority voting based on several decision trees.
Neural network: learns complex functions on feature combinations.
… and many more

Sentiment Classification in Practice#

Sentiment classification of reviews

Classification of the nominal sentiment polarity or score of a customer review on a product, service, or work of art.

Data

2100 English hotel reviews from TripAdvisor. 900 training, 600 validation, and 600 test reviews.
Each review has a sentiment score from {1, …, 5}.

Tasks

3-class sentiment: 1–2 mapped to negative, 3 to neutral, 4–5 to positive. Training set balanced with random undersampling.
5-class sentiment: each score interpreted as one (nominal) class.

Approach

Algorithm: linear SVM with one-versus-all multi-class handling.
Features: combination of several standard and specific feature types.

Feature Engineering#

What is feature engineering?

The design and development of the feature representation of instances used to address a given task.
The representation governs what patterns can be found during learning.

Standard vs. specific features

Standard: features that can be derived from (more or less) general linguistic phenomena and that may help in several tasks.
Specific: features that are engineered for a specific tasks, usually based on expert knowledge about the task.

Features covered here

Standard content features: token n-grams, target class features.
Standard style features: POS and phrase n-grams, stylometric features.
Specific features: local sentiment, discourse relations, flow patterns.

Some General Linguistic Phenomena

Standard Content Feature Types#

Token n-grams

Token unigrams (bag-of-words): the distribution of all token 1-grams that occur in at least 5% of all training texts.
Token bigrams/trigrams

Target class features

Core vocabulary: the distribution of all words that occur at least three times as often in one class as in every other.
Sentiment scores: the mean positivity, negativity, and objectivity of all first and average word senses in SentiWordNet.
Sentiment words: the distribution of all subjective words in SentiWordNet.

Standard Style Feature Types#

Part-of-speech (POS) tag n-grams

POS unigrams. The distribution of all part-of-speech 1-grams that occur in at least 5% of all training texts.
POS bigrams/trigrams. Analog for 2-grams and 3-grams.

Phrase type n-grams

Phrase unigrams. The distribution of all phrase type 1-grams that occur in at least 5% of all training texts.
Phrase bigrams/trigrams. Analog for 2-grams and 3-grams.

Stylometric features

Character trigrams. The distribution of all character 3-grams that occur in at least 5% of all training texts.
Function words. The distribution of the top 100 words in the training set.
Lexical statistics. Average numbers of tokens, clauses, and sentences.

Evaluation of the Standard Feature Types#

Evaluation

One linear SVM for each feature type alone and for their combination.
Training on training set, tuning on validation set, test on test set.

Discussion

Token unigrams: best, but some other types close.
Combination does not outperform best single feature type.
60.8% accuracy does not seem very good.

Review Argumentation#

Example hotel review

“We spent one night at that hotel. Staff at the front desk was very nice, the room was clean and cozy, and the hotel lies in the city center… but all this never justifies the price, which is outrageous!”

A shallow model of review argumentation

A review can be seen as a flow of local sentiments on domain concepts that are connected by discourse relations.

Specific Feature Types for Review Sentiment Analysis#

Local sentiment distribution

The frequencies of positive, neutral, and negative local sentiment as well as of changes of local sentiments.

positive 0.4 neutral 0.4 negative 0.2 (neutral, positive) 0.25 …
The average local sentiment value from 0.0 (negative) to 1.0 (positive).

average sentiment 0.6
The interpolated local sentiment at each normalized position in the text.

e.g., normalization length 9: (0.5, 0.75, 1.0, 1.0, 1.0, 0.75, 0.5, 0.25, 0.0)

Discourse relation distribution

The distribution of discourse relation types in the text.

background 0.25 elaboration 0.5 contrast 0.25 (all others 0.0)
The distribution of combinations of relation types and local sentiments.

background(neutral, positive) 0.25 elaboration(positive, positive) 0.25 …

Sentiment flow patterns

The similarity of the normalized flow of the text to each flow pattern.

Content and style features

Content: token n-grams, sentiment scores.
Style: part-of-speech n-grams, character trigrams, lexical statistics.

Evaluation of the Specific Feature Types#