Lab: Lexicon-based Sentiment Analysis#
In this lab session, we will explore various lexicon-based methods for sentiment analysis. Lexicon-based methods rely on a predefined list of words with associated sentiment scores to determine the overall sentiment of a text. We will use the following lexicon-based methods:
TextBlob
AFINN
VADER
We will perform sentiment analysis on the movie_reviews
dataset from the NLTK package.
Setup#
First, let’s import the required packages and load the dataset.
import nltk
from nltk.corpus import movie_reviews
from sklearn.metrics import (
accuracy_score,
precision_score,
recall_score,
f1_score,
classification_report,
confusion_matrix,
)
nltk.download("movie_reviews")
nltk.download("vader_lexicon")
[nltk_data] Downloading package movie_reviews to
[nltk_data] /home/yjlee/nltk_data...
[nltk_data] Unzipping corpora/movie_reviews.zip.
[nltk_data] Downloading package vader_lexicon to
[nltk_data] /home/yjlee/nltk_data...
True
Next, we’ll extract the reviews and their categories (positive or negative).
fileids = movie_reviews.fileids()
reviews = [movie_reviews.raw(fileid) for fileid in fileids]
categories = [movie_reviews.categories(fileid)[0] for fileid in fileids]
Now, we’ll define a function to evaluate the classification performance of each lexicon-based method.
def evaluate_classification_performance(true_labels, predicted_labels):
print("Accuracy: ", accuracy_score(true_labels, predicted_labels))
print(
"Precision: ",
precision_score(true_labels, predicted_labels, average="weighted"),
)
print("Recall: ", recall_score(true_labels, predicted_labels, average="weighted"))
print("F1 Score: ", f1_score(true_labels, predicted_labels, average="weighted"))
print("Model Report: \n___________________________________________________")
print(classification_report(true_labels, predicted_labels))
TextBlob#
TextBlob is a simple Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
Performing sentiment analysis using TextBlob#
%pip install textblob
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting textblob
Downloading textblob-0.17.1-py2.py3-none-any.whl (636 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 636.8/636.8 kB 17.4 MB/s eta 0:00:00a 0:00:01
?25hRequirement already satisfied: nltk>=3.1 in /home/yjlee/.cache/pypoetry/virtualenvs/lecture-_dERj_9R-py3.8/lib/python3.8/site-packages (from textblob) (3.8.1)
Requirement already satisfied: joblib in /home/yjlee/.cache/pypoetry/virtualenvs/lecture-_dERj_9R-py3.8/lib/python3.8/site-packages (from nltk>=3.1->textblob) (1.2.0)
Requirement already satisfied: regex>=2021.8.3 in /home/yjlee/.cache/pypoetry/virtualenvs/lecture-_dERj_9R-py3.8/lib/python3.8/site-packages (from nltk>=3.1->textblob) (2023.3.23)
Requirement already satisfied: click in /home/yjlee/.cache/pypoetry/virtualenvs/lecture-_dERj_9R-py3.8/lib/python3.8/site-packages (from nltk>=3.1->textblob) (8.1.3)
Requirement already satisfied: tqdm in /home/yjlee/.cache/pypoetry/virtualenvs/lecture-_dERj_9R-py3.8/lib/python3.8/site-packages (from nltk>=3.1->textblob) (4.65.0)
Installing collected packages: textblob
Successfully installed textblob-0.17.1
[notice] A new release of pip is available: 23.0 -> 23.1.2
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
from textblob import TextBlob
def sentiment_TextBlob(docs):
results = []
for doc in docs:
testimonial = TextBlob(doc)
if testimonial.sentiment.polarity > 0:
results.append("pos")
else:
results.append("neg")
return results
predictions = sentiment_TextBlob(reviews)
evaluate_classification_performance(categories, predictions)
Accuracy: 0.6
Precision: 0.7225010902553423
Recall: 0.6
F1 Score: 0.5361560556566348
Model Report:
___________________________________________________
precision recall f1-score support
neg 0.89 0.23 0.36 1000
pos 0.56 0.97 0.71 1000
accuracy 0.60 2000
macro avg 0.72 0.60 0.54 2000
weighted avg 0.72 0.60 0.54 2000
AFINN#
AFINN is a list of English words rated for valence with an integer between minus five (negative) and plus five (positive). The words have been manually labeled by Finn Årup Nielsen in 2009-2011. The file is tab-separated. There are two versions:
AFINN-111: Newest version with 2477 words and phrases.
AFINN-96: 1468 unique words and phrases on 1480 lines. Note that there are 1480 lines, as some words are listed twice. The word list in AFINN-96 is the same as AFINN-111, but with 1009 fewer words and phrases.
Performing sentiment analysis using AFINN#
%pip install afinn
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting afinn
Downloading afinn-0.1.tar.gz (52 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 52.6/52.6 kB 4.9 MB/s eta 0:00:00
?25h Preparing metadata (setup.py) ... ?25ldone
?25hBuilding wheels for collected packages: afinn
Building wheel for afinn (setup.py) ... ?25ldone
?25h Created wheel for afinn: filename=afinn-0.1-py3-none-any.whl size=53430 sha256=3a6e16220649c10b4cd5b3ca2aa3ac50b9e5b24f322be56bbed0737c3feefaaa
Stored in directory: /tmp/pip-ephem-wheel-cache-b1osgjx9/wheels/f6/6f/c3/b305c5107a17618f2938a067d5ffcbb556909d82398762089e
Successfully built afinn
Installing collected packages: afinn
Successfully installed afinn-0.1
[notice] A new release of pip is available: 23.0 -> 23.1.2
[notice] To update, run: pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
from afinn import Afinn
def sentiment_Afinn(docs):
afn = Afinn(emoticons=True)
results = []
for doc in docs:
if afn.score(doc) > 0:
results.append("pos")
else:
results.append("neg")
return results
predictions = sentiment_Afinn(reviews)
evaluate_classification_performance(categories, predictions)
Accuracy: 0.664
Precision: 0.6783880680137142
Recall: 0.664
F1 Score: 0.6570854714462421
Model Report:
___________________________________________________
precision recall f1-score support
neg 0.73 0.52 0.61 1000
pos 0.63 0.81 0.71 1000
accuracy 0.66 2000
macro avg 0.68 0.66 0.66 2000
weighted avg 0.68 0.66 0.66 2000
VADER#
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon## Lexicon-based Methods in Practice and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. VADER uses a combination of a sentiment lexicon, a set of linguistic rules, and grammatical heuristics to predict the sentiment of a given text.
Performing sentiment analysis using VADER#
from nltk.sentiment.vader import SentimentIntensityAnalyzer
def sentiment_vader(docs):
analyser = SentimentIntensityAnalyzer()
results = []
for doc in docs:
score = analyser.polarity_scores(doc)
if score["compound"] > 0:
results.append("pos")
else:
results.append("neg")
return results
predictions = sentiment_vader(reviews)
evaluate_classification_performance(categories, predictions)
Accuracy: 0.635
Precision: 0.6580655585685583
Recall: 0.635
F1 Score: 0.6211802777111816
Model Report:
___________________________________________________
precision recall f1-score support
neg 0.72 0.44 0.55 1000
pos 0.60 0.83 0.69 1000
accuracy 0.64 2000
macro avg 0.66 0.64 0.62 2000
weighted avg 0.66 0.64 0.62 2000
Summary#
In this lab session, we explored various lexicon-based sentiment analysis methods, including TextBlob, AFINN, and VADER. We applied these methods to the movie_reviews
dataset and evaluated their classification performance using accuracy, precision, recall, and F1 score metrics.
It is important to note that lexicon-based methods have limitations, such as sensitivity to the context in which words are used and the inability to capture complex semantic relationships between words. However, they can still provide valuable insights for sentiment analysis tasks, especially when combined with other approaches like machine learning-based techniques.