GloVe#

GloVe is an unsupervised learning algorithm for obtaining vector representations for words. It is based on the co-occurrence matrix of words from a corpus.

GloVe stands for Global Vectors for Word Representation. It was introduced by [Pennington et al., 2014] in 2014.

Improvement over word2vec#

  • Word2vec uses a window-based approach, in which it only considers the local context of a word. This means that it does not consider the global context of a word.

  • GloVe uses a global context of a word in addition to the local context. This means that it can capture the meaning of a word better than word2vec.

Co-occurrence matrix#

What are the global contexts of a word? The global contexts of a word are the words that co-occur with it in a corpus.

For example, we have the following sentences:

Document 1: “All that glitters is not gold.”

Document 2: “ All is well that ends well.”

Then, with a window size of 1, the co-occurrence matrix of the words in the corpus is:

co-occurrence_matrix
  • The rows and columns represent the words in the corpus.

  • <START> and <END> are special tokens that represent the start and end of a sentence.

  • Since that and is occur only once in the window of glitters, the value of (that, glitters) and (is, glitters) is 1.

Training GloVe#

Glove model is a weighted least squares regression model, where the weights are the word vectors. The objective function is the sum of squared errors between the co-occurrence matrix and the dot product of the word vectors.

\[ \begin{align} \mathcal{L} = \frac{1}{2}\sum_{i,j=1}^V f(X_{ij}) (\log X_{ij} - \mathbf{u}_i^\top \mathbf{v}_j)^2 \end{align} \]

where \(X_{ij}\) is the co-occurrence matrix, \(\mathbf{u}_i\) is the word vector of the \(i\) th word, and \(\mathbf{v}_j\) is the word vector of the \(j\) th word. The function \(f\) is a weighting function that is used to downweight the common words.

GloVe vs word2vec#

  • GloVe is a global model, while word2vec is a local model.

  • GloVe ouputperforms word2vec on word analogy, word similarity, and Named Entity Recognition (NER) tasks.

  • If the nature of the problem is similar to the above tasks, then GloVe is a better choice than word2vec.

  • Since it uses a global context, GloVe is better at capturing the meaning of rare words even on small datasets.

  • GloVe is slower than word2vec.

Using GloVe#

GloVe is available to download from the Stanford NLP website.