When I’m building models, I frequently run into situations where I’ve trained multiple models over a few datasets or tasks and I’m curious about how they compare. For instance, it’s clear that if I train two word vector models on random subsets of Wikipedia, the trained models will be “similar” to each other. In contrast, if I train word vectors over a Twitter dataset, the new vectors will be “different” from the Wikipedia-trained vectors.

In order to make it easier to quantify this difference, I wrote repcomp, a python package for comparing embeddings. repcomp supports the following embedding comparison approaches:

  • Nearest Neighbors: Fetch the nearest neighbor set of each entity according to embedding distances, and compare model A’s neighbor sets to model B’s neighbor sets.
  • Canonical Correlation: Treat embedding components as observations of random variables and compute the canonical correlations between model A and model B.
  • Unit Match: Form a unit-to-unit matching between model A’s embedding components and model B’s embedding components and measure the correlations of the matched units.

You can install repcomp with pip install repcomp

We can use repcomp to easily compare two word embedding models that have been pre-trained on Twitter and Wikipedia data:

  import gensim.downloader as api
  import numpy as np
  from repcomp.comparison import NeighborsComparison

  # Load word vectors from gensim
  glove_wiki_50 = api.load("glove-wiki-gigaword-50")
  glove_twitter_50 = api.load("glove-twitter-50")

  # Build the embedding matrices over the shared vocabularies
  shared_vocab = set(glove_wiki_50.vocab.keys()).intersection(
  glove_wiki_50_vectors = np.vstack([glove_wiki_50.get_vector(word) for word in shared_vocab])
  glove_twitter_50_vectors = np.vstack([glove_twitter_50.get_vector(word) for word in shared_vocab])

  # Run the comparison
  comparator = NeighborsComparison()
  print("The neighbors similarity between glove-wiki-gigaword-50 and glove-twitter-50 is {}".format(
    comparator.run_comparison(glove_wiki_50_vectors, glove_twitter_50_vectors)["similarity"]))

We can use also use repcomp to:

  • Quantify the impact of changing a hyperparameter on an embedding model. In this notebook we use matrix factorization to train a couple of movie embedding models based on the MovieLens dataset, and then use embedding comparison to see how changing the value of hyperparameters affects the learned embedding spaces.
  • Measure the difference between the representations that different neural network architectures learn. In this notebook, we compare the final-layer embeddings for Imagenet-trained VGG16, VGG19, and InceptionV3 models

If you’re interested in contributing to repcomp, please feel free to open up a Pull Request here!