Introducing repcomp - A Python Package for Comparing Trained Embedding Models

When I’m building models, I frequently run into situations where I’ve trained multiple models over a few datasets or tasks and I’m curious about how they compare. For instance, it’s clear that if I train two word vector models on random subsets of Wikipedia, the trained models will be “similar”... [Read More]
Tags: Embeddings, Machine Learning, ML, Python, Comparison, Neural Network, Word Vector

My Thoughts on KDD 2018

Last week I was at KDD 2018 in London. This was my first time at KDD, and I had the opportunity to present our paper on embeddings at the Common Model Infrastructure workshop. I was really impressed by both the workshops and the main program, and I thought I’d share... [Read More]
Tags: KDD, Machine Learning, ML, Data, Conference

Representing Graphs with Low Dimensional Matrix Factorization for Fun and Profit

A solid laptop computer in 2018 has about 1TB (1000GB) of disk space, and the capability to store about 16GB of memory in RAM. In comparison, internet users in the United States generate about 3000TB of data every minute 1. An enormous amount of this data takes the form of... [Read More]
Tags: Embeddings, Matrix, Factorization, Graph, Recommendation, Word2Vec

Don't trust data too much

Introduction There’s a famous scene in the HBO show “The Wire” where the unscrupulous Deputy Commissioner Rawls is addressing the police colonels and majors, and he says: Gentlemen, the word from on high is that the felony rates district by district will decline by five percent before the end of... [Read More]
Tags: Data, Statistics, Lying, Probability, p-value, Misuse, Trust

R and R^2, the relationship between correlation and the coefficient of determination.

There are 2 closely related quantities in statistics - correlation (often referred to as ) and the coefficient of determination (often referred to as ). Today we’ll explore the nature of the relationship between and , go over some common use cases for each statistic and address some misconceptions. Correlation... [Read More]
Tags: Correlation, R, R2, R^2, Coefficient of Determination, Regression, Performance