My solutions to Bartosz Milewski's "Category Theory for Programmers"

I recently worked through Bartosz Milewski’s excellent free book “Category Theory for Programmers.” The book is available online here and here. I had an awesome time reading the book and learning about Category Theory so I figured I’d post my solutions to the book problems online to make it easier... [Read More]
Tags: Category Theory, Functional Programming, Mathematics, Solutions

Introducing repcomp - A Python Package for Comparing Trained Embedding Models

When I’m building models, I frequently run into situations where I’ve trained multiple models over a few datasets or tasks and I’m curious about how they compare. For instance, it’s clear that if I train two word vector models on random subsets of Wikipedia, the trained models will be “similar”... [Read More]
Tags: Embeddings, Machine Learning, ML, Python, Comparison, Neural Network, Word Vector

My Thoughts on KDD 2018

Last week I was at KDD 2018 in London. This was my first time at KDD, and I had the opportunity to present our paper on embeddings at the Common Model Infrastructure workshop. I was really impressed by both the workshops and the main program, and I thought I’d share... [Read More]
Tags: KDD, Machine Learning, ML, Data, Conference

Representing Graphs with Low Dimensional Matrix Factorization for Fun and Profit

A solid laptop computer in 2018 has about 1TB (1000GB) of disk space, and the capability to store about 16GB of memory in RAM. In comparison, internet users in the United States generate about 3000TB of data every minute 1. An enormous amount of this data takes the form of... [Read More]
Tags: Embeddings, Matrix, Factorization, Graph, Recommendation, Word2Vec

Don't trust data too much

Introduction There’s a famous scene in the HBO show “The Wire” where the unscrupulous Deputy Commissioner Rawls is addressing the police colonels and majors, and he says: Gentlemen, the word from on high is that the felony rates district by district will decline by five percent before the end of... [Read More]
Tags: Data, Statistics, Lying, Probability, p-value, Misuse, Trust