Distributional and frequency effects in word embeddings: different and larger embeddings¶

© 2018 Chris Culy, August 2018¶

chrisculy.net ¶

Overview¶

This is part of ongoing work on word embeddings. In the previous series of posts we saw a wide range of distributional and frequency effects with respect to word embeddings. In this series of posts, I will use a series of summary tests to look at another embedding technique (continuous bag of words) as well as embeddings based on large corprora.

TL;DR: Results and Contributions¶

High level only. More details in the sections

Summary tests
- Summary tests for distributional and frequency effects
- sgns and cbow show different frequency effects with Vanity Fair

Large corpora
- Frequency encoding is stronger for the larger corpora than for Vanity Fair
- Frequency stratfication tends to be stronger for Vanity Fair than for the larger corpora +Frequency stratification tends to be direct for the larger corpora and indirect for Vanity Fair
- Powerlaw for nearest neighbors is stronger for the larger corpora than for Vanity Fair
- Similarity skewness is moderate

Distributional and frequency effects in word embeddings: different and larger embeddings¶

© 2018 Chris Culy, August 2018¶

chrisculy.net¶

Overview¶

TL;DR: Results and Contributions¶

chrisculy.net ¶