How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings

Dec 10, 2019 in NLP

どんな論文？

Contextulalized Word Vectorの性質を調べた論文。単語ベクトルが層を経る毎にどの程度移動しているか、文の平均とどの程度ずれているか、ランダムサンプルしたベクトル間のcos類似度がどうなっているか、どの程度一つのベクトルで説明可能かを調査している。

新規性

probing taskではなく、潜在空間の性質をそのまま調べている。

結果

contextualized word vectorの空間は異方性
上の層のベクトルは、下の層のベクトルとあまり似ていない。上の層は、task-specificな特徴になっていると考えられる
１文中の各tokenに対するベクトル表現において、ELMOは、どの表現も同じになるようにcontextualizeされるが、BERTはだんだんと異なる表現になっており、GPT-2は、全く似ていない表現になっている。
contextulalized word vectorは一つの主成分では5%も説明できないので、有限の語の意味を表しているとはあまり考えられない。

次に読むべき論文

David Mimno and Laure Thompson. 2017. The strange geometry of skip-gram with negative sampling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pages 2873–2878

← Previous post Next post →