On the Origins of Linear Representations in Massive Language Fashions


arXiv:2403.03867v1 Announce Sort: cross
Summary: Latest works have argued that high-level semantic ideas are encoded “linearly” within the illustration area of enormous language fashions. On this work, we examine the origins of such linear representations. To that finish, we introduce a easy latent variable mannequin to summary and formalize the idea dynamics of the subsequent token prediction. We use this formalism to point out that the subsequent token prediction goal (softmax with cross-entropy) and the implicit bias of gradient descent collectively promote the linear illustration of ideas. Experiments present that linear representations emerge when studying from information matching the latent variable mannequin, confirming that this easy construction already suffices to yield linear representations. We moreover affirm some predictions of the idea utilizing the LLaMA-2 massive language mannequin, giving proof that the simplified mannequin yields generalizable insights.

Supply hyperlink


Please enter your comment!
Please enter your name here