Publication:
Asymptotic Study of in-Context Learning with Random Transformers Through Equivalent Models

Placeholder

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Demir, Samet (58662508900)
Doǧan, Zafer (35101767100)

Publication Date

Language

Embargo Status

No

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

We study the in-context learning (ICL) capabilities of pretrained Transformers in the setting of nonlinear regression. Specifically, we focus on a random Transformer with a nonlinear MLP head where the first layer is randomly initialized and fixed while the second layer is trained. Furthermore, we consider an asymptotic regime where the context length, input dimension, hidden dimension, number of training tasks, and number of training samples jointly grow. In this setting, we show that the random Transformer behaves equivalent to a finite-degree Hermite polynomial model in terms of ICL error. This equivalence is validated through simulations across varying activation functions, context lengths, hidden layer widths (revealing a double-descent phenomenon), and regularization settings. Our results offer theoretical and empirical insights into when and how MLP layers enhance ICL, and how nonlinearity and over-parameterization influence model performance. © 2025 IEEE.

Source

Publisher

IEEE Computer Society

Subject

Citation

Has Part

Source

IEEE International Workshop on Machine Learning for Signal Processing, MLSP

Book Series Title

Edition

DOI

10.1109/MLSP62443.2025.11204336

item.page.datauri

Link

Rights

CC BY-NC-ND (Attribution-NonCommercial-NoDerivs)

Copyrights Note

Creative Commons license

Except where otherwised noted, this item's license is described as CC BY-NC-ND (Attribution-NonCommercial-NoDerivs)

Endorsement

Review

Supplemented By

Referenced By

0

Views

0

Downloads

View PlumX Details