Publication:
Asymptotic Study of in-Context Learning with Random Transformers Through Equivalent Models

dc.conference.date2025-08-31 through 2025-09-03
dc.conference.locationIstanbul
dc.contributor.coauthorDemir, Samet (58662508900)
dc.contributor.coauthorDoǧan, Zafer (35101767100)
dc.date.accessioned2025-12-31T08:19:08Z
dc.date.available2025-12-31
dc.date.issued2025
dc.description.abstractWe study the in-context learning (ICL) capabilities of pretrained Transformers in the setting of nonlinear regression. Specifically, we focus on a random Transformer with a nonlinear MLP head where the first layer is randomly initialized and fixed while the second layer is trained. Furthermore, we consider an asymptotic regime where the context length, input dimension, hidden dimension, number of training tasks, and number of training samples jointly grow. In this setting, we show that the random Transformer behaves equivalent to a finite-degree Hermite polynomial model in terms of ICL error. This equivalence is validated through simulations across varying activation functions, context lengths, hidden layer widths (revealing a double-descent phenomenon), and regularization settings. Our results offer theoretical and empirical insights into when and how MLP layers enhance ICL, and how nonlinearity and over-parameterization influence model performance. © 2025 IEEE.
dc.description.fulltextYes
dc.description.harvestedfromManual
dc.description.indexedbyScopus
dc.description.publisherscopeInternational
dc.description.readpublishN/A
dc.description.sponsoredbyTubitakEuTÜBİTAK
dc.description.sponsorshipTürkiye Bilimsel ve Teknolojik Araştırma Kurumu, TUBITAK, (124E063); Türkiye Bilimsel ve Teknolojik Araştırma Kurumu, TUBITAK
dc.identifier.doi10.1109/MLSP62443.2025.11204336
dc.identifier.embargoNo
dc.identifier.isbn9798331570293
dc.identifier.isbn9781467374545
dc.identifier.isbn9781728166629
dc.identifier.isbn9781538654774
dc.identifier.isbn9781509063413
dc.identifier.isbn9781728163383
dc.identifier.isbn9781728108247
dc.identifier.isbn9781509007462
dc.identifier.isbn9781467310260
dc.identifier.isbn9781479936946
dc.identifier.issn2161-0363
dc.identifier.quartileN/A
dc.identifier.urihttps://doi.org/10.1109/MLSP62443.2025.11204336
dc.identifier.urihttps://hdl.handle.net/20.500.14288/31436
dc.keywordsdeep learning theory
dc.keywordshigh-dimensional asymptotics
dc.keywordsIn-context learning
dc.keywordstransformer
dc.language.isoeng
dc.publisherIEEE Computer Society
dc.relation.affiliationKoç University
dc.relation.collectionKoç University Institutional Repository
dc.relation.ispartofIEEE International Workshop on Machine Learning for Signal Processing, MLSP
dc.relation.openaccessYes
dc.rightsCC BY-NC-ND (Attribution-NonCommercial-NoDerivs)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleAsymptotic Study of in-Context Learning with Random Transformers Through Equivalent Models
dc.typeConference Proceeding
dspace.entity.typePublication

Files