Publication:
Asymptotic study of in-context learning with random transformers through equivalent models

dc.conference.dateAUG 31-SEP 03, 2025
dc.conference.locationIstanbul
dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.departmentKUIS AI (Koç University & İş Bank Artificial Intelligence Center)
dc.contributor.kuauthorDemir, Samet
dc.contributor.kuauthorDoğan, Zafer
dc.contributor.schoolcollegeinstituteResearch Center
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned2025-12-31T08:19:08Z
dc.date.available2025-12-31
dc.date.issued2025
dc.description.abstractWe study the in-context learning (ICL) capabilities of pretrained Transformers in the setting of nonlinear regression. Specifically, we focus on a random Transformer with a nonlinear MLP head where the first layer is randomly initialized and fixed while the second layer is trained. Furthermore, we consider an asymptotic regime where the context length, input dimension, hidden dimension, number of training tasks, and number of training samples jointly grow. In this setting, we show that the random Transformer behaves equivalent to a finite-degree Hermite polynomial model in terms of ICL error. This equivalence is validated through simulations across varying activation functions, context lengths, hidden layer widths (revealing a double-descent phenomenon), and regularization settings. Our results offer theoretical and empirical insights into when and how MLP layers enhance ICL, and how nonlinearity and over-parameterization influence model performance.
dc.description.fulltextYes
dc.description.harvestedfromManual
dc.description.indexedbyScopus
dc.description.publisherscopeInternational
dc.description.readpublishN/A
dc.description.sponsoredbyTubitakEuTÜBİTAK
dc.description.sponsorshipTÜBİTAK ARDEB 1001 program. S.D. is supported by an AI Fellowship provided by KUIS AI Research Center and a PhD Scholarship (BİDEB 2211) from TÜBİTAK.
dc.identifier.doi10.1109/MLSP62443.2025.11204336
dc.identifier.embargoNo
dc.identifier.grantno124E063
dc.identifier.isbn9798331570293
dc.identifier.issn2161-0363
dc.identifier.quartileN/A
dc.identifier.urihttps://doi.org/10.1109/MLSP62443.2025.11204336
dc.identifier.urihttps://hdl.handle.net/20.500.14288/31436
dc.keywordsDeep learning theory
dc.keywordsHigh-dimensional asymptotics
dc.keywordsIn-context learning
dc.keywordsTransformer
dc.language.isoeng
dc.publisherIEEE Computer Society
dc.relation.affiliationKoç University
dc.relation.collectionKoç University Institutional Repository
dc.relation.ispartofIEEE International Workshop on Machine Learning for Signal Processing, MLSP
dc.relation.openaccessYes
dc.rightsCC BY-NC-ND (Attribution-NonCommercial-NoDerivs)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectEngineering
dc.titleAsymptotic study of in-context learning with random transformers through equivalent models
dc.typeConference Proceeding
dspace.entity.typePublication
person.familyNameDemir
person.familyNameDoğan
person.givenNameSamet
person.givenNameZafer
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication77d67233-829b-4c3a-a28f-bd97ab5c12c7
relation.isOrgUnitOfPublication.latestForDiscovery3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isParentOrgUnitOfPublicationd437580f-9309-4ecb-864a-4af58309d287
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscoveryd437580f-9309-4ecb-864a-4af58309d287

Files