Publication: Asymptotic study of in-context learning with random transformers through equivalent models
| dc.conference.date | AUG 31-SEP 03, 2025 | |
| dc.conference.location | Istanbul | |
| dc.contributor.department | Graduate School of Sciences and Engineering | |
| dc.contributor.department | Department of Electrical and Electronics Engineering | |
| dc.contributor.department | KUIS AI (Koç University & İş Bank Artificial Intelligence Center) | |
| dc.contributor.kuauthor | Demir, Samet | |
| dc.contributor.kuauthor | Doğan, Zafer | |
| dc.contributor.schoolcollegeinstitute | Research Center | |
| dc.contributor.schoolcollegeinstitute | College of Engineering | |
| dc.contributor.schoolcollegeinstitute | GRADUATE SCHOOL OF SCIENCES AND ENGINEERING | |
| dc.date.accessioned | 2025-12-31T08:19:08Z | |
| dc.date.available | 2025-12-31 | |
| dc.date.issued | 2025 | |
| dc.description.abstract | We study the in-context learning (ICL) capabilities of pretrained Transformers in the setting of nonlinear regression. Specifically, we focus on a random Transformer with a nonlinear MLP head where the first layer is randomly initialized and fixed while the second layer is trained. Furthermore, we consider an asymptotic regime where the context length, input dimension, hidden dimension, number of training tasks, and number of training samples jointly grow. In this setting, we show that the random Transformer behaves equivalent to a finite-degree Hermite polynomial model in terms of ICL error. This equivalence is validated through simulations across varying activation functions, context lengths, hidden layer widths (revealing a double-descent phenomenon), and regularization settings. Our results offer theoretical and empirical insights into when and how MLP layers enhance ICL, and how nonlinearity and over-parameterization influence model performance. | |
| dc.description.fulltext | Yes | |
| dc.description.harvestedfrom | Manual | |
| dc.description.indexedby | Scopus | |
| dc.description.publisherscope | International | |
| dc.description.readpublish | N/A | |
| dc.description.sponsoredbyTubitakEu | TÜBİTAK | |
| dc.description.sponsorship | TÜBİTAK ARDEB 1001 program. S.D. is supported by an AI Fellowship provided by KUIS AI Research Center and a PhD Scholarship (BİDEB 2211) from TÜBİTAK. | |
| dc.identifier.doi | 10.1109/MLSP62443.2025.11204336 | |
| dc.identifier.embargo | No | |
| dc.identifier.grantno | 124E063 | |
| dc.identifier.isbn | 9798331570293 | |
| dc.identifier.issn | 2161-0363 | |
| dc.identifier.quartile | N/A | |
| dc.identifier.uri | https://doi.org/10.1109/MLSP62443.2025.11204336 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14288/31436 | |
| dc.keywords | Deep learning theory | |
| dc.keywords | High-dimensional asymptotics | |
| dc.keywords | In-context learning | |
| dc.keywords | Transformer | |
| dc.language.iso | eng | |
| dc.publisher | IEEE Computer Society | |
| dc.relation.affiliation | Koç University | |
| dc.relation.collection | Koç University Institutional Repository | |
| dc.relation.ispartof | IEEE International Workshop on Machine Learning for Signal Processing, MLSP | |
| dc.relation.openaccess | Yes | |
| dc.rights | CC BY-NC-ND (Attribution-NonCommercial-NoDerivs) | |
| dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
| dc.subject | Engineering | |
| dc.title | Asymptotic study of in-context learning with random transformers through equivalent models | |
| dc.type | Conference Proceeding | |
| dspace.entity.type | Publication | |
| person.familyName | Demir | |
| person.familyName | Doğan | |
| person.givenName | Samet | |
| person.givenName | Zafer | |
| relation.isOrgUnitOfPublication | 3fc31c89-e803-4eb1-af6b-6258bc42c3d8 | |
| relation.isOrgUnitOfPublication | 21598063-a7c5-420d-91ba-0cc9b2db0ea0 | |
| relation.isOrgUnitOfPublication | 77d67233-829b-4c3a-a28f-bd97ab5c12c7 | |
| relation.isOrgUnitOfPublication.latestForDiscovery | 3fc31c89-e803-4eb1-af6b-6258bc42c3d8 | |
| relation.isParentOrgUnitOfPublication | d437580f-9309-4ecb-864a-4af58309d287 | |
| relation.isParentOrgUnitOfPublication | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 | |
| relation.isParentOrgUnitOfPublication | 434c9663-2b11-4e66-9399-c863e2ebae43 | |
| relation.isParentOrgUnitOfPublication.latestForDiscovery | d437580f-9309-4ecb-864a-4af58309d287 |
