Séminaire Images Optimisation et Probabilités
Pierre Marion
( INRIA )Salle de conférence
05 février 2026 à 11:15
Attention-based models, such as Transformer, excel across
various tasks but lack a comprehensive theoretical understanding. To
address this gap, we introduce the single-location regression task,
where only one token in a sequence determines the output, and its
position is a latent random variable, retrievable via a linear
projection of the input. To solve this task, we propose a dedicated
predictor, which turns out to be a simplified version of a non-linear
self-attention layer. We study its theoretical properties, both in terms
of statistics (gap to Bayes optimality) and optimization (convergence of
gradient descent). In particular, despite the non-convex nature of the
problem, the predictor effectively learns the underlying structure. This
highlights the capacity of attention mechanisms to handle sparse token
information. Based on Marion et al., Attention Layers Provably Solve
Single-Location Regression, ICLR 2025, and Duranthon et al., Statistical
Advantage of Softmax Attention: Insights from Single-Location
Regression, submitted.