EndPrompt: Efficiënte uitbreiding van lange contexten via terminale verankering

Samenvatting

Het uitbreiden van het contextvenster van grote taalmodellen vereist doorgaans training op reeksen van de doellengte, wat gepaard gaat met kwadratische geheugen- en rekenkosten die aanpassing aan lange context duur en moeilijk reproduceerbaar maken. Wij stellen EndPrompt voor, een methode die effectieve contextuitbreiding bereikt met alleen korte trainingsreeksen. Het kerninzicht is dat het blootstellen van een model aan relatieve positionele afstanden over lange afstand niet vereist dat volledige invoer wordt geconstrueerd: we behouden de oorspronkelijke korte context als een intact eerste segment en voegen een korte terminale prompt toe als tweede segment, waarbij we positionele indices toewijzen nabij de doellengte van de context. Deze constructie met twee segmenten introduceert zowel lokale als relatieve afstanden over lange afstand binnen een korte fysieke reeks, terwijl de semantische continuïteit van de trainingstekst behouden blijft – een eigenschap die ontbreekt in op brokken gebaseerde simulatiebenaderingen die aaneengesloten context splitsen. We geven een theoretische analyse, gefundeerd op Rotary Position Embedding en de Bernstein-ongelijkheid, waaruit blijkt dat positie-interpolatie een rigoureuze gladheidsbeperking oplegt aan de aandachtsfunctie, waarbij gedeelde Transformer-parameters onstabiele extrapolatie naar niet-waargenomen tussenliggende afstanden verder onderdrukken. Toegepast op modellen uit de LLaMA-familie, waarbij het contextvenster wordt uitgebreid van 8K naar 64K, behaalt EndPrompt een gemiddelde RULER-score van 76,03 en de hoogste gemiddelde score op LongBench, waarbij LCEG (72,24), LongLoRA (72,95) en volledige-lengte fine-tuning (69,23) worden overtroffen, terwijl aanzienlijk minder rekenkracht nodig is. Deze resultaten tonen aan dat generalisatie van lange context kan worden geïnduceerd uit spaarzame positionele supervisie, waarmee de heersende veronderstelling wordt uitgedaagd dat dichte langereeksentraining noodzakelijk is voor betrouwbare contextvensteruitbreiding. De code is beschikbaar op https://github.com/clx1415926/EndPrompt.

English

Extending the context window of large language models typically requires training on sequences at the target length, incurring quadratic memory and computational costs that make long-context adaptation expensive and difficult to reproduce. We propose EndPrompt, a method that achieves effective context extension using only short training sequences. The core insight is that exposing a model to long-range relative positional distances does not require constructing full-length inputs: we preserve the original short context as an intact first segment and append a brief terminal prompt as a second segment, assigning it positional indices near the target context length. This two-segment construction introduces both local and long-range relative distances within a short physical sequence while maintaining the semantic continuity of the training text--a property absent in chunk-based simulation approaches that split contiguous context. We provide a theoretical analysis grounded in Rotary Position Embedding and the Bernstein inequality, showing that position interpolation induces a rigorous smoothness constraint over the attention function, with shared Transformer parameters further suppressing unstable extrapolation to unobserved intermediate distances. Applied to LLaMA-family models extending the context window from 8K to 64K, EndPrompt achieves an average RULER score of 76.03 and the highest average on LongBench, surpassing LCEG (72.24), LongLoRA (72.95), and full-length fine-tuning (69.23) while requiring substantially less computation. These results demonstrate that long-context generalization can be induced from sparse positional supervision, challenging the prevailing assumption that dense long-sequence training is necessary for reliable context-window extension. The code is available at https://github.com/clx1415926/EndPrompt.