SwiLTra-Bench: De Zwitserse Benchmark voor Juridische Vertaling

Samenvatting

In Zwitserland is juridische vertaling bijzonder belangrijk vanwege de vier officiële talen van het land en de vereisten voor meertalige juridische documentatie. Dit proces is echter traditioneel afhankelijk van professionals die zowel juridische experts als bekwame vertalers moeten zijn, wat knelpunten veroorzaakt en de effectieve toegang tot rechtvaardigheid beïnvloedt. Om deze uitdaging aan te pakken, introduceren we SwiLTra-Bench, een uitgebreide meertalige benchmark van meer dan 180K uitgelijnde Zwitserse juridische vertaalparen, bestaande uit wetten, kopnoten en persberichten in alle Zwitserse talen samen met Engels, ontworpen om LLM-gebaseerde vertaalsystemen te evalueren. Onze systematische evaluatie toont aan dat toonaangevende modellen superieure vertaalprestaties behalen voor alle documenttypen, terwijl gespecialiseerde vertaalsystemen specifiek uitblinken in wetten maar onderpresteren in kopnoten. Door rigoureus testen en validatie door menselijke experts tonen we aan dat, hoewel het finetunen van open SLM's hun vertaalkwaliteit aanzienlijk verbetert, ze nog steeds achterblijven bij de beste zero-shot geprompte toonaangevende modellen zoals Claude-3.5-Sonnet. Daarnaast presenteren we SwiLTra-Judge, een gespecialiseerd LLM-evaluatiesysteem dat het beste aansluit bij de beoordelingen van menselijke experts.

English

In Switzerland legal translation is uniquely important due to the country's four official languages and requirements for multilingual legal documentation. However, this process traditionally relies on professionals who must be both legal experts and skilled translators -- creating bottlenecks and impacting effective access to justice. To address this challenge, we introduce SwiLTra-Bench, a comprehensive multilingual benchmark of over 180K aligned Swiss legal translation pairs comprising laws, headnotes, and press releases across all Swiss languages along with English, designed to evaluate LLM-based translation systems. Our systematic evaluation reveals that frontier models achieve superior translation performance across all document types, while specialized translation systems excel specifically in laws but under-perform in headnotes. Through rigorous testing and human expert validation, we demonstrate that while fine-tuning open SLMs significantly improves their translation quality, they still lag behind the best zero-shot prompted frontier models such as Claude-3.5-Sonnet. Additionally, we present SwiLTra-Judge, a specialized LLM evaluation system that aligns best with human expert assessments.

SwiLTra-Bench: De Zwitserse Benchmark voor Juridische Vertaling

SwiLTra-Bench: The Swiss Legal Translation Benchmark

Samenvatting

Support