Johann Frei from the Kramer Lab at the University of Augsburg and Raphael Scheible from the Technical University of Munich presented their work on GottBERT at the prestigious Empirical Methods in Natural Language Processing (EMNLP) Conference 2024 in Miami from November 12-18, 2024.
EMNLP is one of the leading conferences in the field of language processing, and with an acceptance rate of only 20.8% for main conference papers, the selection is a testament to the quality and relevance of this research.
GottBERT is the first purely German RoBERTa model pre-trained on the OSCAR dataset. It achieves outstanding results in Named Entity Recognition (NER) and text classification and sets new standards in the German-speaking NLP community.
The models are freely available under the MIT license on Huggingface.
The manuscript “GottBERT: a pure German Language Model” is also publicly available.
Congratulations on this success and we look forward to further impulses and developments in German-language NLP research!