The DICE Group has been actively involved in the development and application of Large Language Models (LLMs) across various fields. Following the successful publication of our first massively multilingual LLM, LOLA (https://aclanthology.org/2025.coling-main.428/) we are now aiming to scale our research to cover even more languages and modalities.
We also completed one full iteration of this Project Group that started in Summer Semester 2025. The work from that iteration, datasets, code, experiments, and lessons learned, will be directly available to the incoming students, giving them a solid foundation of knowledge to build upon.
For a detailed overview of the previous project group, see their conclusion slides: Final_Presentation_HTYLLM_PG_SoSe_25.pdf.
With this background, the current PG offers a unique opportunity to collaborate on developing the next generation of multilingual and multimodal language models. The project will push the boundaries of current LLM capabilities while providing hands‑on experience in cutting‑edge Natural Language Processing (NLP) and Machine Learning (ML) techniques.
Our project group aims to train a large, open-source multilingual language model and address the challenges posed by the curse of multilinguality. Specifically, our goals include:
For more information, check out the slides: HTYLLM2_PG_SoSe_26.pdf.
Q: What is the selection process for this project?
A: Candidates will need to submit an assignment and undergo an interview as part of the selection process.
Q: Is there a seminar connected to this PG?
A: No.
Q: What are the prerequisites for this PG?
A: The ideal candidate should possess foundational knowledge in NLP and ML, along with strong programming skills in Python and shell scripting. Additionally, proficiency in Linux is essential. The ability to learn quickly and adapt to new technologies and methodologies is also critical as the PG domain is expected to have steep learning curve.
In case you have further questions, feel free to contact Nikit Srivastava.
Coming soon