MEDAL Summer School in Computational Modelling

The 2025 summer school will take place in Birmingham between the 23rd and 27th of June.
If you do not need a visa to enter the UK, you can still register with the above link, and you will automatically be put on a waiting list.
Important announcements:
- Visas: The registration list for those who need a visa application letter from the University of Birmingham is closed now!
- Consortium: If you're affiliated to one of our partnering Universities/Research Institutes, then please register before the 31st of January (or before registration is full) to safe a spot in this years summer school!
- Others: If you are not affiliated to one of our partnering Universities and registered after the 6th of January, you will be placed on a waiting list.
Join us this summer in the vibrant city of Birmingham for an immersive journey into the exciting world of Computational Modelling for the language sciences! This program offers an unparalleled opportunity to deepen your expertise and network with leading scholars.
🔍 What to Expect:
- Inspiring Keynotes: Gain insights into the latest advancements from top researchers.
- Expert-Led Workshops: Learn from renowned academics on topics like LDL Models, LLM for linguists, NDL algorithm, Construction Grammar: LLMs, child language modelling, computational approaches to language evolution. More will be added!
- Hands-On Labs: Get practical experience with cutting-edge methods.
- Individual Consultation Sessions: Get personalized insights and hands-on guidance in a one-on-one session with the plenary or workshop instructor, tailored to your learning goals!
👩💻 Who Should Attend?
- The talks and workshops are designed for beginners as well as advanced learners.
- Beginners are supported during the first, set up day, where we help with the equipment and software installation.
The summer school will consist of a keynote session and classes about either Python for linguists, Consultations and Poster Presentations each day. On four days parallel workshops will be given that you can choose from such as:
- LLMs for linguists
- LDL
- Construction Grammar: LLMs
- Computational models for L1/L2
- Multimodal interaction
- Language evolution
After the workshops, each day will be wrapped up by social activities.
The programme is now available, however more workshops will be added soon! Check the programme page for updates and changes.
Workshops and social events
Detailed information about the workshops and social events will be published here.
Plenaries
Workshops
You will explore the barriers that have limited the uptake of computational methods in cognitive linguistics, such as steep learning curves, reliance on Big Data, and the demand for exact instruction. Through a rich mix of the “history of ideas” and hands-on examples, the workshop will demonstrate how computational models can be tailored to address linguistic complexity, delivering empirically testable predictions and actionable insights. During the hands-on sessions you will learn to use computational models based on research on learning from psychology. After an introduction to the principles of learning, we will demonstrate how error-correction models can be used for the analysis of linguistic data, using some of our own work. Using findings from work on L1 and L2, we will teach you how annotated corpus data can be used in the computational models, and how computational modelling can be used to generate hypotheses that can be tested in experimental or classroom settings. The workshop is tailored for participants with no programming experience, and make use of our cloud computation infrastructure. It includes hands-on work with existing data while also setting aside time for participants to apply what they have learned to their own data.
Level of participation: Beginners to Advanced
Software requirements: None, we will be using our cloud computing interface but the training can also be run in Python
Given a text of disputed or questioned authorship -- as is common in historical, literary, political, and forensic contexts -- a range of methods have been developed for inferring information about the author of that text through computational linguistic analysis. In this workshop, we introduce computational methods for linguistic authorship analysis. On day one, we introduce the field of linguistic authorship analysis, including defining different types of authorship problems, including attribution, verification, and profiling, and discussing the differences between manual and computational approaches to resolving these tasks. We then introduce methods for geolinguistic profiling, which involves predicting the geographic background of an author, with a focus on the German language. On day two, we discuss the task of authorship attribution, which involves selecting the most likely author of an anonymous text from a set of candidate authors, using large language models. We discuss how to fine-tune authorial large language models for authorship analysis and demonstrate this methodology using a range of standard English-language benchmarking corpora.
Level: Postgraduate students and higher
Software: We will be presenting techniques implemented in R/R Studio (geolinguistic profiling) and Python (authorship attribution) and will provide notebooks that students are welcome to follow along with during our sessions; however, students will not be required to run code themselves.
The workshop will present an overview of the (psycho)linguistic application of data-driven models trained on data of language usage. The issue will be addressed from both a historical and a methodological perspective, with a discussion ranging from traditional distributional approaches to modern large language models. The talk will highlight the continuity between such different modelling traditions, as well as their viability as instruments for scientific investigation and their degree of cognitive plausibility. The talk will mostly focus on systems trained on text corpora, but the possibility of using other types of data sources (such as databases of annotated images) will be also explored. Examples of empirical works will be provided, combining the analysis of such models with psychological and linguistic issues.
Level: Suitable for everyone
Software: N/A
Large language models (LLMs) based on the transformer architecture are deep neural networks trained on textual corpora for general-purpose language understanding and generation. This workshop will introduce attendees to popular Python libraries devoted to interrogating LLMs on measures of language processing and representation. Specifically, attendees will learn to obtain representations of linguistic units at various level of granularity (sub-word tokens, words, sentences) and to probe the impact of context on these representations (e.g., how the meaning of ambiguous words is modulated by the sentences in which they occur). Then, attendees will learn to extract LLMs’ predictability measures, which will be applied to obtain estimates of sentence grammaticality and semantic plausibility. Finally, the workshop will introduce the basic tools to probe the connection between language and vision with vision-language models. The workshop will cover foundational LLMs of different families (i.e., BERT encoder language models, GPT decoder language models, CLIP multimodal models). After the workshop, attendees will know how to use basic Python resources to start working independently with LLMs.
Level: Theoretically, the workshop assumes a very basic knowledge of large language models. Thus, while some theoretical concepts will be reviewed throughout the workshop, we highly recommend attendees to follow the theoretical lectures on LLMs. Practically, the workshop will not assume experience with Python; however, unexperienced attendees are encouraged to participate in the “Python for linguists” sessions in order to familiarize themselves with the coding environment.
Software: The workshop will be carried out on popular cloud-based coding platforms and thus will not require installing software on attendees' local machines. Since Google Colab will most likely be the cloud-based platform of choice, a Google account will be required.
This hands-on workshop is designed for students and researchers in psychology, linguistics, cognitive science, and related fields who are interested in applying computational methods to language research. The workshop introduces key computational approaches for testing different theories of child word learning using Python. By the end of the workshop, participants will have a foundation for integrating computational modelling into their research workflow, along with Jupyter notebooks covering all workshop activities (input preprocessing, implementation, output visualisation, and evaluation using real and simulated data). The workshop focuses on two influential theories in language acquisition: transitional probability, which explains how infants discover words in fluent speech using statistical cues, and chunking, which captures how children build a vocabulary through exposure to parental input. Participants will implement these theories using two major approaches: first, by testing a transitional probability model and aligning its output with infant behavioural data, and second, by running simulations on conversational corpus data to examine how manipulating a chunking-based learning system and its input affects vocabulary acquisition. Participants should bring their own laptop and can use Google Colab without installing any software. All that is needed is a modern web browser (e.g., Chrome, Safari), a Google account, and a stable internet connection. However, in case of internet dropouts, we recommend that participants install Python (version 3.8+) and Jupyter Notebook. A list of required packages, setup instructions, and further details about the workshop are available here.
Level: This hands-on workshop is designed for students and researchers in psychology, linguistics, cognitive science, and related fields who are interested in applying computational methods to language research.
Software: Participants should bring their own laptop and can use Google Colab without installing any software. All that is needed is a modern web browser (e.g., Chrome, Safari), a Google account, and a stable internet connection. However, in case of internet dropouts, we recommend that participants install Python (version 3.8+) and Jupyter Notebook. A list of required packages, setup instructions, and further details about the workshop are available here.
This workshop will introduce fundamental methods for analysing multimodal signals in conversation. On the first day, we will discuss how to process kinematic information (i.e., how to extract key body points) and automatically transcribe and align speech from dialogue video recordings. On the second day, we will build on this knowledge to develop methods that allow us to automatically detect gestures using speech and kinematic features. Each workshop day will consist of a short presentation of at most 45 minutes, followed by hands-on practical exercises and discussion.
Level: The workshop is suitable for anyone with an interest in multimodality. We expect students to have basic programming skills, preferably in Python.
Software: We plan to use the following software: Vscode, Python (e.g., Miniconda), MediaPipe, and Whisper-X. You do not need to install this software beforehand: we will help you out during the workshop.
In this workshop we will play with simulation models of the processes implicated in the emergence of language structure: individual learning, cultural transmission, and genetic evolution. I will provide simple recreations in python of two key models in the literature: a model showing that compositional structure in language arises as a trade-off between simplicity and expressivity; and a model that shows that strong linguistic nativism cannot evolve. Both are based on very simple Bayesian models of individuals that are placed in simulated populations that interact and learn from one another. We will explore the parameter space of the models and talk about how they might be extended.
Level: Beginners
Software: Jupyter notebooks with matplotlib, scipy, and numpy installed.
I will present an error-driven computational model for the mental lexicon that provides a set of algorithms for probing visual and auditory comprehension, as well as speech production. The first half of the workshop will introduce basic concepts and key elements of the DLM theory. The second half of the workshop will provide participants with hands-on experience with the open-source implementation of the model, the JudiLing package for the julia programming language. Participants will be guided through a jupyter notebook that illustrates how the DLM can be used both as a linguistic model and as a cognitive model generating detailed predictions for lexical processing.
Level: PhD students, postdocs
Softwares: R julia
Social Events
Additional stuff
Programme
This years summer school starts with getting set up Monday. From Tuesday-Friday every day starts with two keynotes. Afterwards there will be workshop classes on Python for linguists, Consultations and Poster Presentations each day that are followed by a lunch break. On the first day a opening roundtable is hosted by Petar Milin and a MEDAL projects update is given. On the remaining days different parallel workshop sessions are organized.
Last but not least, every day is closed by social activities!
The workshop titles in this programme are short forms of the official titles; for more detailed information about the content of the workshop, please check the Workshops and social events page. Please be also aware that the exact locations will follow.
Please note the following information regarding course duration and structure:
- The following courses will run over four days:
• LLMs for Linguists (Marco Marelli and Marco Ciapparelli)
• LDL (Melanie Bell and Harald Baayen)
• Construction Grammar (Florent Perek and Harish Tayyar Madabushi) - The first two sessions of LLMs and LDL, as well as the first session of Construction Grammar, are introductory. Attending these is likely necessary before joining the more advanced final sessions.
- All remaining courses are two days long.
- The length and sequence of each workshop are indicated in brackets (e.g. 2/4 = second session in a four-day course).
Click on the sentence below to see the program as a downloadable pdf file, or scroll down for the browser version:
Click here to see the program as a downloadable pdf file!
Monday June 23th
11:00-13:30
Opening & Registration
13:30-15:00
Opening Roundtable (Petar Milin)
15:00-15:30
Coffee break
15:30-17:00
MEDAL Project update
17:00-
Social activities
Tuesday June 24th
08:30-09:00
Registration
09:00-10:00
Plenary 1:
10:15 – 11:15
Plenary 2:
Marco Marelli & Marco Ciapparelli
11:30-12:30
Poster Presentations
Python for linguists
Consultations
*Note that these are parallel sessions
12:30-14:00
Lunch break
14:00-15:30
Parallel session A: From distributional semantics to LLMs (1/4): Marco Marelli
Parallel session B: LDL Low (1/4): Melanie Bell
Parallel session C: Building Computational Models of Child Word Learning (1/2): Gary Jones and Francesco Cabiddu
Parallel session D: Methods for the automatic processing of multimodal interaction (1/2): Raquel Fernandez and Esam Ghaleb
Parallel session E: Computational simulations of error-driven learning in L1 and L2 (1/2): Dagmar Divjak and Petar Milin
15:30-15:45
Coffee break
15:45-17:00
Parallel sessions A-E continued
Parallel session A: From distributional semantics to LLMs (1/4): Marco Marelli
Parallel session B: LDL Low (1/4): Melanie Bell
Parallel session C: Building Computational Models of Child Word Learning (1/2): Gary Jones and Francesco Cabiddu
Parallel session D: Methods for the automatic processing of multimodal interaction (1/2): Raquel Fernandez and Esam Ghaleb
Parallel session E: Computational simulations of error-driven learning in L1 and L2 (1/2): Dagmar Divjak and Petar Milin
Wednesday June 25th
08:30-09:00
Registration
09:00-10:00
Plenary 1:
Gary Jones & Francesco Cabiddu
10:15 – 11:15
Plenary 2:
Raquel Fernandez & Esam Ghaleb
11:30-12:30
Poster Presentations
Python for linguists
Consultations
*Note that these are parallel sessions
12:30-14:00
Lunch break
14:00-15:30
Parallel session A: From distributional semantics to LLMs (2/4): Marco Marelli
Parallel session B: LDL Low (2/4): Melanie Bell
Parallel session C: Building Computational Models of Child Word Learning (2/2): Gary Jones and Francesco Cabiddu
Parallel session D: Methods for the automatic processing of multimodal interaction (2/2): Raquel Fernandez and Esam Ghaleb
Parallel session E: Using distributional semantics in linguistic research (1/3): Florent Perek
Parallel session F: Computational simulations of error-driven learning in L1 and L2 (2/2): Dagmar Divjak and Petar Milin
15:30-15:45
Coffee break
15:45-17:00
Parallel sessions A-F continued
Parallel session A: Computational simulations of error-driven learning in L1 and L2 (2/4): Marco Marelli
Parallel session B: LDL Low (2/4): Melanie Bell
Parallel session C: Building Computational Models of Child Word Learning (2/2): Gary Jones and Francesco Cabiddu
Parallel session D: Building Computational Models of Child Word Learning (2/2): Raquel Fernandez and Esam Ghaleb
Parallel session E: Using distributional semantics in linguistic research (1/3): Florent Perek
Parallel session F: Computational simulations of error-driven learning in L1 and L2 (2/2): Dagmar Divjak and Petar Milin
17:00-19:00
Reception
Thursday June 26th
08:30-09:00
Registration
09:00-10:00
Plenary 1:
10:15 – 11:15
Plenary 2:
11:30-12:30
Poster Presentations
Python for linguists
Consultations
*Note that these are parallel sessions
12:30-14:00
Lunch break
14:00-15:30
Parallel session A: Large language models for (psycho)linguistics (3/4): Marco Ciapparelli
Parallel session B: Modeling lexical processing with the Discriminative Lexicon Model (3/4): Harald Baayen
Parallel session C: Simulating the evolution of language (1/2): Simon Kirby
Parallel session D: Construction grammar (2/3): Harish Tayyar Madabushi
Parallel session E: Computational Authorship Analysis (1/2): Jack Grieve
15:30-15:45
Coffee break
15:45-17:00
Parallel sessions A-E continued
Parallel session A: Large language models for (psycho)linguistics (3/4): Marco Ciapparelli
Parallel session B: Modeling lexical processing with the Discriminative Lexicon Model (3/4): Harald Baayen
Parallel session C: Simulating the evolution of language (1/2): Simon Kirby
Parallel session D: Construction grammar (2/3): Harish Tayyar Madabushi
Parallel session E: Computational Authorship Analysis (1/2): Jack Grieve
Friday June 27th
08:30-09:00
Registration
09:00-10:00
Plenary 1:
10:15 – 11:15
Plenary 2:
11:30-12:30
Python for linguists
Consultations
*Note that these are parallel sessions
12:30-14:00
Lunch break
14:00-15:30
Parallel session A: Large language models for (psycho)linguistics (4/4): Marco Ciapparelli
Parallel session B: Modeling lexical processing with the Discriminative Lexicon Model (4/4): Harald Baayen
Parallel session C: Simulating the evolution of language (2/2): Simon Kirby
Parallel session D: Construction grammar (3/3): Harish Tayyar Madabushi
Parallel session E: Computational Authorship Analysis(2/2): Jack Grieve
15:30-15:45
Coffee break
15:45-17:00
Parallel sessions A-E continued
Parallel session A: Large language models for (psycho)linguistics (4/4): Marco Ciapparelli
Parallel session B: Modeling lexical processing with the Discriminative Lexicon Model (4/4): Harald Baayen
Parallel session C: Simulating the evolution of language (2/2): Simon Kirby
Parallel session D: Construction grammar (3/3): Harish Tayyar Madabushi
Parallel session E: Computational Authorship Analysis (2/2): Jack Grieve