MEDAL Events MEDAL Summer School in Computational Modelling

MEDAL Summer School in Computational Modelling

University of Birmingham

The 2025 summer school will take place in Birmingham between the 23rd and 27th of June.

Registrations have closed!

If you do not need a visa to enter the UK, you can still register with the above link, and you will automatically be put on a waiting list.

Important announcements:

  • Visas:  The registration list for those who need a visa application letter from the University of Birmingham is closed now!
  • Consortium: If you're affiliated to one of our partnering Universities/Research Institutes, then please register before the 31st of January (or before registration is full) to safe a spot in this years summer school!
  • Others: If you are not affiliated to one of our partnering Universities and registered after the 6th of January, you will be placed on a waiting list.

Join us this summer in the vibrant city of Birmingham for an immersive journey into the exciting world of Computational Modelling for the language sciences! This program offers an unparalleled opportunity to deepen your expertise and network with leading scholars.

🔍 What to Expect:

  • Inspiring Keynotes: Gain insights into the latest advancements from top researchers.
  • Expert-Led Workshops: Learn from renowned academics on topics like LDL Models, LLM for linguists, NDL algorithm, Construction Grammar: LLMs, child language modelling, computational approaches to language evolution. More will be added!
  • Hands-On Labs: Get practical experience with cutting-edge methods.
  • Individual Consultation Sessions: Get personalized insights and hands-on guidance in a one-on-one session with the plenary or workshop instructor, tailored to your learning goals!

👩‍💻 Who Should Attend?

  • The talks and workshops are designed for beginners as well as advanced learners.  
  • Beginners are supported during the first, set up day, where we help with the equipment and software installation.  

The summer school will consist of a keynote session and classes about either Python for linguists, Consultations and Poster Presentations each day. On four days parallel workshops will be given that you can choose from such as:

  • LLMs for linguists
  • LDL
  • Construction Grammar: LLMs
  • Computational models for L1/L2
  • Multimodal interaction
  • Language evolution

After the workshops, each day will be wrapped up by social activities.

The programme is now available, however more workshops will be added soon! Check the programme page for updates and changes. 

Workshops and social events

Detailed information about the workshops and social events will be published here.

Plenaries

Who is Satoshi Nakamoto? Using computational authorship analysis to help resolve the bitcoin authorship problem - Jack Grieve


 

Understanding the unknown: How to make sense of unfamiliar words, from a computational psycholinguistic perspective - Marco Marelli & Marco Ciapparelli


A CLASSIC explanation of early language acquisition - Gary Jones & Francesco Cabiddu

Co-speech gestures in face-to-face dialogue: A representation learning perspective - Raquel Fernandez & Esam Ghaleb

Cultural evolution builds the statistical structure of language: Evidence from human and whale song datasets - Simon Kirby

How does language work? Challenges and opportunities in the age of deep learning - Harald Baayen & Melanie Bell

Workshops

Computational simulations of error-driven learning in L1 and L2 - Dagmar Divjak & Petar Milin

You will explore the barriers that have limited the uptake of computational methods in cognitive linguistics, such as steep learning curves, reliance on Big Data, and the demand for exact instruction. Through a rich mix of the “history of ideas” and hands-on examples, the workshop will demonstrate how computational models can be tailored to address linguistic complexity, delivering empirically testable predictions and actionable insights. During the hands-on sessions you will learn to use computational models based on research on learning from psychology. After an introduction to the principles of learning, we will demonstrate how error-correction models can be used for the analysis of linguistic data, using some of our own work. Using findings from work on L1 and L2, we will teach you how annotated corpus data can be used in the computational models, and how computational modelling can be used to generate hypotheses that can be tested in experimental or classroom settings. The workshop is tailored for participants with no programming experience, and make use of our cloud computation infrastructure. It includes hands-on work with existing data while also setting aside time for participants to apply what they have learned to their own data.
 
Level of participation: Beginners to Advanced
Software requirements: None, we will be using our cloud computing interface but the training can also be run in Python

Computational authorship analysis - Jack Grieve, Dana Roemling & Weihang Huang

Given a text of disputed or questioned authorship -- as is common in historical, literary, political, and forensic contexts -- a range of methods have been developed for inferring information about the author of that text through computational linguistic analysis. In this workshop, we introduce computational methods for linguistic authorship analysis. On day one, we introduce the field of linguistic authorship analysis, including defining different types of authorship problems, including attribution, verification, and profiling, and discussing the differences between manual and computational approaches to resolving these tasks. We then introduce methods for geolinguistic profiling, which involves predicting the geographic background of an author, with a focus on the German language. On day two, we discuss the task of authorship attribution, which involves selecting the most likely author of an anonymous text from a set of candidate authors, using large language models. We discuss how to fine-tune authorial large language models for authorship analysis and demonstrate this methodology using a range of standard English-language benchmarking corpora.

Level: Postgraduate students and higher
Software: We will be presenting techniques implemented in R/R Studio (geolinguistic profiling) and Python (authorship attribution) and will provide notebooks that students are welcome to follow along with during our sessions; however, students will not be required to run code themselves.

From distributional semantics to LLMs - Marco Marelli

The workshop will present an overview of the (psycho)linguistic application of data-driven models trained on data of language usage. The issue will be addressed from both a historical and a methodological perspective, with a discussion ranging from traditional distributional approaches to modern large language models. The talk will highlight the continuity between such different modelling traditions, as well as their viability as instruments for scientific investigation and their degree of cognitive plausibility. The talk will mostly focus on systems trained on text corpora, but the possibility of using other types of data sources (such as databases of annotated images) will be also explored. Examples of empirical works will be provided, combining the analysis of such models with psychological and linguistic issues.

Level: Suitable for everyone
Software: N/A

Large language models for (psycho)linguistics - Marco Ciapparelli

Large language models (LLMs) based on the transformer architecture are deep neural networks trained on textual corpora for general-purpose language understanding and generation. This workshop will introduce attendees to popular Python libraries devoted to interrogating LLMs on measures of language processing and representation. Specifically, attendees will learn to obtain representations of linguistic units at various level of granularity (sub-word tokens, words, sentences) and to probe the impact of context on these representations (e.g., how the meaning of ambiguous words is modulated by the sentences in which they occur). Then, attendees will learn to extract LLMs’ predictability measures, which will be applied to obtain estimates of sentence grammaticality and semantic plausibility. Finally, the workshop will introduce the basic tools to probe the connection between language and vision with vision-language models. The workshop will cover foundational LLMs of different families (i.e., BERT encoder language models, GPT decoder language models, CLIP multimodal models). After the workshop, attendees will know how to use basic Python resources to start working independently with LLMs.

Level: Theoretically, the workshop assumes a very basic knowledge of large language models. Thus, while some theoretical concepts will be reviewed throughout the workshop, we highly recommend attendees to follow the theoretical lectures on LLMs. Practically, the workshop will not assume experience with Python; however, unexperienced attendees are encouraged to participate in the “Python for linguists” sessions in order to familiarize themselves with the coding environment.
Software: The workshop will be carried out on popular cloud-based coding platforms and thus will not require installing software on attendees' local machines. Since Google Colab will most likely be the cloud-based platform of choice, a Google account will be required.

Building computational models of child word learning: Case studies on transitional probability and chunking - Gary Jones & Francesco Cabiddu

This hands-on workshop is designed for students and researchers in psychology, linguistics, cognitive science, and related fields who are interested in applying computational methods to language research. The workshop introduces key computational approaches for testing different theories of child word learning using Python. By the end of the workshop, participants will have a foundation for integrating computational modelling into their research workflow, along with Jupyter notebooks covering all workshop activities (input preprocessing, implementation, output visualisation, and evaluation using real and simulated data). The workshop focuses on two influential theories in language acquisition: transitional probability, which explains how infants discover words in fluent speech using statistical cues, and chunking, which captures how children build a vocabulary through exposure to parental input. Participants will implement these theories using two major approaches: first, by testing a transitional probability model and aligning its output with infant behavioural data, and second, by running simulations on conversational corpus data to examine how manipulating a chunking-based learning system and its input affects vocabulary acquisition. Participants should bring their own laptop and can use Google Colab without installing any software. All that is needed is a modern web browser (e.g., Chrome, Safari), a Google account, and a stable internet connection. However, in case of internet dropouts, we recommend that participants install Python (version 3.8+) and Jupyter Notebook. A list of required packages, setup instructions, and further details about the workshop are available here.
 
Level: This hands-on workshop is designed for students and researchers in psychology, linguistics, cognitive science, and related fields who are interested in applying computational methods to language research.
Software: Participants should bring their own laptop and can use Google Colab without installing any software. All that is needed is a modern web browser (e.g., Chrome, Safari), a Google account, and a stable internet connection. However, in case of internet dropouts, we recommend that participants install Python (version 3.8+) and Jupyter Notebook. A list of required packages, setup instructions, and further details about the workshop are available here.
 

Methods for the automatic processing of multimodal interaction - Raquel Fernandez & Esam Ghaleb

This workshop will introduce fundamental methods for analysing multimodal signals in conversation. On the first day, we will discuss how to process kinematic information (i.e., how to extract key body points) and automatically transcribe and align speech from dialogue video recordings. On the second day, we will build on this knowledge to develop methods that allow us to automatically detect gestures using speech and kinematic features. Each workshop day will consist of a short presentation of at most 45 minutes, followed by hands-on practical exercises and discussion.

Level: The workshop is suitable for anyone with an interest in multimodality. We expect students to have basic programming skills, preferably in Python.
Software: We plan to use the following software: Vscode, Python (e.g., Miniconda), MediaPipe, and Whisper-X. You do not need to install this software beforehand: we will help you out during the workshop.

Simulating the evolution of language - Simon Kirby

In this workshop we will play with simulation models of the processes implicated in the emergence of language structure: individual learning, cultural transmission, and genetic evolution. I will provide simple recreations in python of two key models in the literature: a model showing that compositional structure in language arises as a trade-off between simplicity and expressivity; and a model that shows that strong linguistic nativism cannot evolve. Both are based on very simple Bayesian models of individuals that are placed in simulated populations that interact and learn from one another. We will explore the parameter space of the models and talk about how they might be extended.

Level: Beginners
Software: Jupyter notebooks with matplotlib, scipy, and numpy installed.

Modeling lexical processing with the Discriminative Lexicon Model - Harald Baayen

I will present an error-driven computational model for the mental lexicon that provides a set of algorithms for probing visual and auditory comprehension, as well as speech production. The first half of the workshop will introduce basic concepts and key elements of the DLM theory. The second half of the workshop will provide participants with hands-on experience with the open-source implementation of the model, the JudiLing package for the julia programming language. Participants will be guided through a jupyter notebook that illustrates how the DLM can be used both as a linguistic model and as a cognitive model generating detailed predictions for lexical processing.

Level: PhD students, postdocs
Softwares: R julia

Social Events

TBA!


Python set-up


Programme

This years summer school starts with getting set up Monday. From Tuesday-Friday every day starts with two keynotes. Afterwards there will be workshop classes on Python for linguists, Consultations and Poster Presentations each day that are followed by a lunch break. On the first day a opening roundtable is hosted by Petar Milin and a MEDAL projects update is given. On the remaining days different parallel workshop sessions are organized.

Last but not least, every day is closed by social activities!

The workshop titles in this programme are short forms of the official titles; for more detailed information about the content of the workshop, please check the Workshops and social events page. Please be also aware that the exact locations will follow.

Please note the following information regarding course duration and structure:

  • The following courses will run over four days:
    • LLMs for Linguists (Marco Marelli and Marco Ciapparelli)
    • LDL (Melanie Bell and Harald Baayen)
    • Construction Grammar (Florent Perek and Harish Tayyar Madabushi)
  • The first two sessions of LLMs and LDL, as well as the first session of Construction Grammar, are introductory. Attending these is likely necessary before joining the more advanced final sessions.
  • All remaining courses are two days long.
  • The length and sequence of each workshop are indicated in brackets (e.g. 2/4 = second session in a four-day course).
Click on the sentence below to see the program as a downloadable pdf file, or scroll down for the browser version:

Monday June 23th

11:00-13:30

Opening & Registration


13:30-15:00

Opening Roundtable  (Petar Milin)


15:00-15:30

Coffee break


15:30-17:00

MEDAL Project update


17:00-

Social activities


Tuesday June 24th

08:30-09:00

Registration


09:00-10:00

Plenary 1:

Harish Tayyar Madabushi


10:15 – 11:15

Plenary 2:

Marco Marelli & Marco Ciapparelli


11:30-12:30

Poster Presentations

Python for linguists

Consultations

*Note that these are parallel sessions


12:30-14:00

Lunch break


14:00-15:30

Parallel session A: From distributional semantics to LLMs (1/4): Marco Marelli

Parallel session B: LDL Low (1/4): Melanie Bell

Parallel session C: Building Computational Models of Child Word Learning (1/2): Gary Jones and Francesco Cabiddu

Parallel session D: Methods for the automatic processing of multimodal interaction (1/2): Raquel Fernandez and Esam Ghaleb

Parallel session E: Computational simulations of error-driven learning in L1 and L2 (1/2): Dagmar Divjak and Petar Milin


15:30-15:45

Coffee break


15:45-17:00

Parallel sessions A-E continued

Parallel session A: From distributional semantics to LLMs (1/4): Marco Marelli

Parallel session B: LDL Low (1/4): Melanie Bell

Parallel session C: Building Computational Models of Child Word Learning (1/2): Gary Jones and Francesco Cabiddu

Parallel session D: Methods for the automatic processing of multimodal interaction (1/2): Raquel Fernandez and Esam Ghaleb

Parallel session E: Computational simulations of error-driven learning in L1 and L2 (1/2): Dagmar Divjak and Petar Milin


Wednesday June 25th

08:30-09:00

Registration


09:00-10:00

Plenary 1:

Gary Jones & Francesco Cabiddu


10:15 – 11:15

Plenary 2:

Raquel Fernandez & Esam Ghaleb


11:30-12:30

Poster Presentations

Python for linguists

Consultations

*Note that these are parallel sessions


12:30-14:00

Lunch break


14:00-15:30

Parallel session A: From distributional semantics to LLMs (2/4): Marco Marelli

Parallel session B: LDL Low (2/4): Melanie Bell

Parallel session C: Building Computational Models of Child Word Learning (2/2): Gary Jones and Francesco Cabiddu

Parallel session D: Methods for the automatic processing of multimodal interaction (2/2): Raquel Fernandez and Esam Ghaleb

Parallel session E: Using distributional semantics in linguistic research (1/3): Florent Perek

Parallel session F: Computational simulations of error-driven learning in L1 and L2 (2/2): Dagmar Divjak and Petar Milin


15:30-15:45

Coffee break


15:45-17:00

Parallel sessions A-F continued

Parallel session A: Computational simulations of error-driven learning in L1 and L2 (2/4): Marco Marelli

Parallel session B: LDL Low (2/4): Melanie Bell

Parallel session C: Building Computational Models of Child Word Learning (2/2): Gary Jones and Francesco Cabiddu

Parallel session D: Building Computational Models of Child Word Learning (2/2): Raquel Fernandez and Esam Ghaleb

Parallel session E: Using distributional semantics in linguistic research (1/3): Florent Perek

Parallel session F: Computational simulations of error-driven learning in L1 and L2 (2/2): Dagmar Divjak and Petar Milin


17:00-19:00

Reception


Thursday June 26th

08:30-09:00

Registration


09:00-10:00

Plenary 1:

Harald Baayen & Melanie Bell


10:15 – 11:15

Plenary 2:

Florent Perek


11:30-12:30

Poster Presentations

Python for linguists

Consultations

*Note that these are parallel sessions


12:30-14:00

Lunch break


14:00-15:30

Parallel session A: Large language models for (psycho)linguistics (3/4): Marco Ciapparelli

Parallel session B: Modeling lexical processing with the Discriminative Lexicon Model (3/4): Harald Baayen  

Parallel session C: Simulating the evolution of language (1/2): Simon Kirby

Parallel session D: Construction grammar (2/3): Harish Tayyar Madabushi

Parallel session E: Computational Authorship Analysis (1/2): Jack Grieve


15:30-15:45

Coffee break


15:45-17:00

Parallel sessions A-E continued

Parallel session A: Large language models for (psycho)linguistics (3/4): Marco Ciapparelli

Parallel session B: Modeling lexical processing with the Discriminative Lexicon Model (3/4): Harald Baayen  

Parallel session C: Simulating the evolution of language (1/2): Simon Kirby

Parallel session D: Construction grammar (2/3): Harish Tayyar Madabushi

Parallel session E: Computational Authorship Analysis (1/2): Jack Grieve


Friday June 27th

08:30-09:00

Registration


09:00-10:00

Plenary 1:

Simon Kirby


10:15 – 11:15

Plenary 2:

Jack Grieve


11:30-12:30

Python for linguists

Consultations

*Note that these are parallel sessions


12:30-14:00

Lunch break


14:00-15:30

Parallel session A: Large language models for (psycho)linguistics (4/4): Marco Ciapparelli

Parallel session B: Modeling lexical processing with the Discriminative Lexicon Model (4/4): Harald Baayen  

Parallel session C: Simulating the evolution of language (2/2): Simon Kirby

Parallel session D: Construction grammar (3/3): Harish Tayyar Madabushi

Parallel session E: Computational Authorship Analysis(2/2): Jack Grieve


15:30-15:45

Coffee break


15:45-17:00

Parallel sessions A-E continued

Parallel session A: Large language models for (psycho)linguistics (4/4): Marco Ciapparelli

Parallel session B: Modeling lexical processing with the Discriminative Lexicon Model (4/4): Harald Baayen  

Parallel session C: Simulating the evolution of language (2/2): Simon Kirby

Parallel session D: Construction grammar (3/3): Harish Tayyar Madabushi

Parallel session E: Computational Authorship Analysis (2/2): Jack Grieve