Summer School Workshops and Events

Descriptions of the various workshops and social events. Workshops take place after the plenary lecture and lunch. Social events usually take place in the evenings.

Monday, 19 June

Natalia Levshina - Grammatical Variation and Deep Learning

Deep Learning and artificial neural networks are the most common algorithms that play a key role in AI applications in our daily lives. However, the use of Deep Learning has been rather limited in linguistics, including research on grammatical variation. This workshop aims to fill in this gap. I will first introduce the basic concepts, such as layers, weights, loss function and backpropagation. I will then present case studies, in which I use neural networks to test theoretical hypotheses about the genitive alternation and the role of information structure in determining word order. I will also demonstrate how one can perform Deep Learning with the help of Keras, a convenient intuitive API of TensorFlow.

Caroline Rowland - Collecting and Analysing Child Language Data

Collecting and analysing naturalistic data from children, especially from children engaging in conversations with others in the home or community, is challenging. In this workshop we will a) cover some basic principles about how to collect child language data, including issues about how to deal with ethical and legal issues in different countries and communities, b) introduce you to the most common transcription and analysis systems, with a focus on the CHILDES programs (CHAT and CLAN; see https://childes.talkbank.org/), and c) finish by briefly introducing you to some of the new automated tools available, including LENA (https://www.lena.org/) and a new pipeline we are developing for semi-automated annotation using Whisper https://openai.com/research/whisper.

Joshua Wilbur - ELAN for Beginners

This workshop provides an introduction to ELAN (tla.mpi.nl/tools/tla-tools/elan/), a tool used to annotate multi-media recordings. It is intended for people who have little or no experience with ELAN. We will cover terminology as well as theoretical and practical considerations necessary to set up your own set of ELAN files for a linguistics project, thus enabling you to create your own ELAN corpus of linguistically annotated audio/video recordings. In addition to completing practice exercises, participants will create their own initial ELAN files. We will also consider the strengths and weaknesses of using ELAN as a corpus search tool.

Evening social event: Guided tour of Tartu

In the evening of day 1, we will gather for a social exploration of Tartu city centre from 18:00 to 19:30. Our guide will take us on a tour through Tartu city streets and history. You will have a chance to get to know each other, enjoy the sights and perhaps have a pint after the tour. Meet in front of the university main building at Ülikooli 18.

Attire: casual

Tuesday, 20 June

Anke Lüdeling - Corpora and Variation: Concepts, Options and Challenges

This workshop is concerned with different types of variation and the analyses of them. Many corpus studies rely on annotated corpora. In most annotation tasks we find cases that are difficult to decide, and the more interesting a linguistic problem is, the more difficult a decision may be. Rather than viewing such difficult annotation tasks, we will work on topics like variation, concept building (tagsets), research questions, and modelling. We will distinguish between different types of problems, such as (a) unclear research questions, (b) unclear/vague theories, (c) pre-trained annotation procedures with unsuitable parameter sets, (d) genuine ambiguity, etc. and discuss what can be learned from each of these cases. Students are encouraged to bring their own research questions and examples.

Doğuş Öksüz - Tracking the development of multi-word and multi-morphemic expressions in learner language

This workshop is concerned with tracking second language learners’ development in the use of multi-word units such as collocations, binomials, lexical bundles and multi-morphemic expressions. In this workshop we will critically analyse corpus-based association measures, specifically focusing on phrasal frequencies and commonly used measures of collocation strength such as mutual information, Log Dice, and Delta P, and lexical gravity. We will examine the extent to which learners’ proficiency levels affect their use of multi-word units through the lenses of above-mentioned measures of association. We will then explore the similarities and differences in multi-word units in morphologically isolating languages like English and multi-morphemic units in morphologically rich (i.e., agglutinating) languages like Turkish and Estonian.

Andres Karjus - Descriptive Stats and Data Visualisation

This workshop introduces techniques for exploring and manipulating linguistic and other data using R and in particular the tidyverse packages, including ggplot2 for visualization, and additional packages like plotly for producing interactive graphs. The workshop also integrates ChatGPT as a coding assistant to expedite learning. Basic familiarity with R is expected, but beginners are otherwise very welcome.

Evening social event: Summer school reception

In the evening of day 2, we will meet for socialising and a light reception. The reception will take place from 18:30 until 21:00 at Ülikooli Kohvik (located at Ulikooli 20). It's a great opportunity for networking and getting to know each other further. 

Attire: smart casual

Wednesday, 21 June

Amanda Potts - Identity Analysis in SketchEngine: Basics.

In this two-part workshop, participants will be introduced to the web-based corpus analysis tool, Sketch Engine. Sketch Engine is a powerful tool that allows users to upload their own corpora in nearly any language and applies advanced part-of-speech tagging. In Part 1 of this workshop, participants will be introduced to the fundamentals of Sketch Engine, uploading their own data and applying corpus linguistic methods. In Part 2 of this workshop, participants will explore more advanced resources, including the distinctive Word Sketch feature, which makes use of part-of-speech tags and collocation to visualise the grammatical ‘behaviour’ of a lemma in a given corpus. By the end of the workshop, participants will be able to perform frequency, concordance, collocation, and keyness analysis in Sketch Engine using their own data. They will be able to describe discourses and representations of social actors and/or phenomena within the corpus (for instance: by comparing alternative phrasing) and to other contexts (i.e. in comparison to reference corpora).

Peeter Tinits - Using Newspapers in Estonia for text analytics

A large bulk of Estonian historical newspapers have been digitised and made available for research (roughly ~25%). This can be a powerful resource for linguists as well as for historians, literary scholars, and social scientists. The workshop will demonstrate available resources to do text analytics on historical newspaper texts, particularly the ones offered by the National Library of Estonia. It will provide: 1) An introduction to how the materials can be accessed (via a JupyterLab environment and otherwise); 2) What tools and helpful visualisations are available to plan your study; and 3) Simple techniques to analyse historical texts based on keyword searches, frequency analysis, and co-occurrence patterns. Historical digitised newspapers bring in a few extra technical difficulties: 1) technical errors made in digitisation (e.g. OCR errors), 2) variation in language use, 3) imbalance in the datasets. They will be discussed and a few solutions offered to these issues. The workshop will take 1.5 h + 1.5 h. The code used in the workshop will be R, and knowledge in R will be useful. However, on a superficial level, changing a few parameters in a pre-given code is possible also without prior training.

Satu Saalasti - Using CLARIN resources for corpus linguistics

The main goal of the workshop is to introduce CLARIN, the research infrastructure for language as social and cultural data, to participants. The workshop will present an overview of CLARIN, and how its resources support corpus linguistics based research all over Europe. After the overview, participants of the workshop will be able to familiarize themselves with the CLARIN infrastructure with a few hands on learning tasks. The second part of the workshop presents naturalistic neuroscience methods that utilize the methods of natural language processing and imaging for studying the brain basis of meaning.

Thursday, 22 June

Peter Uhrig - A workflow for Multimodal Corpus Research

In this workshop, participants will learn step by step how to carry out their own multimodal corpus study based on the NewsScape 2016 corpus, a collection of more than 30,000 hours of American TV News. The workshop will start with a discussion of the types of research questions that might be addressed with such a corpus approach, followed by a hands-on session introducing CQPweb and the Rapid Annotator. Students should bring a laptop with a working Internet connection and ideally headphones/earphones they can connect to their computer. Students are invited to send ideas for potential research questions via email before the workshop.

Anita Slonimska - Annotation and coding of multimodal corpora in ELAN

In this workshop you will learn the basics of how to use ELAN, a free annotation software, for coding and annotating multimodal communication corpora. The workshop is divided into three blocks: theoretical foundations of gesture, ELAN tutorial, and hands-on practice. Participants will first gain theoretical knowledge about different types of gestures, their structure, their interaction with speech and role in discourse. We will then use this theoretical foundation in a step-by-step tutorial in ELAN software in order to learn how to create and structure annotation tiers, segment and code gestures as well as how to use the coding for analysing and visualising data. In the final part of the workshop, you will engage in hands-on practice, applying your newly gained theoretical and practical knowledge. By the end of the workshop, you will be equipped with the skills that will enable you to conduct your own research on multimodal communication.

James Trujillo - Bringing together manual coding and motion-tracking for advanced analysis of multimodal communication

Analyzing corpus data often ends up either taking a qualitative, manual-coded side, or utilizes automated methods and computer vision approaches to extract and summarize data. However, manual coding and computer-vision based approaches can be highly complementary, and work very well together. In this workshop, I will provide an introduction to using manual coding to focus and inform automated methods, which in turn can provide a rich method of analysis. Specifically, we will cover 1) easy-to-use automated movement detection to speed up manual coding of visual signals, 2) automatically extracting movement data using manual annotations, 3) quantifying the temporal relationship between visual and linguistic or acoustic signals. The workshop will tutorial-like walkthroughs using open code and materials, as well as open discussions for current issues and future directions.

Petar Millin - Multi-level / mixed-effect models

Over the past 15 years, Multilevel or Mixed-Effect statistical modelling have evolved from being the "new kid on the block" to becoming the gold standard for data analysis in language sciences. With the advancements in computational implementations and researchers' growing confidence, these models have witnessed significant and rapid growth. In this workshop, we aim to revisit the fundamental principles and explore the essential requirements associated with these models. To facilitate understanding, we will provide practical examples that demonstrate their application.

Friday, 23 June

Afternoon social event: Bog hike

at Selli-Sillaotsa. The 4.3km hike follows a dirt trail in the woods, extensive boardwalks across the bog and a short part along a gravel road.

14:00 - bus leaves from Jakobi 1

16:30 - bus returns to Tartu

Attire: sporty or casual; note also that sometimes mosquitoes and other insects can be really annoying, so long sleeves and long trousers might be preferrable, and consider using insect repellent.

Evening social event: Midsummer celebration

For centuries, the Midsummer holiday has brought family members of all ages together to have fun. At Midsummer celebrations, you can listen to good music, enjoy the midsummer bonfire, games and dances.

The event will take place at Raadi Park and is organised by the city of Tartu and the Estonian National Museum

19:30 - Meet up by the huge #TARTU2024 sign on Raekoja plats to go there together

20:00 - everyone is welcome to the shore of Lake Raadi, where the band Svjata Vatra will start with a concert

21:00 - the victory fire arrives at the party site and Tartu city fire is lit

21.30 - live music by Svyata Vatra, Legend and Nedsaja Village Band continues

Attire: casual