Resources

On this page we collect all kinds of external resources which might be useful to early-career researchers!

Online corpora and language databases

These are links to several language repositories and corpora, most of them are freely accessible, for others you need special permission. 

  • The Language Archive
    This is an archive of many languages, including many endangered ones. The archive is hosted at the Max Planck Institute for Psycholinguistics. 
  • Universal Dependencies
    Universal Dependencies (UD) is a dependency-grammar framework for cross-linguistically consistent morphological annotation. A Treebank is a collection of sentences with morphological and syntactic annotations. UD Treebanks are  corpora available online for over 100 languages.
  • The SpeechReporting Corpus
    The SpeechReporting corpus contains corpora of traditional folk stories, annotated for a number of discourse phenomena using the ELAN-CorpA software and tools (Chanard 2015; Nikitina et al. 2019). It is updated regularly with newly available data, including data from new languages. All texts are transcribed, glossed, translated, and annotated.
     
  • Pangloss Collection
    Pangloss Collection is the archive of the fieldwork data from CNRS-affiliated research. It is developed by CNRS-LACITO.
     
  • OPUS
    A collection of parallel translated texts in multiple languages.
     
  • DELAMAN
    DELAMAN stands for Digital Endangered Languages and Musics Archives Network. It is an international network of archives of data on linguistic and cultural diversity, in particular on small languages and cultures under pressure.

  • Bambara Reference Corpus
    ​​​​​​​
    This Sketch Engine corpus contains texts of the Mande language Bambara. It contains about 1 million words and was built by Valentin Vydrin, Kirill Maslinsky, Jean Jacques Méric and Andrij Rovenchak.

  • TalkBank
    TalkBank is a project that contains many different language corpora, including CHILDES, the child language corpora.

Blogs and popular science websites

Here you can find links to interesting blogs.

  • MPI TalkLing
    This is the blog of the Max Planck Institute for Psycholinguistics.

  • Novaator (Estonian)
    Novaator is an Estonian popular science website.

  • NEMO Kennislink (Dutch)
    NEMO Kennislink is a Dutch popular science website.

Open Science resources

OSF is an open platform for sharing your data, materials and scripts! 

Here you can read about the importance of pre-registration and find useful resources about it: Preregistration (cos.io)