Individual Submission Summary

Direct link:

Parallel Corpora with Estonian, Latvian, and Lithuanian Texts in the Russian National Corpus

Sun, June 3, 8:15 to 9:45am, History Corner (450 Serra Mall, Building 200), 015


Texts in Estonian, Latvian, and Lithuanian have been extensively translated from and into Russian for a number of historical reasons. The corpus of translations of culturally significant works covers a vast time span of historical periods and is fairly representative of corresponding literatures. In our talk, we are going to discuss how texts in the state languages of three Baltic countries have been integrated into the parallel subcorpus of the Russian National Corpus (RNC).
In parallel corpora, the original text is juxtaposed sentence-by-sentence to its translation. In addition, Estonian and Latvian texts in the RNC are lemmatized and morphologically annotated. The online search interface allows the analysis of the use of particular lexemes, constructions and grammatical categories. The Lithuanian-Russian corpus is currently under construction; it is planned to launch it in 2018, so that all the three state languages of the Baltic republics are represented by the 100th anniversary of independence.The choice of texts is primarily determined by the availability of their translations. Many of them have been written by the most prominent authors of different periods: these include the earliest works written in the Russian Empire, the interwar independence texts, the Soviet texts, the exile texts, and the post-Soviet independence texts.
The parallel corpora within the RNC are a valuable tool not only for linguistic studies, but also for the learners of the languages in focus and the translators.

RNC: (Estonian-Russian), (Latvian-Russian)

Short Bio

Natalia Perkova, Master of Linguistics (General Linguistics, Saint-Petersburg State University, 2011), currently a PhD candidate in linguistics, Stockholm University

PhD thesis "Presuppositional comitatives in the Circum-Baltic area"

Research interests:
● Circum-Baltic languages, most particularly Latvian;
● corpus linguistics, parallel corpora;
● linguistic typology.

Selected papers:
Perkova, Natalia & Sitchinava, Dmitri. 2016. On the Development of a Latvian-Russian Parallel Corpus. In: Inguna Skadiņa, Roberts Rozis (eds.). Human Language Technologies - The Baltic Perspective: Proceedings of the Seventh International Conference Baltic HLT 2016. Amsterdam: IOS Press. 130-135.
Perkova, Natalia. 2016. Latyšskie komitativnye konstrukcii v areal’noj perspektive. [Latvian comitative constructions from the perspective of areal linguistics]. Acta Linguistica Petropolitana XII (1): 186-192.
Perkova, Natalia. 2015. Adjectives of temperature in Latvian. In : Koptjevskaja-Tamm, Maria (ed.), The Linguistics of Temperature. Amsterdam, Benjamins, 216-253.
Perkova, Natalia. 2014. S soboj v russkom jazyke: komitativnye konstrukcii kauzacii peremeščenija i ih svojstva [С собой in Russian: comitative constructions of caused motion and their properties]. Acta Linguistica Petropolitana X (2): 157-179.
Perkova, Natalia. 2012. Review of Aurelija Usonienė, Nicole Nau, Ineta Dabašinskienė (eds.), Multiple Perspectives in Linguistic Research on Baltic Languages. Newcastle upon Tyne: Cambridge Scholars Publishing. Baltic Linguistics 3: 179-192.

Dmitri Sitchinava, PhD in linguistics (2005), senior researcher, V.V. Vinogradov Russian Language Institute of the Russian Academy of Sciences; Associate Professor, National Research University Higher School of Economics

Research interests:
Russian and other Slavic languages;
corpus linguistics, parallel corpora;
linguistic typology;
poetics and metrics

The most complete list of papers available at: