THE GENERAL INTERNET CORPUS OF RUSSIAN AND THE NOTION OF REPRESENTATIVENESS IN CORPUS LINGUISTICS
1
Piperski A.C
The present article deals with the notion of representativeness in corpus linguistics. It turns out that there are no exact methods for assessing representativeness, and for this reason the representativeness of a corpus is nothing more than a tacit agreement between the creators of a corpus and its users. The General Internet Corpus of Russian (GICR) which is presently under development tries to make such an agreement explicit. It encourages its uses to study register variation in the Russian language of the Internet. The linguistic community will be able to use a research tool to study different segments of the Web and to create subcorpora using automatically extracted metadata. As for June 2013, GICR contains two segments of the Russian Web, namely the blog platform LiveJournal.com and the “Magazine Reading Room” (http://magazines.russ.ru/). More segments will be added soon.
Библиографическая ссылка
Пиперски А.Ч ГЕНЕРАЛЬНЫЙ ИНТЕРНЕТ-КОРПУС РУССКОГО ЯЗЫКА И ПОНЯТИЕ РЕПРЕЗЕНТАТИВНОСТИ В КОРПУСНОЙ ЛИНГВИСТИКЕ // Научное обозрение. Физико-математические науки
. 2020. № 1.
С. 47-48;
URL:
https://physics-mathematics.ru/en/article/view?id=63 (дата обращения: 24.06.2026).