{"id":9876,"date":"2023-12-05T13:55:11","date_gmt":"2023-12-05T12:55:11","guid":{"rendered":"https:\/\/dev.test-wpo.pl\/?post_type=projekty-nib&#038;p=9876"},"modified":"2024-02-29T12:32:22","modified_gmt":"2024-02-29T11:32:22","slug":"rozbudowa-elektronicznego-korpusu-tekstow-polskich-xvii-i-xviii-w-i-jego-integracja-z-elektronicznym-slownikiem-jezyka-polskiego-xvii-i-xviii-w","status":"publish","type":"projekty-nib","link":"https:\/\/ijppan.pl\/en\/projekty-nib\/rozbudowa-elektronicznego-korpusu-tekstow-polskich-xvii-i-xviii-w-i-jego-integracja-z-elektronicznym-slownikiem-jezyka-polskiego-xvii-i-xviii-w\/","title":{"rendered":"THE EXTENDING OF THE ELECTRONIC CORPUS OF THE 17TH- AND 18TH-CENTURY POLISH TEXTS AND ITS INTEGRATION WITH THE ELECTRONIC DICTIONARY OF THE 17TH- AND 18TH-CENTURY POLISH"},"content":{"rendered":"<p><strong>Principal investigator:<\/strong> Prof. W\u0142odzimierz Gruszczy\u0144ski<\/p>\n<p><strong>Contractors<\/strong><\/p>\n<p><strong>Employees of the Institute of Polish Language, PAS<\/strong>:\u00a0 Dorota Adamiec, Bart\u0142omiej Borek, Renata Bronikowska, Mirella Gliwi\u0144ska, Katarzyna Kry\u0144ska, Magdalena Majdak, Jagoda Marsza\u0142ek, Wies\u0142aw Morawski, Ewa Rodek, Aleksandra Wieczorek<\/p>\n<p><strong>Employees of the Institute of Computer Science, PAS<\/strong>: Tomasz Bartosiak, Witold Kiera\u015b, Dorota Komosi\u0144ska, Bart\u0142omiej Nito\u0144, Maciej Ogrodniczuk, Marcin Woli\u0144ski, Alina Wr\u00f3blewska<\/p>\n<p><strong>Other contractors<\/strong>: Magdalena Awianowicz, Halina Bedeniczuk, Joanna Bili\u0144ska-Brynk, Alina Borsewicz, Marta Chomaniuk, Anna Dzier\u017cawska, Zbigniew Gaw\u0142owicz, Micha\u0142 Godlewski, Norbert Go\u0142dys, Artur Goszczy\u0144ski, Bo\u017cena Itoya, Klaudia Jovanovska, Hanna Jurczyk, Ilona Jurkiewicz-Bucha\u0142a, Ewa Karasi\u0144ska-Gajo, Kacper Kardas, Agnieszka Kirsztejn, Ludwika Klejnowska, Joanna Koc, Magdalena Ko\u0142odziejczyk, Bartosz Kossakowski, Matylda Koz\u0142owska, Grzegorz Kulesza, Weronika Lachowicz, Ma\u0142gorzata Maciejewska, Agnieszka Ma\u0142ochleb, Emanuel Modrzejewski, Aleksandra Opali\u0144ska, Ewa Oranowska-Wr\u00f3bel, Natalia Owsianka, Ma\u0142gorzata Pachulska, Katarzyna P\u0142o\u0144ska, Paulina Rosalska, Martyna Saba\u0142a-Bolek, Andrea Smolarz, Katarzyna Stankiewicz, Olga Stolarczyk, Jacek Stwora, Monika Szafra\u0144ska, Agnieszka Szuli\u0144ska, Bartosz Szyma\u0144ski, Renata \u015ali\u017c, Klaudia Wieczorek, Micha\u0142 Wieczorek, Patrycja Wojtasik, Krzysztof Wr\u00f3bel, Maciej Zboch, Mateusz \u017b\u00f3\u0142tak<\/p>\n<p><strong>Project number:<\/strong> 11H 180413 86<br \/>\n<strong>Start date:<\/strong> 2018-12-06<br \/>\n<strong>End date:<\/strong> 2023-12-05<br \/>\n<strong>Funding entity:<\/strong> MEiN \u2013 NPRH 86<\/p>\n<h2>project description<\/h2>\n<p>The aim of the project was to continue work on the Electronic Corpus of the 17<sup>th<\/sup>&#8211; and 18<sup>th<\/sup>-century Polish Texts (until 1772) created in 2013-2018. The corpus, initially numbering 13.5 million segments and including texts from the Baroque period (hence the short name KorBa for \u201cBaroque Corpus\u201d), was expanded to include texts from the late 18<sup>th<\/sup> century belonging to the Enlightenment period. Since these cultural trends have left a clear mark on the language, the new version of KorBa comprises two subcorpora that can be searched separately: Baroque (1601-1740) and Enlightenment (1741-1800). New texts from the 17<sup>th<\/sup> and early 18<sup>th<\/sup> centuries have also been added to the corpus, selected to ensure greater chronological, geographical, genre and thematic balance. In total, the new KorBa contains nearly 27 million tokens from more than 2,000 texts. An experimental syntactically annotated corpus, consisting of 1,000 sentences has also been developed.<\/p>\n<p>The new version of KorBa has been built using two new tools based on neural networks: a transcriber, designed for automatic transforming transliterated text into modern spelling, and a tagging system KFTT comprising a tokenizer, a morphosyntactic tagger and a lemmatizer. Thanks to its neural architecture, KFTT can handle less common tokenization and spelling found in historical texts with high accuracy. The use of modern technologies allowed to reduce the number of errors occurring during data processing and thus increase the reliability of the results.<\/p>\n<p>The project also included integration of the four sources for research on the Polish language of the 17<sup>th<\/sup> and 18<sup>th<\/sup> centuries: KorBa, the Electronic Dictionary of the 17<sup>th<\/sup>&#8211; and 18<sup>th<\/sup>-century Polish (e-SXVII), the Digital Library of Polish and Poland-Related News Pamphlets from the 16<sup>th<\/sup> to the 18<sup>th<\/sup> Century (CBDU) and the Card-index of the Dictionary of the Polish Language of the 17<sup>th<\/sup> and First Half of the 18<sup>th <\/sup>Century (KXVII). For this purpose, the dedicated website Polish Language of the 17<sup>th<\/sup> and 18<sup>th<\/sup> Centuries (<a href=\"https:\/\/polszczyzna17-18.ijppan.pl\/\">https:\/\/polszczyzna17-18.ijppan.pl<\/a>) has been developed, which allows the simultaneous searching of these resources. In addition, connections between individual resources aimed at special purposes have been created. For instance the connections between the KorBa and e-SXVII websites make it easier for dictionary editors to use the corpus. The links between CBDU and e-SXVII allow explaining archaic words appearing in CBDU texts by referring to the appropriate e-SXVII entries. All these connections are dynamic, which means that each time data is downloaded from the current database of individual resources.<\/p>\n<h2>LIST OF PUBLICATIONS RELATED TO THE PROJECT<\/h2>\n<p>Bili\u0144ska-Brynk, J., Rodek, E., <em>Paper Quotation Slips to the Electronic Dictionary of the 17th- and 18th-Century Polish \u2013 Digital Index and its Integration with the Dictionary<\/em>, [in:] Gavriilidou, Z., Mitsiaki, M., Fliatouras, A. (eds.) <em>Proceedings of the XIX EURALEX Congress: Lexicography for Inclusion<\/em>, t. I, Democritus University of Thrace (2020), pp. 465-470.<\/p>\n<p>Bronikowska, R., Kry\u0144ska, K., <em>\u0141acina w KorBie. U\u017cyteczno\u015b\u0107 Elektronicznego Korpusu Tekst\u00f3w Polskich XVII i XVIII Wieku dla filologa neolatynisty<\/em>, <em>\u201c<\/em>Polonica<em>\u201d<\/em> XL (2020), pp. 123-135.<\/p>\n<p>Bronikowska, R., Majdak, M., Wieczorek, A., \u017b\u00f3\u0142tak, M., <em>The Electronic Dictionary of the 17th- and 18th-century Polish &#8211; towards the open formula asset of the historical vocabulary<\/em>, [in:] Gavriilidou, Z., Mitsiaki, M., Fliatouras, A. (eds.) <em>Proceedings of the XIX EURALEX Congress: Lexicography for Inclusion<\/em>, t. I, Democritus University of Thrace (2020), pp. 471-475.<\/p>\n<p>Bronikowska, R., <em>Predykatywne u\u017cycia przymiotnik\u00f3w w rodzaju \u017ce\u0144skim w dawnej polszczy\u017anie \u2013 semantyczna charakterystyka na podstawie danych korpusowych<\/em>, <em>\u201c<\/em>Prace Filologiczne\u201d 76, 2021, pp. 49-65. <a href=\"https:\/\/doi.org\/10.32798\/pf.869\">https:\/\/doi.org\/10.32798\/pf.869<\/a> .<\/p>\n<p>Bronikowska, R., <em>Unfinished \u201cverbization\u201d process \u2013 the development of predicative constructions with an adjective of the feminine gender in the 17th and 18th centuries in the light of corpus data<\/em>, <em>\u201c<\/em>Polonica\u201d, 41(1), 2021, pp. 97-110. <a href=\"https:\/\/doi.org\/10.17651\/POLON.41.7\">https:\/\/doi.org\/10.17651\/POLON.41.7<\/a>.<\/p>\n<p>Bronikowska, R., <em>Verbification of feminine forms of adjectives mo\u017cna \u2018possible\u2019, niemo\u017cna \u2018impossible\u2019 and niepodobna \u2018impossible\u2019 \u2013 corpus-based approach<\/em>, \u201cJazykovedn\u00fd \u010casopis\u201d, vol. 74(1) (2023), pp. 9-18. <a href=\"https:\/\/www.juls.savba.sk\/ediela\/jc\/2023\/1\/jc23-01.pdf\">https:\/\/www.juls.savba.sk\/ediela\/jc\/2023\/1\/jc23-01.pdf<\/a><\/p>\n<p>Gruszczy\u0144ski, W., Adamiec, D., Bronikowska, R., Kiera\u015b, W., Modrzejewski, E., Wieczorek, A. i\u00a0 Woli\u0144ski, M., <em>The Electronic Corpus of 17th- and 18th-century Polish Texts<\/em>, <em>\u201c<\/em>Language Resources and Evaluation\u201d vol. 56, issue 1, 2021, pp. 309-332. <a href=\"https:\/\/link.springer.com\/article\/10.1007%2Fs10579-021-09549-1\">https:\/\/link.springer.com\/article\/10.1007%2Fs10579-021-09549-1<\/a><\/p>\n<p>Gruszczy\u0144ski, W., Adamiec, D., Bronikowska, R., Wieczorek, A., <em>Elektroniczny Korpus Tekst\u00f3w Polskich z XVII i XVIII w. \u2013 problemy teoretyczne i warsztatowe<\/em>, <em>\u201c<\/em>Poradnik J\u0119zykowy\u201d 8 (2020), pp. 32\u201351.<\/p>\n<p>Gruszczy\u0144ski, W., Adamiec, D., Majdak, M., <em>Barokowa polszczyzna w internecie, czyli Elektroniczny s\u0142ownik j\u0119zyka polskiego XVII i XVIII wieku<\/em>, \u201eLingVaria\u201d 1 (2023), pp. 113\u2013124. <a href=\"https:\/\/doi.org\/10.12797\/LV.18.2023.35.08\">https:\/\/doi.org\/10.12797\/LV.18.2023.35.08<\/a><\/p>\n<p>Majdak, M., <em>Keywords in religious literature of 17th and 18th centuries in light of the data from the Electronic Corpus of 17th- and 18th-century Polish Texts<\/em>, \u201cJazykovedn\u00fd \u010casopis\u201d, vol. 74(1) (2023), pp. 100-107. <a href=\"https:\/\/www.juls.savba.sk\/ediela\/jc\/2023\/1\/jc23-01.pdf\">https:\/\/www.juls.savba.sk\/ediela\/jc\/2023\/1\/jc23-01.pdf<\/a><\/p>\n<p>Majdak, M., <em>Znaczenia wyrazu <\/em>g\u0142os<em> w s\u0142ownikach i tekstach<\/em>, [in:] M. Majdak <em>\u201c<\/em>G\u0142os. Studium leksykograficzne\u201d series Prace Instytutu J\u0119zyka Polskiego PAN 153, Krak\u00f3w 2019, pp. 50-148.<\/p>\n<p>Ogrodniczuk, M., Gruszczy\u0144ski, W., <em>Connecting Data for Digital Libraries: The Library, the Dictionary and the Corpus<\/em> (in:) <em>Jatowt A., Maeda A., Syn S. (eds.) Digital Libraries at the Crossroads of Digital Information for the Future. ICADL 2019. Lecture Notes in Computer Science<\/em>, vol. 11853. Springer, Cham (2019), pp. 125-138.<\/p>\n<p>Ogrodniczuk, M., Gruszczy\u0144ski, W., <em>Wikipedia-Based Entity Linking for the Digital Library of Polish and Poland-Related News Pamphlets.<\/em> [in:] Ishita E., Pang N.L.S., Zhou L. (eds.) Digital Libraries at Times of Massive Societal Transition. ICADL 2020. Lecture Notes in Computer Science, vol. 12504. Springer, Cham (2020), pp. 81-88.<\/p>\n<p>Ogrodniczuk, M., Kry\u0144ska, K., <em>Evaluating Machine Translation of\u00a0Latin Interjections in\u00a0the\u00a0Digital Library of\u00a0Polish and\u00a0Poland-related News Pamphlets<\/em>, [in:] Tseng, YH., Katsurai, M., Nguyen, H.N. (eds.) From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries. ICADL 2022. Lecture Notes in Computer Science, vol. 13636. Springer, Cham. <a href=\"https:\/\/doi.org\/10.1007\/978-3-031-21756-2_34\">https:\/\/doi.org\/10.1007\/978-3-031-21756-2_34<\/a><\/p>\n<p>Rodek, E., <em>Rzeczowniki \u017ce\u0144skoosobowe zako\u0144czone na -yni\/ -ini, -ica, -iczka, -aczka, -anka, -arka w XVII i\u00a0XVIII wieku (na materiale z Elektronicznego Korpusu Tekst\u00f3w Polskich z XVII i XVIII w.)<\/em>, <em>\u201c<\/em>Prace Filologiczne\u201d 78 (2023), pp. 337-358. <a href=\"https:\/\/wuw.pl\/data\/include\/cms\/Prace_Filologiczne_2023_78.pdf\">https:\/\/wuw.pl\/data\/include\/cms\/\/Prace_Filologiczne_2023_78.pdf<\/a><\/p>\n<p>Wieczorek, A., <em>Integracja Elektronicznego s\u0142ownika j\u0119zyka polskiego XVII i XVIII wieku i Elektronicznego Korpusu Tekst\u00f3w Polskich z XVII i XVIII Wieku okiem u\u017cytkownika i redaktora<\/em>, [in:] E. Hory\u0144, E. M\u0142ynarczyk and P. \u017bmigrodzki (eds.) <em>J\u0119zyk polski \u2013 mi\u0119dzy tradycj\u0105 a wsp\u00f3\u0142czesno\u015bci\u0105. Ksi\u0119ga jubileuszowa z okazji stulecia Towarzystwa Mi\u0142o\u015bnik\u00f3w J\u0119zyka Polskiego<\/em>, Krak\u00f3w 2021, pp. 547\u2013560.<\/p>\n<h2>USEFUL WEBSITES<\/h2>\n<p><a href=\"https:\/\/korba.edu.pl\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/korba.edu.pl<br \/>\n<\/a><a href=\"https:\/\/sxvii.pl\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/sxvii.pl<\/a><br \/>\n<a href=\"https:\/\/www.rcin.org.pl\/dlibra\/publication\/20029\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.rcin.org.pl\/dlibra\/publication\/20029<\/a><br \/>\n<a href=\"https:\/\/cbdu.ijppan.pl\">https:\/\/cbdu.ijppan.pl<\/a><br \/>\n<a href=\"https:\/\/polszczyzna17-18.ijppan.pl\">https:\/\/polszczyzna17-18.ijppan.pl<\/a><\/p>\n<p>&nbsp;<\/p>","protected":false},"featured_media":0,"template":"","projekty":[24,23],"class_list":["post-9876","projekty-nib","type-projekty-nib","status-publish","hentry","projekty-projekty","projekty-projekty-zrealizowane"],"acf":[],"_links":{"self":[{"href":"https:\/\/ijppan.pl\/en\/wp-json\/wp\/v2\/projekty-nib\/9876","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ijppan.pl\/en\/wp-json\/wp\/v2\/projekty-nib"}],"about":[{"href":"https:\/\/ijppan.pl\/en\/wp-json\/wp\/v2\/types\/projekty-nib"}],"wp:attachment":[{"href":"https:\/\/ijppan.pl\/en\/wp-json\/wp\/v2\/media?parent=9876"}],"wp:term":[{"taxonomy":"projekty","embeddable":true,"href":"https:\/\/ijppan.pl\/en\/wp-json\/wp\/v2\/projekty?post=9876"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}