Lectures on Digital Humanities

The ninth lecture

The risks and rewards of quantitative lexical analysis of literary texts 

Dr. hab. Jan Rybicki 
7 Juni 2019, 10 am
Instytut Informacji Naukowej i Bibliotekoznawstwa UWr,
pl. Uniwersytecki 9/13
Room 104


Quantitative analysis of the vocabulary of literary texts is a way to discern stylometric features of individual writers; thus allows authorship attribution or plagiarism detection. Also, the application of the same methods to texts of known authorship helps to bring together stylometry and distance reading, allowing at times to highlight interesting connections between texts that are not visible in traditional qualitative analysis. Particularly interesting results are those obtained by counting and examining through multivariate statistical analysis very simple, “mechanical” features of texts, such as frequencies of very frequent words. However, attempts to invade the semantic level of the text – even as simple as sentyment analysis – yield much less unambiguous results.

Jan Rybicki is an Assistant Professor at the Institute of English Studies, Jagiellonian University in Kraków, Poland. He has written extensively on the application of quantitative methods in the study of literature, tracing the stylometric signals of authors, translators, genres and genders in literary texts in several languages. Together with Maciej Eder and Mike Kestemont, he is a co-author of the “stylo” package for R, which has become a well-known tool of stylometric analysis. He is also an active literary translator; he has translated some 30 novels from English to Polish by such authors as John le Carre, Kazuo Ishiguro or William Golding.

Past lectures

© Copyright 2018 by Pracownia Humanistyki Cyfrowej UWr - All Rights Reserved

This website was started with Mobirise