Home // International Journal On Advances in Software, volume 16, numbers 1 and 2, 2023 // View article


Pattern Discovery and Stylometric Analysis in English Literature and Literary Translation Through State Integration in Markovian Representations

Authors:
Clement Leung
Chenjie Zeng

Keywords: Victorian novels; English poems; multi-step Markov chain; Shakespearean plays; Brontës; Sparse Matrix

Abstract:
In analysing English literary work, the distinct aims and objectives are to determine the authorship, period, style, motif, and purpose. Here, the proper evaluation of results in English Literature is: first, place the known literary work in a machine learning model and discover their patterns and styles; second, compare the corresponding metrics with an unknown literary work. Since obtaining such knowledge from human experts is laborious and highly subjective, we align a data analysis method with extensions of the Markovian representations, which can be generalized to more versatile descriptions as the context develops. In particular, we consider the simple Markovian model and more elaborate generalisations that aim to remove the limitations of the memoryless properties of the basic Markovian representations. The first generalisation extends the state space by using the Cartesian product to form the composite state space, while the second approach exploits the stanza structure to integrate the states. The first approach can incorporate arbitrary long-time steps but leads to a high-dimension transition matrix. In contrast, the second more preferable approach yields a relatively small dimension matrix, which is computationally much more efficient. In addition, the latter approach also leads itself to further state integration by judiciously analysing the purpose of each line of a passage and provides the scope for analysing much larger corpora. Through the appropriate use of Markovian representation generalisations, examining the pattern of probability entries in the transition matrix, and applying this characterisation to the vast body of English literature, much more scientific, objective, and reliable decisions can be arrived at concerning proper authorship, writing style and other literary qualities.

Pages: 47 to 58

Copyright: Copyright (c) to authors, 2023. Used with permission.

Publication date: June 30, 2023

Published in: journal

ISSN: 1942-2628