Home // HUSO 2017, The Third International Conference on Human and Social Analytics // View article
Approach for Identification of Artificially Generated Texts
Authors:
Katerina Korenblat
Zeev Volkovich
Keywords: Scientific Frauds; SCIgen; Classification
Abstract:
The paper is devoted to a new method for identification artificially composed scientific papers. We consider this problem from the general point of view of the writing style. It is naturally to suppose that the style of artificial generated manuscripts has to be substantially different from this one of the human generated articles because the human writing process is established in inherently another manner. The Mean Dependency Distance introduced in previous authors’ works is used to quantify the writing process developing. A set of artificially generated manuscripts is taken and the distance values are calculated to sequential chunks of all papers. A suspected document is also divided into chunks, and a version of the known KNN method is applied together with a distance-based outlier detection method to classify it as a real or a fake document. The provided numerical experiments demonstrate high ability of the method to distinguish between two types of documents.
Pages: 7 to 10
Copyright: Copyright (c) IARIA, 2017
Publication date: July 23, 2017
Published in: conference
ISSN: 2519-8351
ISBN: 978-1-61208-578-4
Location: Nice, France
Dates: from July 23, 2017 to July 27, 2017