Home // HUSO 2017, The Third International Conference on Human and Social Analytics // View article


Approach for Identification of Artificially Generated Texts

Authors:
Katerina Korenblat
Zeev Volkovich

Keywords: Scientific Frauds; SCIgen; Classification

Abstract:
The paper is devoted to a new method for identification artificially composed scientific papers. We consider this problem from the general point of view of the writing style. It is naturally to suppose that the style of artificial generated manuscripts has to be substantially different from this one of the human generated articles because the human writing process is established in inherently another manner. The Mean Dependency Distance introduced in previous authors’ works is used to quantify the writing process developing. A set of artificially generated manuscripts is taken and the distance values are calculated to sequential chunks of all papers. A suspected document is also divided into chunks, and a version of the known KNN method is applied together with a distance-based outlier detection method to classify it as a real or a fake document. The provided numerical experiments demonstrate high ability of the method to distinguish between two types of documents.

Pages: 7 to 10

Copyright: Copyright (c) IARIA, 2017

Publication date: July 23, 2017

Published in: conference

ISSN: 2519-8351

ISBN: 978-1-61208-578-4

Location: Nice, France

Dates: from July 23, 2017 to July 27, 2017