Home // eKNOW 2012, The Fourth International Conference on Information, Process, and Knowledge Management // View article


Automatic Keyphrase Extraction: A Comparison of Methods

Authors:
Richard Hussey
Shirley Williams
Richard Mitchell

Keywords: Term Frequency, Inverse Document Frequency, C-Value, NC-Value, Synonyms, Comparisons, Automated Keyphrase Extraction, Document Classification.

Abstract:
There are many published methods available for creating keyphrases for documents. Previous work in the field has shown that in a significant proportion of cases author selected keyphrases are not appropriate for the document they accompany. This requires the use of such automated methods to improve the use of keyphrases. Often the keyphrases are not updated when the focus of a paper changes or include keyphrases that are more classificatory than explanatory. The published methods are all evaluated using different corpora, typically one relevant to their field of study. This not only makes it difficult to incorporate the useful elements of algorithms in future work but also makes comparing the results of each method inefficient and ineffective. This paper describes the work undertaken to compare five methods across a common baseline of six corpora. The methods chosen were term frequency, inverse document frequency, the C-Value, the NC-Value, and a synonym based approach. These methods were compared to evaluate performance and quality of results, and to provide a future benchmark. It is shown that, with the comparison metric used for this study Term Frequency and Inverse Document Frequency were the best algorithms, with the synonym based approach following them. Further work in the area is required to determine an appropriate (or more appropriate) comparison metric.

Pages: 18 to 23

Copyright: Copyright (c) IARIA, 2012

Publication date: January 30, 2012

Published in: conference

ISSN: 2308-4375

ISBN: 978-1-61208-181-6

Location: Valencia, Spain

Dates: from January 30, 2012 to February 4, 2012