Home // CONTENT 2019, The Eleventh International Conference on Creative Content Technologies // View article
Using Domain Taxonomy to Model Generalization of Thematic Fuzzy Clusters
Authors:
Dmitry Frolov
Susana Nascimento
Trevor Fenner
Boris Mirkin
Keywords: Generalization; gap-offshoot penalty; fuzzy cluster; spectral clustering; annotated suffix tree
Abstract:
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its 'head subject' in the higher ranks of the taxonomy tree. The head subject is supposed to 'tightly' cover the query set, possibly bringing in some errors, both 'gaps' and 'offshoots'. Our method globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. We apply this to a collection of about 18000 research papers published in Springer journals on Data Science for the past 20 years. We extract a taxonomy of Data Science from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection and use lifted head subjects of the thematic clusters to comment on the tendencies of current research in the corresponding aspects of the domain.
Pages: 20 to 25
Copyright: Copyright (c) IARIA, 2019
Publication date: May 5, 2019
Published in: conference
ISSN: 2308-4162
ISBN: 978-1-61208-707-8
Location: Venice, Italy
Dates: from May 5, 2019 to May 9, 2019