Home // DBKDA 2011, The Third International Conference on Advances in Databases, Knowledge, and Data Applications // View article
From Synchronous Corpus to Monitoring Corpus, LIVAC: The Chinese Case
Authors:
Benjamin K. Tsou
Andy C. Chin
Oi Yee Kwong
Keywords: monitoring corpus; synchronous corpus; homothematic coprus; LIVAC; the Chinese language
Abstract:
Very large corpora of properly processed textual materials are uncommon but they can provide important resources for language modeling in natural language processing, ranging from speech processing and text input to automatic IR and patent translation. However, when properly cultivated in spatial-temporal terms, they can foster innovative knowledge discovery in database applications by functioning as monitoring corpus and enhance the human centered communication environment by allowing more substantive introspection and comparison of linguistic and social-cultural developments of the relevant speech communities. This paper discusses how the gigantic synchronous and homothematic corpus of Chinese, LIVAC, can contribute to the monitoring the linguistic homogeneity and heterogeneity diachronically and synchronically. After processing media texts of more than 400 million Chinese characters over 16 years, LIVAC has yielded a lexical corpus of 1.5 million words. This paper examines some aspects of the nature and extent of lexical and morphological divergence and convergence in the Chinese language of Hong Kong, Taipei and Beijing. Additional discussions cover creation and relexification of neologisms, categorial fluidity and the associated challenges to terminology standardization, such as renditions of non-Chinese personal names. This paper also explores how the associated socio-cultural developments can be fruitfully monitored by means of this unique spatial-temporal corpus.
Pages: 175 to 180
Copyright: Copyright (c) IARIA, 2011
Publication date: January 23, 2011
Published in: conference
ISSN: 2308-4332
ISBN: 978-1-61208-115-1
Location: St. Maarten, The Netherlands Antilles
Dates: from January 23, 2011 to January 28, 2011