Home // DBKDA 2011, The Third International Conference on Advances in Databases, Knowledge, and Data Applications // View article


From Synchronous Corpus to Monitoring Corpus, LIVAC: The Chinese Case

Authors:
Benjamin K. Tsou
Andy C. Chin
Oi Yee Kwong

Keywords: monitoring corpus; synchronous corpus; homothematic coprus; LIVAC; the Chinese language

Abstract:
Very large corpora of properly processed textual materials are uncommon but they can provide important resources for language modeling in natural language processing, ranging from speech processing and text input to automatic IR and patent translation. However, when properly cultivated in spatial-temporal terms, they can foster innovative knowledge discovery in database applications by functioning as monitoring corpus and enhance the human centered communication environment by allowing more substantive introspection and comparison of linguistic and social-cultural developments of the relevant speech communities. This paper discusses how the gigantic synchronous and homothematic corpus of Chinese, LIVAC, can contribute to the monitoring the linguistic homogeneity and heterogeneity diachronically and synchronically. After processing media texts of more than 400 million Chinese characters over 16 years, LIVAC has yielded a lexical corpus of 1.5 million words. This paper examines some aspects of the nature and extent of lexical and morphological divergence and convergence in the Chinese language of Hong Kong, Taipei and Beijing. Additional discussions cover creation and relexification of neologisms, categorial fluidity and the associated challenges to terminology standardization, such as renditions of non-Chinese personal names. This paper also explores how the associated socio-cultural developments can be fruitfully monitored by means of this unique spatial-temporal corpus.

Pages: 175 to 180

Copyright: Copyright (c) IARIA, 2011

Publication date: January 23, 2011

Published in: conference

ISSN: 2308-4332

ISBN: 978-1-61208-115-1

Location: St. Maarten, The Netherlands Antilles

Dates: from January 23, 2011 to January 28, 2011