Home // eTELEMED 2019, The Eleventh International Conference on eHealth, Telemedicine, and Social Medicine // View article


TCM Named Entity Recognition Based On Character Vector With Bidirectional LSTM-CRF

Authors:
Jigen Luo
Jianqiang Du
Bin Nie
Wangping Xiong
Jia He
Yanyun Yang

Keywords: Named Entity Recognition; Character vector; Bidirectional LSTM-CRF; Chinese Medicine Informatics.

Abstract:
In order to better solve the problem of low accuracy caused by inaccurate word segmentation in the task of extracting Chinese medicine named entities, the extraction technology relies heavily on the characteristics of manual development, and needs guidance of domain knowledge. This paper proposes a named entity recognition in the field of Traditional Chinese Medicine (TCM) Based on character vectors for Bidirectional Long Short Term Memory with a Conditional Random Field (Bidirectional LSTM-CRF). First of all, the model uses the word2vec tool to convert the corpus into a character vector, which can avoid the influence of inaccurate word segmentation in Chinese medicine field on entity recognition; then use Bidirectional LSTM neural network to extract deep features of sentence level, and reduce the workload of manual feature setting in the traditional method; Finally access to the CRF layer, and the Viterbi algorithm is used to dynamically plan the most reasonable tag output of the sentence, and the correlation between the output tags is considered. We use different models to conduct experiments on the TCM corpus. The results show that the model proposed in this paper has a good effect. The F value of the evaluation index on the three types of Chinese medicine, prescription and syndrome type has reached 90%.

Pages: 56 to 60

Copyright: Copyright (c) IARIA, 2019

Publication date: February 24, 2019

Published in: conference

ISSN: 2308-4359

ISBN: 978-1-61208-688-0

Location: Athens, Greece

Dates: from February 24, 2019 to February 28, 2019