Speaker Labelling Using Closed -Captioning

Yamamuro, Keita; Itou, Katunobu

Home // CONTENT 2011, The Third International Conference on Creative Content Technologies // View article

Speaker Labelling Using Closed -Captioning

Authors:
Keita Yamamuro
Katunobu Itou

Keywords: Mobile security; Biometric security; speech processing; Speaker verification system; Gaussian mixture model.

Abstract:
There has recently been much research on annotation systems for television broadcasting because of interest in retrieving highlights from television programs. However, most of the methods developed have specialized in only one genre. Therefore, in this study we targeted three genres drama, animation, and variety and developed a system of annotating indexical information through metadata obtained from television captions. Specifically, the information from the captions is used to create a phoneme HMM that is then used for speaker identification. The proposed system selects the most appropriate phonemic model from several candidate models based on the Bayesian information criterion (BIC) of likelihood and data. Characters in 70 television programs were identified with a recognition accuracy of 39.6%. Television captioning can already identify about 50.0% of the characters in a show, and when we combined captioning with the proposed system, 70.0-80.0% of the utterances in one program were correctly identified.

Pages: 38 to 42

Copyright: Copyright (c) IARIA, 2011

Publication date: September 25, 2011

Published in: conference

ISSN: 2308-4162

ISBN: 978-1-61208-157-1

Location: Rome, Italy

Dates: from September 25, 2011 to September 30, 2011