Home // MMEDIA 2017, The Ninth International Conferences on Advances in Multimedia // View article
Authors:
David Doukhan
Jean Carrive
Keywords: Speech/music discrimination; Audio segmentation; Convolutional Neural Networks; Music Information Retrieval; Multimedia Indexation
Abstract:
A convolutional neural network architecture, trained with a semi-supervised strategy, is proposed for speech/music classification (SMC) and segmentation (SMS). It is compared to baseline machine learning algorithms on three SMC corpora and demonstrates superior performances, associated to perfect media-level speech recall scores. Evaluation corpora include speech-over-music segments with durations varying between 3 and 30 seconds. Early SMS results are presented. Segmentation errors are associated to musical genres not covered in the training database, and/or with close to speech acoustic properties. These experiments are aimed to help the design of novel speech/music annotated resources and evaluation protocols, suited to TV and radio stream indexation.
Pages: 16 to 19
Copyright: Copyright (c) IARIA, 2017
Publication date: April 23, 2017
Published in: conference
ISSN: 2308-4448
ISBN: 978-1-61208-548-7
Location: Venice, Italy
Dates: from April 23, 2017 to April 27, 2017