Home // DBKDA 2025, The Seventeenth International Conference on Advances in Databases, Knowledge, and Data Applications // View article


Visualizing Proximity of Audio Signals from Different Musical Instruments - A Two Step Approach

Authors:
Goutam Chakraborty
Cedric Bornand
Lokesh Reddy
Subhash Molaka
Pawan Reddy
Lakshman Patti

Keywords: MFCC; STFT; Spectrogram; CNN; U-Net; t-SNE; and UMAP.

Abstract:
We perceive music from various perspectives - the melody, the rhythm, the emotions or passions they evoke, the richness of sound, and how it correlates with the time of the day (like Morning Raga) or with seasons (like Vivaldi’s Four Seasons). This is a multimodal classification challenge for which correct data annotation is a difficult issue. In this work, we propose a method for visualizing audio signals from various musical instruments to identify their variances and quantify their similarities and distances. The appropriate tools (algorithms) for this task were identified by experimental analysis. The work is conducted in two stages: the first is audio feature extraction and compression, and the second is the projection of high-dimensional audio features on a two-dimensional plane using various unsupervised visualization techniques. The aim is to determine which feature compression and visualization tools can produce clearly separated clusters of audio signals. The features of the STFT spectrogram extracted using CNN provide the best compressed representations, which are better visualized using t-SNE and UMAP techniques, achieving silhouette scores of 84% and 81%, respectively. The STFT spectrogram features are compressed more effectively using UNet, resulting in improved cluster visualization with t-SNE, UMAP, and even with PCA, with silhouette scores of around 75%.

Pages: 25 to 31

Copyright: Copyright (c) IARIA, 2025

Publication date: March 9, 2025

Published in: conference

ISSN: 2308-4332

ISBN: 978-1-68558-244-9

Location: Lisbon, Portugal

Dates: from March 9, 2025 to March 13, 2025