Machine Learning Based Indian raga Identification for Music Therapy

Machine Learning Based Indian raga Identification for Music Therapy

1. Frame Blocking

2. DWT Co-efficient Computation

3. MFCC Computation

c. Cepstrum method of Pitch Estimation

Abstract

Figures

I. Introduction

Ii. Literature Survey

Iii. Proposed Method

1. Frame Blocking

2. DWT Co-efficient Computation

3. MFCC Computation

c. Cepstrum method of Pitch Estimation

Iv. Results And Discussion

V. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1

Status:

Version 1

doi:10.21203/rs.3.rs-1760645/v1

The concept of Music Therapy in India is approached with different terms like Music Therapy or Healing Music, Nada Chikitsa, Nada Yoga, Raga Chikitsa, and Raga Therapy. Out of many languages in the world, which are used to convey information, the language that everyone can understand is Music. All Humans can be influenced by music. Music is one mode of communication that is pleasant to listen and it heals many mental diseases. These powers of healing of Music are rediscovered in modern medicine using modern science. In recent times, treating persons with physical and mental health issues using music therapy is getting more significant. The main objective of this paper is to implement and evaluate the performance of raga identification using a machine learning algorithm and to compare the accuracy of the raga identification system which uses Mel Frequency Cepstral Coefficients (MFCC) as a feature and the system which uses pitch and chroma information along with MFCC features. A machine learning-based algorithm is proposed to identify the Raga recognition for music therapy. This method uses MFCC (Mel Frequency Cepstral Coefficients) feature along with Pitch and Chroma information for feature extraction. KNN method employed for classification of ragas. The proposed approach varies from existing methods where the notes of temporal information were ignored in the pitch-class profile method. The dataset has been collected from Kaggle to evaluate the proposed method. The proposed classification system is designed to identify different ragas (Asvari, Bageshree, Bhairavi, Darbari, and Yaman) in an open-set approach. The performance of the proposed classifier is observed to be 92.34%.

Raga Identification

Pitch Contour

MFCC (Mel Frequency Cepstral Coefficients) and Pitch

Chroma

South Indian Classical which is also called Carnatic and North Indian Classical, also known as Hindustani are the two main divisions of Indian Classical music. There are some basic differences between these two styles. The structure of these styles can be recorded in a raga. “A pattern of notes having characteristic intervals, rhythms, and embellishments, used as a basis for improvisation” – is the definition of Raga. In Western Classical music, we come across many types of scales, we can compare the raga to those types. The concept of raga doesn’t have any direct equivalent in western classical music, but a raga is comparable with a major or minor scale of natural harmonic. Raga is associated with every song. For instance, the Hindustani raga “Kafi” and the Western classical “Dorian” scale have the same note structure: C, D, D#, F, G, A, A#. Ranjani et al. [1] confirmed that prescriptive notations include Raga attributes and enough to perceive a Raga of Carnatic music. The paintings is limited to the notations of seven notes and suppresses the finer observe function information. A dictionary-primarily based totally method is used to captures the facts of repetitive observe styles inside a Raga notation. In all of the above reference authors extracted functions implemented diverse statistics mining algorithms inclusive of KNN, Decision Tree, SVM, Naïve Bayes on it. These Distinct tactics closely depend on function extraction mechanisms. Anand [2], Ross et al. [3], Chowdhuri [4], and Madhusudhan and Chowdhary [5] attempted to broaden a Neural Network for Raga identification. The KNN set of rules works in real-time, i.e., it does now no longer have a schooling section. Due to the supply of the schooling section Decision Tree, SVM, Naïve Bayes is computing quicker withinside the trying out section than KNN however as dataset modifications schooling can be required for them. In the olden days, Music remedy become utilized in Greece`s historical college of the Orphic. Music has powers of healing and prophylactic which had been recognized with the aid of using Aristotle, Plato, and Pythagoras [6-8]. Music Therapy inclusive of Playing at the harp has been used to remedy the infection of King David withinside the Olden days. Many Human diseases were cured using music by the father of modern medicine – Hippocrates. Music was used as an aid to reduce the pain for women during baby delivery in ancient Egypt. The detailed report has been written by an Arabic famous writer - Ibn Sina. Naa Jeevan Dhara in raga Bihari is one of the rendering compositions by Thyagaraja, an Indian Musical Legend, who helped to get back the dead person! A famous physician, Richard Browne once wrote a text which explains the medicinal use of music, which is known as Medicina Musica published in the year 1729 [9]. In Tanjore, Saraswati Mahal Library, there is a book named Raga Chikistsa which describes the healing of different ailments using different ragas – this was mentioned by Dr. Burnell. "Music is a kind of inarticulate, unfathomable speech which leads us to the edge of the infinite and lets us for a moment gaze in that" Carlyle observed this. Fundamentally music is called a nada or a sound that generates specific vibrations and these vibrations travel through the air or any other medium and reach out to the human body to heal [10]. In Sangeet Ratnakar, Sarangdev mentions that music or ahata nada is obtained by striking or aghata by a living being of any type on an instrument. Hence we can consider music as the universal energy or power in the form of ragas.

Automatic raag classification has been discussed and attempted previously. Various methods have been used like pitch-class distribution (PCD), HMM, finite automata modeling of the musical rules, and pitch-class string formation. [11] uses PCDs and pitch-class dyad distributions (PCDDs) as features derived from pitch contours. PCDDs are essentially bi-grams of pitch classes. The pitch contour was detected using the Harmonic Product Spectrum (HPS) algorithm. The dimensionality of the features was reduced to 50 from 156 using PCA and a Multivariate Normal (MVN) classifier was applied. An accuracy of 94% was achieved on a test set of 17 samples. 17 raag classes were used. GTraagDB [12] was used as the dataset. [13] uses HMM and a string matching algorithm to detect raags. The pitch contour was used as the feature which was detected using the autocorrelation method. Only 2 raag classes were used and an accuracy of 77% was achieved on the test set of 31 samples. Improved performance of 10% was achieved with an additional stage that involved identifying catchphrases called pakad in the test sequences [14-15]. The dataset used was collected manually.

In [16] Sharma, Hiteshwari, Bali, Rasmeet S, Raga identification is performed on the four ragas Des, Bhupali, Yaman, and Todi both a vocal and instrumental live performance dataset, and identification is made implemented using height class configuration and classifier histogram machine. For the height layer profile, they get 83.39 and 97.3% accuracy for the histogram. In this article [17] Ekta Patel and Savita Chauhan used the MATLAB toolkit to extract the sheet music functions. A WEKA machine learning engine is used that works on the .arff file format. Bayesian Network, Naive Bayes, Support Vector Machine (SVM), Decision Table, Random Forest Classifier on Bhairav, Yaman, Shankara, Saarang dataset. The situations require mainly complex variables such as pitch and mood in the track, ignoring louder tones, converting different dataset parameters, and Raag. Effects are compared before and after arbitrarization, although in this musical identification by Raag, the accuracy of the likelihood-based classifier is higher. It showed that a probability-based classifier gave correct results. In comparison, Bayesian Net offers better performance. [18] Hiteshwari Sharma and R. S. Bali recognized different key variables for raga classification and soft computational fuzzy set techniques for raga recognition. They used a dataset of five ragas such as Des, Bhupali, Yaman, Todi, and Pahadi with three parameters like time, dirgaswaras, and vadi and they also achieved reasonable accuracy. [19] G. Pandey, C. Mishra, and Paul Ipe introduced the hidden Markov model and pakad correspondence on the dataset of two ragas Bhupali and Yaman kalian. They achieved 77° accuracy with the basic HMM and 87° accuracy with the HMM and Pakad matching method. In [20] Muhammad Asim Ali and Zain Ahmed Siddiqui, their research is based on the automatic classification of musical genres using machine learning. They used algorithms like K Nearest Neighbor (KNN) and Support Vector Machine (SVM) to predict song genres. Using the GTZAN dataset, which covers a range of ten genres, such as blues, hiphop, jazz, classical, metal, reggae, country, pop, disco, and rock, they collected music data. They used a dataset of 1000 songs. The above comparison shows that SVM is a more efficient classifier than KNN. [21] SnigdhaChillara, Kavitha A, Shwetha A Neginhal, Shreya Haldia, Vidyullatha K, proposed to solve the problem of classification and comparison with other models using the small dataset of the Free Music Archive. In it, two types of inputs were given to the models. Where the CNN model uses spectral images and a .csv file for logistic regression and the ANN model uses stored audio features. They got an accuracy of 88.5 using CNN on a spectrum-based model, which is pretty good compared to the various algorithms used by other authors. It usually depends on whether the person is a prepared artist or an unprepared but educated person. Those who have little information about raga cannot recognize raga unless they remember synthesis and raga [22]. One cannot distinguish a raga without comparing the rhythms generated from the previous swala. There is a critical perception of this technique. Even if individuals are virtually unable to convey what a raga is, they can recognize it. This certainty reduces the possibility of a classifier with sufficient information on each raga. The feature extraction process characterizes a set of parameters for various algorithms [23]. Classifiers are used to correlate features obtained from feature extractions. Feature extractions are categorized into corresponding grades to minimize prediction errors [24].

Section II discusses the raga identification system, Section III discusses pre-processing, feature extraction, and implementation using a machine learning algorithm, Section IV discusses the results and discussion and Section V concludes the paper.

A. Dataset

A subset of Kaggle [25] was used for training and testing. Kaggle consists of 31 raags of duration ranging from 2 minutes to 60 minutes. The recordings consists of solo vocals and solo instruments. The percussions and the drone typically heard in raag performances have been removed from the recordings. Each raag is also annotated with its tonic frequency.

The Raga Identification System’s fundamental components are described in Figure1. The testing phase and training phase are the two modules of the Raga Identification System. The film songs are given as the input for this system in the training phase. We have considered two datasets that are having 5 ragas in each of them for the analysis of the results.

Segmentation of the audio song is done into its overlapped frames. The discontinuity can be reduced by allowing the neighboring frames to overlap. Raga Identification Performance is evaluated by considering the two different systems in our approach. One out of thirteen MFCC coefficients is extracted in each and every frame in the first system. Whereas in the second system, one out of twelve MFCC coefficients along with one pitch frequency is extracted in each frame and then they are concatenated.

The extracted features are modeled using the algorithm known as the K-means clustering algorithm. With the cluster size of 256, two sets of data which has 5 ragas in each of the models have been created. Among the various models, to identify the raga, testing has been done for every song in 5 models.

B. Feature extraction and normalization:

In audio signal frame blocking, S(n) is divided into several frames with each frame having N samples in it. Frames that are adjacent to each other are separated by M number of samples. The blocking of frames with the value for M = (1/2) N is illustrated in Figure 4. a. The audio signal sample and input frame sample is shown in Figure 4.b and 4.c.

The first N samples of audio are illustrated in the first frame. After the M number of samples, the second frame will begin and the N-M number of samples will be overlapped.

After 2M number of samples from first frame, the third frame will begin, and N-2M number of samples will be overlapped. Until all the audio signal is considered within one or more frames, this process will continue.

The speech samples present in the database and the input test speech signal are decomposed into Approximate and Detail coefficients using Discrete Wavelet Transform (DWT). Daubachies wavelets have been reported to be highly successful in speech applications among the wavelets family and hence the same is used in this work. Daubachies 4 with 5, Level decomposition (db4, Lev5) are used here. End Point Algorithm is used to detect the beginning and end points of the speech signal and remove the unwanted silence portions.

Mel Frequency Cepstral Coefficients (MFCC) are the regularly used features in speech recognition since Mel scale is the nearest match to the way that human ear perceives the sound. Mel scale has linear frequency spacing below 1 KHz and logarithmic spacing above 1 KHz, which is shown in Fig 5.

MFCC computation using DWT co-efficient consists of the following steps:

Discrete Wavelet Transform co-efficients are obtained.
Spectrum of each frame is calculated using FFT.
The spectral components are passed through Mel Filter bank.
Logarithm of the Mel filter bank filtered spectrum is obtained.
Discrete Cosine Transform is calculated to achieve energy compaction.

The relation between frequency variable and the Mel scale frequency is given by the equation (1)

The pitch estimation can be done by Cepstral analysis. For the pitch estimation, the source of information can be extracted from speech signal by separating the excitation and vocal tract related information. The equation for cepstrum is given by (2) as,

The input sample audio, I/P Audio Signal cepstrum and pitch tracking waveforms are as shown in Figure 4.a- 4.c. In the waveform we can observe that the low frequency region is having all the slowly varying components in the log magnitude and also the high frequency region is having all the fast varying components. In Log Magnitude Spectrum, the vocal tract is represented by the slow varying components, whereas the excitation source is represented by the fast varying components.

The Pitch lag is represented by Cepstrum in terms of "quefrency". By estimating the pitch, the dominant frequency is represented by most energy when there is a lag.

d. Chromagram

In the musical context, chroma function or chromagram is closely related to 12 different pitches. Chroma-based features, also known as pitch class profiles, are a powerful tool for analysing melodies that can be meaningfully categorized into pitches (often in 12 categories) and tuned to approximately the same scale. The main feature of the chroma function is to capture the overtones and melodic quality of the track while withstanding changes in timbre and instruments. The chromatogram is the energy within the 12 semitones (or chroma) of the western music octave, specifically C, C #, D, D #, E, F, F #, G, G #, A, A #, It is a visual representation of and B. Shows a 12-pitch energy distribution.

K Means Clustering

Vector quantification by clustering K-paths from a popular signal for cluster analysis in data mining. Clustering observations on the major moto k clusters of partitioning by k-means, where each observation occurs on average with the cluster, is a model of the cluster. 1-nearest neighbor classification can be applied to cluster centers derived from k-means to classify new data in existing clusters, each observation d-dimensional real vector, k-means of clustering n (set n) set S = {S1, S2…, Sk}. Divide into sk so that the squares in the cluster (WCSS) can be reduced (the sum of the remote functions in the center of cluster K). In other words, its purpose is to find:

Where, the mean of points in S_i is given by μ_i.

Classification using Clustering Algorithm

K-means is easy and can be used for a wide variety of data types; It is sensitive to the initial conditions of cluster centers. The main cluster centroids may not be optimal because the algorithm can convert to local optimal solutions. Empty cluster can be obtained if the points are not assigned to the cluster during the assignment points. It is important to have a good startup cluster center for K-Mean to work properly. A new cluster center initialization algorithm is proposed for the algorithm to introduce cluster centers for K-means. So, the incremental k-means algorithm is as follows. Input: Number of initial groups (M) and target number of groups (K) where M> K.

With the help of K - Means clustering algorithm we compared the system having both Pitch frequencies and MGCC Features with the Raga Identification System’s performance. The audio song has been down sampled to 11025Hz for training purpose. From every frame, the above mentioned features have to be extracted. For each raga, the model is developed with the help of clustering algorithm by using the feature vectors extracted. The clustering algorithm medialization is done for the cluster size of 256. Each raga’s model parameter has been trained. During the phase of testing, the test song feature vectors are divided into 72 test segments of 100 vectors with overlapping of 90 vectors. The measurement of the distance between centroid and each vector is considered; out of which the minimum distance is extracted. All the models have to undergo this step. For each model the Mean of minimum distances has to be computed. By selecting the minimum averages producing model, we can classify the Raga. Each Raga’s Accuracy is given by

TABLE 1: CONFUSION MATRIX FOR ACCURACY RATE OF RAGA IDENTIFICATION WITH MFCC AND PITCH FEATURES

The overall accuracy in the results got for identification of raga with the help of both Pitch frequencies and MFCC feature is 92%. Figure 11 shows the graphical analysis of accuracy for various raagas for recognition.

Table 2 : Raaga recognition accuracy using different feature with KNN classifier

Sl. No	Features	Accuracy
1	MFCC	62
2	MFCC, Jitter, Shimmer	57.29
3	MFCC, Jitter, Shimmer and ZCR	57.78
4	Rhythm and Timbre	65.33
5	MFCC, Rhythm and Timbre	62.81
6	Pitch and Energy	77.88
7	MFCC and Pitch	72.86
8	Pitch	75.87
9	MFCC, Pitch and Chroma (proposed work)	92

Table 3: Raaga identification for music therapy

Ragas	Cure of Disease
Asavari-adana	To build confidence – Low BP
Darbari	Intestinal Gas Diarrhoea Constipation
Bihag	Sonorous sleep
Durga	Pleasing effect on Nerves
Bageshri	Insomnia
Yaman	Relaxation & Easing Tension

Table 3 gives about which raga is suitable for cure the disease. Based on the disease to cure, a raga can be used. Therefore, raga music can be used to achieve a more positive state of mind and keep depression and anxiety away. This helps prevent stress responses from causing havoc to the body, helps keep creativity and optimism at a higher level, and has many other benefits.

In our work, first, we segmented the input raaga into frames that are overlapping. The clustering algorithm is used to develop and store the models by extracting the features of each frame. The testing and analysis of the results of independent test songs will be done in the testing phase. The results show that the proposed algorithm where we used both Pitch Feature and MFCC together along with chroma features has better accuracy than the system designed using MFCC alone and MFCC-Pitch alone. Future studies with large sample sizes, including both males and females, may help to better understand the effects of music therapy on important physiological parameters such as heart rate and respiratory rate.

Funding Declaration:

This work is supported by GSSS institute of Engineering for women, Mysuru, India.

Author Contributions:

Anitha K found the problem and designed the raag identification for music therapy and also implemented the proposed scheme. Parameshachari B D conducted the experiments to evaluate the performance of the IETD and wrote the paper.

Data Availability:

Datasets for this research are collected from the reference [26].

Competing Interest:

The authors declare that they have no competing of interest.

Ranjani H, G., & Sreenivas, T. V. (2017). Raga identification using repetitive note patterns from prescriptive notations of Carnatic music. arXiv e-prints, arXiv-1711.
Anand, Ankit. "Raga identification using convolutional neural network." In 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), pp. 1-6. IEEE, 2019.
Ross, Joe Cheri, Rudra Murthy, Kaustuv Kanti Ganguli, and Pushpak Bhattacharyya. "Identifying raga similarity in hindustani classical music through distributed representation of raga names." In Proceedings of the 13th International Symposium on CMMR. 2017.
Chowdhuri, Sauhaarda. "Phononet: multi-stage deep neural networks for raga identification in hindustani classical music." In Proceedings of the 2019 on international conference on multimedia retrieval, pp. 197-201. 2019.
Madhusudhan, Sathwik Tejaswi, and Girish Chowdhary. "Deepsrgm-sequence classification and ranking in Indian classical music with deep learning." In Proceedings of the 20th International Society for Music Information Retrieval Conference, pp. 533-540. 2019.
Silverman, Michael J. "Psychiatric patients' perception of music therapy and other psychoeducational programming." Journal of Music Therapy 43, no. 2 (2006): 111-122.
Pelletier, Cori L. "The effect of music on decreasing arousal due to stress: A meta-analysis." Journal of music therapy 41, no. 3 (2004): 192-214.
Leardi, Sergio, Renato Pietroletti, G. Angeloni, Stefano Necozione, G. Ranalletta, and B. Del Gusto. "Randomized clinical trial examining the effect of music therapy in stress response to day surgery." Journal of British Surgery 94, no. 8 (2007): 943-947.
Bodner, Ehud, Iulian Iancu, Avi Gilboa, Amiram Sarel, Avi Mazor, and Dorit Amir. "Finding words for emotions: The reactions of patients with major depressive disorder towards various musical excerpts." The Arts in Psychotherapy 34, no. 2 (2007): 142-150.
Montello, Louise, and Edgar E. Coons. "Effects of active versus passive group music therapy on preadolescents with emotional, learning, and behavioral disorders." Journal of music therapy 35, no. 1 (1998): 49-67.
Field, Tiffany, Alex Martinez, Thomas Nawrocki, and Jeffrey Pickens. "Music shifts frontal EEG in depressed adolescents." Adolescence 33, no. 129 (1998): 109.
GTraagDB arranged by Parag Chordia, http://prernagupta.com/parag/data.html
Chordia, Parag. "Automatic raag classification of pitch-tracked performances using pitch-class and pitch-class dyad distributions." In ICMC. 2006.
Prasad Reddy, P. V. G. D., B. Tarakeswara Rao, and K. R. Sudha. "Automatic Raaga identification system for Carnatic music using hidden Markov model." Global Journal of Computer Science and Technology (2012).
Pandey, Gaurav, Chaitanya Mishra, and Paul Ipe. "TANSEN: A System for Automatic Raga Identification." In IICAI, pp. 1350-1363. 2003.
Joshi, Dipti, Jyoti Pareek, and Pushkar Ambatkar. "Indian Classical Raga Identification using Machine Learning." In International Semantic Intelligence Conference. 2021.
Sharma, Hiteshwari, and Rasmeet S. Bali. "Comparison of ML classifiers for Raga recognition." International Journal of Scientific and Research Publications 5, no. 10 (2015).
Patel, Ekta, and Savita Chauhan. "Raag detection in music using supervised machine learning approach." International Journal of Advanced Technology and Engineering Exploration 4, no. 29 (2017): 58.
Sharma, Hiteshwari, and Rasmeet S. Bali. "Raga identification of hindustani music using soft computing techniques." In 2014 Recent Advances in Engineering and Computational Sciences (RAECS), pp. 1-6. IEEE, 2014.
Pandey, Gaurav, Chaitanya Mishra, and Paul Ipe. "TANSEN: A System for Automatic Raga Identification." In IICAI, pp. 1350-1363. 2003.
Ali, Muhammad Asim, and Zain Ahmed Siddiqui. "Automatic music genres classification using machine learning." International Journal of Advanced Computer Science and Applications (IJACSA) 8, no. 8 (2017): 337-344.
S. Chillara, A. S. Kavitha, S. A. Neginhal, S. Haldia, and K. S. Vidyullatha, “Music Genre Classification using Machine Learning Algorithms : A comparison,” no. May, pp. 851–858, 2019.
S. Shetty, and S. Hegde, Automatic classification of carnatic music instruments using MFCC and LPC. In Data Management, Analytics and Innovation pp. 463-474, 2020. https://doi.org/10.1007/978-981-32- 9949-8_32
B. Mor, S. Garhwal, A. and Kumar, MIMVOGUE: modeling Indian music using a variable order gapped HMM. Multimedia Tools and Applications, vol. 80, no. 10, pp.14853-14866, 2021. https://doi.org/10.1007/s11042-020-10303-y
S. Anu, K. Muthukkumaran, M. Punniyamoorthy, S.A. Veerapandian, and G. Sangeetha, A methodology for the transformation of architectural forms into music and vice-versa for the enhancement of the musical and architectural libraries. Multimedia Tools and Applications, vol. 80, no. 7, pp. 10901-10926, 2021. https://doi.org/10.1007/s11042-020-10201-3.
https://www.kaggle.com/kcwaghmarewaghmare/indian-music-raga?select.

No competing interests reported.

Accuracy (%)