author:"Gong, Rong" | Pollux - Fachinformationsdienst Politikwissenschaft

Comunicació presentada a: 5th China Conference on Sound and Music Technology - Chinese Traditional Music Technology Session celebrada el 21 de novembre de 2017 a Suzhou, Xina. ; Music Information Retrieval (MIR) technologies have been proven useful in assisting western classical singing training. Jingju (also known as Beijing or Peking opera) singing is different from western singing in terms of most of the perceptual dimensions, and the trainees are taught by using mouth/heart method. In this paper, we first present the training method used in the professional jingju training classroom scenario and show the potential benefits of introducing the MIR technologies into the training process. The main part of this paper dedicates to identify the potential MIR technologies for jingju singing training. To this intent, we answer the question: how the jingju singing tutors and trainees value the importance of each jingju musical dimension—intonation, rhythm, loudness, tone quality and pronunciation? This is done by (i) classifying the classroom singing practices, tutor's verbal feedbacks into these 5 dimensions, (ii) surveying the trainees. Then, with the help of the music signal analysis, a finer inspection on the classroom practice recording examples reveals the detailed elements in the training process. Finally, based on the above analysis, several potential MIR technologies are identified and would be useful for the jingju singing training. ; This by the European Research Council under the European Union's Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583).

Open Access

BASE

Exportieren

Open Access#4

Audio to score matching by combining phonetic and duration information

Gong, Rong; Pons Puig, Jordi; Serra, Xavier

Comunicació presentada a la 18th International Society for Music Information Retrieval Conference (ISMIR 2017), celebrada els dies 23 a 27 d'octubre de 2017 a Suzhou, Xina. ; We approach the singing phrase audio to score matching problem by using phonetic and duration information – with a focus on studying the jingju a cappella singing case. We argue that, due to the existence of a basic melodic contour for each mode in jingju music, only using melodic information (such as pitch contour) will result in an ambiguous matching. This leads us to propose a matching approach based on the use of phonetic and duration information. Phonetic information is extracted with an acoustic model shaped with our data, and duration information is considered with the Hidden Markov Models (HMMs) variants we investigate. We build a model for each lyric path in our scores and we achieve the matching by ranking the posterior probabilities of the decoded most likely state sequences. Three acoustic models are investigated: (i) convolutional neural networks (CNNs), (ii) deep neural networks (DNNs) and (iii) Gaussian mixture models (GMMs). Also, two duration models are compared: (i) hidden semi-Markov model (HSMM) and (ii) post-processor duration model. Results show that CNNs perform better in our (small) audio dataset and also that HSMM outperforms the post-processor duration model. ; This work is partially supported by the Maria de Maeztu Programme (MDM-2015-0502) and by the European Research Council under the European Union's Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583).

Open Access

BASE

Exportieren

Open Access#5

Score-informed syllable segmentation for a cappella singing voice with convolutional neural networks

Pons Puig, Jordi; Gong, Rong; Serra, Xavier

Comunicació presentada a la 18th International Society for Music Information Retrieval Conference (ISMIR 2017), celebrada els dies 23 a 27 d'octubre de 2017 a Suzhou, Xina. ; This paper introduces a new score-informed method for the segmentation of jingju a cappella singing phrase into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated syllable onset detection function (ODF) and its score. Throughout the paper, we first examine the jingju syllables structure and propose a definition of the term "syllable onset". Then, we identify which are the challenges that jingju a cappella singing poses. Further, we investigate how to improve the syllable ODF estimation with convolutional neural networks (CNNs). We propose a novel CNN architecture that allows to efficiently capture different timefrequency scales for estimating syllable onsets. Besides, we propose using a score-informed Viterbi algorithm – instead of thresholding the onset function–, because the available musical knowledge we have (the score) can be used to inform the Viterbi algorithm to overcome the identified challenges. The proposed method outperforms the state-of-the-art in syllable segmentation for jingju a cappella singing. We further provide an analysis of the segmentation errors which points possible research directions. ; This work is partially supported by the Maria de Maeztu Programme (MDM-2015-0502) and the European Research Council under the European Union's Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583).

Open Access

BASE

Exportieren

Open Access#6

Creating an a cappella singing audio dataset for automatic Jingju singing evaluation research

Gong, Rong; Caro Repetto, Rafael; Serra, Xavier

Comunicació presentada al 4th International Workshop on Digital Libraries for Musicology celebrat el 28 d'octubre de 2017 a Shanghai, Xina. ; e data-driven computational research on automatic jingju (also known as Beijing or Peking opera) singing evaluation lacks a suitable and comprehensive a cappella singing audio dataset. In this work, we present an a cappella singing audio dataset which consists of 120 arias, accounting for 1265 melodic lines. is dataset is also an extension our existing CompMusic jingju corpus. Both professional and amateur singers were invited to the dataset recording sessions, and the most common jingju musical elements have been covered. is dataset is also accompanied by metadata per aria and melodic line annotated for automatic singing evaluation research purpose. All the gathered data is openly available online. ; This research was funded by the European Research Council under the European Union's Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583).

Open Access

BASE

Exportieren

Open Access#7

Comparision of the singing style of two jingju schools

Caro Repetto, Rafael; Gong, Rong; Kroher, Nadine; Serra, Xavier

Comunicació presentada a la 16th International Society for Music Information Retrieval Conference (ISMIR 2015), celebrada els dies 26 a 30 d'octubre de 2015 a Màlaga, Espanya. ; Performing schools (liupai) in jingju (also known as Peking or Beijing opera) are one of the most important elements for the appreciation of this genre among connoisseurs. In the current paper, we study the potential of MIR techniques for supporting and enhancing musicological descriptions of the singing style of two of the most renowned jingju schools for the dan role-type, namely Mei and Cheng schools. To this aim, from the characteristics commonly used for describing singing style in musicological literature, we have selected those that can be studied using standard audio features. We have selected eight recordings from our jingju music research corpus and have applied current algorithms for the measurement of the selected features. Obtained results support the descriptions from musicological sources in all cases but one, and also add precision to them by providing specific measurements. Besides, our methodology suggests some characteristics not accounted for in our musicological sources. Finally, we discuss the need for engaging jingju experts in our future research and applying this approach for musicological and educational purposes as a way of better validating our methodology. ; This research is funded by the European Research Council under the European Union's Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583).

Open Access

BASE

Exportieren

Open Access#8

Score-informed syllable segmentation for Jingju a cappella singing voice with Mel-frequency intensity profiles

Gong, Rong; Obin, Nicolas; Dzhambazov, Georgi Bogomilov; Serra, Xavier

Comunicació presentada al International Workshop on Folk Music Analysis, celebrat del 14 al 16 de juny a Màlaga, Espanya. ; This paper introduces a new unsupervised and score-informed method for the segmentation of singing voice into syllables. The main idea of the proposed method is to detect the syllable onset on a probability density function by incorporating a priori syllable duration derived from the score. Firstly, intensity profiles are used to exploit the characteristics of singing voice depending on the Mel-frequency regions. Then, the syllable onset probability density function is obtained by selecting candidates over the intensity profiles and weighted for the purpose of emphasizing the onset regions. Finally, the syllable duration distribution shaped by the score is incorporated into Viterbi decoding to determine the optimal sequence of onset time positions. The proposed method outperforms conventional methods for the segmentation of syllable on a jingju (also known as Peking or Beijing opera) a cappella dataset. An analysis is conducted on precision errors to provide direction for future improvement. ; This work is partly supported by the European Research Council under the European Union's Seventh Framework Program, as part of the CompMusic project (ERC grand agreement 267583).

Open Access

BASE

Exportieren

Open Access#9

Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks

Fonseca, Eduardo; Gong, Rong; Bogdanov, Dmitry; Slizovskaia, Olga; Gómez Gutiérrez, Emilia, 1975-; Serra, Xavier

Comunicació presentada al Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017), celebrat el dia 16 de novembre de 2017 a Munic, Alemanya. ; This work describes our contribution to the acoustic scene classifi- cation task of the DCASE 2017 challenge. We propose a system that consists of the ensemble of two methods of different nature: a feature engineering approach, where a collection of hand-crafted features is input to a Gradient Boosting Machine, and another approach based on learning representations from data, where log-scaled melspectrograms are input to a Convolutional Neural Network. This CNN is designed with multiple filter shapes in the first layer. We use a simple late fusion strategy to combine both methods. We report classification accuracy of each method alone and the ensemble system on the provided cross-validation setup of TUT Acoustic Scenes 2017 dataset. The proposed system outperforms each of its component methods and improves the provided baseline system by 8.2%. ; This work is partially supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No 688382 "AudioCommons", and the European Research Council under the European Union's Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583), and the Spanish Ministry of Economy and Competitiveness under the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502).

Open Access

BASE

Exportieren

Suchergebnisse

Filter

Format

Medientyp

Sprache

Jahre

Kontakt

Hilfe