2004 Fiscal Year Final Research Report Summary
Music Information Processing Using Continuous Speech Recognition Methods
Project/Area Number |
14380156
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
SAGAYAMA Shigeki The University of Tokyo, Graduate School of Information Science and Technology, Professor, 大学院・情報理工学系研究科, 教授 (00303321)
|
Co-Investigator(Kenkyū-buntansha) |
SHINODA Koichi The University of Tokyo, Graduate School of Information Science and Technology, Associate Professor, 大学院・情報理工学系研究科, 助教授 (10343097)
TABARU Tetsuya The University of Tokyo, Graduate School of Information Science and Technology, Research Associate, 大学院・情報理工学系研究科, 科学技術振興特任教員(助手担当) (90272393)
NISHIMOTO Takuya The University of Tokyo, Graduate School of Information Science and Technology, Research Associate, 大学院・情報理工学系研究科, 助手 (80283696)
|
Project Period (FY) |
2002 – 2004
|
Keywords | continuous speech recognition / music information processing / rhythm recognition / automatic harmonization / automatic counterpoint / harmonic clustering / specmurt |
Research Abstract |
We formulated music rhythm recognition for ranscribing MIDI data into music score as a Viterbi path search problem in HMM where hidden states and output probabilities represent the intended note values and actually played note lengths, respectively. We also solved rhythm recognition of polyphonic music by reducing polyphony intomonophony. Tempo modeling and tempo change detection were enabled with segmental k-means algorithm for speech recognition. Harmonization (chord finding) of given melodies was formulated as an isomorphic problem as continuous speech recognition by defining output by the given melody, hidden states by the chord behind the melody and stochastic language model by chord sequences. Automatic counterpoint was developed with a two-step maximum likelihood approach consisting of rhythm design and pitch allocation solved by dynamic programming. In polyphonic signal analysis, an algorithm named Harmonic-structured Clustering was developed based on the k-means clustering algorithm under harmonic constraint by modeling the framewise observed spectrum as overlapped harmonic structures and considering that the distributed energy in harmonic structure belongs to a single cluster. Furthermore, by introducing the probabilistic assignment to clusters, k-means was generalized into the EM-algorithm and attained higher performance of multi-pitch estimation. Utilizing an information criterion such as AIC, the number of sources and octave location were also enabled. "Specmurt analysis" was proposed for polyphonic signal analysis. The inverse Fourier transform of linear spectrum with log-frequency was called "specmurt". Along log-scaled frequency, observed linear spectrum is regarded as convolution of distribution density of fundamental frequencies and harmonic structures of multiple tones which are assumed identical. This idea opened up a new signal processing capabilities.
|
Research Products
(76 results)