Ph.D. Thesis Proposal: Human Utterance Classi cation: Speech, Song and Incoherent Mumbling


This paper is a proposal to do research work on features, techniques and strategies for human utterance classi cation (assigning individual human vocal utterances to one (or more) of a group of possible classes including speech, song, and others). This is a sub-problem of audio signal classi cation (assigning individual sounds to one (or more) of a group of possible classes including music, human speech, noise, environmental, and other classes), and is useful in such applications as speech recognition and multimedia databases. I will present a motivation for investigating human utterance classi cation, speci cally distinguishing between speech and singing, as well as a motivation for my proposed approach to this problem, which is based on extraction and fuzzy clustering of problem-speci c features. Part of the proposed work is to determine which features are most appropriate for the problem. Currently, I have implemented and investigated feature extraction based on instantaneous measurement of pitch and energy. The next stages will include development of other feature extractors based on value tracking of pitch and energy, as well as bandwidth, rhythm, rhyme and other information, development of appropriate grouping schemes and evaluation techniques, and expansion of the current data corpus.


0 Figures and Tables

    Download Full PDF Version (Non-Commercial Use)