Improving the vowel classification accuracy using varying signal frame length

Abstract:
The parts of speech influenced by glottal pulse excitation, the vocal tract, and the speaker's lips shape the voiced components of the speech signal. On the other hand, semantic information in speech is primarily shaped by the vocal tract. However, the irregularity of the glottal excitation's periodicity contributes to a significant dispersion of the parameterization coefficients, introducing fluctuations into the amplitude spectrum. This study proposes a technique to mitigate the impact of this irregularity on the feature vector. It involves using a variable signal frame length synchronized with the fundamental period T_0 and averaging amplitude spectra over a single period to minimize noise effects, smooth out the characteristics, and reduce the estimator variance. By utilizing the derived HFCC parameters, statistical models representing individual Polish vowels were created using mixtures of Gaussian distributions. Additionally, the impact of these correction concepts on the classification accuracy of speech frames containing Polish vowels was examined.