The human sense of hearing and its ability to talk are very important means of communication, which are gaining importance for IT-systems. The course objectives are to introduce fundamental models for production, perception and recognition of speech, necessary for the understanding, construction and performance evaluation of IT-systems, which use speech as one of the input/output media.
Kursusindhold:
Course Background: Natural and synthetic speech is becoming increasingly important in IT-systems, where it, among others, is applied in automatic information delivery systems; in reservation and information retrieval systems; in animated movies and cartoon movies; and in teleconferencing systems. Furthermore it is expected to be important in future systems for immersive telepresence connecting geographically distant sites, and networked systems for E-commerce, maintenance, and monitoring.
Course Contents: Models for Speech Production: The human vocal tract. Linear prediction used for parameter estimation. Parameters for the male/female, and child voice.
Models for Speech Perception: The human ear. Frequency analysis and pitch perception. Intensity discrimination. Time/frequency masking. Sound localization and auditory perception. The interaction between visual and auditory information.
Speech Coding, Recognition and Film Animation: Speech coding using the CELP (Code Excited Linear Prediction) algorithms. Principles of MP3 audio coding. Speech recognition using the HMM (Hidden Markov Model) algorithms. Combining audio-visual information for animation of movies and cartoon films. Noise reduction of speech. Performance Evaluation: Estimation of the subjective quality of a speech based system. Future applications in Quality of Service (QoS) measures.
Demonstrations of human ear psychoacoustic properties important for coding of audio and speech.
Hands-on exercises on: Spectral Analysis of Speech. Speech Coding and Synthesis. Speech Recognition.