ICASSP
30th Anniversary

Tutorial TUT-13: Machine learning in speech and language processing

Instructors

J. Bilmes; University of Washington
P. Haffner; AT&T Laboratories

Time & Location

Saturday, March 19, 13:30 - 16:30, Location: CC: Room 113-B

Abstract

This tutorial will explore new learning approaches that enable the researcher to go beyond traditional HMM and natural language techniques. We distinguish two main paradigms, Graphical Models and Large Margin classifiers/Kernel methods, which will structure this tutorial in two parts. We will also show how these paradigms, which are two branches with the same underlying roots, cover different problems and often complement each other.

1. Graphical models are a promising paradigm for studying both existing and novel techniques in speech and language processing. This part of the tutorial will show how many pattern recognition techniques are often just instances of applying graphical algorithms. This includes (most famously) the Baum-Welch algorithm for HMMs, but graphs and their algorithms generalize many other techniques as well. The tutorial will moreover show that many advanced models proposed for speech and language processing can easily be described by and understood with a graph. This includes segment models, HMM decomposition, speaker adaptation, and some machine translation models. Moreover, the tutorial will survey a number of speech recognition techniques that were born directly out of the graphical-model paradigm, such as Buried Markov models, structural discriminability, and explicit graphical structures for speech recognition. Many of the practical issues in using such models in speech and language will be discussed as well as methods to perform fast inference.

2. The use of classification algorithms for language understanding or tagging was limited by their poor ability to deal with high-dimensional problems and their absence of flexibility. The advent of large margin and regularized techniques such as Support Vector Machines (SVMs), Adaboost and Regularized Maximum Entropy, combined with optimization techniques with a fast and guaranteed convergence, allows effective inference on very large corpora. The second part of this tutorial will provide a unified introduction to these methods and show how they lead to optimization algorithms with very different scaling properties. Kernel methods represent a very powerful tool to extend some of these algorithms to sequences and graphs, in particular, weighted automata kernels. Far from being abstract and computationally complex objects, these kernels can be readily implemented using general weighted automata algorithms that have been extensively used in speech and language applications. Applications such as spoken language understanding and tagging will be covered by our examples, leading to comparisons of learning algorithms on some of the largest corpora available in the industry.

Targeted Audience and Objectives: This tutorial targets students, researchers or engineers looking for learning techniques matching a given problem. Our coverage will be as comprehensive as possible to enable an informed choice:

Unified understanding of the underlying statistical learning theory.
Scope of application and limitations of each learning technique.
Emphasize flexibility to handle exotic data representations and explore new models.
Optimization techniques and software available.
Rigorous comparative results, showing how these techniques scale on very large corpora.
The engineer will be able to assess what is feasible with machine learning, and what type of gains in productivity can be expected.

Prerequisites: Participants are expected to be familiar with basic statistics, linear algebra and some convex optimization. However, we will strive to attach common sense explanations to most of the mathematical concepts presented in this tutorial.

Presenter Information

Jeff A. Bilmes is an Assistant Professor in the Department of Electrical Engineering at the University of Washington, Seattle (and adjunct in Linguistics and also in Computer Science and Engineering). He co-founded the Signal, Speech, and Language Interpretation Laboratory at the University. He received a masters degree from MIT, and a Ph.D. in Computer Science at the University of California, Berkeley. Jeff is the author of the graphical models toolkit (GMTK), and has done much research on both structure learning of and fast probabilistic inference in dynamic Bayesian networks (DBNs). Jeff was a leader of the 2001 Johns Hopkins summer workshop team applying graphical models to speech and language, and has continued to lead in the speech/language community in this endeavor. His primary research lies in statistical graphical models, speech, language and time series processing, human-computer interaction, and probabilistic machine learning. He was a general co-chair for IEEE Automatic Speech Recognition and Understanding 2003, and is a member of IEEE, ACM, and ACL. Additional information is available at: http://ssli.ee.washington.edu/~bilmes

Patrick Haffner received his B.S. degree from Ecole Polytechnique in 1987 and his PhD. degree from Ecole National Supérieure des Télécommunications in 1994 (both are located in Paris, France). Patrick Haffner is a Research Technology Consultant with AT&T Labs-Research. He has been interested in global and large-scale learning techniques for speech and image processing since 1988. He first generalized Time-Delay Neural Networks to continuous speech recognition applications, and unified them with HMMs. He then joined Yann LeCun's team at AT&T labs to work on an architecture that combines Neural Networks and Finite State Transducers for Document recognition. In 1998, he applied statistical learning principles to implement a robust document segmentation algorithm for the DjVu document distribution system: this segmenter still represents the state-of-the-art today. More recently, Patrick Haffner has been working on algorithms and software solutions for speech, language and sequence processing. In particular, with Rational kernels, he demonstrated applications that combine finite state machines (FSM) and machine learning software libraries.