ICASSP
30th Anniversary

Human Language Technology: Applications and Challenges for Speech Processing

Organizers: M. Ostendorf, U. Washington; E. Shriberg, SRI International & ICSI; A. Stolcke, SRI International & ICSI

Over the last decade, the speech processing and natural language processing communities have developed largely independently, though many of the algorithms stem from the same fundamental theory. As speech recognition improves, the possibility for language processing of "spoken documents" is of increasing interest, raising the need to bridge this gap. One step needed is a richer representation of audio that includes speaker and structural information in addition to word transcriptions, promulgated as "Rich Transcription" by the DARPA EARS program. The goal of this Special Session is to further advance this interdisciplinary effort by outlining specific ways in which speech technology can influence downstream natural language technology.

The session will introduce speech researchers to the many interesting and useful downstream applications that take speech recognition as input. In turn, HLT researchers will have an opportunity to help direct speech engineers toward these particular problems. Finally, the session will bring to light a wealth of shared algorithmic methods that could be useful in both fields, and where cross-fertilization is likely to provide mutual benefits. Such shared techniques include statistical language modeling, dimensionality reduction techniques, and machine learning techniques in general.

Overview lecture:
Title: Human language technology: Opportunities & challenges
Authors: M. Ostendorf, U. Washington; E. Shriberg, SRI International & ICSI; A. Stolcke, SRI International & ICSI

Regular lectures:

Title: Approaches and applications of speech diarization
Authors: D.A. Reynolds and P.A. Torres-Carrasquillo (MIT-LL)
Title: Structural metadata research in the EARS program
Authors: Y. Liu (ICSI), E. Shriberg (SRI/ICSI), A. Stolcke (SRI/ICSI), B. Peskin (ICSI), J. Ang (ICSI), D. Hillard (UW), M. Ostendorf (UW), M. Tomalin (Cambridge U), M. Harper (Purdue)
Title: Metadata and parsing
Authors: M. Lease, E. Charniak and M. Johnson (Brown)
Title: Machine translation
Authors: D. Marcu and K. Knight (ISI/USC)
Title: Information extraction
Authors: R. Weischedel and L. Ramshaw (BBN)
Title: Summarization
Authors: K. McKeown and J. Hirschberg (Columbia)
Title: Real-world audio indexing systems
Authors: J.M. Van Thong, B. Logan and D. Goddeau (HP/CRL)
Title: Combining text and audio-visual features in video retrieval and topic clustering
Authors: S.-F. Chang (Columbia), R. Manmatha (UMass) and T.-S. Chua (NUS Singapore)
Title: Interactive dialog systems
Authors: Y. Gao et al. (IBM)
Title: Measuring human readability of machine generated text
Authors: D. Jones (MIT-LL) and T. Gibson (MIT)
Title: User-centered evaluation for machine translation of spoken language
Authors: D. Palmer (Virage)