Abstract
The world of multimedia is rapidly growing and reshaping. Multimedia has become ubiquitous, and
the tipping point has arrived with the improvements in ease of use and a simple media search.
However, video annotation and content-based search is still a great challange. In this talk I will
present our work towards this goal. Using automatic annotation tools, a multimodal video search
system allows to combine speech, OCR, image-based and visual concepts into a single multimodal
query. Mutual relevance feedback is used to assist an expert user with query formulation.
I will give
an introduction to the NIST TRECVID international benchmark and present our evaluation results.
In the last part of the talk I will present ViaScribe - a project that aims to assist disabled employees
with access to corporate information.
Dr. Arnon Amir is a research staff member in the Multimodal Medical Decision Support project, Computer Science department, IBM Almaden Research Center. His research touches various topics in computer vision and multimedia information retrieval, including video analysis, speech indexing, multimodal video search and efficient video browsing, as well as eye detection and gaze tracking and their applications for human computer interaction. Dr. Amir holds a D.Sc. (1997) and an M.Sc. (1992) in Computer Science from the Technion, Israel Institute of Technology, and a B.Sc. (Magna Cum Laude, 1989) in Computer and Electrical Engineering from the Ben Gurion University. Dr. Amir has coauthored more than sixty refereed technical papers and holds ten US patents. |