Donald Metzler
Donald Metzler
Senior Software Engineer
Google Inc.
340 Main St.
Venice, CA 90291
lastname AT google DOT com
Teaching / Tutorials
Tutorial Presenter, Probabilistic Models for Information Retrieval (Full Day), SIGIR 2009
[ Part I Slides - Part II Slides ]
Publications
[ Complete Publication List - Collaborators - Google Scholar - DBLP ]
Search Engines: Information Retrieval in Practice [ Web Site - Publisher - Amazon ]
W. Bruce Croft, Donald Metzler, and Trevor Strohman
Search Engines: Information Retrieval in Practice is designed to give
undergraduate students the understanding and tools they need to
evaluate, compare and modify search engines. The book covers the
important issues in IR at a level appropriate for undergraduate
computer science or computer engineering majors. Key mathematical
models are included. The programming exercises in the book make
extensive use of Galago, a Java-based open source search engine.
A Feature-Centric View of Information Retrieval [ Publisher - Amazon ]
Donald Metzler
In a shift away from heuristic, hand-tuned ranking functions and
complex probabilistic models, this book presents feature-based
retrieval models. The Markov random field model detailed goes
beyond the traditional yet ill-suited bag of words assumption in two
ways. First, the model can easily exploit various types of dependencies
that exist between query terms, eliminating the term independence
assumption that often accompanies bag of words models. Second,
arbitrary textual or non-textual features can be used within the model.
Combining term dependencies and arbitrary features results in a very
robust, powerful retrieval model capable of obtaining state-of-the-art
effectiveness across a wide range of tasks and data sets.
Selected publications:
Metzler, D. and Croft, W.B., "A Markov Random Field Model for Term Dependencies," Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2005), 472-479, 2005. [pdf][slides][Best Student Paper]
Strohman, T., Metzler, D., Turtle, H., Croft, W.B., "Indri: A language model-based search engine for complex queries" in the online Proceedings of the International Conference on Intelligence Analysis. [pdf]
Metzler, D. and Croft, W.B., "Combining the Language Model and Inference Network Approaches to Retrieval," Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval, 40(5), 735-750, 2004. [pdf]
Metzler, D., Dumais, S., and Meek, C. "Similarity Measures for Short Segments of Text," in the Proceedings of the 29th European Conference on Information Retrieval (ECIR 2007), 16-27, 2007. [pdf][slides]
Diaz, F., and Metzler, D. "Improving the Estimation of Relevance Models Using Large External Corpora," in the Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), 154-161, 2006. [pdf][slides]
Last updated: July 16, 2012
Software
I was involved, to various extents, in the research and development of the following software:
• Mavuno - Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.
• Ivory - Ivory is a Hadoop-based retrieval toolkit that supports various retrieval models, including the Markov Random Field model for IR. The project is a joint effort with Jimmy Lin and his colleagues at the University of Maryland.
Program / Organizing Committees
• Program Chair: OAIR 2013
• Area Chair / Senior PC: CIKM 2012, SIGIR 2012, AAAI 2012, WWW 2011, SIGIR 2011
• Editorial board: Transactions on Information Systems (TOIS), Information Retrieval Journal (IRJ)
• Poster Co-Chair: SIGIR 2009
• Workshop Co-Chair: SIGIR 2011 Workshop on Social Web Search and Mining: Analysis of User Generated Content Under Crisis
• Best Paper Award Committee Co-Chair: ICTIR 2011
• Publicity Chair: CIKM 2011
Awards and Honors
• Honorable Mention for Best Paper, SIGIR 2011
• Short-listed for Best Interdisciplinary Paper, CIKM 2010
• Microsoft Live Labs Graduate Fellowship, 2006-2007
• Best Student Paper, SIGIR 2005