Donald Metzler

Senior Software Engineer

Google Inc.

340 Main St.

Venice, CA 90291

lastname AT google DOT com



[ Brief Bio - CV - LinkedIn ]

Interests: Information Retrieval, Natural Language Processing, Social Media, Web Search, Computational Advertising, Large-Scale Text Mining

Teaching / Tutorials

Tutorial Presenter, Probabilistic Models for Information Retrieval (Full Day), SIGIR 2009

[ Part I Slides - Part II Slides ]


[ Complete Publication List - Collaborators - Google Scholar - DBLP ]

Search Engines: Information Retrieval in Practice [ Web Site - Publisher - Amazon ]

W. Bruce Croft, Donald Metzler, and Trevor Strohman

Search Engines: Information Retrieval in Practice is designed to give

undergraduate students the understanding and tools they need to

evaluate, compare and  modify search engines. The book covers the

important issues in IR at a  level appropriate for undergraduate

computer science or computer  engineering majors. Key mathematical

models are included. The programming exercises in the book make

extensive use of Galago, a Java-based open source search engine.

A Feature-Centric View of Information Retrieval [ Publisher - Amazon ]

Donald Metzler

In a shift away from heuristic, hand-tuned ranking functions and

complex probabilistic models, this book presents feature-based

retrieval models. The Markov random field model detailed goes

beyond the traditional yet ill-suited bag of words assumption in two

ways. First, the model can easily exploit various types of dependencies

that exist between query terms, eliminating the term independence

assumption that often accompanies bag of words models. Second,

arbitrary textual or non-textual features can be used within the model.

Combining term dependencies and arbitrary features results in a very

robust, powerful retrieval model capable of obtaining state-of-the-art

effectiveness across a wide range of tasks and data sets.

Selected publications:

Metzler, D. and Croft, W.B., "A Markov Random Field Model for Term Dependencies," Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2005), 472-479, 2005. [pdf][slides][Best Student Paper]

Strohman, T., Metzler, D., Turtle, H., Croft, W.B., "Indri: A language model-based search engine for complex queries" in the online Proceedings of the International Conference on Intelligence Analysis. [pdf]

Metzler, D. and Croft, W.B., "Combining the Language Model and Inference Network Approaches to Retrieval," Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval, 40(5), 735-750, 2004. [pdf]

Metzler, D., Dumais, S., and Meek, C. "Similarity Measures for Short Segments of Text," in the Proceedings of the 29th European Conference on Information Retrieval (ECIR 2007), 16-27, 2007. [pdf][slides]

Diaz, F., and Metzler, D. "Improving the Estimation of Relevance Models Using Large External Corpora," in the Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), 154-161, 2006. [pdf][slides]

Last updated: July 16, 2012


I was involved, to various extents, in the research and development of the following software:

  1. Mavuno - Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.

  1. Ivory - Ivory is a Hadoop-based retrieval toolkit that supports various retrieval models, including the Markov Random Field model for IR. The project is a joint effort with Jimmy Lin and his colleagues at the University of Maryland.

  1. Indri - Indri is an efficient, scalable search engine with a robust query language. The project synthesizes and enhances the Lemur and Inquery search tools.

Program / Organizing Committees

  1. Program Chair: OAIR 2013

  2. Area Chair / Senior PC: CIKM 2012, SIGIR 2012, AAAI 2012, WWW 2011, SIGIR 2011

  3. Editorial board: Transactions on Information Systems (TOIS), Information Retrieval Journal (IRJ)

  4. Poster Co-Chair: SIGIR 2009

  5. Workshop Co-Chair: SIGIR 2011 Workshop on Social Web Search and Mining: Analysis of User Generated Content Under Crisis

  6. Best Paper Award Committee Co-Chair: ICTIR 2011

  7. Publicity Chair: CIKM 2011

Awards and Honors

  1. Honorable Mention for Best Paper, SIGIR 2011

  2. Short-listed for Best Interdisciplinary Paper, CIKM 2010

  3. Microsoft Live Labs Graduate Fellowship, 2006-2007

  4. Best Student Paper, SIGIR 2005