Donald Metzler

Staff Software Engineer

Google Inc.

Email: lastname AT google DOT com



[ Brief Bio - CV - LinkedIn ]

Interests: Information Retrieval, Natural Language Processing, Social Media, Web Search, Computational Advertising, Large-Scale Text Mining


[ Complete Publication List - Collaborators - Google Scholar - DBLP ]

Search Engines: Information Retrieval in Practice [ Web Site - Publisher - Amazon ]

W. Bruce Croft, Donald Metzler, and Trevor Strohman

Search Engines: Information Retrieval in Practice is designed to give undergraduate students the understanding and tools they need to evaluate, compare and modify search engines. The book covers the important issues in IR at a level appropriate for undergraduate computer science or computer engineering majors. Key mathematical models are included. The programming exercises in the book make extensive use of Galago, a Java-based open source search engine.

A Feature-Centric View of Information Retrieval [ Publisher - Amazon ]

Donald Metzler

In a shift away from heuristic, hand-tuned ranking functions and complex probabilistic models, this book presents feature-based retrieval models. The Markov random field model detailed goes beyond the traditional yet ill-suited bag of words assumption in two ways. First, the model can easily exploit various types of dependencies that exist between query terms, eliminating the term independence assumption that often accompanies bag of words models. Second, arbitrary textual or non-textual features can be used within the model. Combining term dependencies and arbitrary features results in a very robust, powerful retrieval model capable of obtaining state-of-the-art

effectiveness across a wide range of tasks and data sets.

Selected publications:

Metzler, D. and Croft, W.B., "A Markov Random Field Model for Term Dependencies," Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2005), 472-479, 2005. [pdf][slides][Best Student Paper]

Strohman, T., Metzler, D., Turtle, H., Croft, W.B., "Indri: A language model-based search engine for complex queries" in the online Proceedings of the International Conference on Intelligence Analysis. [pdf]

Metzler, D. and Croft, W.B., "Combining the Language Model and Inference Network Approaches to Retrieval," Information Processing and Management Special Issue on Bayesian Networks and Information Retrieval, 40(5), 735-750, 2004. [pdf]

Metzler, D., Dumais, S., and Meek, C. "Similarity Measures for Short Segments of Text," in the Proceedings of the 29th European Conference on Information Retrieval (ECIR 2007), 16-27, 2007. [pdf][slides]

Diaz, F., and Metzler, D. "Improving the Estimation of Relevance Models Using Large External Corpora," in the Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2006), 154-161, 2006. [pdf][slides]


Experimental Methods for Information Retrieval (Half Day), SIGIR 2012

[ Slides ]

Probabilistic Models for Information Retrieval (Full Day), SIGIR 2009

[ Part I Slides - Part II Slides ]

Program / Organizing Committees

  1. Program Chair: OAIR 2013, ICTIR 2013, WSDM 2014

  2. Area Chair / Senior PC: AAAI (2012), AIRS (2013), CIKM (2012, 2013, 2015), SIGIR (2011-2013, 2015-2016), WWW (2011)

  3. Editorial board: Information Retrieval Journal (past), PeerJ Computer Science Journal (current), Transactions on Information Systems (past)

  4. Steering Committee: ICTIR

  5. Poster Chair: SIGIR 2009

  6. Workshop Organizer: SIGIR 2011 Workshop on Social Web Search and Mining: Analysis of User Generated Content Under Crisis

  7. Best Paper Award Committee: ICTIR 2011 (Chair), SIGIR 2015

  8. Publicity Chair: CIKM 2011

Awards and Honors

  1. ACM Senior Member, 2012

  2. Runner-Up for Best Search Paper, WSDM 2012

  3. Honorable Mention for Best Paper, SIGIR 2011

  4. Short-listed for Best Interdisciplinary Paper, CIKM 2010

  5. Microsoft Live Labs Graduate Fellowship, 2006-2007

  6. Best Student Paper, SIGIR 2005


I was involved, to various extents, in the research and development of the following software:

  1. Mavuno - Mavuno is an open source, modular, scalable text mining toolkit built upon Hadoop. It supports basic natural language processing tasks (e.g., part of speech tagging, chunking, parsing, named entity recognition), is capable of large-scale distributional similarity computations (e.g., synonym, paraphrase, and lexical variant mining), and has information extraction capabilities (e.g., instance and semantic relation mining). It can easily be adapted to new input formats and text mining tasks.

  1. Ivory - Ivory is a Hadoop-based retrieval toolkit that supports various retrieval models, including the Markov Random Field model for IR. The project is a joint effort with Jimmy Lin and his colleagues at the University of Maryland.

  1. Indri - Indri is an efficient, scalable search engine with a robust query language. The project synthesizes and enhances the Lemur and Inquery search tools.

Last updated: January 1, 2016