University of Michigan

Workshop on Data, Text, Web, and Social Network Mining

Friday, April 23, 2010

9:30 AM - 6 PM

Sponsored by Yahoo!, CSE, and SI





Over the last several years, the research community at the University of Michigan focused on mining large amounts of data (whether structured, semi-structured, textual, or multimedia) has grown significantly. Faculty interested in developing new data mining techniques are now hosted in several units, including Computer Science and Engineering, Information, Statistics, Linguistics, and Mathematics, and also several domain units in the natural sciences, medical sciences, social sciences, and humanities, with faculty interested in the use of data mining techniques to advance science in their domain. The goal of this workshop is to bring this group of people together and to set the agenda for research in the next 10 years and beyond.

Who is invited?

All UM faculty and graduate students working in the fields of text and data mining, broadly construed to include models and technologies for statistical data analysis, Web search technology, analysis of user behavior, social network analysis, data visualization, etc. as well as related areas. External visitors are also welcome to attend.

How to participate

  1. All faculty doing research in data and text mining and related areas get an automatic lab overview slot to describe their work and their interests in the field. Email your lab's name and/or talk title to dm2010@umich.edu by March 31, 2010 to reserve your spot.
  2. In addition to overview slots, we offer faculty and graduate students the opportunity to present other work in a range of formats. Email dm2010@umich.edu by March 31, 2010 and indicate the type of slot you are interested in: technical presentation, poster, or demo. We will need an abstract and title, list of authors, as well as a short introduction specifying whether this talk was presented elsewhere (e.g., your most recent SIGMOD or SIGIR talk). If the talk is based on an existing paper (whether published or not), attach the paper as well.
  3. Additional graduate student demos and posters will be presented during the afternoon reception. Email dm2010@umich.edu by March 31, 2010 a list of poster titles and the persons presenting them.

Invited Speaker

Raghu Ramakrishnan, Chief Scientist for Audience & Cloud Computing, and Fellow, Yahoo!: Building and Searching a Web of Concepts


Workshop Program

The workshop consists of invited talk, faculty presentations, discussions, and a poster session. The full program is accessible in MS word format and PDF .


Invited Talk

Raghu Ramakrishnan: Building and Searching a Web of Concepts


Search engines are increasingly offering results that are based on a semantically rich interpretation of the user's intent and the content available to satisfy that intent. A natural question is to ask how far along we are in understanding content on the web. The Semantic Web seeks to enable publication of data with rich markups that facilitate automated interpretation; Yahoo!'s Search Monkey is an example of a service in this spirit. However, there is much useful data that is not semantically marked up, and many domains in which the coverage of existing structured data feeds is low. In this talk, I will discuss the goal of constructing a web of "concepts" (a term I use to denote entities, categories of entities, and relationships) by starting with the current view of the web (as a collection of hyperlinked pages, or documents, each seen as a bag of words).

We need to extract concept-centric metadata for a broad and deep set of important concepts, and stitch it together to create a semantically rich aggregate view of all the information available on the web for each concept instance. The goal of building and maintaining such a web of concepts presents many challenges, but also offers the promise of enabling many powerful applications, including novel search and information discovery paradigms. In this talk, I will describe a research agenda towards this goal and discuss related work, including the PSOX project at Yahoo!.


Raghu Ramakrishnan is Chief Scientist for Audience and Cloud Computing at Yahoo!, and a Yahoo! Fellow, Building and Searching a Web of Concepts. His work has influenced query optimization in commercial database systems and the design of window functions in SQL:1999. His paper on the Birch clustering algorithm received the SIGMOD 10-Year Test-of-Time award, and he has written the widely-used text "Database Management Systems" (with Johannes Gehrke). Ramakrishnan is a Fellow of the ACM and IEEE, and has received several awards, including the ACM SIGKDD Innovations Award, the ACM SIGMOD Contributions Award, a Distinguished Alumnus Award from IIT Madras, a Packard Foundation Fellowship in Science and Engineering, and an NSF Presidential Young Investigator Award. He is Chair of ACM SIGMOD, on the Board of Directors of ACM SIGKDD and the Board of Trustees of the VLDB Endowment. Ramakrishnan was Professor of Computer Sciences at the University of Wisconsin-Madison, and founder and CTO of QUIQ, a company that pioneered question-answering communities, powering Ask Jeeves' AnswerPoint as well as customer-support for companies such as Compaq. Raghu Ramakrishnan got his B.Tech. from IIT Madras in 1983 and his Ph.D. from the University of Texas at Austin in 1987.