2 July 2009

Roger Barga talks about Tools and Services for Data Intensive Research

Orsay, Bat I, 1st floor,
MSR-INRIA Joint Centre,
Tools and Services for Data Intensive Research
Roger Barga
Microsoft Research, Redmond

Tools and Services for Data Intensive Research

For many important research investigations, especially in science, efficiently analyzing large data sets is a major challenge. Microsoft’s Dryad is a high-performance, general-purpose distributed computing engine that handles some of the most difficult aspects of cluster-based distributed computing. It’s powerful: Microsoft routinely uses Dryad applications to analyze petabytes of data on clusters of thousands of computers. Microsoft Research has also developed DryadLINQ, which allows developers to use an extended version of the LINQ programming model and API to implement Dryad applications in managed code. DryadLINQ code is similar to what you’ll see in a conventional LINQ-to-objects application, and the application core is often only a few lines of code. Behind the scenes though, a DryadLINQ provider automatically converts the LINQ query into a Dryad job and executes the query as a distributed application on a cluster. Using Dryad through DryadLINQ, even a novice at parallel processing or cluster-based computing can implement a high-performance distributed application to efficiently analyze terabytes of data.

In this talk we present an introduction to Dryad and DryadLINQ for data intensive research and we compare and contrast it to other related technologies. We describe our ongoing efforts to collaborate with external researchers to explore the application Dryad and DryadLINQ to big data research problems in science. We also highlight our efforts to offer software and services to researchers across the world, through the academic release of Dryad and DryadLINQ with associated programming user documentation being prepared by MSR’s External Research.