Large-scale Parallel Software Group

Research in the Large-scale Parallel Software Group is focused on software for parallel and distributed computing systems. Building efficient, portable software for parallel and distributed platforms is notoriously difficult; our research is aimed at making this easier. Our research can be broadly classified into three areas: making parallel and distributed programs easier to write; building efficient runtime systems for parallel and distributed programs; and bridging the gap between parallel and distributed systems to allow a common hardware base to be used for a broad range of applications. The LPS group collaborates extensively with other groups in LCS and at other institutions.

Our work on making parallel and distributed programs easier to write began with the Prelude system, and is currently represented by the Autopilot project (a collaboration with Prof. Kaashoek's Parallel and Distributed Operating System group). Autopilot combines an easy-to-use shared-memory programming model with an efficient runtime system based on a scalable distributed-memory message-passing architecture. Autopilot simplifies the programmer's job by managing locality and the decomposition of work into parallel tasks.

Our work on efficient runtime system mechanisms includes work on scheduling and resource management (register relocation and lottery scheduling) designed to reduce the costs of managing parallel tasks and also to permit multiple tasks to share computing resources fairly, and on efficient communication mechanisms designed to provide very low-overhead communication with predictable performance, including both point-to-point messages and global communication patterns.

To bridge the gap between parallel and distributed systems, we are collaborating with other groups in LCS on the Exokernel and Fugu projects. A principal goal of these projects is to provide low-overhead protected communication, thus enabling a convergence of architectures between parallel and distributed systems. We are also developing resource management mechanisms (gang scheduling and global load distribution) that permit a range of applications, including parallel supercomputing, sequential, and distributed jobs, to share the same underlying computing resources.

Usage of the PSG WWW server is kept track of through usage statistics.

carl@lcs.mit.edu