A Laboratory for Scalable Systems

Larry L. Peterson, Eugene W. Myers
Department of Computer Science
University of Arizona
Tucson, AZ 85721
{llp, gene}@cs.arizona.edu

The goal of this project is to investigate the viability of building scalable systems from commodity components, the key to which lies in the software. Our primary research thrusts include developing a system infrastructure that scales with processor performance and addressing aggregate scalability. Funds are being used to build a state-of-the-art workstation cluster for use in our research.

1. Project Overview

The University of Arizona, a Land Grant institution, has an outstanding research reputation. It ranks 10th in external research support among state universities, and 14th among all universities in the country. The Department of Computer Science has an equally strong track-record, evidenced in having been awarded three NSF Infrastructure grants over the past 13 years. The department has a long history of doing experimental systems research, and a tradition of making useful software prototypes available in the public domain.

The current infrastructure grant is providing the facilities to continue this experimental systems work, with a particular focus on scalable systems. Now in its second year, the goal of the project is to investigate the viability of building scalable systems from commodity components. Towards this end, we are using the NSF funds to build a state-of-the-art workstation cluster from PentiumPro and Alpha processors, and 100Mbps switched ethernet. The University is also supporting the project by contributing $75k a year for maintenance and the purchase of additional hardware (e.g., file servers and workstations for software development).

The key to building scalable systems from commodity components lies in the software, and with this in mind, we have two major research thrusts: (1) Develop a system infrastructure that scales with processor performance. This is being accomplished through three different sub-projects: The Scout Operating System, Support for Mobile Code, and Protocol Design. (2) Address aggregate scalability by designing, implementing, and experimenting with three different scalable systems: DNA Assembly Server, Filaments, and Scalable Storage Server.

In addition to conducting high quality research, the department is committed to increasing the participation of underrepresented groups in computer science. We have been addressing this issue in a number of ways, including a summer outreach program for minority junior high students, a mentoring program aimed at attracting and retaining women graduate students, actively recruiting female faculty, and hosting various events such as the annual Daughters Day on Campus.

2. Scout Operating System

Scout is a configurable, communication-oriented OS designed for network appliances. It's central abstraction is a path, which is essentially the extension of a network connection into the host operating system. An application using Scout can associate with a given path all the resourcesCPU, memory, bus, cachenecessary to provide the same quality of service as provided by the network connection to which it is attached.

Prototype implementation of Scout with full support for the path abstraction has been completed. The current system runs on Alpha and Pentium processors, and includes the TCP/IP protocol suite; video, network, and disk device drivers; a window manager; and a suite of MPEG-related protocols used to encode, transport, and decode video. In addition, several compiler-based optimizations that can be applied to paths have been implemented and evaluated. These optimizations reduce the instruction count and improve I-cache behavior. These techniques applied to both TCP/IP and RPC paths, are found to reduce the path latency by 25-90%, depending on how pessimally the code was loaded in the non-optimized case. Finally, the advantages of using paths in a particular application domainreceiving, decoding, and displaying MPEG-compressed video have been demonstrated. A Scout-based implementation of such an MPEG player is able to process frames 20-30% faster than an equivalent player implemented in Linux, and under high load, is able to meet a significantly greater percentage of the frame deadlines.

Publications

Students

Abhiram Khune, Brady Montz, David Mosberger, Oliver Spatscheck, Dave Larson

3. Support for Mobil Code

We developed a large Java translation system. The system, Toba, translates Java bytecode into C or subsequent compilation by an ANSI C compiler. The resulting object files link with the Java runtime system that we developed. With the exception of dynamic linking, we fully support Java. Toba-generated executables are many times faster than interpreted Java, and significantly faster than Just-In-Time (JIT) compiled programs. Toba has been available via ftp since October, 1996 and works on Solaris and Linux platforms. A prototype interpreter for Java bytecodes provides modest dynamic loading facilities, and is currently being re-engineered. Once the interpreter is stable, we will begin development of a JIT compiler for Java bytecode. Our research emphasis will be on JIT-speed.

Another project, Krakatoa, is a Java bytecode "decompiler." It turns Java bytecode back into Java source. The system is almost completely functional, and surprisingly simple. It appears that bytecode is significantly easier to reverse-engineer than ordinary object files.

Finally, we've begun work on a new compiler for the Icon programming language. This new compiler translates Icon into Java bytecode. The compiler utilizes a completely new mechanism for translating Icon's goal-directed control flow, which promises to be simpler and more efficient than previous techniques. This work should be completed in 1997.

Publications

Students

Patrick Bridges, Jens Ernst, Tim Newsham, Denise Todd, Scott Watterson

4. Protocol Design

Our research on distributed systems over the past several years can be divided into two major areas: configurable systems and high-performance heterogeneous scientific computing. In the first area, we have been developing a new approach to building highly customizable distributed

services such as atomic multicast, group RPC, and membership. The approach is based on a new event-driven model that allows different semantic properties of a service to be implemented as separate modules that can be configured together to provide a customized version of the service. We have applied the model to group RPC and membership services and completed a prototype implementation of the model based on augmenting version 3.2 of the x-kernel. We are extending this approach to include real-time guarantees.

In the area of heterogeneous scientific computing, we have pursued two lines of investigation: (1) developing new techniques for performing automatic high-level semantic data transformations between simulations; and (2) optimizing for RPC invocations in scientific applications by folding multiple calls into a single call, with intermediate code executed remotely by a Java applet.

Publications

Students

Nina Bhatti (Ph.D. graduate), Ilwoo Chang, Xiaonan Han, Matti Hiltunen, Donald Waugaman

5. DNA Assembly Server

Our work on large-scale DNA sequencing consists of work in three areas: fundamental algorithms for underlying assembly problems; software system/environment packaging for the assembly algorithms; and design of protocols for actually obtaining the sequence. The algorithmic work is embodied in our second Fragment Assembly Kernel (FAK II) and the encapsulating software environment is called the FAKtory.

An update of the FAK II algorithmic library, Version 4.2, was released this fall which includes improvements to the overlap and layout algorithms. In addition, the kernel is now incorporated into Roger Staden's Gap4 software environment for DNA sequencing. This year, most of our efforts and resources have been invested in the development of the FAKtory environment. We currently have a usable system that is being internally beta-tested and have signed on several major labs to serve as beta testers.

We have been examining the idea of directly shotgun sequencing the entire human genome using the current 100kb resolution STS maps to guide the assembly. We completed and reported the results of simulations on the coverage and linking characteristics of the expected data set. Currently we are designing algorithms for the inter-STS-marker assembly problem and will then perform trials on simulated data sets as a pilot feasibility study.

Publications

Students

Brad Traweek, Deepak Lakhanpal, Kedarnath Dubhashi, Eric Anson, Mudita Jain (Ph.D. graduate)

6. Filaments: Efficient Fine-Grain Parallelism

This project addresses the problem of architecture-independent parallel programming.

Our goals are to make it easier to develop parallel programs that are simple (easy to write) and portable while also being efficient. Toward this end we have developed a software system called Filaments, which provides fine-grain parallelism and a shared address space. These are in a sense the least common denominators for parallel computing, and hence they help make Filaments easy to use. The package has been implemented on shared-memory multiprocessors and networks of workstations, so an application can run unchanged on a range of parallel machines. Finally, the implementation is efficient: the performance of an application is typically within 10% of the performance of a customized coarse-grain program and is sometimes faster due to the use of run-time decision making.

In the past year we have developed an adaptive protocol for placing data on pages so as to avoid false sharing and an adaptive method for placing pages on nodes. We have also ported Filaments to a network of Suns running Solaris and begun porting it to a network of Pentiums running Linux.

Students

Vincent Freeh, David K. Lowenthal (Ph.D. graduates)

Publications

7. Swarm Scalable Storage Server

The goal of the Swarm project is to design and implement a scalable network storage system based on a cluster of personal computers. Swarm is a storage service, rather than a file service, because it can be configured to provide a variety of storage abstractions and access protocols. For example, a single Swarm cluster could simultaneously support Sun's Network File System (NFS), a parallel file system, HTTP, and a specialized Swarm interface. Current network file systems are designed to work well with only a single storage abstraction and access protocol (e.g. the Unix file abstraction and the NFS access protocol). Swarm, on the other hand, decouples these two facets of storage service, allowing for a more flexible storage system. The research challenge is to develop techniques for allocating Swarm's resources so that the storage abstractions and access protocols are configurable, without sacrificing performance and predictability.

We have designed and simulated a cooperative caching scheme that allows clients of a distributed file system to share the contents of their caches. A Linux implementation is in progress. We have also completed a Scout NFS client implementation, allowing Scout machines to access NFS file servers. We are using the NFS client to study the integration of caches into Scout. The low-level Swarm software has also made great progress. We completed a Linux-based Swarm storage server, and are nearly finished with a Scout version. The Swarm client-side logging software has been designed and is now being written.

Publications

Students

Prasenjit Sarkar, Wanda Chiu, Rajesh Sundaram, Tim Newsham