Matti A. Hiltunen, Richard D. Schlichting, and Carlos A. Ugarte
Department of Computer Science
The University of Arizona
Tucson, AZ 85721, USA
Survivability, the ability of a system to tolerate intentional attacks or accidental failures or errors, is becoming increasingly important with extended use of computer systems in society. Techniques, such as cryptographic methods, intrusion detection, and traditional fault-tolerance are being developed to improve the survivability of such systems. One of the special challenges of survivability is the need to work with existing systems - legacy or off-the-shelf - that were not designed with survivability in mind. The Cactus project offers some potential solutions to survivability in the form of highly customizable middleware.
This paper outlines potential contributions to survivability by the Cactus projects at the University of Arizona.
The Cactus approach to constructing highly customizable middleware services is based on implementing abstract properties and functional components of a service as separate modules that interact using an event-driven execution model. The basic building block of this model is a micro-protocol, a software module that implements a well-defined property of the desired service. A micro-protocol is, in turn, structured as a collection of event handlers, which are procedure-like segments of code that are executed when a specified event occurs. Events are used to signify state changes of interest, e.g., "message arrival from the network". When such an event occurs, all event handlers bound for that event are executed. Events can be raised by micro-protocols or be raised implicitly by the runtime system. Execution of handlers is atomic with respect to concurrency, i.e., each handler is executed to completion without interruption. The binding of handlers to events can be changed at runtime.
Event handler binding and invocation are implemented by a standard runtime system or framework that is linked with the micro-protocols to form a composite protocol. The framework also supports shared data (e.g., messages) that can be accessed by the micro-protocols configured into the framework. Once created, a composite protocol can be composed in a traditional hierarchical manner using system like the x-kernel [HP91] with other protocols to construct a middleware service.
Another aspect of Cactus that may have a great impact on survivability is diversity provided by the high level of customizability. If users customize the underlying services to their exact requirements and characteristics of the execution environment, the different service instances may become different enough that one method of breaking the service might not work on others [CP97].
Furthermore, an important aspect we are exploring in the Cactus project is adaptability. By adaptability, we mean a system's ability to react to changes - both in the execution environment (e.g., failures, intrusions) and user requirements (e.g., user's request to increase the security level of the system) - dynamically at runtime. The Cactus model offers a method of adaptation in event-handler rebinding: different modes of operation can be implemented in different event-handlers and a mode switch only involves unbinding the old event-handlers and binding the new ones. We are also exploring methods of dynamic code modification, where new event-handlers can be inserted into a running composite protocol. To coordinate the adaptation of distributed programs, we have developed a three phase model for adaptations consisting of change detection, agreement, and action [HS96]. The agreement phase, where the different nodes of the distributed computation agree if the adaptation is necessary and what the adaptation should be, is the most complicated. To simplify the implementation of this phase, we have developed a library of agreement micro-protocols.
Finally, as mentioned above, it is often necessary to increase the survivability of existing legacy or off-the-shelf applications. This can be often accomplished by either replacing some of the underlying communication and operating system services with survivable ones or by transparently inserting new middleware services between the application and the underlying services. One of the Cactus prototypes runs on the MK operating system and CORDS communication subsystem from the OpenGroup Research Institute. This platform allows all or part of a protocol stack to be inserted into the operating system kernel, and could thus replace existing communication services provided by the operating system. We are exploring the transparent insertion of new middleware services in the context of CORBA. The goal is to enhance the QoS for existing CORBA clients and servers without modifying either their code or the underlying CORBA ORBs. We are working on accomplishing this goal on the Orbix CORBA ORB by inserting composite protocols into smart proxies (at the clients) and filters (at the servers).
GroupRPC service. GroupRPC provides remote procedure call service to a group of servers. Using a group of servers rather than a single server naturally improves survivability in the sense that the failure of one server does not prevent a remote procedure call from completing successfully. Furthermore, the service allows the client to specify a collation function that is used to combine the responses from different servers into one. Such a collation function can be used to implement majority voting and even sanity checks on the responses, which allows the masking of different types of failures and even intrusions to some of the servers. Moreover, another special feature of GroupRPC is that it allows the user to specify the failure model that the service is to tolerate. The failure models currently supported include crash, omission, timing, and Byzantine (arbitrary). The failure model chosen not only affects the behavior at the client but also the interaction between servers. In particular, if the service is configured to tolerate Byzantine failures, the servers run a Byzantine agreement protocol to agree on the set of requests to be processed. Thus, any faulty or compromised server cannot cause the server group to become inconsistent.
Security service. The security service is a composite protocol that provides enhanced security features such as authentication, integrity, and privacy for messages traversing that layer. The service provides a choice of micro-protocols for each of the aspects of secure communication. For example, we can provide a choice of methods for authentication, encryption, and detection of replay attacks. All aspects of security are optional. For example, a user might require integrity but not privacy (encrypted message signature). Also, the service not only allows a choice of cryptographic protocol from a list of options but also the arbitrary combination of these protocols. This makes it more difficult for the intruder the break the encryption.