Bongki Moon
Department of Computer Science
University of Arizona
P.O. Box 210077
Tucson, AZ 85721-0077
Phone: (520) 621-4326
Fax : (520) 621-4246
Email:
bkmoon@cs.arizona.edu
URL:
http://www.cs.arizona.edu/~bkmoon
One of the major issues of the molecular biological community is
the need and technical difficulty to setup and maintain distributed
databases to be linked, coordinated, and integrated
over the web as the basis for analyzing and interpreting biological
organisms. The Distributed Cooperative Apache (DC-Apache) web server
system, which is one of the main components of the proposed research,
will provide the necessary infrastructure for searching, browsing and
sharing biological and genomic data.
The main objective of the proposed research activities is to make
scientific and geospatial data archives accessible through the Internet.
Given explosive data traffic in the world-wide web (WWW), it is crucial
to achieve the scalable performance of web servers. The overall
performance and resource utilization can be improved by spreading
document requests among a group of web servers. This leads to the design
and implementation of Distributed Cooperative Apache (DC-Apache) web
server. We have developed the DC-Apache system built atop the Apache web
servers (version 1.3 based on the pool-of-processes model)
by augmenting them with new functionalities so that individual web
servers can cooperate and share work load as a collective unit.
We have also addressed the issue of storage management for more effective document replication under limited capacity. We have evaluated the DC-Apache system with real-world data sets such as Sequoia scientific data and standard benchmark suite SpecWeb99. In all the experiments, the DC-Apache system has demonstrated its ability to achieve high performance and scalability by effectively distributing load among a group of cooperating Apache servers and by eliminating hot spots and performance bottleneck with replicated documents. In particular, the Resource-Aware method, proposed for data replication under limited storage, turned out to be very effective in replicating and replacing documents.
The second major research activity was to develop new techniques to process
distance join queries for spatial and multimedia database applications.
Additional requirements for ranking and stopping cardinality are often combined
with the spatial distance join in on-line query processing or Internet search
environments. These requirements pose new challenges as well as opportunities
for more efficient processing of spatial distance join queries. We have
developed an efficient k-distance join algorithm that uses new plane-sweeping
techniques for fast pruning of distant pairs. We have also developed adaptive
multi-stage algorithms for k-distance join and incremental distance join
operations. Furthermore, we have found that a priority strategy for the tied
pairs in the priority queue during distance join processing greatly affects its
performance. We have proposed a probabilistic tie-breaking priority method
to address this issue. Our performance study shows that the proposed strategies
outperform previous work by up to an order of magnitude for both k-distance join
and incremental distance join queries, under various operational conditions.
DC-Apache system is a scalable web server
solution in order to meet the explosion of data traffic in the World Wide Web.
Our solution takes the
graph-based approach and it is built on the hypothesis that most
web sites only have a few
well-known entry points from which users
start navigating through the site's documents. The DC-Apache system can
dynamically manipulate the hyperlinks embedded in web documents in order to
distribute access requests among multiple cooperating web servers.
XISS: XML Indexing
and Storage System
XISS
utilizes the extended preorder
numbering scheme for XML documents.
The extended preorder numbering scheme
provides a way to encode the
elements and attributes in an XML document, such that the
ancestor-descendant relationship can be determined quickly
and future insertions can be accommodated
gracefully. This numbering scheme also
provides opportunities for storing XML data
using relational databases, which is
demonstrated in the XML Indexing and Storage System using RDBMS (XISS/R).
This web
page includes real-world data sets such Shakespeare's Plays and SIGMOD Record, and synthetic data sets based on News Industry Text Format (NITF)
as a DTD.