[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FYI: CHEP'98 GCA Architecture Abstract




            An Architecture for Optimizing Query Processing 
          and Data Delivery in Multilevel Storage Environments


When, in the next several years, high energy and nuclear physics experiments 
begin to generate hundreds and even thousands of terabytes of data annually, 
significant portions of that data will necessarily reside on tertiary storage.
In these settings, tools that support prefetching, and that attempt to 
optimize the order in which data are returned to querying programs, will 
play a vital role.  When a few keystrokes are the difference between 
a query that returns 100 events and one that returns 100,000,000 events and  
requires thousands of tape mounts, it will be important for physicists to 
understand the scope and ramifications of their queries before the 
machinery is set in motion.  

This paper describes an approach to query estimation, cache management,  
prefetching, and parallel, order-optimized iteration under development 
as one part of a U.S. Department of Energy Grand Challenge project whose aim 
is to provide tools for large-scale data handling in high energy and nuclear 
physics.  The architecture has been tested on the Parallel Distributed 
Systems Facility at the National Energy Research Scientific Computing Center 
(NERSC) using NERSC's High Performance Storage System (HPSS) for tertiary 
storage.  The underlying data are simulated STAR events, instantiated in a 
federation of Objectivity object-oriented databases.  The software will be 
deployed and evaluated in the context of the Relativistic Heavy Ion Collider's 
Mock Data Challenge in the late summer and early fall of 1998.