[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MDC2 requirements



 Requirements for MDC2

We address here requirements that will affect the "Storage Manager"
for MDC2.  Some may have an effect on the "query object" and the
"event iterators".  Since there is little time between MDC1 and MDC2,
we need to prioritize what should be done, and  proceed as soon as
possible.  We assigned for several items a priority for MDC2, which is
what we believe should take priority (priority 1 is highest).  These
priorities need to be agreed on by all GC participants.

1)  Multiple event components

This is a requirement to support the splitting of event data into
multiple components, such as the "track" component, the "vertices"
component, and the "raw data" component.  In the current design at
RHIC for STAR and Phenix, it is expected that the "tag" database will
reside on disk cache, and the other components on tape storage.

A query can then ask for any combination of the components for  some
"range" conditions over the event properties.  Thus, it is important
to determine how the component databases will be formed.  This will
affect both the indexing modules that the query estimator uses as well
as the scheduling of caching by the query monitor.  It will also
affect the efficiency of the system.  We need to have answers to the
following questions:

Question 1:
Will all the component databases contain event data in the same order?
Can we assume that as event components are generated they will be
stored on different databases in the federation in the same order?
Answer:
No, while it can be assumed that roughly the event components will be
in the same order, this can not be relied on.  If the events are
generated in parallel by several processors, the generation of the
components can be out of order.  (This was confirmed by Torre).

Question 2:
Will each component file contain components of the same events only?
Can we assume that as soon as one of the components files fills up
(say at 1 GB), all the component files will be closed (even if their
size is much smaller)?
Answer:
No, typically, many more small components will end up in a single file
than files of large components.  Furthermore, we cannot assume that
files will be roughly the same size (e.g., 1GB).  File sizes will vary
depending on the components and the complexity of the events.

Implications:
1.  For the QE: multiple indexes, one for each component type
2.  For the QM: coordination of file caching
3.  For the QO and QE: extended query language

Priority: 1


2) Add time estimate to the QE

It was pointed out that one of the first things that analysts will
want is a precise time estimate, in addition to the number of files
and number of events per query.

It is possible to provide time estimates for a stand alone
availability of the system.  The most basic estimate can be based on
average access time per file, assuming that the system is continuously
available.  More sophisticated estimates can take into account
what’s in the cache at the time of the inquiry, variable file
sizes, and query load on the system.

The suggestion is to have the basic time estimate for MDC2, so we can
run some tests and see how close we are to the real system behavior.

Priority: 2


3)  Dynamic indexing

In the production system, events will be added continuously.  As new
files are generated, there will be a process to update the TagDB.