[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

revised component design doc



For those who have seen this before, this is a revised version.  

I added an explanation in the intro about event components, and the
requirement to have them together in cache.  I also fixed the
notation and some typos.  

I made some corrections on what is returned by estimates, following
Henrik's comments.  I also modified the policy on when to cache a
bundle a bit.  It says now that if there is a tie on which bundle to
cache, the largest bundle that fits in cache will be cached.  This is
to correct the problem we observed in the MDC1 runs.

Otherwise, it is the same as before.

Arie.

------------------------------------------------------------------
Supporting Events Data Split into Multiple Event-Components

This document is the result of several discussions between people who
are developing the Storage Manager.  We consulted others as well.  It
is intended as a design document for the next version of the Storage
Manager.

We describe here in some detail the design decisions we made for the
next version of the Storage Manager (version 2.0) to support events
split into multiple event-components.  Briefly, this extension comes
from the need to store various aspects of event data separately.  The
data that is accessed most frequently is stored permanently on disk.
Other components of events (such as "tracks", "hits" and "vertices")
will be stored in separate files on tape.  Thus, an event will have
each of its components stored in separate files.  Each "component
file" will contain only data for its type: a "tracks" file will
contains only the tracks components (usually, for many events), a
'hits' files will contain only hits components, etc.

Generally, component files are not coordinated in their event
components.  For example, a "tracks" file may contain tracks
components for 100 events, while a related "vertices" file that has
vertices components for 50 of these events, may have 500 additional
components of other events. 

Queries can specify any combination of components (see below). A
requirement of the system is that processing an event requires all its
components, as specified in the query, to be in cache at the same
time.  The task of the Storage Manager is to coordinate the caching of
files in a way that satisfies this requirement and optimizes
performance. 

In addition, Version 2.0  will be extended to include file_size
information.  It will be used to get more precise query estimations,
as well as improved caching policies.

1.  File Bundles

Before we proceed we need to introduce the concept of a "file bundle"
that is used to manage multiple event-components.  A "file bundle"
refers to a set of files, one for each component, that needs to be in
cache AT THE SAME TIME to process events whose components are in these
files.

We use the following notation:
The set of events: E1, E2, ..., Ei, ..., En;
The set of components: C1, C2, ..., Cj, ..., Cm;
The set of files: F1, ..., Fk, ..., Fp;

Thus, if an event-component for Ei, component Cj is stored in file Fk,
then we refer to it as EC(i,j,k).

For example, suppose that a query requested components C3 and C7.
Suppose that files F13 , F206 are 2 (of many) files for components C3
and C7, respectively.  Assume that F13 and F206 contain
event-components in common for events {E7, E36, E102}, and no other.
Then the following must exist: EC(7,3,13), EC(7,7,206), EC(36,3,13),
EC(36,7,206), EC(102,3,13), EC(102,7,206).  

We say that the ordered set {F13, F206} is a "file bundle" for the
ordered set of components {C3, C7}, and {E7, E36, E102} is the set of
events associated with this file bundle.  We use the following concise
notation for this file bundle and its associated event set:
C13, F206: {E7, E36, E102}

A query can have a large number of file bundles, where some files can
be in more than one file bundle. 

We generalize the notation for "an ordered set of file bundles and
their associated event set" as follows:

Given that the query has q components: 
C1, ..., C(j), ..., Cq
Then each file bundle x has q files: 
Fx(1), ..., Fx(j), ..., Fx(q).
If the file bundle has r events associated with it:
Ex(1), ..., Ex(i), ..., Ex(r),
Then, for each Ex(i), there must exist an event component in each of
the q files, EC(i,j,Fx(j)) for
1<= i <=r, 1<= j <=q.

The following is the notation used for "an ordered set of file bundles
and their associated event set".

{Fx(1), Fx(2), ..., Fx(q): {Ex(1), Ex(2), ..., Ex(r)},
where x ranges over all file bundles for this query.

To simplify, we drop the bundle index x, and write:
  
{F1, F2, ..., Fq: {E1, E2, ..., Er}}, where
q = the number of components requested in the query, and
r = the number of events in common in the q component files.

Note that a special case of that for a single component is what we had
the previous version: {F: {E1, E2, ..., Er}}.  

A detailed example of a set of file bundles for a query is given in
Appendix 1.


2. The query interface language

The query language will be extended to support the following
psedo-query format:

SELECT (component_name, component_name, ...)
FROM dataset_name
WHERE (predicate_conditions_on_properties)

Where:
'Component_name"  is a string representing the name of a component of
the "dataset_name", and "predicate_conditions_on_properties" is the
same predicate structure on properties as we had previously.  

Example:
SELECT tracks, hits
FROM Run17
WHERE glb_trk_tot>0 & glb_trk_tot<10 & n_vert_total<3

Note:
If dataset_name is "null", a default dataset is assumed.  Actually,
currently we assume that there is a separate Storage Manager for each
dataset, and a dataset_name parameter is not supported.

1. Extensions to the Query Estimator (QE) 

The QE index will be extended to include file information for all
components.  Specifically, for each event component the index will
include the file_oid.  A separate index will contain the file sizes of
all the files.  The following extensions will be made to the QE
functions:

2a)  The "quick query estimation" will return (min, max) figures for
the "number of events"and the "number of files" (for all components
requested).

2b)  The "full query estimate" returns precise figures for the above,
as well as an event_oid set.  In addition, it returns the "total MBs
of all the files" and "time estimate" (assuming the query ran stand
alone, all files in cache for that query stay in cache, and no file
gets cached more than once).  This "time estimate" may not be
available for MDC2.

2c)  Upon "execute" request, the QE will use its indexes to generate
and pass to the Query Monitor (QM) a set of file bundles and their
associated events:
{F1, F2, ..., Fq: {E1, E2, ..., Er}}.


3. Extensions to the Query Monitor (QM)

The QM will be extended to support file bundles, and caching policies
to support their coordination in cache.  This includes:

3a)  Each query in the query queue will now have a set of file bundles
associated with it.

3b)  Each file bundle will have a score value associated with it. The
scheduling of a file bundle to be brought to cache will be based on
the score values.  The score values are part of the caching policy.

3c)  The caching policy will have 5 elements.  Each element can have
several options.  The combination of such options can provide a wide
range of policy options.   The explanation of each caching policy
element is given in Appendix 2, along with the default policy we plan
to implement for MDC2.

3d)  Depending on what's in cache, a request for caching a file bundle
may be partial, i.e. a request to cache the files in the bundle that
are not in cache.  The files in the cache will be marked as "in use"
until the rest of the files requested are cached.

3e)  Caching requests to the Cache Manager for file bundles (or
partial file bundles) are integral: either all files can be cached or
none are cached.

3f)  After a file bundle is cached the list of event-ids for that
bundle is passed to the requesting Event_Iterator (EI).  Currently, we
do not see the need to pass to the EI the list of file, but this can
be done if there is some advantage to it.


4) Extensions to the Cache Manager (CM)

The CM will be extended to handle caching of file bundles.  This
includes:

4a)  Accept a request for a (partial) file bundle caching.  Note that
the CM does not need to distinguish between partial or non-partial
requests.  

4b)  The CM will find out if enough cache exist for the entire caching
request.  If so, it pre-allocated the cache space (to avoid
deadlocks), and schedules all PFTPs.

4c)  The CM informs the QM as soon as each file is cached.  Thus, the
QM can take advantage of that for other queries even if the entire
bundle is not cached yet.

4d)  The CM will cache files into the directory path that Objectivity
maintains for each file.  This will be determined dynamically for each
files cached.  This includes the "dummy directories" for files not
stored in Objectivity, such as raw data files.



APPENDIX 1

Example:  
Suppose that for a given query the following event list qualifies: 11,
12, 13, 21, 22, 23, 24.  For each event and component there is another
OID and the file_ID for it.  For simplicity, we label them according
to the components follows:

For example: for event 11
its event_oid = e11,
its track_oid = t11, and
its hits_Oid  = h11.

So, suppose that for the example above, we have IDs and files as
follows:

Event_oid  tracks_oid  tracks_fid  hits_oid  hits_fid

   e11        t11        F1         h11        F2
   e12        t12        F1         h12        F2
   e13        t13        F1         h13        F3
   e21        t21        F4         h21        F5
   e22        t22        F4         h22        F6
   e23        t23        F4         h23        F6
   e24        t24        F4         h24        F6

Note that:
F1 and F2 have 2 events in common,
F1 and F3 have 1 events in common,
F4 and F5 have 1 events in common,
F4 and F6 have 3 events in common.

Thus, the set of file bundles and their associated event_ids will be:

{F1, F2: {e11, e12}],
 F1, F3: {e13},
 F4, F5: {e21},
 F4, F6: {e22, e23, e24}}

Note that:
1. All file bundles have the same number of files, one for each
component requested in the query.
2. We assume that all the event component IDs (i.e. t11, t12,  ...,
and h11, h12, ..., for the above query) are stored in the event
header, and will be picked up by the user code from there.  Thus, only
the event IDs need to be passed to the Event Iterator (EI).


APPENDIX 2

Caching Policy Elements

1. File scoring policy

File scores can be assigned on the basis of various criteria, such as
the number of events per file that qualify for a query, the number of
queries in the query queue that need that file, or combinations of
these.

The default method (the one to be implemented first) is:
File score = SUM (all bundles for each query that the file appears in)
over all pending queries.  

For example, if there are 2 queries in the queue, and file Fk appears
in 5 bundles for query 1 and in 3 bundles for query 2, then score (Fk)
= 8.

The file scores are dynamically decremented after each file bundle is
processed (i.e. released by the EI).  The file scores are dynamically
incremented for each new query request arriving to the QM.

The default method to determine the "score of a file bundle" is simply
the sum of the weights of all the files in the bundle.  Thus, for a
file bundle x, FBx, that has k files, we have:
Score (FBx) = SUM (score (Fk)), where k ranges over all files in FBx.

2. Query service policy

The QM keeps a queue of all queries according to their arrival time.
The order of servicing queries can use various policies, such as Round
Robin (RR) or Shortest Query First (SQF).  Care also needs to be
provided that queries are not starved perpetually. For example, a
query may be skipped if not enough space is found in the cache for any
of the query bundles.  In principle, query service policies can be
tuned to types of users or types of queries based on priority
assignment, but we will not consider these options at first.

Another service policy consideration may be set according to a user's
priority.  We do not plan to include such policies at this time.

The default policy is RR.  Service for a query is skipped if the query
has all the bundles it requested satisfied (subject to pre-fetching
limits - see below) and it is still processing them.

An additional provision is a counter per query for each time it cannot
be serviced. If so, the counter is incremented.  When that counter
reaches a certain value (no_service_parameter) no other query is
serviced until it is possible to service the current query.  The
no_service_parameter can be dynamically set.

3. Bundle caching policy

This policy determines which bundle to cache when it is a query's turn
to be serviced.  This is based on the "file bundle score".

The default policy is to cache the file bundle with the most files in
cache.  In case of a tie, the bundle with the highest score is
selected.  In case of a tie on the score, the largest bundle in
attempted first.  In case that there is no space in cache for the
selected bundle, the next eligible bundle that will fit the cache is
selected.

In addition, the default policy will pass a bundle to any query (out
of RR order) that has all the files for any of its bundles in cache.
This is also subject to the pre-fetching limit.

4. File purging policy

File purging policy is the policy that determines what to release from
cache when a bundle is released and cache space is needed.  No file
purging occurs until space is needed by some query.

Rather than considering purging all files in a bundle to be purged,
this policy is based on one file at a time.  This is to insure that if
a file is needed by more than one bundle, its purging is deferred as
long as possible.  The policy is based on the "file score" of the
files in cache.

The default policy is simply to purge the file with the smallest
score.  In case of a tie, the largest file will be purged (an
alternative policy for a tie is to choose the file longest in the
cache since it was released, but this will not be implemented at
first).

5. Pre-fetching policy

This policy simply states how many bundles will be pre-fetched for a
query.  For example, if this is set to 2, then each query can have
only 2 bundles at any one time in the cache.  If a high level of
parallel processing is anticipated, this parameter should be high.

In principle, a pre-fetching number can be associated with each type
of user or even with each query.  For example, users can be assign a
priority level, and their pre-fetching parameter can be set
accordingly.  However, we will not consider such policies at this
time.

The default policy will be set to "infinity".  But, this parameter can
be set dynamically.  This parameter will be assigned to all queries.

-----------------------------------------------------------------------