[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Multi-component design document

To: gcdev@sseos.lbl.gov
Subject: Multi-component design document
From: Arie Shoshani <arie@math.lbl.gov>
Date: Wed, 4 Nov 1998 22:07:49 -0800 (PST)
Cc: D_Rotem@lbl.gov, arie@math.lbl.gov, jinbaek@ieor.berkeley.edu
Enclosed is a fairly long (and hopefully complete) document that
describes in some detail the design of what we plan to implement 
in the Storage Manager in order to support multiple event components.
This document is the result of several discussions between people who
are developing the Storage Manager.  We consulted others as well.  It
is intended as a design document for the next version of the Storage
Manager.

I urge people interested in that (and especially people affected
by our decisions) to read this carefully, and give us feedback as soon
as possible, so any comment you have may still influence the
implementation.

Thanks, Arie.
---------------------------------------------------------------------

Supporting Multiple Event Components

We describe here in some detail the design decisions we made for the
next version of the Storage Manager (version 2.0) to support multiple
event components.  This includes the extensions to each of the Storage
Manager's modules as well as new caching policies to handle mutliple
event components.

In addition, Version 2.0  will be extended to include file_size
information.  It will be used to get more precise query estimations,
as well as improved caching policies.

1.  File Bundles

Before we proceed we need to introduce the concept of a "file bundle"
that is used to manage multiple event components.  A "file bundle"
refers to the ordered set of files, one for each component, that
needs to be in cache AT THE SAME TIME to process events whose
components are in these files.

For example, suppose that a query requested components C1 and C2.
Suppose that files F13, F206 are 2 (of many) files for components c1
and c2, respectively.  Assume that F(c1) and F(c2) contain event
components in common {E7, E36, E102}.  Then, we refer to the ordered
set (F(c1), F(c2)) as the "file bundle" and to the set {E7, E36, E102}
as the set of events associated with that file bundle for that query.
A query can have a large number of file bundles, where some files can
be in more than one file bundle. 

For a given query, the following is a concise representation of "a set
of file bundles and their associated event set":

{(F1, F2, ... Fi){E1, E2, ... Ej}}, where
i = the number of components requested in the query, and
j = the number of events in common in the i component files.

Note that a special case of that for a single component is what we had
the previous version: {Fi {E1, E2, ... Ej}}.  A detailed example
is given in Appendix 1.


2. The query interface language

The query language will be extended to support the following
psedo-query format:

SELECT (component_name, component_name, ...)
FROM dataset_name
WHERE (predicate_conditions_on_properties)

Where:
'Component_name"  is a string representing the name of a component of
the "dataset_name", and "predicate_conditions_on_properties" is the
same predicate structure on properties as we had previously.  If
dataset_name is "null", a default dataset is assumed.

Example:
SELECT tracks, hits
FROM Run17
WHERE glb_trk_tot>0 & glb_trk_tot<10 & n_vert_total<3

1. Extensions to the Query Estimator (QE) 

The QE index will be extended to include file information for all
components.  Specifically, for each event component the index will
include the file_oid.  A separate index will contain the file sizes of
all the files.  The following extensions will be made to the QE
functions:

2a)  The "quick query estimation" will return (min, max) figures for
"number of events", "number of files" (for all components requested),
"total MBs of the files" and "time estimate" (assuming the query ran
stand alone, all files in cache for that query stay in cache, and no
file gets cached more than once).  This "time estimate" may not be
available for MDC2.

2b)  The "full query estimate" returns precise figures for the above,
as well as an event_oid set.

2c)  Upon "execute" request, the QE will use its indexes to generate
and pass to the Query Monitor (QM) a set of file bundles and their
associated events:
{(F1, F2, ... Fi){E1, E2, ... Ej}}.


3. Extensions to the Query Monitor (QM)

The QM will be extended to support file bundles, and caching policies
to support their coordination in cache.  This includes:

3a)  Each query in the query queue will now have a set of file bundles
associated with it.

3b)  Each file bundle will have a weight value associated with it.
The scheduling of a file bundle to be brought to cache will be based
on the cost values.  The cost values are part of the caching policy.

3c)  The caching policy will have 5 elements.  Each element can have
several options.  The combination of such options can provide a wide
range of policy options.   The explanation of each caching policy
element is given in Appendix 2, along with the default policy we plan
to implement for MDC2.

3d)  Depending on what's in cache, a request for caching a file bundle
may be partial, i.e. a request to cache the files in the bundle that
are not in cache.  The files in the cache will be marked as "in use"
until the rst of the files requested are cached.

3e)  Caching requests to the Cache Manager for file bundles (or
partial file bundles) are integral: either all files can be cached or
none are cached.

3f)  After a file bundle is cached the list of event-ids for that
bundle is passed to the requesting Event_Iterator (EI).  Although we
do not see the need to pass to the EI the list of file, this can be
done if there is some advantage to it.


4) Extensions to the Cache Manager (CM)

The CM will be extended to handle caching of file bundles.  This
includes:

4a)  Accept a request for a (partial) file bundle caching.  Note that
the CM does not need to distinguish between partial or non-partial
requests.  

4b)  The CM will find out if enough cache exist for the entire caching
request.  If so, it pre-allocated the cache space (to avoid
deadlocks), and schedules all PFTPs.

4c)  The CM informs the QM as soon as each file is cached.  Thus, the
QM can take advantage of that for other queries even if the entire
bundle is not cached yet.

4d)  The CM will cache files into the directory path that Objectivity
maintains for each file.  This will be determined dynamically for each
files cached.  This includes the "dummy directories" for files not
stored in Objectivity, such as raw data files.



APPENDIX 1

Example:  
Suppose that for a given query the following event list qualifies: 11,
12, 13, 21, 22, 23, 24.  For each event and component there is another
OID and the file_ID for it.  For simplicity, we label them according
to the components follows:

For example: for event 11
its event_oid = e11,
its track_oid = t11, and
its hits_Oid  = h11.

So, suppose that for the example above, we have IDs and files as
follows:

Event_oid  tracks_oid  tracks_fid  hits_oid  hits_fid

   e11        t11        F1         h11        F2
   e12        t12        F1         h12        F2
   e13        t13        F1         h13        F3
   e21        t21        F4         h21        F5
   e21        t22        F4         h22        F6
   e23        t23        F4         h23        F6
   e24        t24        F4         h24        F6

Note that:
F1 and F2 have 2 events in common,
F1 and F3 have 1 events in common,
F4 and F5 have 1 events in common,
F4 and F6 have 3 events in common.

Thus, the set of file bundles and their associated event_ids will be:

{[F1, F2 {e11, e12}],
 [F1, F3 {e13],
 [F4, F5 {e21}],
 [F4, F6 {e22, e23, e24}]}

Note that:
1. All file bundles have the same number of files, one for each
component requested in the query.
2. We assume that all the event component IDs (i.e. t11, t12, ...,
and h11, h12, ... for the above query) are stored in the event
header, and will be picked up by the user code from there.


APPENDIX 2

Caching Policy Elements

1. File weight policy

File weights can be assigned on the basis of various criteria, such as
the number of events per file that qualify for a query, the number of
queries in the query queue that need the file, or combinations of
these.

The default method (the one to be implemented first) is:
File weight = SUM (all bundles for each query) over all pending
queries.  

For example, if there are 2 queries in the queue, and file Fi appears
in 5 bundles for query 1 and 3 bundles in for query 2,
Weight (Fi) = 8.

The file weights are dynamically decremented after each file bundle is
processed (i.e. released by the EI).  The file weights are dynamically
incremented for each new query request arrives to the QM.

The default method to determine the "weight of a file bundle" is
simply the sum of the weights of all the files in the bundle.

2. Query service policy

The QM keeps a queue of all queries according to their arrival time.
The order of servicing queries can use various policies, such as Round
Robin (RR) or Shortest Query First (SQF).  Care also need to be
provided that queries are not starved perpetually. For example, a
query may be skipped if not enough space is found in the cache for any
of the query bundles.  In principle, query service policies can be
tuned to types of users or types of queries based on priority
assignment, but we will not consider these option at first.

The default policy is RR.  Service for a query is skipped if the query
has all the bundles it requested satisfied (subject to pre-fetching
limits - see below) and it is still processing them.

An additional provision is a counter per query for each time it cannot
be serviced. If so, the counter is incremented.  When that counter
reaches a certain value (no_service_parameter) no other query is
serviced until it is possible to service the current query.  The
no_service_parameter can be dynamically set.

3. Bundle caching policy

This policy determines which bundle to cache when it is a query's turn
to be serviced.  This is based on the "file bundle weight".

The default policy is to cache the file bundle with the most files in
cache.  In case of a tie, the bundle with the highest weight is
selected.  In case that there is no space in cache for the selected
bundle, the next eligible bundle that will fit the cache is selected.

In addition, the default policy will pass a bundle to any query (out
of RR order) that has all the files for any of its bundles in cache.
This is also subject to the pre-fetching limit.

4. File purging policy

File purging policy is the policy that determines what to release from
cache when a bundle is released and cache space is needed.  No file
purging occurs until space is needed by some query.

Rather than considering purging all files in a bundle to be purged,
this policy is based on one file at a time.  This is to insure that if
a file is needed by more than one bundle, its purging is deferred as
long as possible.  The policy is based on the "file weight" of the
files in cache.

The default policy is simply to purge the file with the smallest
weight.  In case of a tie, the largest file will be purged (an
alternative policy for ties is to choose the file longest in the cache
since it was released, but will not be implemented at first).

5. Pre-fetching policy

This policy simply states how many bundles will be pre-fetched for a
query.  For example, if this is set to 2, then each query can have
only 2 bundles at any one time.  If high level of parallel processing
is anticipated, this parameter should be high.

In principle, a pre-fetching number can be associated with each type
of query or even with each query.  For example, users can be assign
priority level, and their pre-fetching parameter can be set
accordingly.  However, we will not consider such policies at first.

The default policy will be set to "infinity".  But, this parameter can
be set dynamically.  This parameter will be assigned to all queries.
Follow-Ups:
- hpss log files from the two tape drive test
  - From: dave zimmerman <zimm@rncus1.lbl.gov>
Prev by Date: GC phone mtg, 5 Nov 98
Next by Date: FW: Multi-component design document
Prev by thread: GC phone mtg, 5 Nov 98
Next by thread: hpss log files from the two tape drive test
Index(es):
- Date
- Thread