[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

one more task on to do list



There is one more task that I di d not put on the todo list.
Enclosed is the updated list, with the additional task (no. 4)
added.  It effect mainly Henrik, but Dave Zimmerman's help is
needed, too.

Arie.
-----------------------------------------------------------------


1.  Cache status initialization.  

When the Storage Manager is initialized, the QM will initialize its
cache status, and the CM will initialize its availale cache status.

To do that, the a new method will be added between the QM and the CM,
asking "what's in the cache".  The response is a list of FIDs.

The CM will check what's in the cache, get the names and sizes, and
go to Objectivity to obtain the FIDs.  It will then update its
"available cache", and return the list of FIDs to the QM.

People involved: Luis, Alex.


2.  Get "what's in cache for this query" for estimation.

The QO can at any time ask for "query estimation".  At that time, the
QE will request "what's in cache for this query" from the QM.  The QE
passes to FID list, and a query token.  

The QM returns 2 lists:
1) list of files in cache, and 2) list of files to be cached.
If query is not in execute status (i.e it was not launched yet), the 
above lists are "what's in cache" and "the rest".  If the query was
launced, all the files that were "done" are first removed, and then
the above lists are "what's in cache" and "the rest".

What's needed: 
1) One more estimation variable: "what's in cache".
2) One more method between from the QE to QM, and assoiated variable.

People involved: Henrik, Alex.


3.  Add estimation varialbles

Two variables were recommended: 
1) "time remaining to process query", and
2) "clustering effciency" of the query.

For 1) the estimate is for moving data to cache only (no processing
or objectivity retreival included).  for MDC1, we assume a constant
parameter.  This estimate assumes no competition from other queries,
and all files are cached sequentially (no parallel caching from
HPSS).  Thus, it is "best case estimate without parallel caching".
Thus, all that will be calculated is: (number of files to be
cached)x(time to cache a file).

For 2) the estimate gives the ratio of "number of events that qualified
for this query" to "number of events that have to be moved to cache"
(regardless of what's in the cache).  For this purpose the QE will
have to maintain "total number of events" per file.

People involved: Henrik.

4.  Maintain separate file IDs in addition to event OIDs.

In MDC1, the event headers will be stored in cache, while other
objects associated with events will be stored on tape.  Thus, it is
not possible to extract the FIDs from the event OIDs.  The effect is
that the bit-sliced index and the tag index will now include OIDs and
FIDs.  These additional FIDs will be extracted from the tagDB.

This change only affects the QE, but the file extracted from the tagDB
will have to include the FIDs.

Note: in the future there will be multiple files associated with each
event (e.g. one for tracks, one for vertices, etc.)

People involved: Henrik and Dave Zimmerman