Components of the Storage Manager and interfaces to it

Doug Olson, Craig Tull, and Arie Shoshani
updated 14 Oct 1997

 

The Storage Manager (SM) has three components: the Query Estimator (QE), the Cache Manager (CM), and the Query Monitor (QM).

The Query Estimator responds to an inquiry to estimate a query, and returns an estimate of the number of events that qualify by the query, the number of cells (files) that will have to be read, and an estimate of time given that none of the cells are cached. The Cache Manager determines which cells to move to the cache and which to remove from the caches. The Query Monitor executes and monitors the progress of a query. It keeps track of the cells that were cached for each query and which still have to be cached.

Mode of operation: to cache entire cells (not only events that qualify in each cell). Thus, cells can be shared by multiple analysis programs.

Discussion: We envision the interaction between the Analysis Framework and the Storage Manager to proceed as follows. At first, a Query Estimation request is sent to the Query Estimator, and a response with the estimation is returned. As a result, additional query estimation requests may be made, until a decision to execute a query is made. When a Query Execution request is made to the Storage Monitor it creates a Cache Map entry for this request, and makes a decision which cells to cache. The Query Monitor uses a caching policy that can be tuned (see below). It makes a request to the Cache Manager to cache the cells. The cache manager tries to optimize the request execution, by maximizing the number of cells that are read from the same tape. When a cell is cached the Cache Manager notifies to two components of the Analysis Framework: it notifies the Object Manager which cell was cached, and it’s location; and it notifies the Event Iterator which cell was cached and offsets within the cell for events that qualifies for the query. When the Event Iterator is finished with a cell it notifies the Query Monitor that it is done. The Event Monitor may schedule another cell to be cached if necessary.

Handling interruptions: In order to avoid ineffective use of the cache because of unexpected interruptions of processes, a time-out mechanism is employed. For example, suppose that a process is interrupted, or inadvertently discontinued. There is a danger that the cells cached on behalf of this process will never be removed. Therefore, a time-out mechanism is used, where the Query Monitor can sent an inquiry to the Event Iterator if it is alive, and whether it is done with the cell. If there is no answer (after a set time), the process is presumed dead, and all cells associated with it are removed (provided no other processes use them). To accommodate processes that have been intentionally interrupted and are not dead (for example, an analyst may suspend the program to evaluate partial results), there is a mechanism to ask that a cell be re-cached. This implies that the Query Monitor will maintain the cache map entries for a non-responsive process for a period of time (set by the system administrator).

Logs: The entries in the cache maps will be maintained in a log so analysis of the effectiveness of usage of the system can be monitored. It is beyond the scope of the Storage Manager to perform such analysis, but the logs can be made available to external monitoring programs.

Caching policy: The caching policy determines which cells to cache on behalf of a query request. When the cache is shared, it is not wise to cache all the cells requested right away. One can use a caching policy that only n cell are cached, and only after a notification is made that a process is finished with a cell, another cell is cached. For example, if n=2, 2 cells are cached and when the first is read, it can be purged if not need by another process, and a third cell is moved to the cache. The caching policy can schedule the caching of cells dynamically according to shared access or other criteria (For example, a cell that is requested by several processes has priority over others). The caching policy can also include other provisions, such as pre-empting cached cells, when more urgent priorities arise.

Message passing: The necessary communication messages between the Analysis Environment and the Storage Manager are described next. There is a total of 7 such message. These are shown schematically in the enclosed figure.

  1. A request from the Query Factory (or other interfaces) to estimate a query.
  • -- information passed: a "range query" – a conjuctive set of ranges over the event properties.

    -- response: no_of_events, no_of_cells, total_MBs_to_be_moved, %_of_events_in_cells (that qualify for the query), no_of_events_in_cache, time_to_process_query .

    There are 2 levels of responses:

    1. Quick response rounds ranges of the query to the nearest bin-boundaries of event properties (in this case the no_of_events is given in terms of a (min, max) rounding both up and down to bin boundaries).
    2. Slower response gives answer to the exact ranges of the query if not on bin boundaries.
    1. A request from Query Factory to execute a query
    2. -- information passed: is a "range query" – a conjunctive set of ranges over the event properties.

      -- response: no-of-events, no-of-cells, total-MBs-to-be-moved, %-of-events-in-cells (as above), but in addition the list-of-cells.

    3. A notification from the Cache Manager to the Object Manager that a cell was cached or removed form the cache.
    4. -- information passed: the cached cell-id, location-in-cache, cached/removed.

      -- response: acknowledgement

    5. A notification from the Cache Manager to the Event Iterator that a cell was cached.
    6. -- information passed: the cached cell-id, a last-cell flag (which indicates whether this is the last cell cached in response to the query) and a vector of {0,1} to indicate events that qualify (or a list of offset addresses for these events).

      -- response: acknowledgement

    7. A notification from the Event Iterator to the Query Monitor that it finished reading a cell or is aborting.
    8. -- information passed: cell-id / all

      -- response: acknowledgement

    9. A inquiry from the Query Monitor to the Event Iterator if done with a cell.
    10. (for executing time outs for non-responsive processes)

      -- information passed: cell_ID

      -- response: yes/no

    11. A request from the Event Iterator to the Query Monitor to re-cache a cell.
  • (This is in case that the event monitor preempts or times-out a cell and removes it from cache)

    -- information passed: cell_ID

    -- response: acknowledgement