[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
The indexes that the QE uses will need to be updated dynamically in an
append only fashion. Thus, new queries (or even running estimates on
the same old query) will then be dynamically updated. This will have
no effect on the execution of currently running queries.
4) Dynamic clustering
This refers to Dave Malonís suggestion that events may be
partitioned into multiple "streams" as they are generated. The
streams will be based on some predetermined property values of the
Given that dynamic indexing (item above) is implemented, then this
clustering method will have no additional effect on the Storage
Is that a realistic expectation? Would some clustered files exist for
5) Coordinated cache manager
This refers to the ability to support by a single cache manager file
caching requests from 2 processes: the Storage Manager (SM) and
Objectivity. Requests from the SM are for file caching and purging on
behalf of queries, and from Objectivity for satisfying Object
It is possible that at any one time one process or the other will
dominate the cache. Therefore some coordination is needed. A good
scheme should allocate resources to each as some preset ratio, but
permit the usage of unused cache by either process beyond it
allocation. The QM will have to know of files cached on behalf of
Objectivity, so that it can take advantage of that for queries.
Currently, we believe that we can take advantage of the work done by
SLACís oofs, and extend it with this capability.
We believe that it is worth trying to have this capability available
for MDC2, but are not sure that it can be achieved in this short time.
6) Caching policies
Current caching policies are common sense heuristics that are not
proven to be the most effective. We may want to experiment with
additional policies in MDC2.
Caching policies are only beneficial when file sharing between queries
are possible. The current policies include: 1) round robin service to
queries, 2) passing files to all EIs who need them as soon as such a
file is cached, 3) for a given query, select file with most events to
be cached first, and 4) removing files from cache on the basis of
oldest untouched file.
The choice of policies should be set up as parameters that can be
changed by a system administrator. For MDC2 several policies will be
chosen and experimented with.
7) Recovery from failures
The system design should include the goal of recovering from possible
failures in such a way that queries being interrupted, continue from
the state they were in. We need to develop all possible failure
scenarios, including single failures of components (user code, QO-EI,
SM, Objectivity, HPSS), as well as combinations of these. The
experience with MDC1 showed that with proper planning, we could
recover from certain HPSS failures. We should try and anticipate all
The information needed to restore the system to its state before the
failure should be dynamically stored in persistent storage, such as in
files or a database.
8) Performance optimization
We may want to plan some experiments to help improve performance.
This include the effectiveness of the QEís index compared to
objectivity indexes or other index methods, as well as caching
We may also want to identify potential bottlenecks in the system given
certain query load profiles.
9) Design of tests/measurements
During MDC1 many tests had to be designed dynamically as we discovered
various aspects of system behavior, such as control over HPSS cache
during tests and the parameter to set for the desired HPSS behavior.
We should use such knowledge to design ahead of time the experiments
we wish to carry out.
Although the primary goal of MDC tests are the robustness of the
software, we should again plan to have some controlled tests. We also
need to reevaluate the design of the logs generated during tests, as
well as how to visualize the logs. The design could include a dynamic
logs visualization tools, but that could be left as a goal beyond