[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Summary of tests
Below is a summary of the tests we ran over the last few days at BNL.
Each test has a summary of the setup followed by comments about the
outcome. The comments include some preliminary observations that need
to be confirmed by the logs.
Arie.
-----------------------------------------------------------------------
TEST 1
--------------------
Fed: star
SII/UC/STAF: UC
Cache_size: 1 GB
Query: queres.all4 (four queries)
Proc.time: 0.01 (except one)
Policy: yes
Cycle: 10
Comments:
1. This test was run overnight as a robustness
test for entire system including UC. It ran
for 10 hrs, and completed.
2. Note: Yuri ran his jobs during the last 3 hours.
TEST 3
--------------------
Fed: star
SII/UC/STAF: SII
Cache_size: 2 GB
Query: query.q1.twice
Proc.time: 0.01
Policy: no
Cycle: 1
Comments:
1. This test was an attempt to run the same query
twice with a 30 min time delay without caching
coordination. The purpose was to show that no
coordination causes many files to be read twice.
2. This is when we noticed that the time we are
observing included wait time until drive was
available. So we decided to serialize the PFTP
queries in the next test.
TEST 3
--------------------
Fed: star
SII/UC/STAF: SII
Cache_size: 1 GB
Query: query.q1, delay, tehn query.q1
Proc.time: 0.01
Policy: no
Cycle: 1
Comments:
1. This test was a second attempt to run the same query
twice with time delay without caching
coordination.
2. We ran this test serially scheduling PFTPs.
This is when we noticed that the tape was removed and
mounted for every file, even if the file is on the same
tape. We later learned that this is because the dismount
time on hpss was set to 15 sec. So serializing PFTP was
bad since there was no pending PFTP from the same tape,
and it dismounted unnecessarily.
3. We decided to abandon serialization of PFTPs.
TEST 4
--------------------
Fed: star
SII/UC/STAF: SII
Cache_size: 1 GB
Query: query.cluster
Proc.time: 10
Policy: yes
Cycle: 1
Comments:
1. This test was made to show the benefits of clustering.
We made a query that will hold the same number of event
as query1, but in 4 files only. The benefit was several
fold speedup, even when the processing time per event
is large (10 sec), and thus making the processing time
dominate the caching time.
2. We noticed that even with one PFTP pending (i.e. one cache
ahead request) we still got tapes dismounted unnecessarily
even if the next file was on the same tape. We think this
happens if the transfer time between the hpss cache and local
cache is longer than transfer time of the next PFTP file to
hpss cache. We verified that if we have many (more than 2)
PFTPs pending from the same tape, the tape does not dismount.
TEST 5
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query6 (six queries)
Proc.time: 20 (except one)
Policy: yes
Cycle: 1
Comments:
1. This run was run overnight as a robustness
test. All 6 queries provided by Dave Z.
completed.
TEST 6
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query7
Proc.time: 20 sec
Policy: yes
Cycle: 1
Comments:
1. This run is the first part of testing the
effect of clustering. This run is not clustered.
2. HPSS_cache did not empty - this test is not useful
for measurement, but it shows that system ran OK.
3. Query7 is a modified Query2 to have more
events (131 instead of 66)
4. We asked for purge policy to be changed to:
purge within 5 min, when cache is 1% full (to guarantee
that hpss_cache empties.)
TEST 7
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query7
Proc.time: 20 sec
Policy: yes
Cycle: 1
Comments:
1. This run is the first part of testing the
effect of clustering. This run is not clustered.
This test is valid.
2. This is the same as the previous run, but
this time hpss_cache was properly emptied.
3. Query7 is a modified Query2 to have more
events (131 instead of 66)
4. Note: This test should be dominated by caching
time, even though proc_time is large, because
there are few relevant events per file.
5. Note: Total time was about 2 hrs (check)
6. Note: We observed files tapes 43 and 44 switching
several times, inducing long delays.
e.g. switch between files 51 and 172, then
again on the next file. Also at 11:31 switch to 44,
and 11:33 switch back to 43.
7. Note: 2 files were missing, 103 and another (see log).
TEST 8
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query8
Proc.time: 20 sec
Policy: yes
Cycle: 1
Comments:
1. This run is the second part of testing the
effect of clustering. This run is clustered.
This test is valid.
2. The query used here was selected to have the
same number of events as the previous query
(131 events) but were concentrated in 4 files
(see queries.stat for distribution)
3. Note: This test should be dominated by processing
time, because there are about 33 events per file
on the average, and processing time is 20 sec/event
(or about 600 sec per file).
4. Note: Total time was about 40 min, (check log)
as opposed to 2 hours for the unclustered case.
TEST 9
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query8
Proc.time: 1 sec
Policy: yes
Cycle: 1
Comments:
1. This run is the third part of testing the
effect of clustering. This run is clustered,
but the processing time was set to 1 second.
This test is valid.
2. As in the previous run,
the query used here was selected to have the
same number of events as the previous query
(131 events) but were concentrated in 4 files
(see queries.stat for distribution)
3. Note: This test should be dominated by cache
time, because there are about 33 events per file
on the average, but processing time is only 1
sec/event (or about 33 sec per file).
4. Note: Total time was about 15 min (check log)
as opposed to 40 min for the previous run
(with proc. time 20 sec) and as opposed to
2 hours for the unclustered case.
TEST 10
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 250 MB
Query: query8
Proc.time: 1 sec
Policy: no
Cycle: 1
Comments:
1. This test was not run to completion.
2. This run was supposed to be the first part of
a test to show the value of caching coordination.
It was set up as the same query run twice with a
delay of 10 min. The idea is that by the time the
second query starts the first file is removed from
the cache. Since there is no policy, the files
will not be synchronized, and will be cached twice.
3. This test got as far as asking to cache the first
file a second time, and then the QM got stuck because
of a race condition that was not anticipated.
4. The second part of this test was not conducted.
Alex was contacted, and he sent a fix. The plan is
to run this test again on Oct 5th (tomorrow) on
when we switch over to STAR.
TEST 11
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 2 GB
Query: query10
Proc.time: 0.01 sec
Policy: yes
Cycle: 1
Comments:
1. The purpose of this test was to check the
efficiency of accessing objectivity. It was
setup as 3 steps:
a) Make the cache size large enough to hold
10 file, and cache them.
Selected files and sizes were (sizes provided
by Dave Z. in parenthesis):
98(194), 94(194), 71(182), 57(173), 39(156)
73(128), 72(148), 58(155), 47(102), 41(119)
In total the query seleced 500 events out of
these files.
b) Run a test with SII only, running this query
10 times, were processing time is very small.
No caching takes place since all the files were
left in the cache. Thus 5000 events are "retrieved".
c) Run the same test with UC. This will make objectivity
get 5000 events from 10 files.
2. This run is only the cache loading setup.
It ran successfully.
TEST 12
--------------------
Fed: Phenix
SII/UC/STAF: SII
Cache_size: 2 GB
Query: query10
Proc.time: 0.01 sec
Policy: yes
Cycle: 10
Comments:
1. This is step b) of the test, running SII -
see previous run
2. Processing time was 32 sec (as expected)
since all the files were in cache.
TEST 13
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 2 GB
Query: query10
Proc.time: 0.01 sec
Policy: yes
Cycle: 10
Comments:
1. This is step c) of the test with UC (i.e.
Objectivity) - see previous runs
2. Processing time was 130 sec
since all the files were in cache.
3. This is only 100 sec to access 5000 events
(pretty good!).
4. A more interesting test will be with UC running
on a linux machine over the net. Then we will test
the cost of transfer time over the net.
TEST 14
--------------------
Fed: Phenix
SII/UC/STAF: STAF
Cache_size: 1 GB
Query: query7
Proc.time: determined by UC,
estimated 30 sec/event
Policy: yes
Cycle: 10
Comments:
1. This was a "robustness" test for STAF codes.
2. HPSS came down in the middle of the test,
and a cetain file could not be PFTP'd. The CM
and QM repeatedly issued the query, until HPSS
came back up, then continued properly.
3. Test was completed.
TEST 15
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 2 GB
Query: Queries11 (3 queries), then each individually.
Proc.time: 1
Policy: yes
Cycle: 1
Comments:
1. This was a combined test to be run overnight
to test caching coordination effectiveness.
3 queries were chosen so that they mutually
overlap by 50% in terms of files they use.
(each accesses 8 file. each pair has 4 overlapping
files).
2. First all three were run together. Then each was
run individually.
~