[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Summary of tests



Below is a summary of the tests we ran over the last few days at BNL.
Each test has a summary of the setup followed by comments about the
outcome.  The comments include some preliminary observations that need
to be confirmed by the logs.

Arie.
-----------------------------------------------------------------------

TEST 1
--------------------
Fed: star
SII/UC/STAF: UC
Cache_size: 1 GB
Query: queres.all4 (four queries)
Proc.time: 0.01 (except one)
Policy: yes
Cycle: 10
Comments:
1. This test was run overnight as a robustness
   test for entire system including UC.  It ran
   for 10 hrs, and completed.
2. Note: Yuri ran his jobs during the last 3 hours.

TEST 3
--------------------
Fed: star
SII/UC/STAF: SII
Cache_size: 2 GB
Query: query.q1.twice
Proc.time: 0.01
Policy: no
Cycle: 1
Comments:
1. This test was an attempt to run the same query
   twice with a 30 min time delay without caching
   coordination.  The purpose was to show that no
   coordination causes many files to be read twice.
2. This is when we noticed that the time we are
   observing included wait time until drive was
   available.  So we decided to serialize the PFTP
   queries in the next test.

TEST 3
--------------------
Fed: star
SII/UC/STAF: SII
Cache_size: 1 GB
Query: query.q1, delay, tehn query.q1
Proc.time: 0.01
Policy: no
Cycle: 1
Comments:
1. This test was a second attempt to run the same query
   twice with time delay without caching
   coordination.
2. We ran this test serially scheduling PFTPs.
   This is when we noticed that the tape was removed and
   mounted for every file, even if the file is on the same
   tape.  We later learned that this is because the dismount
   time on hpss was set to 15 sec.  So serializing PFTP was
   bad since there was no pending PFTP from the same tape,
   and it dismounted unnecessarily.
3. We decided to abandon serialization of PFTPs.

TEST 4
--------------------
Fed: star
SII/UC/STAF: SII
Cache_size: 1 GB
Query: query.cluster
Proc.time: 10
Policy: yes
Cycle: 1
Comments:
1. This test was made to show the benefits of clustering.
   We made a query that will hold the same number of event
   as query1, but in 4 files only.  The benefit was several
   fold speedup, even when the processing time per event
   is large (10 sec), and thus making the processing time
   dominate the caching time.
2. We noticed that even with one PFTP pending (i.e. one cache
   ahead request) we still got tapes dismounted unnecessarily
   even if the next file was on the same tape.  We think this
   happens if the transfer time between the hpss cache and local
   cache is longer than transfer time of the next PFTP file to
   hpss cache.  We verified that if we have many (more than 2)
   PFTPs pending from the same tape, the tape does not dismount.


TEST 5
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query6 (six queries)
Proc.time: 20 (except one)
Policy: yes
Cycle: 1
Comments:
1. This run was run overnight as a robustness
   test.  All 6 queries provided by Dave Z.
   completed.

TEST 6
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query7
Proc.time: 20 sec
Policy: yes
Cycle: 1
Comments:
1. This run is the first part of testing the
   effect of clustering.  This run is not clustered.
2. HPSS_cache did not empty - this test is not useful
   for measurement, but it shows that system ran OK.
3. Query7 is a modified Query2 to have more
   events (131 instead of 66)
4. We asked for purge policy to be changed to:
   purge within 5 min, when cache is 1% full (to guarantee
   that hpss_cache empties.)

TEST 7
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query7
Proc.time: 20 sec
Policy: yes
Cycle: 1
Comments:
1. This run is the first part of testing the
   effect of clustering.  This run is not clustered.
   This test is valid.
2. This is the same as the previous run, but
   this time hpss_cache was properly emptied.
3. Query7 is a modified Query2 to have more
   events (131 instead of 66)
4. Note: This test should be dominated by caching
   time, even though proc_time is large, because
   there are few relevant events per file.
5. Note: Total time was about 2 hrs (check)
6. Note: We observed files tapes 43 and 44 switching
   several times, inducing long delays.
   e.g. switch between files 51 and 172, then
   again on the next file. Also at 11:31 switch to 44,
   and 11:33 switch back to 43.
7. Note: 2 files were missing, 103 and another (see log).

TEST 8
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query8
Proc.time: 20 sec
Policy: yes
Cycle: 1
Comments:
1. This run is the second part of testing the
   effect of clustering.  This run is clustered.
   This test is valid.
2. The query used here was selected to have the
   same number of events as the previous query
   (131 events) but were concentrated in 4 files
   (see queries.stat for distribution)
3. Note: This test should be dominated by processing
   time, because there are about 33 events per file
   on the average, and processing time is 20 sec/event
   (or about 600 sec per file).
4. Note: Total time was about 40 min, (check log)
   as opposed to 2 hours for the unclustered case.

TEST 9
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 1 GB
Query: query8
Proc.time: 1 sec
Policy: yes
Cycle: 1
Comments:
1. This run is the third part of testing the
   effect of clustering.  This run is clustered,
   but the processing time was set to 1 second.
   This test is valid.
2. As in the previous run,
   the query used here was selected to have the
   same number of events as the previous query
   (131 events) but were concentrated in 4 files
   (see queries.stat for distribution)
3. Note: This test should be dominated by cache
   time, because there are about 33 events per file
   on the average, but processing time is only 1
   sec/event (or about 33 sec per file).
4. Note: Total time was about 15 min (check log)
   as opposed to 40 min for the previous run
   (with proc. time 20 sec) and as opposed to
   2 hours for the unclustered case.

TEST 10
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 250 MB
Query: query8
Proc.time: 1 sec
Policy: no
Cycle: 1
Comments:
1. This test was not run to completion.
2. This run was supposed to be the first part of
   a test to show the value of caching coordination.
   It was set up as the same query run twice with a
   delay of 10 min.  The idea is that by the time the
   second query starts the first file is removed from
   the cache.  Since there is no policy, the files
   will not be synchronized, and will be cached twice.
3. This test got as far as asking to cache the first
   file a second time, and then the QM got stuck because
   of a race condition that was not anticipated.
4. The second part of this test was not conducted.
   Alex was contacted, and he sent a fix.  The plan is
   to run this test again on Oct 5th (tomorrow) on
   when we switch over to STAR.

TEST 11
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 2 GB
Query: query10
Proc.time: 0.01 sec
Policy: yes
Cycle: 1
Comments:
1. The purpose of this test was to check the
   efficiency of accessing objectivity.  It was
   setup as 3 steps:
   a) Make the cache size large enough to hold
      10 file, and cache them.
      Selected files and sizes were (sizes provided
      by Dave Z. in parenthesis):
       98(194), 94(194), 71(182), 57(173), 39(156)
       73(128), 72(148), 58(155), 47(102), 41(119)
      In total the query seleced 500 events out of
      these files.
   b) Run a test with SII only, running this query
      10 times, were processing time is very small.
      No caching takes place since all the files were
      left in the cache.  Thus 5000 events are "retrieved".
   c) Run the same test with UC.  This will make objectivity
      get 5000 events from 10 files.
2.  This run is only the cache loading setup.
    It ran successfully.

TEST 12
--------------------
Fed: Phenix
SII/UC/STAF: SII
Cache_size: 2 GB
Query: query10
Proc.time: 0.01 sec
Policy: yes
Cycle: 10
Comments:
1. This is step b) of the test, running SII -
   see previous run
2. Processing time was 32 sec (as expected)
   since all the files were in cache.

TEST 13
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 2 GB
Query: query10
Proc.time: 0.01 sec
Policy: yes
Cycle: 10
Comments:
1. This is step c) of the test with UC (i.e.
   Objectivity) - see previous runs
2. Processing time was 130 sec
   since all the files were in cache.
3. This is only 100 sec to access 5000 events
   (pretty good!).
4. A more interesting test will be with UC running
   on a linux machine over the net.  Then we will test
   the cost of transfer time over the net.

TEST 14
--------------------
Fed: Phenix
SII/UC/STAF: STAF
Cache_size: 1 GB
Query: query7
Proc.time: determined by UC,
           estimated 30 sec/event
Policy: yes
Cycle: 10
Comments:
1. This was a "robustness" test for STAF codes.
2. HPSS came down in the middle of the test,
   and a cetain file could not be PFTP'd.  The CM
   and QM repeatedly issued the query, until HPSS
   came back up, then continued properly.
3. Test was completed.

TEST 15
--------------------
Fed: Phenix
SII/UC/STAF: UC
Cache_size: 2 GB
Query: Queries11 (3 queries), then each individually.
Proc.time: 1
Policy: yes
Cycle: 1
Comments:
1. This was a combined test to be run overnight
   to test caching coordination effectiveness.
   3 queries were chosen so that they mutually
   overlap by 50% in terms of files they use.
   (each accesses 8 file.  each pair has 4 overlapping
    files).
2. First all three were run together.  Then each was
   run individually.
~