[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MDC1 Tests



>  
>  -----Original Message-----
>  From:	David M. Malon [SMTP:malon@anl.gov]
>  Sent:	Wednesday, September 30, 1998 9:13 AM
>  To:	malon@dis.anl.gov
>  Subject:	MDC1 tests
>  
>  Enclosed is a summary of test we plan to run in the next few
>  days.  It represents discussions I recently had with Henrik,
>  Alex and Doug.
>  
>  Dave, Dave, and Jeff,
>  
>  Please take the time to look at that and give us comments
>  (if any) as soon as you can.  We'll start some of these tests
>  tomorrow.
>  
>  Arie.
>  ------------------------------------------------------------------
>  TESTS PLANNED FOR GC-MDC1
>  
>  To figure out the scale of the tests, we use the following
>  notation:
>  
>  N - total number of events in the system
>  S  - average size of event
>  F - number of files in the system
>  F_size - average size of each file
>  T_capacity - tape capacity in GB
>  N_tapes - number of tapes to store the files
>  N_event_per_file = F_size \ S
>  
>  For the tests we currently have:
>  
>  N = 5,000 - 10,000 (we use 10,000)
>  S  = 2 MB
>  F_size = 250 MB
>  F = 10,000*2 / 250 = 40
>  T_capacity = 25 GB
>  N_tapes = 1 (but data was spread out on 4 tapes)
>  N_event_per_file = 250 / 2 = 125
>  
   Some of these numbers are off;  F = 241, N_event_per_file = 7918/241 ~= 33. 
   It matters in a couple of places below.
   
   David Zimmerman is compiling F_size statistics and the like.
   
   The file sizes are wildly varying--some are >650MB, some <10 MB.  (My crude 
   statistics:  75% of files are <250MB,
   50%<90MB, 25%<31MB.)
   
   It will be CRUCIAL to log the actual file sizes and data quantities 
   delivered.  Note that a 2 GB cache will hold only
   3 of the largest DB files, but for some queries, 60 files may be cached at 
   once.
>  
>  Scale down factor:
>  
>  Assume a total of 15.10**6 event/year
>  fraction = 15.10**6 / 10,000 = 1,500 (we'll use 1000)
>  
>  Cache size for tests:
>  1000 GB /1000 = 1GB
>  (We'll use initially 2 GBs - or 8 files cache)
>  
>  Query size for tests:
>  In real situations we expect queries of size: 10,000 - 50,000 events
>  scale down by 1000 gives: 10 - 50 events
>  (We'll use initially 100 events to see effects with more files
>  cached)
   
   Why are we scaling by 10,000 sometimes and 1,000 other times?
   
>  TESTS SETUP
>  
>  To describe queries and their relationship we use the following
>  parameters:
>  
>  n - number of events that qualify for a query
>  f - number of files that contain the events that qualified
>  CF = clustering_factor = n / f = average number of events that qualify
>  per file
>                      (this factor is between 1 and N_event_per_file)
>  OF = overlap factor = % of files that overlap between any 2 queries on
>  the average
>                      (this factor is between 0 and 100%)
>  
>  Parameters we can set for each test:
>  
>  (Q1, Q2, ..., Qn) - ordered set of queries to run
>  PT - processing time per event (in seconds)
>  CT - time to cache a file (in seconds)
>  AT - arrival time: time delay between queries (in seconds)
>  NC - number of times to cycle over the set of queries
>  
>  example: to run the same query 5 times spaced every 10 sec, with
>  processing
>  time of 8 sec/event.
>  (Q1)
>  PT = 8
>  CT = F_size / stage speed + dismount/mount = 250 / 5 + 15 = 75 sec
                                                               ~~
                                                               65
>  AT = 10 sec
>  NC = 5
>  
>  BASELINE TESTS
>  
>  Baseline test checking are for getting minimum time per query, i.e.
>  running
>  the query by itself (stand alone).  This should help us determine if
>  the system
>  behaves as expected.
>  
>  TEST A.  make caching time dominate
>  
>  Assume a query has:
>  n - 100
>  f - 20
>  CF = n / f = 5
>  PT = 4 sec
>  PT_per_file = 4*5 = 20
>  
>  Since CT = 75, total time will be dominated by caching
>  We should expect:  20*75= 1500 sec = about 0.5 hour
>  
   
   I know this is just an example, but why not just make PT (nearly) 0?  It's 
   safer with the small databases
   (<10 MB), particularly if the tape is already mounted--otherwise, there is 
   a risk that PT will exceed caching time
   for some files.
   
>  TEST A1: run above for each query with SII
   
   Clear the disk cache between queries?  (Does this require relaunching the 
   CM?)  What about the HPSS disk
   cache?  If our purpose is to see whether the system behaves as expected, we 
   will need to know for each query which files and how much data were 
   retrieved from HPSS vs disk, for example.
   
>  TEST A2: run above for each query with full system
>      (if time is significantly > TEST A1, check transfer time
>        from Objectivity to User Code)
>  
>  TEST B.  Make processing time dominate
>  
>  Assume the same query as above, but increase processing time:
>  n - 100
>  f - 20
>  CF = n / f = 5
>  PT = 30 sec
>  PT_per_file = 30*5 = 150
>  
>  Since PT_per_file > CT, processing time dominates.
>  We should expect: 20*150= 3000 sec = about 1 hour
>  
>  TEST B1: run above for each query with SII
>  
>  TEST B2: run above for each query with full system
>  
>  
>  TEST C:  to see that all files are readable
>  
>  Use a query that reads all files;
>  Set PT very small = 0.1 sec, thus PT_per_file = 125*0.1 = 12.5 sec
>  Measure total time:
>  expect: 40*75 = 3000 sec
   
   Expect 241 files * 65 seconds if 250MB average is correct (but not every 
   file read requires a tape mount...)--closer to 4 hours?
   
>  TESTS THAT SHOW GAIN IN PERFORMANCE
   
   I'll provide comments about performance gain testing soon; I assume we'll 
   do the baseline tests first.
   
>  TEST D: to show effect of overlap
>  
>  Need: ability to turn off overlap policy by QM.
>             When this is turned off, the QM schedules file
>             according to the order they were submitted.
>             (another possibility to be determined later:
>               choose random order for file scheduling)
>  
>  Each test below is run with and without caching policy.
>  
>  TEST D1: full overlap
>  
>  Run a single query 10 time,
>  Vary arrival time (AT) till AT > PT_per_file
>  Measure effect with AT variation
>  
>  TEST D2: partial overlap
>  
>  Run all queries is some order;
>  Vary AT;  measure effect
>  Change queries to vary Overlap Factor (OF): 10%, 25%, 50%, 75%;
>        Keep AT short, measure effect
>  
>  
>  TEST E: to show effect of clustering
>  
>  TEST E1:  vary clustering factor (CF)
>  
>  Each test below should be run with caching policy on.
>  (We may want to try the same with caching policy off).
>  
>  Check CF of the given queries;
>  Change queries so that CF is doubled, tripled, etc.
>  leave arrival time small;
>  measure effect.
>  
>  TEST E2:  Vary AT with a few selected CFs
>  See effects.
>  
>  
>  TEST F: to show effect of order-optimization
>  
>  QE gives QO list of OIDs (full estimate).
>  QO scrambles order, and request each OID as a separate query.
>  Measure effect.
>  
>  
>  WHAT WILL BE  MEASURED
>  
>  As was mentioned in a previous message from Henrik, various things
>  will be logged.  Our goal is to detect any anomalies, and then measure
>  the effect of various tests in terms of total time to perform each
>  test,
>  and breakdown of times for various components of the system.
>  
>  Henrik covered in his message the measurements we expect to extract
>  from the logs.
>  ----------------------------------------------------------------------
>