From: Arie Shoshani [shoshani@lbl.gov]
Sent: Monday, April 12, 1999 1:19 PM
RUN | Query, Code used |
purpose |
#bundles X comp = files |
policy on |
Cache Size (GB) |
Proc Time (sec) |
Total Run time (sec) |
comments |
1 |
SII,Q1 |
Robustness, small files | 32 x 4 = 128 | yes |
5 |
0.1 |
3167 |
OK. Could benefit from more pre-fetching. |
2 |
SII,Q2 |
Robustness, Large files |
15 x 4 = 60 | yes |
5 |
0.1 |
12086 |
OK. But, interrupted by HPSS down twice (40 min + 10 min) |
3 |
SII,Q3 |
Overlap (same query twice ) |
12 x 3 = 36 | Yes |
6 |
0.1 |
3316 |
NOT OK. 15 min delay between queries. Policy on. Problem: lock on staged file released too early. FIXED |
4 |
SII,Q3 |
Overlap (same query twice ) |
12 x 3 = 36 | NO |
6 |
0.1 |
5925 |
NOT OK. 15 min delay between queries. Policy off. Problem: Second query got serviced more frequently out of turn. Skipping queries worked incorrectly. FIXED. |
5 |
SII,Q4 |
3 queries, 50% overlap bundles |
15 x 3 = 45 | Yes |
12 |
0.1 |
6208 |
NOT OK. Problem: no purging occurred, bug in cache size setup. FIXED |
5a |
SII,Q4 |
3 queries, 50% overlap bundles |
15 x 3 = 45 | Yes |
12 |
0.1 |
6615 |
NOT OK. Problem: made a bundle request before previous bundle finished processing. Bug found. FIXED |
6 |
SII,Q4 |
3 queries, 50% overlap bundles |
15 x 3 = 45 | NO |
12 |
0.1 |
6411 |
NOT OK. Problem: 3 extra files requested at end of run and not pushed to SII. Not traced, but believed related to previous bug. Need to check over. |
7 |
SII,Q5 |
Robustness, overnight |
102 x 6 =612 total (20 Q) |
Yes |
12 |
0.1 |
? |
NOT OK. 20 queries, 10 min apart, 10 cycles. Problem: QM crashed early. Traced to earlier bug. FIXED. |
8 |
SII, Q7 |
1 drive | 4 x 7 = 28 | Yes |
80 |
0.1 |
3865 |
OK. Bundles selected with components on 5 tape; 7 components |
9 |
SII, Q7 |
2 drives | 4 x 7 = 28 | Yes |
80 |
0.1 |
2200 |
OK. Same as above. Big time improvement!! |
10 |
SII, Q7 |
3 drives | 4 x 7 = 28 | Yes |
80 |
0.1 |
20 |
OK. Same as above, but forgot to empty cache |
11 |
SII, Q7 |
3 drives | 4 x 7 = 28 | Yes |
80 |
0.1 |
1993 |
OK. Small time improvement. Low pending PFTPs, not enough files requested between bundle processing. More pre-fetching could help. More parallel queries could also show speedup. |
12 |
SII, Q7 |
4 drives | 4 x 7 = 28 | Yes |
80 |
0.1 |
1699 |
OK. Same as above. Small gain. |
13 |
SII, Q7 |
5 drives | 4 x 7 = 28 | Yes |
80 |
0.1 |
1432 |
OK. Same as above. Small gain. |
14 |
UC,Q4 |
Run UC code | 15 x 3 = 45 | Yes |
12 |
0.1 |
short |
NOT OK. UC failed to find files (e.g. requested a Hit comp instead of raw) |
15 |
SII, Q1 |
Large Proc time | 32 x 4 = 128 | yes |
5 |
5 |
~2 hrs |
OK. We stopped run after 15 bundles – nothing more to be learned. |
16 |
SII,Q5 |
Robustness, overnight |
Total: 102 x 6= 612 |
Yes |
12 |
0.1 |
~1 hr |
NOT OK. 20 queries scheduled. Crashed
after 2 queries completed. Traced to not locking query queue in QM. FIXED |
17 |
SII,Q1+Q2 |
Robustness, overnight |
32 x 4 = 128 15 x 4 = 60 |
Yes |
12 |
0.1 |
~1 hr |
NOT OK. Crashed after only 1 query completed. Same as above. FIXED |
18 |
SII,Q1 |
Robustness, overnight |
15 x 4 = 60 | Yes |
12 |
0.1 |
~1.5 hrs |
NOT OK. Crashed after only 1 query completed. Same as above. FIXED |
19 |
SII,Q3 |
Repeat of Run 3 with policy | 12 x 3 = 36 | Yes |
6 |
0.1 |
4023 |
OK. It was slower than run 3 since network was slower today. |
20 |
SII,Q3 |
Repeat of Run 4 no policy |
12 x 3 = 36 | NO |
6 |
0.1 |
7900 |
OK. One bundle was delayed unnecessarily; Needs to be fixed. We adjusted total time to compensate for that. |
21 |
UC, Q7a |
Trying again UC code | 4 x 4 = 16 | Yes |
6 |
0.1 |
302 |
OK. We picked a small query, to verify it works. Same as Q7, but only 4 comp. |
22 |
SII,Q12 |
Cache starving 5 queries |
3 x 3 = 9 each |
Yes |
1 |
1 |
982 |
OK. 5 queries requested 6 files each. 30 PFTP pending – worked OK. |
23 |
SII, Q8 |
Stress test 99 queries |
3 x 3 = 9 | Yes |
80 |
0.1 |
-- |
NOT OK. Only 3 queries different queries, repeated 30 times. Failed: too many remote windows. Need to set to "no window". |
24 |
SII, Q8 |
Stress test 39 queries |
3 x 3 = 9 | Yes |
80 |
0.1 |
-- |
NOT OK. Only 3 queries different queries, repeated 13 times. Failed: QM crashed on rmds03. Believed to be a patch level problem. Need to verify. |
25 |
SII, Q8 |
Stress test 39 queries |
3 x 3 = 9 | Yes |
80 |
0.1 |
802 |
OK. Only 3 queries different queries, repeated 13 times. QM ran OK at LBNL |
26 |
SII, Q8 |
Stress test (try rmds03 again) |
3 x 3 = 9 | Yes |
80 |
0.1 |
-- |
NOT OK. Repeat of run 24. Failed: QM crashed on rmds03. |
27 |
SII,Q9 |
Overlap 3 query components | 4 x 2 = 8 x 3 queries |
Yes |
5 |
0.1 |
2737 |
OK. It ran slower than no policy because pre-fetching hogged the small cache. Need to discuss this policy. |
28 |
SII,Q9 |
Overlap 3 query components | 4 x 2 = 8 x 3 queries |
NO |
5 |
0.1 |
2349 |
OK. This query is optimal for No Policy. |
29 |
SII,Q11 |
Short query Stand-alone |
4 x 3 = 12 small files |
Yes |
12 |
0.1 |
375 |
OK. This short query will be "injected" 4 times into a large background query. |
30 |
SII,Q10 |
Long background | 36 x 7 = 252 x 2 queries |
Yes |
12 |
1 |
7883 |
OK for the test. This large Background query ran fine and killed after experiment was done. BUT, second query did not find files it needed. Problem traced to HPSS limit on PFTPs. Need to fix problem. |
30a |
SII,Q11 |
Short query Injected 1 |
4 x 3 = 12 small files |
Yes |
12 |
0.1 |
530 |
OK. 1st injection. Note: 40% slower than stand-alone |
30b |
SII,Q11 |
Short query Injected 2 |
4 x 3 = 12 small files |
Yes |
12 |
0.1 |
643 |
OK. 2nd injection. Note: 70% slower than stand-alone |
30c |
SII,Q11 |
Short query Injected 3 |
4 x 3 = 12 small files |
Yes |
12 |
0.1 |
41 |
OK. 3rd injection. Note: we waited only a short time, and all files were in cache |
30d |
SII,Q11 |
Short query Injected 3 |
4 x 3 = 12 small files |
Yes |
12 |
0.1 |
548 |
OK. 4th injection. Note: 45% slower than stand-alone |
31 |
Linux-SII, Q7 |
Check running From Linux |
4 x 7 = 28 | Yes |
12 |
0.1 |
--- |
NOT OK. Failed because Objectivity lockserver was down. Ran until Objectivity needed |
34 |
SII, Q5 |
Robustness overnight |
102 x 6 =612 total (20 Q) |
Yes |
12 |
0.1 |
--- |
NOT OK. Could not run, because Objectivity lockserver was down. |