From: Arie Shoshani [shoshani@lbl.gov]
Sent: Monday, April 12, 1999 1:19 PM

RUN Query,

Code

used

purpose

#bundles X

comp = files

policy

on

Cache

Size

(GB)

Proc

Time

(sec)

Total

Run time (sec)

comments

1

SII,Q1

Robustness, small files 32 x 4 = 128

yes

5

0.1

3167

OK. Could benefit from more pre-fetching.

2

SII,Q2

Robustness,

Large files

15 x 4 = 60

yes

5

0.1

12086

OK. But, interrupted by HPSS down twice (40 min + 10 min)

3

SII,Q3

Overlap (same

query twice )

12 x 3 = 36

Yes

6

0.1

3316

NOT OK. 15 min delay between queries. Policy on. Problem: lock on staged file released too early. FIXED

4

SII,Q3

Overlap (same

query twice )

12 x 3 = 36

NO

6

0.1

5925

NOT OK. 15 min delay between queries. Policy off. Problem: Second query got serviced more frequently out of turn. Skipping queries worked incorrectly. FIXED.

5

SII,Q4

3 queries, 50%

overlap bundles

15 x 3 = 45

Yes

12

0.1

6208

NOT OK. Problem: no purging occurred, bug in cache size setup. FIXED

5a

SII,Q4

3 queries, 50%

overlap bundles

15 x 3 = 45

Yes

12

0.1

6615

NOT OK. Problem: made a bundle request before previous bundle finished processing. Bug found. FIXED

6

SII,Q4

3 queries, 50%

overlap bundles

15 x 3 = 45

NO

12

0.1

6411

NOT OK. Problem: 3 extra files requested at end of run and not pushed to SII. Not traced, but believed related to previous bug. Need to check over.

7

SII,Q5

Robustness,

overnight

102 x 6 =612

total (20 Q)

Yes

12

0.1

?

NOT OK. 20 queries, 10 min apart, 10 cycles. Problem: QM crashed early. Traced to earlier bug. FIXED.

8

SII, Q7

1 drive 4 x 7 = 28

Yes

80

0.1

3865

OK. Bundles selected with components on 5 tape; 7 components

9

SII, Q7

2 drives 4 x 7 = 28

Yes

80

0.1

2200

OK. Same as above. Big time improvement!!

10

SII, Q7

3 drives 4 x 7 = 28

Yes

80

0.1

20

OK. Same as above, but forgot to empty cache

11

SII, Q7

3 drives 4 x 7 = 28

Yes

80

0.1

1993

OK. Small time improvement. Low pending PFTPs, not enough files requested between bundle processing. More pre-fetching could help. More parallel queries could also show speedup.

12

SII, Q7

4 drives 4 x 7 = 28

Yes

80

0.1

1699

OK. Same as above. Small gain.

13

SII, Q7

5 drives 4 x 7 = 28

Yes

80

0.1

1432

OK. Same as above. Small gain.

14

UC,Q4

Run UC code 15 x 3 = 45

Yes

12

0.1

short

NOT OK. UC failed to find files (e.g. requested a Hit comp instead of raw)

15

SII, Q1

Large Proc time 32 x 4 = 128

yes

5

5

~2 hrs

OK. We stopped run after 15 bundles – nothing more to be learned.

16

SII,Q5

Robustness,

overnight

Total:

102 x 6= 612

Yes

12

0.1

~1 hr

NOT OK. 20 queries scheduled. Crashed after 2 queries completed.

Traced to not locking query queue in QM. FIXED

17

SII,Q1+Q2

Robustness,

overnight

32 x 4 = 128

15 x 4 = 60

Yes

12

0.1

~1 hr

NOT OK. Crashed after only 1 query completed. Same as above. FIXED

18

SII,Q1

Robustness,

overnight

15 x 4 = 60

Yes

12

0.1

~1.5 hrs

NOT OK. Crashed after only 1 query completed. Same as above. FIXED

19

SII,Q3

Repeat of Run 3 with policy 12 x 3 = 36

Yes

6

0.1

4023

OK. It was slower than run 3 since network was slower today.

20

SII,Q3

Repeat of Run 4

no policy

12 x 3 = 36

NO

6

0.1

7900

OK. One bundle was delayed unnecessarily; Needs to be fixed. We adjusted total time to compensate for that.

21

UC,

Q7a

Trying again UC code 4 x 4 = 16

Yes

6

0.1

302

OK. We picked a small query, to verify it works. Same as Q7, but only 4 comp.

22

SII,Q12

Cache starving

5 queries

3 x 3 = 9

each

Yes

1

1

982

OK. 5 queries requested 6 files each.

30 PFTP pending – worked OK.

23

SII, Q8

Stress test

99 queries

3 x 3 = 9

Yes

80

0.1

--

NOT OK. Only 3 queries different queries, repeated 30 times. Failed: too many remote windows. Need to set to "no window".

24

SII, Q8

Stress test

39 queries

3 x 3 = 9

Yes

80

0.1

--

NOT OK. Only 3 queries different queries, repeated 13 times. Failed: QM crashed on rmds03. Believed to be a patch level problem. Need to verify.

25

SII, Q8

Stress test

39 queries

3 x 3 = 9

Yes

80

0.1

802

OK. Only 3 queries different queries, repeated 13 times. QM ran OK at LBNL

26

SII, Q8

Stress test (try

rmds03 again)

3 x 3 = 9

Yes

80

0.1

--

NOT OK. Repeat of run 24. Failed: QM crashed on rmds03.

27

SII,Q9

Overlap 3 query components 4 x 2 = 8

x 3 queries

Yes

5

0.1

2737

OK. It ran slower than no policy because pre-fetching hogged the small cache. Need to discuss this policy.

28

SII,Q9

Overlap 3 query components 4 x 2 = 8

x 3 queries

NO

5

0.1

2349

OK. This query is optimal for No Policy.

29

SII,Q11

Short query

Stand-alone

4 x 3 = 12

small files

Yes

12

0.1

375

OK. This short query will be "injected" 4 times into a large background query.

30

SII,Q10

Long background 36 x 7 = 252

x 2 queries

Yes

12

1

7883

OK for the test. This large Background query ran fine and killed after experiment was done. BUT, second query did not find files it needed. Problem traced to HPSS limit on PFTPs. Need to fix problem.

30a

SII,Q11

Short query

Injected 1

4 x 3 = 12

small files

Yes

12

0.1

530

OK. 1st injection. Note: 40% slower than stand-alone

30b

SII,Q11

Short query

Injected 2

4 x 3 = 12

small files

Yes

12

0.1

643

OK. 2nd injection. Note: 70% slower than stand-alone

30c

SII,Q11

Short query

Injected 3

4 x 3 = 12

small files

Yes

12

0.1

41

OK. 3rd injection. Note: we waited only a short time, and all files were in cache

30d

SII,Q11

Short query

Injected 3

4 x 3 = 12

small files

Yes

12

0.1

548

OK. 4th injection. Note: 45% slower than stand-alone

31

Linux-SII, Q7

Check running

From Linux

4 x 7 = 28

Yes

12

0.1

---

NOT OK. Failed because Objectivity lockserver was down. Ran until Objectivity needed

34

SII, Q5

Robustness

overnight

102 x 6 =612

total (20 Q)

Yes

12

0.1

---

NOT OK. Could not run, because Objectivity lockserver was down.