-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLOG.txt
88 lines (69 loc) · 2.44 KB
/
LOG.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
27.1.15
=======
How to order events? -> std::queue?
What needs to happen?
- Read single event -> evbuffer
- Check if a thread is waiting
- if thread available, give it the evbuffer
- else (no thread available) read more events (-> queue)
- if queue not at some max size, get next event
What parts get threaded?
- EVIO -> no
- Decoder -> yes
- Detectors -> yes
- Tests/Cuts -> yes, I guess
- Output -> maybe ...
Details of EVIO/Decoder:
What about those event buffers?
I think EVIO maintains the buffer internally. It is overwritten for each new
event. At least we need to be prepared for that.
So -> Must copy each event buffer
Event sizes range up to a few 100kB -> few MB needed with 16 or 32 threads.
No problem.
These aren't reallocated with every event, only if necessary.
But we need time to copy the buffer for every event. This is a slowdown.
So the event queue needs to contain pointer to copied event buffers.
-------------------
Before I start hacking away, let's do some timing tests:
Compile with -O2 -DNDEBUG
Generate 1M events:
./generate -c2 -n1000000 test.dat
Analyze all, writing compressed output, all defined variables (detA.*, detB.*):
time ./ppodd -z test.dat
real 0m21.265s
user 0m20.635s
sys 0m0.105s
Save output of this as test-st-opt-reference.odat.gz.
For comparison: Debug code compiled with -g -O0:
real 0m25.018s
user 0m24.171s
sys 0m0.308s
Generated output is exactly identical to replay with optmizations:
md5sum test.odat.gz test-st-opt-reference.odat.gz
3a4fb242bcfe5fc54dd62240e5faa63a test.odat.gz
3a4fb242bcfe5fc54dd62240e5faa63a test-st-opt-reference.odat.gz
Not compressing the output: (back with optimizations):
time ./ppodd test.dat
real 0m7.842s
user 0m6.551s
sys 0m0.139s
(Debug code:
real 0m9.915s
user 0m8.394s
sys 0m0.166s
)
So, yeah, most time is spent in the compression part. This won't get much faster
with multithreading. So we should compare uncompressed replays.
To avoid problems with different compression parameters, let's make the
reference file: test-st-opt-reference.odat
Move the large *.dat and *.odat files to /aonl4/work1/ole/parallel/
---------------------
4.4.20
======
Some ideas from "C++ Concurrency in Action", 2nd ed.
shared_ptr in ConcurrentQueue?
try_pop in ConcurrentQueue?
end_of_data member in QueuingThreadPool, use it to end threads
(not optimal since the thread function isn't controlled by QueuingThreadPool)
Run both analysis and output threads in the same thread pool?
Consider a pipeline design?