The epoch design of this project is mostly refering to mes, the novelty is use pebs to construct the topology and calculate the hierachy latency based on this. See the talk
$ uname -a
Linux gpu01 5.19.0-29-generic #30-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 4 12:14:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ sudo apt install llvm-dev clang libbpf-dev libclang-dev libcxxopts-dev libfmt-dev librange-v3-dev
LOGV=1 ./CXL-MEM-Simulator -t ./microbench/many_calloc -i 5 -c 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
- -t Target: The path to the executable
- -i Interval: The epoch of the simulator, the parameter is in milisecond
- -c CPUSet: The core id to run the executable and the rest will be
setaffinity
to one other core - -d Dram Latency: The current platform's DRAM latency, default is 85ns
- -b, -l Bandwidth, Latency: Both use 2 input in the vector, first for read, second for write
- -c Capacity: The capacity of the memory with first be local, remaining accordingly to the input vector.
- -w Weight: Use the heuristic to calculate the bandwidth
- -o Topology: Construct the topology using newick tree syntax (1,(2,3)) stands for
1
/
0 - local
\
2
switch /
\
3
- env LOGV stands for logs level that you can see.
The pebs requires no larger than 5 perf_open_event
attached to certain PID, so I limit the bpf program to munmap(kprobe) and sbrk(kprobe/kretprobe), you can configure them. For multiple process application, I need to first SIGSTOP the process and send/recv
back the PID information. For client and server application, I need to SIGSTOP/SIGCONT on both client and server simultaneously, which is not implemented yet.
@article{yangyarch23,
title={CXLMemSim: A pure software simulated CXL.mem for performance characterization},
author={Yiwei Yang, Pooneh Safayenikoo, Jiacheng Ma, Tanvir Ahmed Khan, Andrew Quinn},
journal={arXiv preprint arXiv:2303.06153},
booktitle={The fifth Young Architect Workshop (YArch'23)},
year={2023}
}