-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Add CI workflow to run compute-benchmarks on incoming syclos PRs #14454
base: sycl
Are you sure you want to change the base?
Changes from 115 commits
1276f39
754c33a
9981c3a
3ed35ca
6ea0110
6d14a32
940e3be
7aafdf5
24b5169
b2d4463
5f1cd57
7a889f8
6e1b3bb
5d1dea4
991cd55
6e141e8
8a5ecb6
606c02a
5dd976e
6c83d2b
5266cac
ab17254
45a93ba
a4a1c03
52f5dc3
90575fe
8dfeff6
3346a09
3a28ded
e67c29f
88f8b3b
e6cbc2e
c6645aa
867e6e6
18ff8b1
cc29c23
3b5454f
6ed0361
f54e1d2
47693f9
f1d3a7f
652667f
be75574
1200217
97b2d4f
08da292
441bc10
8979115
7fa6a2e
1074e42
21498b4
8506c20
c2835cd
51cdaee
ef6a085
2fd6b7b
1d5c676
c20d75e
a9dad80
0c3e901
12326ce
4335bf9
ba29015
90fe17f
59e38fe
d212adc
321d83a
ba3c45c
81fb277
e8178c5
410666e
c32ad36
b5fa113
0ae396d
5d8f864
fda62fc
0a083b8
e58248d
8091ed0
d142575
c77f967
c04ccaa
e4897d5
a5b7e23
6a12cf4
f027f8e
08388fa
9f7b0ff
30bd28c
a3b0487
0e4bb7f
74bd73c
0282c0a
66a51f3
5173a0d
d5a9468
867fc5a
168325f
cf599c0
96ad8dd
a4f1d5e
7bf79fd
23a21e9
c1c7313
3bdf383
e9425f5
6e889d4
e158a70
bfedd09
d5bfa08
02428d6
54b8fd4
acd1931
38c9bed
6193059
fcbbe52
e5a12b8
cf886e9
e434d74
ec8b2f0
3936328
dff86d2
371097d
619c86b
792769e
ec369cd
3f1666f
07bca84
5d5c755
40e8b9b
7bf8043
3d366ae
98b1acf
4c13ae7
ba9da64
798e16b
0dea393
3252b59
37612f5
6f5074f
efef394
93456e0
625c72f
0faada5
c256b5c
ab29ffc
337c51e
cdea68d
beb2942
f4d8a3f
572ff7e
412449e
2951b37
bce5229
5c25a95
3e01431
82c1248
ff8675a
6751d2b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
name: Aggregate compute-benchmark averages from historical data | ||
|
||
# The benchmarking workflow in sycl-linux-run-tests.yml passes or fails based on | ||
# how the benchmark results compare to a historical average: This historical | ||
# average is calculated in this workflow, which aggregates historical data and | ||
# produces measures of central tendency (median in this case) used for this | ||
# purpose. | ||
|
||
on: | ||
workflow_dispatch: | ||
inputs: | ||
cutoff_timestamp: | ||
description: | | ||
Timestamp indicating the age limit of data used in average calculation: | ||
Any benchmark results created before this timestamp is excluded from | ||
being aggregated. | ||
|
||
Any valid date string supported by GNU coreutils is valid here: | ||
https://www.gnu.org/software/coreutils/manual/html_node/Date-input-formats.html | ||
type: string | ||
required: false | ||
workflow_call: | ||
inputs: | ||
cutoff_timestamp: | ||
type: string | ||
required: false | ||
|
||
permissions: | ||
contents: read | ||
|
||
jobs: | ||
aggregate: | ||
name: Aggregate average (median) value for all metrics | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
with: | ||
path: llvm | ||
sparse-checkout: | | ||
devops/scripts/benchmarking | ||
- name: Load benchmarking configuration | ||
run: | | ||
CONFIG_FILE="$GITHUB_WORKSPACE/llvm/devops/scripts/benchmarking/benchmark-ci.conf" | ||
|
||
# Load default values from configuration file | ||
. "$GITHUB_WORKSPACE/llvm/devops/scripts/benchmarking/utils.sh" | ||
# utils.sh contains functions to sanitize config file settings | ||
load_single_config $CONFIG_FILE PERF_RES_GIT_REPO | ||
load_single_config $CONFIG_FILE PERF_RES_BRANCH | ||
load_single_config $CONFIG_FILE PERF_RES_PATH | ||
echo "PERF_RES_GIT_REPO=$PERF_RES_GIT_REPO" >> $GITHUB_ENV | ||
echo "PERF_RES_BRANCH=$PERF_RES_BRANCH" >> $GITHUB_ENV | ||
echo "PERF_RES_PATH=$PERF_RES_PATH" >> $GITHUB_ENV | ||
|
||
# Determine a "cutoff timestamp" used by the aggregator script | ||
# | ||
# This timestamp controls which historical results are used to compute | ||
# measures of central tendency: Any files timestamped *before* this time | ||
# will be *excluded* from the central tendency calculation. | ||
|
||
load_single_config $CONFIG_FILE TIMESTAMP_FORMAT | ||
echo "TIMESTAMP_FORMAT=$TIMESTAMP_FORMAT" >> $GITHUB_ENV | ||
if [ -z '${{ inputs.cutoff_timestamp }}' ]; then | ||
uditagarwal97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# No time given, use default time period from config file: | ||
load_single_config $CONFIG_FILE AVERAGE_CUTOFF_RANGE | ||
echo "CUTOFF_TIMESTAMP=$(date --date="$AVERAGE_CUTOFF_RANGE" +"$TIMESTAMP_FORMAT")" >> $GITHUB_ENV | ||
else | ||
# If the provided time is a valid GNU coreutils date string, convert | ||
# the time to our format: | ||
_converted_timestamp="$(date --date '${{ inputs.cutoff_timestamp }}' +"$TIMESTAMP_FORMAT" 2> /dev/null)" | ||
if [ -n "$_converted_timestamp" ]; then | ||
echo "CUTOFF_TIMESTAMP=$_converted_timestamp" >> $GITHUB_ENV | ||
else | ||
# If not a valid GNU date string, it could be in our timestamp format already. | ||
# aggregate.py will ensure the timestamp is in the proper format, so we can pass the | ||
# time forward regardless: | ||
echo 'CUTOFF_TIMESTAMP=${{ inputs.cutoff_timestamp }}' >> $GITHUB_ENV | ||
fi | ||
fi | ||
- name: Checkout historical performance results repository | ||
run: | | ||
git clone -b $PERF_RES_BRANCH https://github.com/$PERF_RES_GIT_REPO $PERF_RES_PATH | ||
- name: Run aggregator on historical results | ||
run: | | ||
# The current format of the historical results respository is: | ||
# /<runner type>/<test case name> | ||
# Thus, a min/max depth of 2 is used to enumerate all test cases in the | ||
# repository. Runner type and testcase name is also extracted from this | ||
# path. | ||
for dir in $(find "$PERF_RES_PATH" -mindepth 2 -maxdepth 2 -type d ! -path '*.git*'); do | ||
uditagarwal97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
_runner="$(basename $(dirname $dir))" | ||
_testcase="$(basename $dir)" | ||
python llvm/devops/scripts/benchmarking/aggregate.py "$_runner" "$_testcase" "$CUTOFF_TIMESTAMP" | ||
done | ||
- name: Upload average to the repo | ||
env: | ||
GITHUB_TOKEN: ${{ secrets.LLVM_SYCL_BENCHMARK_TOKEN }} | ||
run: | | ||
# TODO -- waiting on security clearance | ||
cd "$PERF_RES_PATH" | ||
git config user.name "SYCL Benchmarking Bot" | ||
git config user.email "[email protected]" | ||
git add . | ||
git commit -m "[GHA] Aggregate median data from $CUTOFF_TIMESTAMP to $(date +"$TIMESTAMP_FORMAT")" | ||
git push "https://[email protected]/$PERF_RES_GIT_REPO.git" "$PERF_RES_BRANCH" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,7 +25,7 @@ on: | |
required: False | ||
tests_selector: | ||
description: | | ||
Two possible options: "e2e" and "cts". | ||
Three possible options: "e2e", "cts", and "benchmark". | ||
type: string | ||
default: "e2e" | ||
|
||
|
@@ -153,6 +153,7 @@ on: | |
options: | ||
- e2e | ||
- cts | ||
- benchmark | ||
|
||
env: | ||
description: | | ||
|
@@ -192,8 +193,14 @@ permissions: | |
packages: read | ||
|
||
jobs: | ||
benchmark_aggregate: | ||
if: ${{ inputs.tests_selector == 'benchmark' }} | ||
name: (Benchmark only) Aggregate benchmark data | ||
uses: ./.github/workflows/sycl-benchmark-aggregate.yml | ||
|
||
run: | ||
if: github.event_name == 'workflow_dispatch' || inputs.skip_run == 'false' | ||
if: ${{ always() && ( github.event_name == 'workflow_dispatch' || inputs.skip_run == 'false' ) }} | ||
needs: benchmark_aggregate | ||
name: ${{ inputs.name }} | ||
runs-on: ${{ fromJSON(inputs.runner) }} | ||
container: | ||
|
@@ -316,12 +323,12 @@ jobs: | |
fi | ||
|
||
- name: Download E2E Binaries | ||
if: inputs.e2e_binaries_artifact != '' | ||
if: inputs.tests_selector == 'e2e' && inputs.e2e_binaries_artifact != '' | ||
uses: actions/download-artifact@v4 | ||
with: | ||
name: ${{ inputs.e2e_binaries_artifact }} | ||
- name: Extract E2E Binaries | ||
if: inputs.e2e_binaries_artifact != '' | ||
if: inputs.tests_selector == 'e2e' && inputs.e2e_binaries_artifact != '' | ||
run: | | ||
mkdir build-e2e | ||
tar -I 'zstd' -xf e2e_binaries.tar.zst -C build-e2e | ||
|
@@ -389,25 +396,25 @@ jobs: | |
ninja -C build-cts -k0 $( [ -n "$CTS_TESTS_TO_BUILD" ] && echo "$CTS_TESTS_TO_BUILD" || echo "test_conformance") | ||
|
||
- name: Pack SYCL-CTS binaries | ||
if: always() && !cancelled() && inputs.cts_testing_mode == 'build-only' | ||
if: inputs.tests_selector == 'cts' && always() && !cancelled() && inputs.cts_testing_mode == 'build-only' | ||
run: tar -I 'zstd -9' -cf sycl_cts_bin.tar.zst -C ./build-cts/bin . | ||
|
||
- name: Upload SYCL-CTS binaries | ||
if: always() && !cancelled() && inputs.cts_testing_mode == 'build-only' | ||
if: inputs.tests_selector == 'cts' && always() && !cancelled() && inputs.cts_testing_mode == 'build-only' | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: sycl_cts_bin | ||
path: sycl_cts_bin.tar.zst | ||
retention-days: ${{ inputs.retention-days }} | ||
|
||
- name: Download SYCL-CTS binaries | ||
if: inputs.sycl_cts_artifact != '' | ||
if: inputs.tests_selector == 'cts' && inputs.sycl_cts_artifact != '' | ||
uses: actions/download-artifact@v4 | ||
with: | ||
name: ${{ inputs.sycl_cts_artifact }} | ||
|
||
- name: Extract SYCL-CTS binaries | ||
if: inputs.sycl_cts_artifact != '' | ||
if: inputs.tests_selector == 'cts' && inputs.sycl_cts_artifact != '' | ||
run: | | ||
mkdir -p build-cts/bin | ||
tar -I 'zstd' -xf sycl_cts_bin.tar.zst -C build-cts/bin | ||
|
@@ -427,7 +434,7 @@ jobs: | |
# these files may differ from each other, so when there is a pre-built set of | ||
# tests, we need to filter it according to the filter-file. | ||
- name: Filter SYCL CTS test categories | ||
if: inputs.sycl_cts_artifact != '' | ||
if: inputs.tests_selector == 'cts' && inputs.sycl_cts_artifact != '' | ||
shell: bash | ||
run: | | ||
cts_exclude_filter="" | ||
|
@@ -481,12 +488,40 @@ jobs: | |
|
||
exit $ret | ||
- name: Pack E2E binaries | ||
if: ${{ always() && !cancelled() && inputs.e2e_testing_mode == 'build-only'}} | ||
if: inputs.tests_selector == 'e2e' && always() && !cancelled() && inputs.e2e_testing_mode == 'build-only' | ||
run: tar -I 'zstd -9' -cf e2e_binaries.tar.zst -C ./build-e2e . | ||
- name: Upload E2E binaries | ||
if: ${{ always() && !cancelled() && inputs.e2e_testing_mode == 'build-only'}} | ||
if: inputs.tests_selector == 'e2e' && always() && !cancelled() && inputs.e2e_testing_mode == 'build-only' | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: sycl_e2e_bin_${{ inputs.artifact_suffix }} | ||
path: e2e_binaries.tar.zst | ||
retention-days: ${{ inputs.retention-days }} | ||
|
||
- name: Run compute-benchmarks | ||
if: inputs.tests_selector == 'benchmark' | ||
run: | | ||
export ONEAPI_DEVICE_SELECTOR="${{ inputs.target_devices }}" | ||
export CMPLR_ROOT=$PWD/toolchain | ||
sycl-ls | ||
./devops/scripts/benchmarking/benchmark.sh -t '${{ inputs.runner }}' -s | ||
- name: Push compute-benchmarks results | ||
if: inputs.tests_selector == 'benchmark' | ||
env: | ||
GITHUB_TOKEN: ${{ secrets.LLVM_SYCL_BENCHMARK_TOKEN }} | ||
run: | | ||
# TODO -- waiting on security clearance | ||
|
||
# Load configuration values | ||
. "./devops/scripts/benchmarking/utils.sh" | ||
CONFIG_FILE="./devops/scripts/benchmarking/benchmark-ci.conf" | ||
load_single_config "$CONFIG_FILE" PERF_RES_PATH | ||
load_single_config "$CONFIG_FILE" PERF_RES_GIT_REPO | ||
load_single_config "$CONFIG_FILE" PERF_RES_BRANCH | ||
|
||
cd "$PERF_RES_PATH" | ||
git config user.name "SYCL Benchmarking Bot" | ||
git config user.email "[email protected]" | ||
git add . | ||
git commit -m "[GHA] Upload compute-benchmarks results from ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}" | ||
git push "https://[email protected]/$PERF_RES_GIT_REPO.git" "$PERF_RES_BRANCH" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
import csv | ||
import sys | ||
from pathlib import Path | ||
import heapq | ||
import statistics | ||
|
||
import common | ||
|
||
|
||
# Simple median calculation | ||
class SimpleMedian: | ||
|
||
def __init__(self): | ||
self.elements = [] | ||
|
||
def add(self, n: float): | ||
self.elements.append(n) | ||
|
||
def get_median(self) -> float: | ||
return statistics.median(elements) | ||
|
||
|
||
# Calculate medians incrementally using a heap: Useful for when dealing with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i didnt look at these scripts in depth but my initial reaction is they seem really complicated and it seems like it would be really difficult for anyone else to debug them or extend them. is there no prebuild tool, either a linux program or a github action that we can rely on? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would more documentation help? I think a lot of complexity, specifically in aggregate.py, can be removed by simply removing StreamingMedian. However, median would then have to be calculated in a less efficient manner: I'm not sure how this scales with a ton of historical data. Although my short-term goal is for this to run nightly, a less efficient algorithm might not be feasible if e.g. this is to be put into precommit eventually. As for the complexity of the general project, I'm actually not sure if this is too simple -- The unified runtime team has a whole proper python program written for this here: https://github.com/oneapi-src/unified-runtime/tree/main/scripts/benchmarks. There is plans for me to integrate my solution into theirs when the UR pulldown happens, but the deadline for this workflow was unexpectedly changed to have a hard deadline for next week: I simply cannot wait any longer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If it would help, one option may be to use UR scripts in an intel/llvm workflow like this: Reach out to me if you want help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, if there's any urgent need to run those benchmarks, you can rely on the already set-up unified-runtime infrastructure. It's already possible to run the benchmark workflow with a specific intel/llvm commit and fork. We will just need to grant anyone who needs access to run those workflows (described here: https://github.com/oneapi-src/unified-runtime/tree/main/scripts/benchmarks#running-in-ci). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey thanks for reaching out @pbalcer! Currently it is possible to run this workflow manually to gather the data required, and that's what I've been relying on. With regards to merging this with the benchmarking infrastructure in the UR, the scope of this workflow has changed: Above running the benchmarks, we are also looking to instrument the benchmarks with a tracing system. This would add extra complexity to just simply running the benchmarks, and I haven't really mulled over how a merge with the UR infrastructure just yet, nevermind if you guys are open to having this in your infrastructure in the first place. I'll reach out to you in private once I have a better picture of how we are going to proceed with this. |
||
# large number of samples. | ||
# | ||
# TODO how many samples are we going to realistically get? I had written this | ||
# with precommit in mind, but if this only runs nightly, it would actually be | ||
# faster to do a normal median calculation. | ||
class StreamingMedian: | ||
|
||
def __init__(self): | ||
# Gist: we keep a minheap and a maxheap, and store the median as the top | ||
# of the minheap. When a new element comes it gets put into the heap | ||
# based on if the element is bigger than the current median. Then, the | ||
# heaps are heapified and the median is repopulated by heapify. | ||
self.minheap_larger = [] | ||
self.maxheap_smaller = [] | ||
|
||
# Note: numbers on maxheap should be negative, as heapq | ||
# is minheap by default | ||
|
||
def add(self, n: float): | ||
if len(self.maxheap_smaller) == 0 or -self.maxheap_smaller[0] >= n: | ||
heapq.heappush(self.maxheap_smaller, -n) | ||
else: | ||
heapq.heappush(self.minheap_larger, n) | ||
|
||
# Ensure minheap has more elements than maxheap | ||
if len(self.maxheap_smaller) > len(self.minheap_larger) + 1: | ||
heapq.heappush(self.minheap_larger, -heapq.heappop(self.maxheap_smaller)) | ||
elif len(self.maxheap_smaller) < len(self.minheap_larger): | ||
heapq.heappush(self.maxheap_smaller, -heapq.heappop(self.minheap_larger)) | ||
|
||
def get_median(self) -> float: | ||
if len(self.maxheap_smaller) == len(self.minheap_larger): | ||
# Equal number of elements smaller and larger than "median": | ||
# thus, there are two median values. The median would then become | ||
# the average of both median values. | ||
return (-self.maxheap_smaller[0] + self.minheap_larger[0]) / 2.0 | ||
else: | ||
# Otherwise, median is always in minheap, as minheap is always | ||
# bigger | ||
return -self.maxheap_smaller[0] | ||
|
||
|
||
def aggregate_median(runner: str, benchmark: str, cutoff: str): | ||
|
||
# Get all .csv benchmark samples for the requested runner + benchmark | ||
def csv_samples() -> list[str]: | ||
# TODO check that the path below is valid directory | ||
cache_dir = Path(f"{common.PERF_RES_PATH}/{runner}/{benchmark}") | ||
# TODO check for time range; What time range do I want? | ||
return filter( | ||
lambda f: f.is_file() | ||
and common.valid_timestamp(str(f)[-19:-4]) | ||
and str(f)[-19:-4] > cutoff, | ||
cache_dir.glob(f"{benchmark}-*_*.csv"), | ||
) | ||
|
||
# Calculate median of every desired metric: | ||
aggregate_s = dict() | ||
for sample_path in csv_samples(): | ||
with open(sample_path, "r") as sample_file: | ||
for s in csv.DictReader(sample_file): | ||
test_case = s["TestCase"] | ||
# Construct entry in aggregate_s for test case if it does not | ||
# exist already: | ||
if test_case not in aggregate_s: | ||
aggregate_s[test_case] = { | ||
metric: SimpleMedian() for metric in common.metrics_variance | ||
} | ||
|
||
for metric in common.metrics_variance: | ||
aggregate_s[test_case][metric].add(common.sanitize(s[metric])) | ||
|
||
# Write calculated median (aggregate_s) as a new .csv file: | ||
with open( | ||
f"{common.PERF_RES_PATH}/{runner}/{benchmark}/{benchmark}-median.csv", "w" | ||
) as output_csv: | ||
writer = csv.DictWriter( | ||
output_csv, fieldnames=["TestCase", *common.metrics_variance.keys()] | ||
) | ||
writer.writeheader() | ||
for test_case in aggregate_s: | ||
writer.writerow( | ||
{"TestCase": test_case} | ||
| { | ||
metric: aggregate_s[test_case][metric].get_median() | ||
for metric in common.metrics_variance | ||
} | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
if len(sys.argv) < 4: | ||
print( | ||
f"Usage: {sys.argv[0]} <runner name> <test case name> <cutoff date YYYYMMDD_HHMMSS>" | ||
) | ||
exit(1) | ||
if not common.valid_timestamp(sys.argv[3]): | ||
print(sys.argv) | ||
print(f"Bad cutoff timestamp, please use YYYYMMDD_HHMMSS.") | ||
exit(1) | ||
common.load_configs() | ||
# <runner>, <test case>, <cutoff> | ||
aggregate_median(sys.argv[1], sys.argv[2], sys.argv[3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct to assume that all these lines do is to read variables from .conf file and add to GITHUB_ENV? If so, can we have a function in utils.sh (say,
read_conf_and_populate_github_env
) to encapsulate all this? Same with TIMESTAMP_FORMAT, AVERAGE_CUTOFF_RANGE env vars too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like it'd make life easier, I can add a smaller one next to load_all_configs for GITHUB_ENV only