Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU functionality #126

Merged
merged 90 commits into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
2529f7a
initial commit for GPU API
computablee Oct 13, 2023
4d04a5f
rename ParallelTests to CPUTests
computablee Oct 13, 2023
f3ddf4c
add GPU tests
computablee Oct 13, 2023
a3a1d8d
add reference to GPU class
computablee Oct 13, 2023
e31b723
add ILGPU as dependency
computablee Oct 13, 2023
2d790e1
redo devicehandle api
computablee Oct 13, 2023
c8f8556
major updates, rewrites
computablee Oct 13, 2023
5a81011
final testing idea for base functionality
computablee Oct 13, 2023
2b8e2b4
add cecil as dependency
computablee Oct 13, 2023
7853b47
new API for DotMP.GPU
computablee Oct 14, 2023
313c9ef
got basic tester working!
computablee Oct 14, 2023
054a62d
rename namespace to include gpu
computablee Oct 14, 2023
45197c8
remove using for testing
computablee Oct 14, 2023
58474a8
update GPU tests for new GPU API
computablee Oct 16, 2023
2b3d7af
implement new API for parallelfor bodies
computablee Oct 16, 2023
42bce7c
Merge pull request #95 from computablee/gpu
computablee Oct 17, 2023
61b93af
test using new GPU data transfer API
computablee Nov 10, 2023
60a1c9f
implement new memory model
computablee Nov 10, 2023
49706b3
tidying up duplicate code
computablee Nov 10, 2023
68092ff
remove unnecessary implicit operators
computablee Nov 10, 2023
0ac98e5
add overloads for dispatching loops with 5 or 6 variables
computablee Nov 10, 2023
b76af51
add parfor_dump to gitignore
computablee Nov 10, 2023
c2f591a
add parallel for overload code gen
computablee Nov 10, 2023
dd56407
add overloads for up to 16 kernel variables
computablee Nov 10, 2023
82165da
move python to python folder
computablee Nov 10, 2023
89b5e44
update python to only generate up to 13 data params
computablee Nov 10, 2023
7b90933
add parfor overloads for up to 13 data parameters
computablee Nov 10, 2023
8a87b5c
fix cardinality in documentation
computablee Nov 10, 2023
ef3035f
implement 2D arrays into GPUArray and Buffer objects
computablee Nov 10, 2023
4eb4445
initial commit of GPU heat transfer benchmark
computablee Nov 10, 2023
b5365b7
add nocopy behavior
computablee Nov 10, 2023
c092eb9
fix exception on OpenCL devices
computablee Nov 10, 2023
f66b227
better accelerator selection
computablee Nov 10, 2023
79c4a35
fix copying back 2D arrays
computablee Nov 10, 2023
528cba0
added start offset for index calculations
computablee Nov 10, 2023
6121c0d
get HeatTransferVerify running properly
computablee Nov 10, 2023
62b5ebe
add LGPL license header
computablee Nov 10, 2023
d11913c
prepare benchmark
computablee Nov 10, 2023
8b12422
remove dispose for benchmarking
computablee Nov 10, 2023
20eebd6
change to .net 6
computablee Nov 10, 2023
708d9d5
add attributes to prevent exceptions when collecting code coverage
computablee Nov 11, 2023
2e328e7
fix bug
computablee Nov 11, 2023
f3b7735
mark old single/ordered/critical regions as obsolete, implement new v…
computablee Nov 11, 2023
429bde2
remove sln files
computablee Nov 11, 2023
fab2c47
exclude obsolete methods from code coverage
computablee Nov 11, 2023
8b40582
add missing using
computablee Nov 11, 2023
594b6ba
use new critical/ordered/single methods
computablee Nov 11, 2023
5e90c89
testing forcollapse performance in heat transfer, will fully implemen…
computablee Nov 11, 2023
348ad0c
implement T4 template for acceleratorhandler
computablee Nov 11, 2023
91f1692
ignore generated acceleratorhandler file
computablee Nov 11, 2023
6540825
add T4 stuff
computablee Nov 11, 2023
a8872d1
delete now unnecessary files
computablee Nov 11, 2023
cdf34c1
more autogen
computablee Nov 11, 2023
267a789
revert collapse
computablee Nov 11, 2023
3d610b4
remove excess newlines
computablee Nov 11, 2023
d03d31b
get parallelfor t4 gen working
computablee Nov 12, 2023
945f7e7
implement collapsed for loops
computablee Nov 12, 2023
d90279e
remove erroneous comment line
computablee Nov 12, 2023
f22ac63
test with 500x500 instead of 514x514
computablee Nov 12, 2023
aeba3b8
properly handle loops not divisible by block size
computablee Nov 12, 2023
a93d9fa
turn array bounds into off-256-divisble size for better testing
computablee Nov 12, 2023
11d4baf
enable more optimizations
computablee Nov 13, 2023
1c12e39
add GPU kernel launch overhead benchmark
computablee Nov 13, 2023
f2b2360
begin progress towards better index integration
computablee Nov 13, 2023
1f20188
add test for forcollapse
computablee Nov 13, 2023
141aafd
tidy up calls
computablee Nov 13, 2023
b5ca731
implement caching of indices
computablee Nov 13, 2023
2275b97
add support for 3D buffers
computablee Nov 13, 2023
81c39ba
migrate to new index technique
computablee Nov 13, 2023
af259ec
WIP
computablee Nov 13, 2023
7e4070a
new index technique via index caching
computablee Nov 13, 2023
a6d0072
update benchmarks
computablee Nov 13, 2023
ca0b54a
run dotnet format
computablee Nov 13, 2023
1c72a2f
add support for .NET 8
computablee Nov 14, 2023
8432ed7
comments, optimizations
computablee Nov 16, 2023
97571c5
temporary
computablee Nov 16, 2023
f87b687
add optimizations
computablee Nov 16, 2023
15e814b
Merge pull request #125 from computablee/gpu
computablee Nov 26, 2023
3ea76d2
resolve merge conflicts
computablee Nov 26, 2023
9df57a0
fix merge conflicts
computablee Nov 26, 2023
e0b139e
fix compile issues
computablee Nov 26, 2023
4561480
uncomment 1D and 3D arrays
computablee Nov 26, 2023
8f9450d
fix linting errors
computablee Nov 26, 2023
ca35b92
fix constructor
computablee Nov 26, 2023
ffc09ad
fix bug with getnumthreads
computablee Nov 26, 2023
b5dc84c
add test for 3D buffers and 3D collapsed for loops
computablee Nov 26, 2023
cc314e9
remove debug console writeline
computablee Nov 27, 2023
6541e09
fix 3D allocation (ZY->XY stride)
computablee Nov 27, 2023
b349701
omit unreachable code
computablee Nov 27, 2023
6fbab81
modify site publishing to build templates before running doxygen
computablee Nov 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .github/workflows/publish_site.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@ jobs:
steps:
- uses: actions/checkout@v4

- name: Setup .NET
uses: actions/setup-dotnet@v3
with:
dotnet-version: 8.0.x

- name: Build Templates
run: make build

- name: Install Doxygen
run: sudo apt-get install doxygen graphviz -y
shell: bash
Expand All @@ -28,4 +36,4 @@ jobs:
uses: JamesIves/github-pages-deploy-action@v4
with:
token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
folder: docs/html
folder: docs/html
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ docs/*
.vscode
*.opencover.xml
*.sln
AcceleratorHandler.cs
Gpu.cs
ProcessedREADME.md

# User-specific files
Expand Down
39 changes: 19 additions & 20 deletions DotMP-Tests/ParallelTests.cs → DotMP-Tests/CPUTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -28,17 +28,17 @@
namespace DotMPTests
{
/// <summary>
/// Tests for the DotMP library.
/// CPU tests for the DotMP library.
/// </summary>
public class ParallelTests
public class CPUTests
{
private readonly ITestOutputHelper output;

/// <summary>
/// Constructor to write output.
/// </summary>
/// <param name="output">Output object.</param>
public ParallelTests(ITestOutputHelper output)
public CPUTests(ITestOutputHelper output)
{
this.output = output;
}
Expand Down Expand Up @@ -522,7 +522,7 @@ public void Critical_works()
DotMP.Parallel.ParallelRegion(num_threads: threads, action: () =>
{
for (int i = 0; i < iters; i++)
DotMP.Parallel.Critical(0, () => ++total);
DotMP.Parallel.Critical(() => ++total);
});

total.Should().Be((int)threads * iters);
Expand All @@ -531,14 +531,13 @@ public void Critical_works()

DotMP.Parallel.ParallelRegion(num_threads: 4, action: () =>
{
if (DotMP.Parallel.GetThreadNum() == 0) DotMP.Parallel.Critical(0, () => Thread.Sleep(1000));
if (DotMP.Parallel.GetThreadNum() == 1) DotMP.Parallel.Critical(1, () => Thread.Sleep(1000));
if (DotMP.Parallel.GetThreadNum() == 2) DotMP.Parallel.Critical(0, () => Thread.Sleep(1000));
if (DotMP.Parallel.GetThreadNum() == 3) DotMP.Parallel.Critical(1, () => Thread.Sleep(1000));
if (DotMP.Parallel.GetThreadNum() % 2 == 0) DotMP.Parallel.Critical(() => Thread.Sleep(1000));
if (DotMP.Parallel.GetThreadNum() % 2 == 1) DotMP.Parallel.Critical(() => Thread.Sleep(1000));
});

double elapsed = DotMP.Parallel.GetWTime() - start;
elapsed.Should().BeLessThan(2200);
elapsed.Should().BeLessThan(2.2);
elapsed.Should().BeGreaterThan(2.0);
}

/// <summary>
Expand Down Expand Up @@ -571,7 +570,7 @@ public void Single_works()
{
for (int i = 0; i < 10; i++)
{
DotMP.Parallel.Single(0, () => DotMP.Atomic.Inc(ref total));
DotMP.Parallel.Single(() => DotMP.Atomic.Inc(ref total));
}
});

Expand All @@ -583,7 +582,7 @@ public void Single_works()
{
for (int i = 0; i < 10; i++)
{
DotMP.Parallel.Single(0, () => DotMP.Atomic.Inc(ref total));
DotMP.Parallel.Single(() => DotMP.Atomic.Inc(ref total));
}
});

Expand Down Expand Up @@ -749,7 +748,7 @@ public void Ordered_works()
DotMP.Parallel.ParallelFor(0, 1024, schedule: DotMP.Schedule.Static,
num_threads: threads, action: i =>
{
DotMP.Parallel.Ordered(0, () =>
DotMP.Parallel.Ordered(() =>
{
incrementing[i] = ctr++;
});
Expand Down Expand Up @@ -1111,7 +1110,7 @@ public void Tasking_works()

DotMP.Parallel.ParallelRegion(num_threads: threads, action: () =>
{
DotMP.Parallel.Single(0, () =>
DotMP.Parallel.Single(() =>
{
for (int i = 0; i < threads * 2; i++)
{
Expand Down Expand Up @@ -1139,7 +1138,7 @@ public void Tasking_works()

DotMP.Parallel.ParallelRegion(num_threads: threads, action: () =>
{
DotMP.Parallel.Single(0, () =>
DotMP.Parallel.Single(() =>
{
for (int i = 0; i < tasks_to_spawn; i++)
{
Expand Down Expand Up @@ -1199,7 +1198,7 @@ public void Nested_tasks_work()

DotMP.Parallel.ParallelRegion(num_threads: threads, action: () =>
{
DotMP.Parallel.Single(0, () =>
DotMP.Parallel.Single(() =>
{
DotMP.Parallel.Task(() =>
{
Expand Down Expand Up @@ -1369,7 +1368,7 @@ public void Non_parallel_single_should_except()
{
Assert.Throws<DotMP.Exceptions.NotInParallelRegionException>(() =>
{
DotMP.Parallel.Single(0, () => { });
DotMP.Parallel.Single(() => { });
});
}

Expand All @@ -1381,7 +1380,7 @@ public void Non_parallel_critical_should_except()
{
Assert.Throws<DotMP.Exceptions.NotInParallelRegionException>(() =>
{
DotMP.Parallel.Critical(0, () => { });
DotMP.Parallel.Critical(() => { });
});
}

Expand All @@ -1395,15 +1394,15 @@ public void Nested_worksharing_should_except()
{
DotMP.Parallel.ParallelFor(0, 10, num_threads: 4, action: i =>
{
DotMP.Parallel.Single(0, () => { });
DotMP.Parallel.Single(() => { });
});
});

Assert.Throws<DotMP.Exceptions.CannotPerformNestedWorksharingException>(() =>
{
DotMP.Parallel.ParallelRegion(num_threads: 4, action: () =>
{
DotMP.Parallel.Single(0, () =>
DotMP.Parallel.Single(() =>
{
DotMP.Parallel.For(0, 10, action: i => { });
});
Expand All @@ -1427,7 +1426,7 @@ public void Non_for_ordered_should_except()
{
Assert.Throws<DotMP.Exceptions.NotInParallelRegionException>(() =>
{
DotMP.Parallel.Ordered(0, () => { });
DotMP.Parallel.Ordered(() => { });
});
}

Expand Down
136 changes: 136 additions & 0 deletions DotMP-Tests/GPUTests.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text.Json.Serialization;
using System.Threading;
using DotMP;
using DotMP.GPU;
using FluentAssertions;
using Xunit;
using Xunit.Abstractions;


namespace DotMPTests
{
/// <summary>
/// CPU tests for the DotMP library.
/// </summary>
public class GPUTests
{
/// <summary>
/// Tests to make sure that for loops work in GPU kernels.
/// </summary>
[Fact]
public void GPU_for_works()
{
double[] a = new double[50000];
double[] x = new double[50000];
double[] y = new double[50000];
float[] res = new float[50000];
float[] res_cpu = new float[50000];

random_init(a);
random_init(x);
random_init(y);

{
using var a_gpu = new DotMP.GPU.Buffer<double>(a, DotMP.GPU.Buffer.Behavior.To);
using var x_gpu = new DotMP.GPU.Buffer<double>(x, DotMP.GPU.Buffer.Behavior.To);
using var y_gpu = new DotMP.GPU.Buffer<double>(y, DotMP.GPU.Buffer.Behavior.To);
using var res_gpu = new DotMP.GPU.Buffer<float>(res, DotMP.GPU.Buffer.Behavior.From);

DotMP.GPU.Parallel.ParallelFor(0, a.Length, a_gpu, x_gpu, y_gpu, res_gpu,
(i, a, x, y, res) =>
{
res[i] = (float)(a[i] * x[i] + y[i]);
});
}

for (int i = 0; i < a.Length; i++)
{
res_cpu[i] = (float)(a[i] * x[i] + y[i]);
}

Assert.Equal(res_cpu, res);

double[] a_old = a.Select(a => a).ToArray();

using (var a_gpu = new DotMP.GPU.Buffer<double>(a, DotMP.GPU.Buffer.Behavior.ToFrom))
{
DotMP.GPU.Parallel.ParallelFor(0, a.Length, a_gpu, (i, a) =>
{
a[i]++;
});
}

for (int i = 0; i < a.Length; i++)
{
a_old[i]++;
}

Assert.Equal(a, a_old);
}

/// <summary>
/// Tests to make sure that DotMP.GPU.Parallel.ForCollapse produces correct results.
/// </summary>
[Fact]
public void Collapse_works()
{
int[,] iters_hit = new int[1024, 1024];

using (var buf = new Buffer<int>(iters_hit, DotMP.GPU.Buffer.Behavior.ToFrom))
{
DotMP.GPU.Parallel.ParallelForCollapse((258, 512), (512, 600), buf, (i, j, iters_hit) =>
{
iters_hit[i, j]++;
});
}

for (int i = 0; i < 1024; i++)
for (int j = 0; j < 1024; j++)
if (i >= 258 && i < 512 && j >= 512 && j < 600)
iters_hit[i, j].Should().Be(1);
else
iters_hit[i, j].Should().Be(0);

iters_hit = null;

int[,,] iters_hit_3 = new int[128, 128, 64];

using (var buf = new Buffer<int>(iters_hit_3, DotMP.GPU.Buffer.Behavior.ToFrom))
{
DotMP.GPU.Parallel.ParallelForCollapse((35, 64), (16, 100), (10, 62), buf, action: (i, j, k, iters_hit_3) =>
{
iters_hit_3[i, j, k]++;
});
}

for (int i = 0; i < 128; i++)
for (int j = 0; j < 128; j++)
for (int k = 0; k < 64; k++)
if (i >= 35 && i < 64 && j >= 16 && j < 100 && k >= 10 && k < 62)
iters_hit_3[i, j, k].Should().Be(1);
else
iters_hit_3[i, j, k].Should().Be(0);

iters_hit_3 = null;
}

/// <summary>
/// Randomly initialize an array of type T.
/// </summary>
/// <typeparam name="T">The type to initialize to.</typeparam>
/// <param name="arr">The allocated array to store values into.</param>
private void random_init<T>(T[] arr)
{
Random r = new Random();

for (int i = 0; i < arr.Length; i++)
{
arr[i] = (T)Convert.ChangeType(r.NextDouble() * 128, typeof(T));
}
}
}
}
19 changes: 18 additions & 1 deletion DotMP/DotMP.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<TargetFrameworks>net6.0;net7.0;net8.0</TargetFrameworks>
<RootNamespace>DotMP</RootNamespace>
<PackageId>DotMP</PackageId>
<Version>1.6.0</Version>
<Version>2.0-pre1</Version>
<Authors>Phillip Allen Lane,et al.</Authors>
<PackageDescription>A library for fork-join parallelism in .NET, with an OpenMP-like API.</PackageDescription>
<RepositoryUrl>https://github.com/computablee/DotMP</RepositoryUrl>
Expand All @@ -23,4 +23,21 @@
<None Include="../ProcessedREADME.md" Pack="true" PackagePath="." />
</ItemGroup>

<ItemGroup>
<PackageReference Include="ILGPU" Version="1.5.1" />
<PackageReference Include="T4.Build" Version="0.2.4" />

<None Include="GPU/AcceleratorHandler.cs">
<DesignTime>True</DesignTime>
<AutoGen>True</AutoGen>
<DependentUpon>GPU/AcceleratorHandler.tt</DependentUpon>
</None>

<None Include="GPU/Gpu.cs">
<DesignTime>True</DesignTime>
<AutoGen>True</AutoGen>
<DependentUpon>GPU/Gpu.tt</DependentUpon>
</None>
</ItemGroup>

</Project>
Loading