Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp memory management, and add USM support. #264

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

VarLad
Copy link

@VarLad VarLad commented Oct 25, 2024

Heavy WIP, for now. Expect very little to work.

This exists so that I can, with time, ask for feedbacks and at a later point, for other people to be able to try this PR.

Edit:

Most of the functionality is there now. There's a bug in array interface, where operations between two CLArray results in an error. This has been fixed in the latest commit.

Most recent update: It should be on par with former SVM backend, though tests haven't been ported yet. Needs testing!

Most recent update: 79 errors in GPUArrays test suite. All errors basically say: Attempt to use a freed reference

All tests pass! Should be usable as of the moment!

Most recent update: Going into WIP again

Most recent update: All tests pass as per latest commit. Feel free to try!

Copy link

codecov bot commented Oct 28, 2024

Codecov Report

Attention: Patch coverage is 71.81208% with 84 lines in your changes missing coverage. Please review.

Project coverage is 72.92%. Comparing base (adafca1) to head (4be457c).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/array.jl 75.37% 49 Missing ⚠️
src/memory.jl 58.92% 23 Missing ⚠️
src/random.jl 0.00% 10 Missing ⚠️
src/compiler/execution.jl 83.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #264      +/-   ##
==========================================
- Coverage   80.00%   72.92%   -7.08%     
==========================================
  Files           9       12       +3     
  Lines         440      613     +173     
==========================================
+ Hits          352      447      +95     
- Misses         88      166      +78     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@VarLad VarLad changed the title Draft: Add USM support to OpenCL.jl WIP: Add USM support to OpenCL.jl Jan 12, 2025
@VarLad VarLad marked this pull request as ready for review January 13, 2025 07:48
github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

@VarLad
Copy link
Author

VarLad commented Jan 13, 2025

Many tests still need to be ported, but any reviews in current functionality would be welcome!
Although, 10305/10418 tests seem to pass with the current tests.

It appears that a majority of the errors are Attempt to use a freed reference

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

@VarLad VarLad changed the title WIP: Add USM support to OpenCL.jl Add USM support to OpenCL.jl Jan 16, 2025
github-actions[bot]

This comment was marked as outdated.

Copy link
Member

@maleadt maleadt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick partial review. My biggest gripes are that this shouldn't have formatted the entire codebase, which makes reviewing very hard, and shouldn't have removed support for regular Buffers or SVM. Intel USM being an extension I don't think we should rely on it being available (SVM is questionable enough already, but at least is mandated by specific OpenCL versions).

lib/cl/CL.jl Outdated Show resolved Hide resolved
lib/cl/api.jl Outdated Show resolved Hide resolved
lib/cl/api.jl Outdated Show resolved Hide resolved
lib/cl/state.jl Outdated Show resolved Hide resolved
lib/cl/cmdqueue.jl Outdated Show resolved Hide resolved
lib/cl/error.jl Outdated Show resolved Hide resolved
lib/cl/kernel.jl Outdated Show resolved Hide resolved
@maleadt maleadt marked this pull request as draft January 17, 2025 08:37
@maleadt
Copy link
Member

maleadt commented Jan 17, 2025

julia> cl.svm_capabilities(cl.device())
(coarse_grain_buffer = true, fine_grain_buffer = false, fine_grain_system = false)

julia> cl.usm_capabilities(cl.device())
ERROR: CLError(code=-30, CL_INVALID_VALUE)

That should probably behave more nicely when USM isn't supported at all.

lib/cl/kernel.jl Outdated Show resolved Hide resolved
lib/cl/kernel.jl Outdated Show resolved Hide resolved
lib/cl/memory.jl Outdated Show resolved Hide resolved
@maleadt
Copy link
Member

maleadt commented Jan 17, 2025

Something is wrong with CI (on master too); the CUDA tests are empty.

@maleadt
Copy link
Member

maleadt commented Jan 20, 2025

I've squashed the commits. Please use rebase in the future; merge commits only make things more annoying.

@VarLad VarLad marked this pull request as ready for review January 20, 2025 14:32
Copy link

github-actions bot commented Jan 20, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/examples/vadd.jl b/examples/vadd.jl
index 744f84d..3ca6edf 100644
--- a/examples/vadd.jl
+++ b/examples/vadd.jl
@@ -21,7 +21,8 @@ prog = cl.Program(; source) |> cl.build!
 kern = cl.Kernel(prog, "vadd")
 
 len = prod(dims)
-clcall(kern, Tuple{CLPtr{Float32}, CLPtr{Float32}, CLPtr{Float32}},
+clcall(
+    kern, Tuple{CLPtr{Float32}, CLPtr{Float32}, CLPtr{Float32}},
        d_a, d_b, d_c; global_size=(len,))
 c = Array(d_c)
 @test a+b ≈ c
diff --git a/lib/cl/api.jl b/lib/cl/api.jl
index c123811..2b84716 100644
--- a/lib/cl/api.jl
+++ b/lib/cl/api.jl
@@ -79,10 +79,12 @@ macro ext_ccall(ex)
     _, fn = target.args
 
     @gensym fptr
-    esc(quote
-        $fptr = $clGetExtensionFunctionAddressForPlatform(platform(), $fn)
-        @ccall $(Expr(:($), fptr))($(argexprs...))::$ret
-    end)
+    return esc(
+        quote
+            $fptr = $clGetExtensionFunctionAddressForPlatform(platform(), $fn)
+            @ccall $(Expr(:($), fptr))($(argexprs...))::$ret
+        end
+    )
 end
 
 include("libopencl.jl")
diff --git a/lib/cl/device.jl b/lib/cl/device.jl
index e7afc1c..e1bdf9e 100644
--- a/lib/cl/device.jl
+++ b/lib/cl/device.jl
@@ -199,7 +199,7 @@ function usm_capabilities(d::Device)
     usm_supported(d) || throw(ArgumentError("Unified Shared Memory not supported on this device"))
 
     function check_capability_bits(mask::cl_device_unified_shared_memory_capabilities_intel)
-        (;
+        return (;
             access = mask & CL_UNIFIED_SHARED_MEMORY_ACCESS_INTEL != 0,
             atomic_access = mask & CL_UNIFIED_SHARED_MEMORY_ATOMIC_ACCESS_INTEL != 0,
             concurrent_access = mask & CL_UNIFIED_SHARED_MEMORY_CONCURRENT_ACCESS_INTEL != 0,
diff --git a/lib/cl/kernel.jl b/lib/cl/kernel.jl
index 6d78972..7544f68 100644
--- a/lib/cl/kernel.jl
+++ b/lib/cl/kernel.jl
@@ -177,7 +177,8 @@ function enqueue_kernel(k::Kernel, global_work_size, local_work_size=nothing;
     end
 
     ret_event = Ref{cl_event}()
-    clEnqueueNDRangeKernel(queue(), k, work_dim, goffset, gsize, lsize,
+    clEnqueueNDRangeKernel(
+        queue(), k, work_dim, goffset, gsize, lsize,
                            n_events, wait_event_ids, ret_event)
     return Event(ret_event[], retain=false)
 end
diff --git a/lib/cl/libopencl.jl b/lib/cl/libopencl.jl
index 8e85bbb..584150d 100644
--- a/lib/cl/libopencl.jl
+++ b/lib/cl/libopencl.jl
@@ -21,7 +21,7 @@ function check(f)
 end
 
 macro CL_MAKE_VERSION(major, minor, patch)
-    quote
+    return quote
         VersionNumber($major, $minor, $patch)
     end
 end
@@ -446,11 +446,12 @@ end
 
 function clSVMAlloc(context, flags, size, alignment)
     @ccall libopencl.clSVMAlloc(context::cl_context, flags::cl_svm_mem_flags, size::Csize_t,
-                                alignment::cl_uint)::CLPtr{Cvoid}
+        alignment::cl_uint
+    )::CLPtr{Cvoid}
 end
 
 function clSVMFree(context, svm_pointer)
-    @ccall libopencl.clSVMFree(context::cl_context, svm_pointer::PtrOrCLPtr{Cvoid})::Cvoid
+    return @ccall libopencl.clSVMFree(context::cl_context, svm_pointer::PtrOrCLPtr{Cvoid})::Cvoid
 end
 
 function clCreateSamplerWithProperties(context, sampler_properties, errcode_ret)
@@ -607,7 +608,8 @@ end
 
 @checked function clSetKernelArgSVMPointer(kernel, arg_index, arg_value)
     @ccall libopencl.clSetKernelArgSVMPointer(kernel::cl_kernel, arg_index::cl_uint,
-                                              arg_value::CLPtr{Cvoid})::cl_int
+        arg_value::CLPtr{Cvoid}
+    )::cl_int
 end
 
 @checked function clSetKernelExecInfo(kernel, param_name, param_value_size, param_value)
@@ -974,8 +976,8 @@ end
 @checked function clEnqueueSVMMemcpy(command_queue, blocking_copy, dst_ptr, src_ptr, size,
                                      num_events_in_wait_list, event_wait_list, event)
     @ccall libopencl.clEnqueueSVMMemcpy(command_queue::cl_command_queue,
-                                        blocking_copy::cl_bool, dst_ptr::PtrOrCLPtr{Cvoid},
-                                        src_ptr::PtrOrCLPtr{Cvoid}, size::Csize_t,
+        blocking_copy::cl_bool, dst_ptr::PtrOrCLPtr{Cvoid},
+        src_ptr::PtrOrCLPtr{Cvoid}, size::Csize_t,
                                         num_events_in_wait_list::cl_uint,
                                         event_wait_list::Ptr{cl_event},
                                         event::Ptr{cl_event})::cl_int
@@ -984,7 +986,7 @@ end
 @checked function clEnqueueSVMMemFill(command_queue, svm_ptr, pattern, pattern_size, size,
                                       num_events_in_wait_list, event_wait_list, event)
     @ccall libopencl.clEnqueueSVMMemFill(command_queue::cl_command_queue,
-                                         svm_ptr::CLPtr{Cvoid}, pattern::Ptr{Cvoid},
+        svm_ptr::CLPtr{Cvoid}, pattern::Ptr{Cvoid},
                                          pattern_size::Csize_t, size::Csize_t,
                                          num_events_in_wait_list::cl_uint,
                                          event_wait_list::Ptr{cl_event},
@@ -994,7 +996,7 @@ end
 @checked function clEnqueueSVMMap(command_queue, blocking_map, flags, svm_ptr, size,
                                   num_events_in_wait_list, event_wait_list, event)
     @ccall libopencl.clEnqueueSVMMap(command_queue::cl_command_queue, blocking_map::cl_bool,
-                                     flags::cl_map_flags, svm_ptr::CLPtr{Cvoid},
+        flags::cl_map_flags, svm_ptr::CLPtr{Cvoid},
                                      size::Csize_t, num_events_in_wait_list::cl_uint,
                                      event_wait_list::Ptr{cl_event},
                                      event::Ptr{cl_event})::cl_int
@@ -1002,8 +1004,9 @@ end
 
 @checked function clEnqueueSVMUnmap(command_queue, svm_ptr, num_events_in_wait_list,
                                     event_wait_list, event)
-    @ccall libopencl.clEnqueueSVMUnmap(command_queue::cl_command_queue,
-                                       svm_ptr::CLPtr{Cvoid},
+    @ccall libopencl.clEnqueueSVMUnmap(
+        command_queue::cl_command_queue,
+        svm_ptr::CLPtr{Cvoid},
                                        num_events_in_wait_list::cl_uint,
                                        event_wait_list::Ptr{cl_event},
                                        event::Ptr{cl_event})::cl_int
@@ -1014,7 +1017,7 @@ end
                                          event_wait_list, event)
     @ccall libopencl.clEnqueueSVMMigrateMem(command_queue::cl_command_queue,
                                             num_svm_pointers::cl_uint,
-                                            svm_pointers::Ptr{CLPtr{Cvoid}},
+        svm_pointers::Ptr{CLPtr{Cvoid}},
                                             sizes::Ptr{Csize_t},
                                             flags::cl_mem_migration_flags,
                                             num_events_in_wait_list::cl_uint,
@@ -1361,10 +1364,12 @@ const clGetCommandBufferInfoKHR_t = Cvoid
 const clGetCommandBufferInfoKHR_fn = Ptr{clGetCommandBufferInfoKHR_t}
 
 function clCreateCommandBufferKHR(num_queues, queues, properties, errcode_ret)
-    @ext_ccall libopencl.clCreateCommandBufferKHR(num_queues::cl_uint,
-                                                  queues::Ptr{cl_command_queue},
-                                                  properties::Ptr{cl_command_buffer_properties_khr},
-                                                  errcode_ret::Ptr{cl_int})::cl_command_buffer_khr
+    return @ext_ccall libopencl.clCreateCommandBufferKHR(
+        num_queues::cl_uint,
+        queues::Ptr{cl_command_queue},
+        properties::Ptr{cl_command_buffer_properties_khr},
+        errcode_ret::Ptr{cl_int}
+    )::cl_command_buffer_khr
 end
 
 @checked function clFinalizeCommandBufferKHR(command_buffer)
@@ -1381,152 +1386,181 @@ end
 
 @checked function clEnqueueCommandBufferKHR(num_queues, queues, command_buffer,
                                             num_events_in_wait_list, event_wait_list, event)
-    @ext_ccall libopencl.clEnqueueCommandBufferKHR(num_queues::cl_uint,
-                                                   queues::Ptr{cl_command_queue},
-                                                   command_buffer::cl_command_buffer_khr,
-                                                   num_events_in_wait_list::cl_uint,
-                                                   event_wait_list::Ptr{cl_event},
-                                                   event::Ptr{cl_event})::cl_int
-end
-
-@checked function clCommandBarrierWithWaitListKHR(command_buffer, command_queue, properties,
+    @ext_ccall libopencl.clEnqueueCommandBufferKHR(
+        num_queues::cl_uint,
+        queues::Ptr{cl_command_queue},
+        command_buffer::cl_command_buffer_khr,
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
+end
+
+@checked function clCommandBarrierWithWaitListKHR(
+        command_buffer, command_queue, properties,
                                                   num_sync_points_in_wait_list,
                                                   sync_point_wait_list, sync_point,
                                                   mutable_handle)
-    @ext_ccall libopencl.clCommandBarrierWithWaitListKHR(command_buffer::cl_command_buffer_khr,
-                                                         command_queue::cl_command_queue,
-                                                         properties::Ptr{cl_command_properties_khr},
-                                                         num_sync_points_in_wait_list::cl_uint,
-                                                         sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                                         sync_point::Ptr{cl_sync_point_khr},
-                                                         mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
-end
-
-@checked function clCommandCopyBufferKHR(command_buffer, command_queue, properties,
-                                         src_buffer, dst_buffer, src_offset, dst_offset,
-                                         size, num_sync_points_in_wait_list,
-                                         sync_point_wait_list, sync_point, mutable_handle)
-    @ext_ccall libopencl.clCommandCopyBufferKHR(command_buffer::cl_command_buffer_khr,
+    @ext_ccall libopencl.clCommandBarrierWithWaitListKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
+end
+
+@checked function clCommandCopyBufferKHR(
+        command_buffer, command_queue, properties,
+        src_buffer, dst_buffer, src_offset, dst_offset,
+        size, num_sync_points_in_wait_list,
+        sync_point_wait_list, sync_point, mutable_handle
+    )
+    @ext_ccall libopencl.clCommandCopyBufferKHR(
+        command_buffer::cl_command_buffer_khr,
                                                 command_queue::cl_command_queue,
-                                                properties::Ptr{cl_command_properties_khr},
+        properties::Ptr{cl_command_properties_khr},
                                                 src_buffer::cl_mem, dst_buffer::cl_mem,
-                                                src_offset::Csize_t, dst_offset::Csize_t,
-                                                size::Csize_t,
+        src_offset::Csize_t, dst_offset::Csize_t,
+        size::Csize_t,
                                                 num_sync_points_in_wait_list::cl_uint,
                                                 sync_point_wait_list::Ptr{cl_sync_point_khr},
                                                 sync_point::Ptr{cl_sync_point_khr},
                                                 mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
 end
 
-@checked function clCommandCopyBufferRectKHR(command_buffer, command_queue, properties,
-                                             src_buffer, dst_buffer, src_origin, dst_origin,
-                                             region, src_row_pitch, src_slice_pitch,
-                                             dst_row_pitch, dst_slice_pitch,
-                                             num_sync_points_in_wait_list,
-                                             sync_point_wait_list, sync_point,
-                                             mutable_handle)
-    @ext_ccall libopencl.clCommandCopyBufferRectKHR(command_buffer::cl_command_buffer_khr,
-                                                    command_queue::cl_command_queue,
-                                                    properties::Ptr{cl_command_properties_khr},
-                                                    src_buffer::cl_mem, dst_buffer::cl_mem,
-                                                    src_origin::Ptr{Csize_t},
-                                                    dst_origin::Ptr{Csize_t},
-                                                    region::Ptr{Csize_t},
-                                                    src_row_pitch::Csize_t,
-                                                    src_slice_pitch::Csize_t,
-                                                    dst_row_pitch::Csize_t,
-                                                    dst_slice_pitch::Csize_t,
-                                                    num_sync_points_in_wait_list::cl_uint,
-                                                    sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                                    sync_point::Ptr{cl_sync_point_khr},
-                                                    mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
-end
-
-@checked function clCommandCopyBufferToImageKHR(command_buffer, command_queue, properties,
-                                                src_buffer, dst_image, src_offset,
-                                                dst_origin, region,
+@checked function clCommandCopyBufferRectKHR(
+        command_buffer, command_queue, properties,
+        src_buffer, dst_buffer, src_origin, dst_origin,
+        region, src_row_pitch, src_slice_pitch,
+        dst_row_pitch, dst_slice_pitch,
+        num_sync_points_in_wait_list,
+        sync_point_wait_list, sync_point,
+        mutable_handle
+    )
+    @ext_ccall libopencl.clCommandCopyBufferRectKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        src_buffer::cl_mem, dst_buffer::cl_mem,
+        src_origin::Ptr{Csize_t},
+        dst_origin::Ptr{Csize_t},
+        region::Ptr{Csize_t},
+        src_row_pitch::Csize_t,
+        src_slice_pitch::Csize_t,
+        dst_row_pitch::Csize_t,
+        dst_slice_pitch::Csize_t,
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
+end
+
+@checked function clCommandCopyBufferToImageKHR(
+        command_buffer, command_queue, properties,
+        src_buffer, dst_image, src_offset,
+        dst_origin, region,
                                                 num_sync_points_in_wait_list,
                                                 sync_point_wait_list, sync_point,
                                                 mutable_handle)
-    @ext_ccall libopencl.clCommandCopyBufferToImageKHR(command_buffer::cl_command_buffer_khr,
-                                                       command_queue::cl_command_queue,
-                                                       properties::Ptr{cl_command_properties_khr},
-                                                       src_buffer::cl_mem,
-                                                       dst_image::cl_mem,
-                                                       src_offset::Csize_t,
-                                                       dst_origin::Ptr{Csize_t},
-                                                       region::Ptr{Csize_t},
-                                                       num_sync_points_in_wait_list::cl_uint,
-                                                       sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                                       sync_point::Ptr{cl_sync_point_khr},
-                                                       mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
-end
-
-@checked function clCommandCopyImageKHR(command_buffer, command_queue, properties,
-                                        src_image, dst_image, src_origin, dst_origin,
-                                        region, num_sync_points_in_wait_list,
-                                        sync_point_wait_list, sync_point, mutable_handle)
-    @ext_ccall libopencl.clCommandCopyImageKHR(command_buffer::cl_command_buffer_khr,
-                                               command_queue::cl_command_queue,
-                                               properties::Ptr{cl_command_properties_khr},
-                                               src_image::cl_mem, dst_image::cl_mem,
-                                               src_origin::Ptr{Csize_t},
-                                               dst_origin::Ptr{Csize_t},
-                                               region::Ptr{Csize_t},
-                                               num_sync_points_in_wait_list::cl_uint,
-                                               sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                               sync_point::Ptr{cl_sync_point_khr},
-                                               mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
-end
-
-@checked function clCommandCopyImageToBufferKHR(command_buffer, command_queue, properties,
-                                                src_image, dst_buffer, src_origin, region,
-                                                dst_offset, num_sync_points_in_wait_list,
+    @ext_ccall libopencl.clCommandCopyBufferToImageKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        src_buffer::cl_mem,
+        dst_image::cl_mem,
+        src_offset::Csize_t,
+        dst_origin::Ptr{Csize_t},
+        region::Ptr{Csize_t},
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
+end
+
+@checked function clCommandCopyImageKHR(
+        command_buffer, command_queue, properties,
+        src_image, dst_image, src_origin, dst_origin,
+        region, num_sync_points_in_wait_list,
+        sync_point_wait_list, sync_point, mutable_handle
+    )
+    @ext_ccall libopencl.clCommandCopyImageKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        src_image::cl_mem, dst_image::cl_mem,
+        src_origin::Ptr{Csize_t},
+        dst_origin::Ptr{Csize_t},
+        region::Ptr{Csize_t},
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
+end
+
+@checked function clCommandCopyImageToBufferKHR(
+        command_buffer, command_queue, properties,
+        src_image, dst_buffer, src_origin, region,
+        dst_offset, num_sync_points_in_wait_list,
                                                 sync_point_wait_list, sync_point,
                                                 mutable_handle)
-    @ext_ccall libopencl.clCommandCopyImageToBufferKHR(command_buffer::cl_command_buffer_khr,
-                                                       command_queue::cl_command_queue,
-                                                       properties::Ptr{cl_command_properties_khr},
-                                                       src_image::cl_mem,
-                                                       dst_buffer::cl_mem,
-                                                       src_origin::Ptr{Csize_t},
-                                                       region::Ptr{Csize_t},
-                                                       dst_offset::Csize_t,
-                                                       num_sync_points_in_wait_list::cl_uint,
-                                                       sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                                       sync_point::Ptr{cl_sync_point_khr},
-                                                       mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
-end
-
-@checked function clCommandFillBufferKHR(command_buffer, command_queue, properties, buffer,
-                                         pattern, pattern_size, offset, size,
+    @ext_ccall libopencl.clCommandCopyImageToBufferKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        src_image::cl_mem,
+        dst_buffer::cl_mem,
+        src_origin::Ptr{Csize_t},
+        region::Ptr{Csize_t},
+        dst_offset::Csize_t,
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
+end
+
+@checked function clCommandFillBufferKHR(
+        command_buffer, command_queue, properties, buffer,
+        pattern, pattern_size, offset, size,
                                          num_sync_points_in_wait_list, sync_point_wait_list,
                                          sync_point, mutable_handle)
-    @ext_ccall libopencl.clCommandFillBufferKHR(command_buffer::cl_command_buffer_khr,
-                                                command_queue::cl_command_queue,
-                                                properties::Ptr{cl_command_properties_khr},
-                                                buffer::cl_mem, pattern::Ptr{Cvoid},
-                                                pattern_size::Csize_t, offset::Csize_t,
-                                                size::Csize_t,
-                                                num_sync_points_in_wait_list::cl_uint,
-                                                sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                                sync_point::Ptr{cl_sync_point_khr},
-                                                mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
-end
-
-@checked function clCommandFillImageKHR(command_buffer, command_queue, properties, image,
-                                        fill_color, origin, region,
-                                        num_sync_points_in_wait_list, sync_point_wait_list,
-                                        sync_point, mutable_handle)
-    @ext_ccall libopencl.clCommandFillImageKHR(command_buffer::cl_command_buffer_khr,
-                                               command_queue::cl_command_queue,
-                                               properties::Ptr{cl_command_properties_khr},
-                                               image::cl_mem, fill_color::Ptr{Cvoid},
-                                               origin::Ptr{Csize_t}, region::Ptr{Csize_t},
-                                               num_sync_points_in_wait_list::cl_uint,
-                                               sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                               sync_point::Ptr{cl_sync_point_khr},
-                                               mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
+    @ext_ccall libopencl.clCommandFillBufferKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        buffer::cl_mem, pattern::Ptr{Cvoid},
+        pattern_size::Csize_t, offset::Csize_t,
+        size::Csize_t,
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
+end
+
+@checked function clCommandFillImageKHR(
+        command_buffer, command_queue, properties, image,
+        fill_color, origin, region,
+        num_sync_points_in_wait_list, sync_point_wait_list,
+        sync_point, mutable_handle
+    )
+    @ext_ccall libopencl.clCommandFillImageKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        image::cl_mem, fill_color::Ptr{Cvoid},
+        origin::Ptr{Csize_t}, region::Ptr{Csize_t},
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
 end
 
 @checked function clCommandNDRangeKernelKHR(command_buffer, command_queue, properties,
@@ -1535,26 +1569,30 @@ end
                                             num_sync_points_in_wait_list,
                                             sync_point_wait_list, sync_point,
                                             mutable_handle)
-    @ext_ccall libopencl.clCommandNDRangeKernelKHR(command_buffer::cl_command_buffer_khr,
-                                                   command_queue::cl_command_queue,
-                                                   properties::Ptr{cl_command_properties_khr},
-                                                   kernel::cl_kernel, work_dim::cl_uint,
-                                                   global_work_offset::Ptr{Csize_t},
-                                                   global_work_size::Ptr{Csize_t},
-                                                   local_work_size::Ptr{Csize_t},
-                                                   num_sync_points_in_wait_list::cl_uint,
-                                                   sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                                   sync_point::Ptr{cl_sync_point_khr},
-                                                   mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
+    @ext_ccall libopencl.clCommandNDRangeKernelKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        kernel::cl_kernel, work_dim::cl_uint,
+        global_work_offset::Ptr{Csize_t},
+        global_work_size::Ptr{Csize_t},
+        local_work_size::Ptr{Csize_t},
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
 end
 
 @checked function clGetCommandBufferInfoKHR(command_buffer, param_name, param_value_size,
                                             param_value, param_value_size_ret)
-    @ext_ccall libopencl.clGetCommandBufferInfoKHR(command_buffer::cl_command_buffer_khr,
-                                                   param_name::cl_command_buffer_info_khr,
-                                                   param_value_size::Csize_t,
-                                                   param_value::Ptr{Cvoid},
-                                                   param_value_size_ret::Ptr{Csize_t})::cl_int
+    @ext_ccall libopencl.clGetCommandBufferInfoKHR(
+        command_buffer::cl_command_buffer_khr,
+        param_name::cl_command_buffer_info_khr,
+        param_value_size::Csize_t,
+        param_value::Ptr{Cvoid},
+        param_value_size_ret::Ptr{Csize_t}
+    )::cl_int
 end
 
 # typedef cl_int CL_API_CALL clCommandSVMMemcpyKHR_t ( cl_command_buffer_khr command_buffer , cl_command_queue command_queue , const cl_command_properties_khr * properties , void * dst_ptr , const void * src_ptr , size_t size , cl_uint num_sync_points_in_wait_list , const cl_sync_point_khr * sync_point_wait_list , cl_sync_point_khr * sync_point , cl_mutable_command_khr * mutable_handle )
@@ -1567,33 +1605,40 @@ const clCommandSVMMemFillKHR_t = Cvoid
 
 const clCommandSVMMemFillKHR_fn = Ptr{clCommandSVMMemFillKHR_t}
 
-@checked function clCommandSVMMemcpyKHR(command_buffer, command_queue, properties, dst_ptr,
-                                        src_ptr, size, num_sync_points_in_wait_list,
+@checked function clCommandSVMMemcpyKHR(
+        command_buffer, command_queue, properties, dst_ptr,
+        src_ptr, size, num_sync_points_in_wait_list,
                                         sync_point_wait_list, sync_point, mutable_handle)
-    @ext_ccall libopencl.clCommandSVMMemcpyKHR(command_buffer::cl_command_buffer_khr,
-                                               command_queue::cl_command_queue,
-                                               properties::Ptr{cl_command_properties_khr},
-                                               dst_ptr::Ptr{Cvoid}, src_ptr::Ptr{Cvoid},
-                                               size::Csize_t,
-                                               num_sync_points_in_wait_list::cl_uint,
-                                               sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                               sync_point::Ptr{cl_sync_point_khr},
-                                               mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
-end
-
-@checked function clCommandSVMMemFillKHR(command_buffer, command_queue, properties, svm_ptr,
-                                         pattern, pattern_size, size,
-                                         num_sync_points_in_wait_list, sync_point_wait_list,
-                                         sync_point, mutable_handle)
-    @ext_ccall libopencl.clCommandSVMMemFillKHR(command_buffer::cl_command_buffer_khr,
-                                                command_queue::cl_command_queue,
-                                                properties::Ptr{cl_command_properties_khr},
-                                                svm_ptr::Ptr{Cvoid}, pattern::Ptr{Cvoid},
-                                                pattern_size::Csize_t, size::Csize_t,
-                                                num_sync_points_in_wait_list::cl_uint,
-                                                sync_point_wait_list::Ptr{cl_sync_point_khr},
-                                                sync_point::Ptr{cl_sync_point_khr},
-                                                mutable_handle::Ptr{cl_mutable_command_khr})::cl_int
+    @ext_ccall libopencl.clCommandSVMMemcpyKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        dst_ptr::Ptr{Cvoid}, src_ptr::Ptr{Cvoid},
+        size::Csize_t,
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
+end
+
+@checked function clCommandSVMMemFillKHR(
+        command_buffer, command_queue, properties, svm_ptr,
+        pattern, pattern_size, size,
+        num_sync_points_in_wait_list, sync_point_wait_list,
+        sync_point, mutable_handle
+    )
+    @ext_ccall libopencl.clCommandSVMMemFillKHR(
+        command_buffer::cl_command_buffer_khr,
+        command_queue::cl_command_queue,
+        properties::Ptr{cl_command_properties_khr},
+        svm_ptr::Ptr{Cvoid}, pattern::Ptr{Cvoid},
+        pattern_size::Csize_t, size::Csize_t,
+        num_sync_points_in_wait_list::cl_uint,
+        sync_point_wait_list::Ptr{cl_sync_point_khr},
+        sync_point::Ptr{cl_sync_point_khr},
+        mutable_handle::Ptr{cl_mutable_command_khr}
+    )::cl_int
 end
 
 const cl_platform_command_buffer_capabilities_khr = cl_bitfield
@@ -1605,13 +1650,15 @@ const clRemapCommandBufferKHR_fn = Ptr{clRemapCommandBufferKHR_t}
 
 function clRemapCommandBufferKHR(command_buffer, automatic, num_queues, queues, num_handles,
                                  handles, handles_ret, errcode_ret)
-    @ext_ccall libopencl.clRemapCommandBufferKHR(command_buffer::cl_command_buffer_khr,
-                                                 automatic::cl_bool, num_queues::cl_uint,
-                                                 queues::Ptr{cl_command_queue},
-                                                 num_handles::cl_uint,
-                                                 handles::Ptr{cl_mutable_command_khr},
-                                                 handles_ret::Ptr{cl_mutable_command_khr},
-                                                 errcode_ret::Ptr{cl_int})::cl_command_buffer_khr
+    return @ext_ccall libopencl.clRemapCommandBufferKHR(
+        command_buffer::cl_command_buffer_khr,
+        automatic::cl_bool, num_queues::cl_uint,
+        queues::Ptr{cl_command_queue},
+        num_handles::cl_uint,
+        handles::Ptr{cl_mutable_command_khr},
+        handles_ret::Ptr{cl_mutable_command_khr},
+        errcode_ret::Ptr{cl_int}
+    )::cl_command_buffer_khr
 end
 
 const cl_command_buffer_update_type_khr = cl_uint
@@ -1664,21 +1711,27 @@ const clGetMutableCommandInfoKHR_t = Cvoid
 
 const clGetMutableCommandInfoKHR_fn = Ptr{clGetMutableCommandInfoKHR_t}
 
-@checked function clUpdateMutableCommandsKHR(command_buffer, num_configs, config_types,
-                                             configs)
-    @ext_ccall libopencl.clUpdateMutableCommandsKHR(command_buffer::cl_command_buffer_khr,
-                                                    num_configs::cl_uint,
-                                                    config_types::Ptr{cl_command_buffer_update_type_khr},
-                                                    configs::Ptr{Ptr{Cvoid}})::cl_int
+@checked function clUpdateMutableCommandsKHR(
+        command_buffer, num_configs, config_types,
+        configs
+    )
+    @ext_ccall libopencl.clUpdateMutableCommandsKHR(
+        command_buffer::cl_command_buffer_khr,
+        num_configs::cl_uint,
+        config_types::Ptr{cl_command_buffer_update_type_khr},
+        configs::Ptr{Ptr{Cvoid}}
+    )::cl_int
 end
 
 @checked function clGetMutableCommandInfoKHR(command, param_name, param_value_size,
                                              param_value, param_value_size_ret)
-    @ext_ccall libopencl.clGetMutableCommandInfoKHR(command::cl_mutable_command_khr,
-                                                    param_name::cl_mutable_command_info_khr,
-                                                    param_value_size::Csize_t,
-                                                    param_value::Ptr{Cvoid},
-                                                    param_value_size_ret::Ptr{Csize_t})::cl_int
+    @ext_ccall libopencl.clGetMutableCommandInfoKHR(
+        command::cl_mutable_command_khr,
+        param_name::cl_mutable_command_info_khr,
+        param_value_size::Csize_t,
+        param_value::Ptr{Cvoid},
+        param_value_size_ret::Ptr{Csize_t}
+    )::cl_int
 end
 
 # typedef cl_int CL_API_CALL clSetMemObjectDestructorAPPLE_t ( cl_mem memobj , void ( CL_CALLBACK * pfn_notify ) ( cl_mem memobj , void * user_data ) , void * user_data )
@@ -1687,9 +1740,11 @@ const clSetMemObjectDestructorAPPLE_t = Cvoid
 const clSetMemObjectDestructorAPPLE_fn = Ptr{clSetMemObjectDestructorAPPLE_t}
 
 @checked function clSetMemObjectDestructorAPPLE(memobj, pfn_notify, user_data)
-    @ext_ccall libopencl.clSetMemObjectDestructorAPPLE(memobj::cl_mem,
-                                                       pfn_notify::Ptr{Cvoid},
-                                                       user_data::Ptr{Cvoid})::cl_int
+    @ext_ccall libopencl.clSetMemObjectDestructorAPPLE(
+        memobj::cl_mem,
+        pfn_notify::Ptr{Cvoid},
+        user_data::Ptr{Cvoid}
+    )::cl_int
 end
 
 # typedef void CL_API_CALL clLogMessagesToSystemLogAPPLE_t ( const char * errstr , const void * private_info , size_t cb , void * user_data )
@@ -1708,22 +1763,28 @@ const clLogMessagesToStderrAPPLE_t = Cvoid
 const clLogMessagesToStderrAPPLE_fn = Ptr{clLogMessagesToStderrAPPLE_t}
 
 function clLogMessagesToSystemLogAPPLE(errstr, private_info, cb, user_data)
-    @ext_ccall libopencl.clLogMessagesToSystemLogAPPLE(errstr::Ptr{Cchar},
-                                                       private_info::Ptr{Cvoid},
-                                                       cb::Csize_t,
-                                                       user_data::Ptr{Cvoid})::Cvoid
+    return @ext_ccall libopencl.clLogMessagesToSystemLogAPPLE(
+        errstr::Ptr{Cchar},
+        private_info::Ptr{Cvoid},
+        cb::Csize_t,
+        user_data::Ptr{Cvoid}
+    )::Cvoid
 end
 
 function clLogMessagesToStdoutAPPLE(errstr, private_info, cb, user_data)
-    @ext_ccall libopencl.clLogMessagesToStdoutAPPLE(errstr::Ptr{Cchar},
-                                                    private_info::Ptr{Cvoid}, cb::Csize_t,
-                                                    user_data::Ptr{Cvoid})::Cvoid
+    return @ext_ccall libopencl.clLogMessagesToStdoutAPPLE(
+        errstr::Ptr{Cchar},
+        private_info::Ptr{Cvoid}, cb::Csize_t,
+        user_data::Ptr{Cvoid}
+    )::Cvoid
 end
 
 function clLogMessagesToStderrAPPLE(errstr, private_info, cb, user_data)
-    @ext_ccall libopencl.clLogMessagesToStderrAPPLE(errstr::Ptr{Cchar},
-                                                    private_info::Ptr{Cvoid}, cb::Csize_t,
-                                                    user_data::Ptr{Cvoid})::Cvoid
+    return @ext_ccall libopencl.clLogMessagesToStderrAPPLE(
+        errstr::Ptr{Cchar},
+        private_info::Ptr{Cvoid}, cb::Csize_t,
+        user_data::Ptr{Cvoid}
+    )::Cvoid
 end
 
 # typedef cl_int CL_API_CALL clIcdGetPlatformIDsKHR_t ( cl_uint num_entries , cl_platform_id * platforms , cl_uint * num_platforms )
@@ -1732,9 +1793,11 @@ const clIcdGetPlatformIDsKHR_t = Cvoid
 const clIcdGetPlatformIDsKHR_fn = Ptr{clIcdGetPlatformIDsKHR_t}
 
 @checked function clIcdGetPlatformIDsKHR(num_entries, platforms, num_platforms)
-    @ext_ccall libopencl.clIcdGetPlatformIDsKHR(num_entries::cl_uint,
-                                                platforms::Ptr{cl_platform_id},
-                                                num_platforms::Ptr{cl_uint})::cl_int
+    @ext_ccall libopencl.clIcdGetPlatformIDsKHR(
+        num_entries::cl_uint,
+        platforms::Ptr{cl_platform_id},
+        num_platforms::Ptr{cl_uint}
+    )::cl_int
 end
 
 # typedef cl_program CL_API_CALL clCreateProgramWithILKHR_t ( cl_context context , const void * il , size_t length , cl_int * errcode_ret )
@@ -1743,9 +1806,11 @@ const clCreateProgramWithILKHR_t = Cvoid
 const clCreateProgramWithILKHR_fn = Ptr{clCreateProgramWithILKHR_t}
 
 function clCreateProgramWithILKHR(context, il, length, errcode_ret)
-    @ext_ccall libopencl.clCreateProgramWithILKHR(context::cl_context, il::Ptr{Cvoid},
-                                                  length::Csize_t,
-                                                  errcode_ret::Ptr{cl_int})::cl_program
+    return @ext_ccall libopencl.clCreateProgramWithILKHR(
+        context::cl_context, il::Ptr{Cvoid},
+        length::Csize_t,
+        errcode_ret::Ptr{cl_int}
+    )::cl_program
 end
 
 const cl_context_memory_initialize_khr = cl_bitfield
@@ -1769,10 +1834,12 @@ const clCreateCommandQueueWithPropertiesKHR_t = Cvoid
 const clCreateCommandQueueWithPropertiesKHR_fn = Ptr{clCreateCommandQueueWithPropertiesKHR_t}
 
 function clCreateCommandQueueWithPropertiesKHR(context, device, properties, errcode_ret)
-    @ext_ccall libopencl.clCreateCommandQueueWithPropertiesKHR(context::cl_context,
-                                                               device::cl_device_id,
-                                                               properties::Ptr{cl_queue_properties_khr},
-                                                               errcode_ret::Ptr{cl_int})::cl_command_queue
+    return @ext_ccall libopencl.clCreateCommandQueueWithPropertiesKHR(
+        context::cl_context,
+        device::cl_device_id,
+        properties::Ptr{cl_queue_properties_khr},
+        errcode_ret::Ptr{cl_int}
+    )::cl_command_queue
 end
 
 # typedef cl_int CL_API_CALL clReleaseDeviceEXT_t ( cl_device_id device )
@@ -1800,11 +1867,13 @@ end
 
 @checked function clCreateSubDevicesEXT(in_device, properties, num_entries, out_devices,
                                         num_devices)
-    @ext_ccall libopencl.clCreateSubDevicesEXT(in_device::cl_device_id,
-                                               properties::Ptr{cl_device_partition_property_ext},
-                                               num_entries::cl_uint,
-                                               out_devices::Ptr{cl_device_id},
-                                               num_devices::Ptr{cl_uint})::cl_int
+    @ext_ccall libopencl.clCreateSubDevicesEXT(
+        in_device::cl_device_id,
+        properties::Ptr{cl_device_partition_property_ext},
+        num_entries::cl_uint,
+        out_devices::Ptr{cl_device_id},
+        num_devices::Ptr{cl_uint}
+    )::cl_int
 end
 
 const cl_mem_migration_flags_ext = cl_bitfield
@@ -1817,13 +1886,15 @@ const clEnqueueMigrateMemObjectEXT_fn = Ptr{clEnqueueMigrateMemObjectEXT_t}
 @checked function clEnqueueMigrateMemObjectEXT(command_queue, num_mem_objects, mem_objects,
                                                flags, num_events_in_wait_list,
                                                event_wait_list, event)
-    @ext_ccall libopencl.clEnqueueMigrateMemObjectEXT(command_queue::cl_command_queue,
-                                                      num_mem_objects::cl_uint,
-                                                      mem_objects::Ptr{cl_mem},
-                                                      flags::cl_mem_migration_flags_ext,
-                                                      num_events_in_wait_list::cl_uint,
-                                                      event_wait_list::Ptr{cl_event},
-                                                      event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueMigrateMemObjectEXT(
+        command_queue::cl_command_queue,
+        num_mem_objects::cl_uint,
+        mem_objects::Ptr{cl_mem},
+        flags::cl_mem_migration_flags_ext,
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 const cl_image_pitch_info_qcom = cl_uint
@@ -1843,14 +1914,16 @@ const clGetDeviceImageInfoQCOM_fn = Ptr{clGetDeviceImageInfoQCOM_t}
 @checked function clGetDeviceImageInfoQCOM(device, image_width, image_height, image_format,
                                            param_name, param_value_size, param_value,
                                            param_value_size_ret)
-    @ext_ccall libopencl.clGetDeviceImageInfoQCOM(device::cl_device_id,
-                                                  image_width::Csize_t,
-                                                  image_height::Csize_t,
-                                                  image_format::Ptr{cl_image_format},
-                                                  param_name::cl_image_pitch_info_qcom,
-                                                  param_value_size::Csize_t,
-                                                  param_value::Ptr{Cvoid},
-                                                  param_value_size_ret::Ptr{Csize_t})::cl_int
+    @ext_ccall libopencl.clGetDeviceImageInfoQCOM(
+        device::cl_device_id,
+        image_width::Csize_t,
+        image_height::Csize_t,
+        image_format::Ptr{cl_image_format},
+        param_name::cl_image_pitch_info_qcom,
+        param_value_size::Csize_t,
+        param_value::Ptr{Cvoid},
+        param_value_size_ret::Ptr{Csize_t}
+    )::cl_int
 end
 
 struct _cl_mem_ion_host_ptr
@@ -1881,23 +1954,27 @@ const clEnqueueReleaseGrallocObjectsIMG_fn = Ptr{clEnqueueReleaseGrallocObjectsI
 @checked function clEnqueueAcquireGrallocObjectsIMG(command_queue, num_objects, mem_objects,
                                                     num_events_in_wait_list,
                                                     event_wait_list, event)
-    @ext_ccall libopencl.clEnqueueAcquireGrallocObjectsIMG(command_queue::cl_command_queue,
-                                                           num_objects::cl_uint,
-                                                           mem_objects::Ptr{cl_mem},
-                                                           num_events_in_wait_list::cl_uint,
-                                                           event_wait_list::Ptr{cl_event},
-                                                           event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueAcquireGrallocObjectsIMG(
+        command_queue::cl_command_queue,
+        num_objects::cl_uint,
+        mem_objects::Ptr{cl_mem},
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 @checked function clEnqueueReleaseGrallocObjectsIMG(command_queue, num_objects, mem_objects,
                                                     num_events_in_wait_list,
                                                     event_wait_list, event)
-    @ext_ccall libopencl.clEnqueueReleaseGrallocObjectsIMG(command_queue::cl_command_queue,
-                                                           num_objects::cl_uint,
-                                                           mem_objects::Ptr{cl_mem},
-                                                           num_events_in_wait_list::cl_uint,
-                                                           event_wait_list::Ptr{cl_event},
-                                                           event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueReleaseGrallocObjectsIMG(
+        command_queue::cl_command_queue,
+        num_objects::cl_uint,
+        mem_objects::Ptr{cl_mem},
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 const cl_mipmap_filter_mode_img = cl_uint
@@ -1911,14 +1988,16 @@ const clEnqueueGenerateMipmapIMG_fn = Ptr{clEnqueueGenerateMipmapIMG_t}
                                              mipmap_filter_mode, array_region, mip_region,
                                              num_events_in_wait_list, event_wait_list,
                                              event)
-    @ext_ccall libopencl.clEnqueueGenerateMipmapIMG(command_queue::cl_command_queue,
-                                                    src_image::cl_mem, dst_image::cl_mem,
-                                                    mipmap_filter_mode::cl_mipmap_filter_mode_img,
-                                                    array_region::Ptr{Csize_t},
-                                                    mip_region::Ptr{Csize_t},
-                                                    num_events_in_wait_list::cl_uint,
-                                                    event_wait_list::Ptr{cl_event},
-                                                    event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueGenerateMipmapIMG(
+        command_queue::cl_command_queue,
+        src_image::cl_mem, dst_image::cl_mem,
+        mipmap_filter_mode::cl_mipmap_filter_mode_img,
+        array_region::Ptr{Csize_t},
+        mip_region::Ptr{Csize_t},
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 # typedef cl_int CL_API_CALL clGetKernelSubGroupInfoKHR_t ( cl_kernel in_kernel , cl_device_id in_device , cl_kernel_sub_group_info param_name , size_t input_value_size , const void * input_value , size_t param_value_size , void * param_value , size_t * param_value_size_ret )
@@ -1930,14 +2009,16 @@ const clGetKernelSubGroupInfoKHR_fn = Ptr{clGetKernelSubGroupInfoKHR_t}
                                              input_value_size, input_value,
                                              param_value_size, param_value,
                                              param_value_size_ret)
-    @ext_ccall libopencl.clGetKernelSubGroupInfoKHR(in_kernel::cl_kernel,
-                                                    in_device::cl_device_id,
-                                                    param_name::cl_kernel_sub_group_info,
-                                                    input_value_size::Csize_t,
-                                                    input_value::Ptr{Cvoid},
-                                                    param_value_size::Csize_t,
-                                                    param_value::Ptr{Cvoid},
-                                                    param_value_size_ret::Ptr{Csize_t})::cl_int
+    @ext_ccall libopencl.clGetKernelSubGroupInfoKHR(
+        in_kernel::cl_kernel,
+        in_device::cl_device_id,
+        param_name::cl_kernel_sub_group_info,
+        input_value_size::Csize_t,
+        input_value::Ptr{Cvoid},
+        param_value_size::Csize_t,
+        param_value::Ptr{Cvoid},
+        param_value_size_ret::Ptr{Csize_t}
+    )::cl_int
 end
 
 const cl_queue_priority_khr = cl_uint
@@ -1970,12 +2051,14 @@ const clGetKernelSuggestedLocalWorkSizeKHR_fn = Ptr{clGetKernelSuggestedLocalWor
 @checked function clGetKernelSuggestedLocalWorkSizeKHR(command_queue, kernel, work_dim,
                                                        global_work_offset, global_work_size,
                                                        suggested_local_work_size)
-    @ext_ccall libopencl.clGetKernelSuggestedLocalWorkSizeKHR(command_queue::cl_command_queue,
-                                                              kernel::cl_kernel,
-                                                              work_dim::cl_uint,
-                                                              global_work_offset::Ptr{Csize_t},
-                                                              global_work_size::Ptr{Csize_t},
-                                                              suggested_local_work_size::Ptr{Csize_t})::cl_int
+    @ext_ccall libopencl.clGetKernelSuggestedLocalWorkSizeKHR(
+        command_queue::cl_command_queue,
+        kernel::cl_kernel,
+        work_dim::cl_uint,
+        global_work_offset::Ptr{Csize_t},
+        global_work_size::Ptr{Csize_t},
+        suggested_local_work_size::Ptr{Csize_t}
+    )::cl_int
 end
 
 const cl_device_integer_dot_product_capabilities_khr = cl_bitfield
@@ -2007,24 +2090,28 @@ const clEnqueueReleaseExternalMemObjectsKHR_fn = Ptr{clEnqueueReleaseExternalMem
                                                         mem_objects,
                                                         num_events_in_wait_list,
                                                         event_wait_list, event)
-    @ext_ccall libopencl.clEnqueueAcquireExternalMemObjectsKHR(command_queue::cl_command_queue,
-                                                               num_mem_objects::cl_uint,
-                                                               mem_objects::Ptr{cl_mem},
-                                                               num_events_in_wait_list::cl_uint,
-                                                               event_wait_list::Ptr{cl_event},
-                                                               event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueAcquireExternalMemObjectsKHR(
+        command_queue::cl_command_queue,
+        num_mem_objects::cl_uint,
+        mem_objects::Ptr{cl_mem},
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 @checked function clEnqueueReleaseExternalMemObjectsKHR(command_queue, num_mem_objects,
                                                         mem_objects,
                                                         num_events_in_wait_list,
                                                         event_wait_list, event)
-    @ext_ccall libopencl.clEnqueueReleaseExternalMemObjectsKHR(command_queue::cl_command_queue,
-                                                               num_mem_objects::cl_uint,
-                                                               mem_objects::Ptr{cl_mem},
-                                                               num_events_in_wait_list::cl_uint,
-                                                               event_wait_list::Ptr{cl_event},
-                                                               event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueReleaseExternalMemObjectsKHR(
+        command_queue::cl_command_queue,
+        num_mem_objects::cl_uint,
+        mem_objects::Ptr{cl_mem},
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 mutable struct _cl_semaphore_khr end
@@ -2040,12 +2127,14 @@ const clGetSemaphoreHandleForTypeKHR_fn = Ptr{clGetSemaphoreHandleForTypeKHR_t}
 
 @checked function clGetSemaphoreHandleForTypeKHR(sema_object, device, handle_type,
                                                  handle_size, handle_ptr, handle_size_ret)
-    @ext_ccall libopencl.clGetSemaphoreHandleForTypeKHR(sema_object::cl_semaphore_khr,
-                                                        device::cl_device_id,
-                                                        handle_type::cl_external_semaphore_handle_type_khr,
-                                                        handle_size::Csize_t,
-                                                        handle_ptr::Ptr{Cvoid},
-                                                        handle_size_ret::Ptr{Csize_t})::cl_int
+    @ext_ccall libopencl.clGetSemaphoreHandleForTypeKHR(
+        sema_object::cl_semaphore_khr,
+        device::cl_device_id,
+        handle_type::cl_external_semaphore_handle_type_khr,
+        handle_size::Csize_t,
+        handle_ptr::Ptr{Cvoid},
+        handle_size_ret::Ptr{Csize_t}
+    )::cl_int
 end
 
 const cl_semaphore_reimport_properties_khr = cl_properties
@@ -2056,9 +2145,11 @@ const clReImportSemaphoreSyncFdKHR_t = Cvoid
 const clReImportSemaphoreSyncFdKHR_fn = Ptr{clReImportSemaphoreSyncFdKHR_t}
 
 @checked function clReImportSemaphoreSyncFdKHR(sema_object, reimport_props, fd)
-    @ext_ccall libopencl.clReImportSemaphoreSyncFdKHR(sema_object::cl_semaphore_khr,
-                                                      reimport_props::Ptr{cl_semaphore_reimport_properties_khr},
-                                                      fd::Cint)::cl_int
+    @ext_ccall libopencl.clReImportSemaphoreSyncFdKHR(
+        sema_object::cl_semaphore_khr,
+        reimport_props::Ptr{cl_semaphore_reimport_properties_khr},
+        fd::Cint
+    )::cl_int
 end
 
 const cl_semaphore_properties_khr = cl_properties
@@ -2100,43 +2191,51 @@ const clRetainSemaphoreKHR_t = Cvoid
 const clRetainSemaphoreKHR_fn = Ptr{clRetainSemaphoreKHR_t}
 
 function clCreateSemaphoreWithPropertiesKHR(context, sema_props, errcode_ret)
-    @ext_ccall libopencl.clCreateSemaphoreWithPropertiesKHR(context::cl_context,
-                                                            sema_props::Ptr{cl_semaphore_properties_khr},
-                                                            errcode_ret::Ptr{cl_int})::cl_semaphore_khr
+    return @ext_ccall libopencl.clCreateSemaphoreWithPropertiesKHR(
+        context::cl_context,
+        sema_props::Ptr{cl_semaphore_properties_khr},
+        errcode_ret::Ptr{cl_int}
+    )::cl_semaphore_khr
 end
 
 @checked function clEnqueueWaitSemaphoresKHR(command_queue, num_sema_objects, sema_objects,
                                              sema_payload_list, num_events_in_wait_list,
                                              event_wait_list, event)
-    @ext_ccall libopencl.clEnqueueWaitSemaphoresKHR(command_queue::cl_command_queue,
-                                                    num_sema_objects::cl_uint,
-                                                    sema_objects::Ptr{cl_semaphore_khr},
-                                                    sema_payload_list::Ptr{cl_semaphore_payload_khr},
-                                                    num_events_in_wait_list::cl_uint,
-                                                    event_wait_list::Ptr{cl_event},
-                                                    event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueWaitSemaphoresKHR(
+        command_queue::cl_command_queue,
+        num_sema_objects::cl_uint,
+        sema_objects::Ptr{cl_semaphore_khr},
+        sema_payload_list::Ptr{cl_semaphore_payload_khr},
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 @checked function clEnqueueSignalSemaphoresKHR(command_queue, num_sema_objects,
                                                sema_objects, sema_payload_list,
                                                num_events_in_wait_list, event_wait_list,
                                                event)
-    @ext_ccall libopencl.clEnqueueSignalSemaphoresKHR(command_queue::cl_command_queue,
-                                                      num_sema_objects::cl_uint,
-                                                      sema_objects::Ptr{cl_semaphore_khr},
-                                                      sema_payload_list::Ptr{cl_semaphore_payload_khr},
-                                                      num_events_in_wait_list::cl_uint,
-                                                      event_wait_list::Ptr{cl_event},
-                                                      event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueSignalSemaphoresKHR(
+        command_queue::cl_command_queue,
+        num_sema_objects::cl_uint,
+        sema_objects::Ptr{cl_semaphore_khr},
+        sema_payload_list::Ptr{cl_semaphore_payload_khr},
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 @checked function clGetSemaphoreInfoKHR(sema_object, param_name, param_value_size,
                                         param_value, param_value_size_ret)
-    @ext_ccall libopencl.clGetSemaphoreInfoKHR(sema_object::cl_semaphore_khr,
-                                               param_name::cl_semaphore_info_khr,
-                                               param_value_size::Csize_t,
-                                               param_value::Ptr{Cvoid},
-                                               param_value_size_ret::Ptr{Csize_t})::cl_int
+    @ext_ccall libopencl.clGetSemaphoreInfoKHR(
+        sema_object::cl_semaphore_khr,
+        param_name::cl_semaphore_info_khr,
+        param_value_size::Csize_t,
+        param_value::Ptr{Cvoid},
+        param_value_size_ret::Ptr{Csize_t}
+    )::cl_int
 end
 
 @checked function clReleaseSemaphoreKHR(sema_object)
@@ -2155,10 +2254,12 @@ const clImportMemoryARM_t = Cvoid
 const clImportMemoryARM_fn = Ptr{clImportMemoryARM_t}
 
 function clImportMemoryARM(context, flags, properties, memory, size, errcode_ret)
-    @ext_ccall libopencl.clImportMemoryARM(context::cl_context, flags::cl_mem_flags,
-                                           properties::Ptr{cl_import_properties_arm},
-                                           memory::Ptr{Cvoid}, size::Csize_t,
-                                           errcode_ret::Ptr{cl_int})::cl_mem
+    return @ext_ccall libopencl.clImportMemoryARM(
+        context::cl_context, flags::cl_mem_flags,
+        properties::Ptr{cl_import_properties_arm},
+        memory::Ptr{Cvoid}, size::Csize_t,
+        errcode_ret::Ptr{cl_int}
+    )::cl_mem
 end
 
 const cl_svm_mem_flags_arm = cl_bitfield
@@ -2213,78 +2314,94 @@ const clSetKernelExecInfoARM_t = Cvoid
 const clSetKernelExecInfoARM_fn = Ptr{clSetKernelExecInfoARM_t}
 
 function clSVMAllocARM(context, flags, size, alignment)
-    @ext_ccall libopencl.clSVMAllocARM(context::cl_context, flags::cl_svm_mem_flags_arm,
-                                       size::Csize_t, alignment::cl_uint)::Ptr{Cvoid}
+    return @ext_ccall libopencl.clSVMAllocARM(
+        context::cl_context, flags::cl_svm_mem_flags_arm,
+        size::Csize_t, alignment::cl_uint
+    )::Ptr{Cvoid}
 end
 
 function clSVMFreeARM(context, svm_pointer)
-    @ext_ccall libopencl.clSVMFreeARM(context::cl_context, svm_pointer::Ptr{Cvoid})::Cvoid
+    return @ext_ccall libopencl.clSVMFreeARM(context::cl_context, svm_pointer::Ptr{Cvoid})::Cvoid
 end
 
 @checked function clEnqueueSVMFreeARM(command_queue, num_svm_pointers, svm_pointers,
                                       pfn_free_func, user_data, num_events_in_wait_list,
                                       event_wait_list, event)
-    @ext_ccall libopencl.clEnqueueSVMFreeARM(command_queue::cl_command_queue,
-                                             num_svm_pointers::cl_uint,
-                                             svm_pointers::Ptr{Ptr{Cvoid}},
-                                             pfn_free_func::Ptr{Cvoid},
-                                             user_data::Ptr{Cvoid},
-                                             num_events_in_wait_list::cl_uint,
-                                             event_wait_list::Ptr{cl_event},
-                                             event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueSVMFreeARM(
+        command_queue::cl_command_queue,
+        num_svm_pointers::cl_uint,
+        svm_pointers::Ptr{Ptr{Cvoid}},
+        pfn_free_func::Ptr{Cvoid},
+        user_data::Ptr{Cvoid},
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 @checked function clEnqueueSVMMemcpyARM(command_queue, blocking_copy, dst_ptr, src_ptr,
                                         size, num_events_in_wait_list, event_wait_list,
                                         event)
-    @ext_ccall libopencl.clEnqueueSVMMemcpyARM(command_queue::cl_command_queue,
-                                               blocking_copy::cl_bool, dst_ptr::Ptr{Cvoid},
-                                               src_ptr::Ptr{Cvoid}, size::Csize_t,
-                                               num_events_in_wait_list::cl_uint,
-                                               event_wait_list::Ptr{cl_event},
-                                               event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueSVMMemcpyARM(
+        command_queue::cl_command_queue,
+        blocking_copy::cl_bool, dst_ptr::Ptr{Cvoid},
+        src_ptr::Ptr{Cvoid}, size::Csize_t,
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 @checked function clEnqueueSVMMemFillARM(command_queue, svm_ptr, pattern, pattern_size,
                                          size, num_events_in_wait_list, event_wait_list,
                                          event)
-    @ext_ccall libopencl.clEnqueueSVMMemFillARM(command_queue::cl_command_queue,
-                                                svm_ptr::Ptr{Cvoid}, pattern::Ptr{Cvoid},
-                                                pattern_size::Csize_t, size::Csize_t,
-                                                num_events_in_wait_list::cl_uint,
-                                                event_wait_list::Ptr{cl_event},
-                                                event::Ptr{cl_event})::cl_int
+    @ext_ccall libopencl.clEnqueueSVMMemFillARM(
+        command_queue::cl_command_queue,
+        svm_ptr::Ptr{Cvoid}, pattern::Ptr{Cvoid},
+        pattern_size::Csize_t, size::Csize_t,
+        num_events_in_wait_list::cl_uint,
+        event_wait_list::Ptr{cl_event},
+        event::Ptr{cl_event}
+    )::cl_int
 end
 
 @checked function clEnqueueSVMMapARM(command_queue, blocking_map, flags, svm_ptr, size,
                                      num_events_in_wait_list, event_wait_list, event)
-    @ext_ccall libopencl.clEnqueueSVMMapARM(command_queue::cl_command_queue,
-                                            blocking_map::cl_bool, flags::cl_map_flags,
-                     ...*[Comment body truncated]*

@maleadt
Copy link
Member

maleadt commented Jan 20, 2025

The Memoization.jl dependency isn't great, also because it doesn't offer us thread safety (unless when using a ridiculously expensive datatype, like ThreadSafeDicts.jl). I wonder if we should copy the CUDA.jl LazyInitialized/@memoize implementation. Alternatively, for now we could just put the buffer type in the task local state, initialized when switching devices.

I'll have a closer look tomorrow.

@maleadt
Copy link
Member

maleadt commented Jan 23, 2025

Doesn't work on Intel, ironically:

❯ jl --project -e 'using OpenCL; cl.platform!("intel"); CLArray([1]) .+ 1'
Precompiling OpenCL...
  1 dependency successfully precompiled in 3 seconds. 64 already precompiled.
ERROR: CLError(code=-30, CL_INVALID_VALUE)
Stacktrace:
  [1] throw_api_error(res::Int32)
    @ OpenCL.cl ~/Julia/pkg/OpenCL/lib/cl/libopencl.jl:6
  [2] check
    @ ~/Julia/pkg/OpenCL/lib/cl/libopencl.jl:17 [inlined]
  [3] clSetKernelExecInfo
    @ ~/Julia/pkg/OpenCL/lib/cl/api.jl:34 [inlined]

@VarLad
Copy link
Author

VarLad commented Jan 23, 2025

I don't own anything Intel so can't help debug/test this one...

Does SVM work here though? CLVector{Int64, cl.SharedVirtualMemory}([1]) .+ 1

@maleadt
Copy link
Member

maleadt commented Jan 23, 2025

No; turns out we weren't properly pushing to svm_pointers on master, so it didn't happen to call clSetKernelExecInfo.

@maleadt
Copy link
Member

maleadt commented Jan 23, 2025

This is the failing code path: https://github.com/intel/compute-runtime/blob/e48b52814958331d46bc6fb6398d6793fdc65049/opencl/source/api/api.cpp#L5300-L5302

Somehow the allocated pointer isn't recognized by the driver.

@maleadt maleadt dismissed their stale review January 23, 2025 21:23

Outdated.

@maleadt maleadt changed the title Add USM support to OpenCL.jl Revamp memory management, and add USM support. Jan 23, 2025
@VarLad
Copy link
Author

VarLad commented Jan 24, 2025

and this doesn't work for USM either?
Can we open a bug report on their side for this? It works with POCL and heck even with NVIDIA drivers...
Also, is your platform an Intel CPU or an Intel GPU? I think their CPU drivers might be problematic...
I'll try getting my hand on an Intel iGPU.

@maleadt
Copy link
Member

maleadt commented Jan 24, 2025

Somehow the allocated pointer isn't recognized by the driver.

Found the issue; we were using an abstract CLPtr[] resulting in boxed objects getting sent to the driver.

@maleadt
Copy link
Member

maleadt commented Jan 24, 2025

Alright, I think I finally got this in a place I'm happy with. It took a lot more work to get all the pieces (calling OpenCL extension functions, introduction of CLPtr, Managed and auto-synchronization, USM on PoCL as well as Intel, SVM and USM co-existence, etc) fully working and in a maintainable shape.

@VarLad Thanks for laying the groundwork here! For future work, I would suggest however to keep the scope of PRs minimal and create multiple ones. That will make it much easier to review as well as get everything working.

Feel free to give this a round of review before merging.

@VarLad
Copy link
Author

VarLad commented Jan 24, 2025

Does a version bump make sense for this PR? 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants