Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying Nerf with CUDA and Metal #13

Open
anandijain opened this issue Jun 15, 2023 · 2 comments
Open

Trying Nerf with CUDA and Metal #13

anandijain opened this issue Jun 15, 2023 · 2 comments

Comments

@anandijain
Copy link

Thanks for this project. Passing through to document some of my usage experience with two different systems.

This one I set the backend to CUDA and pretty much everything works

julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, broadwell)
  Threads: 8 on 12 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 8


Precompiling project...
  1 dependency successfully precompiled in 18 seconds. 154 already precompiled.
  1 dependency had warnings during precompilation:
┌ Nerf [2c86e8b6-813a-40f3-97f9-c72f78886291]
│  [ Info: [Nerf.jl] Backend: CUDA
│  [ Info: [Nerf.jl] Device: CUDA.CUDAKernels.CUDABackend(false, false)
└  
     Testing Running tests...
[ Info: [Nerf.jl] Testing on backend: CUDA.CUDAKernels.CUDABackend(false, false)
Check ray samples span: Error During Test at /home/anandijain/.julia/dev/Nerf/test/sampler.jl:1
  Got exception outside of a @test
  MethodError: no method matching Nerf.RayBundle(::Nerf.OccupancyGrid{CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}}, ::Nerf.Cone, ::Nerf.BBox, ::CUDA.CuArray{SMatrix{3, 3, Float32, 9}, 1, CUDA.Mem.DeviceBuffer}, ::CUDA.CuArray{SVector{3, Float32}, 1, CUDA.Mem.DeviceBuffer}, ::Nerf.CameraIntrinsics{true}; n_rays::Int64, rng_state::UInt64)
  
  Closest candidates are:
    Nerf.RayBundle(::Any; n_rays, n_steps) got unsupported keyword argument "rng_state"
     @ Nerf ~/.julia/dev/Nerf/src/sampler.jl:70

This is on an M1. It doesn't seem like Metal.jl I ran into some resolver issues instantiating with julia master (I confirmed that it is okay on 1.9.1 though). I have the following diff:

julia> versioninfo()
Julia Version 1.10.0-DEV.1397
Commit ed5bd4c9553 (2023-05-30 06:09 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
  Threads: 13 on 4 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 8
diff --git a/Project.toml b/Project.toml
index 9355941..2d12a26 100644
--- a/Project.toml
+++ b/Project.toml
@@ -4,7 +4,6 @@ authors = ["Anton Smirnov <[email protected]>"]
 version = "0.1.0"
 
 [deps]
-AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
 Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
 BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
 CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
@@ -17,6 +16,7 @@ JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
 JpegTurbo = "b835a17e-a41a-41e7-81f0-2f016b05efe0"
 KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
+Metal = "dde4c033-4e86-420c-a63e-0dd931031962"
 Preferences = "21216c6a-2e73-6563-6e65-726566657250"
 Quaternions = "94ee1d12-ae83-5a48-8b1c-48b8ff168ae0"
 Revise = "295af30f-e4ad-537b-8983-00126c2a3abe"
@@ -26,6 +26,5 @@ Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
 Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
 
 [compat]
-AMDGPU = "0.4"
 KernelAbstractions = "0.9"
 Zygote = "0.6.55"
diff --git a/src/kautils.jl b/src/kautils.jl
index 09b53f7..e0b93bd 100644
--- a/src/kautils.jl
+++ b/src/kautils.jl
@@ -1,16 +1,14 @@
-# Supported values are: AMD, CUDA.
-const BACKEND_NAME::String = @load_preference("backend", "AMD")
+# Supported values are: AMD, CUDA, Metal.
+const BACKEND_NAME::String = @load_preference("backend", "Metal")
 
-@static if BACKEND_NAME == "AMD"
-    using AMDGPU
-    AMDGPU.allowscalar(false)
-    const Backend::ROCBackend = ROCBackend()
-
-    Base.rand(::ROCBackend, ::Type{T}, shape) where T = AMDGPU.rand(T, shape)
-elseif BACKEND_NAME == "CUDA"
+@static if BACKEND_NAME == "CUDA"
     using CUDA
     CUDA.allowscalar(false)
     const Backend::CUDABackend = CUDABackend()
 
     Base.rand(::CUDABackend, ::Type{T}, shape) where T = CUDA.rand(T, shape)
+elseif BACKEND_NAME == "Metal"
+    using Metal
+    const Backend::MetalBackend = MetalBackend()
+    Base.rand(::MetalBackend, ::Type{T}, shape) where T = Metal.rand(T, shape)
 end
(Nerf) pkg> add AMDGPU
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package AMDGPU [21141c5a]:
 AMDGPU [21141c5a] log:
 ├─possible versions are: 0.1.0-0.4.14 or uninstalled
 ├─restricted to versions * by an explicit requirement, leaving only versions: 0.1.0-0.4.14
 ├─restricted by compatibility requirements with Adapt [79e6a3ab] to versions: 0.2.2-0.4.14 or uninstalled, leaving only versions: 0.2.2-0.4.14
 │ └─Adapt [79e6a3ab] log:
 │   ├─possible versions are: 0.3.0-3.6.2 or uninstalled
 │   ├─restricted to versions * by Nerf [2c86e8b6], leaving only versions: 0.3.0-3.6.2
 │   │ └─Nerf [2c86e8b6] log:
 │   │   ├─possible versions are: 0.1.0 or uninstalled
 │   │   └─Nerf [2c86e8b6] is fixed to version 0.1.0
 │   ├─restricted by compatibility requirements with CUDA [052768ef] to versions: 1.0.0-3.6.2
 │   │ └─CUDA [052768ef] log:
 │   │   ├─possible versions are: 0.1.0-4.3.2 or uninstalled
 │   │   ├─restricted to versions * by Nerf [2c86e8b6], leaving only versions: 0.1.0-4.3.2
 │   │   │ └─Nerf [2c86e8b6] log: see above
 │   │   ├─restricted by julia compatibility requirements to versions: [2.3.0, 2.5.0-4.3.2] or uninstalled, leaving only versions: [2.3.0, 2.5.0-4.3.2]
 │   │   ├─restricted by compatibility requirements with Adapt [79e6a3ab] to versions: 2.4.0-4.3.2 or uninstalled, leaving only versions: 2.5.0-4.3.2
 │   │   │ └─Adapt [79e6a3ab] log: see above
 │   │   └─restricted by compatibility requirements with GPUCompiler [61eb1bfa] to versions: [3.11.0-4.0.1, 4.1.3-4.2.0] or uninstalled, leaving only versions: [3.11.0-4.0.1, 4.1.3-4.2.0]
 │   │     └─GPUCompiler [61eb1bfa] log:
 │   │       ├─possible versions are: 0.1.0-0.21.0 or uninstalled
 │   │       ├─restricted by compatibility requirements with CUDA [052768ef] to versions: [0.3.0-0.7.3, 0.8.1-0.10.0, 0.11.1-0.12.9, 0.13.3-0.20.3]
 │   │       │ └─CUDA [052768ef] log: see above
 │   │       ├─restricted by compatibility requirements with Metal [dde4c033] to versions: [0.16.0-0.17.3, 0.19.0-0.20.3]
 │   │       │ └─Metal [dde4c033] log:
 │   │       │   ├─possible versions are: 0.0.1-0.4.1 or uninstalled
 │   │       │   ├─restricted to versions * by Nerf [2c86e8b6], leaving only versions: 0.0.1-0.4.1
 │   │       │   │ └─Nerf [2c86e8b6] log: see above
 │   │       │   └─restricted by compatibility requirements with GPUCompiler [61eb1bfa] to versions: 0.0.1-0.3.0 or uninstalled, leaving only versions: 0.0.1-0.3.0
 │   │       │     └─GPUCompiler [61eb1bfa] log: see above
 │   │       └─restricted by compatibility requirements with AMDGPU [21141c5a] to versions: [0.4.0-0.5.5, 0.7.0-0.17.3, 0.19.0-0.19.4], leaving only versions: [0.16.0-0.17.3, 0.19.0-0.19.4]
 │   │         └─AMDGPU [21141c5a] log: see above
 │   ├─restricted by compatibility requirements with Metal [dde4c033] to versions: 3.0.0-3.6.2
 │   │ └─Metal [dde4c033] log: see above
 │   └─restricted by compatibility requirements with ChainRules [082447d4] to versions: 3.4.0-3.6.2
 │     └─ChainRules [082447d4] log:
 │       ├─possible versions are: 0.0.1-1.51.0 or uninstalled
 │       └─restricted by compatibility requirements with Zygote [e88e6eb3] to versions: 1.44.1-1.51.0
 │         └─Zygote [e88e6eb3] log:
 │           ├─possible versions are: 0.1.0-0.6.62 or uninstalled
 │           └─restricted to versions 0.6.55-0.6 by Nerf [2c86e8b6], leaving only versions: 0.6.55-0.6.62
 │             └─Nerf [2c86e8b6] log: see above
 ├─restricted by compatibility requirements with GPUCompiler [61eb1bfa] to versions: 0.4.0-0.4.14 or uninstalled, leaving only versions: 0.4.0-0.4.14
 │ └─GPUCompiler [61eb1bfa] log: see above
 └─restricted by compatibility requirements with LLD_jll [d55e3150] to versions: 0.1.0-0.3.7 or uninstalled — no versions left
   └─LLD_jll [d55e3150] log:
     └─possible versions are: 15.0.7 or uninstalled

The test results for this were:

Precompiling project...
  59 dependencies successfully precompiled in 52 seconds. 66 already precompiled.
  2 dependencies had warnings during precompilation:
┌ ImageMagick [6218d12a-5da1-5696-b52f-db25d2ecc6d1]
│  WARNING: using deprecated binding Colors.RGB1 in ImageCore.
│  , use XRGB instead.
│  WARNING: using deprecated binding Colors.RGB4 in ImageCore.
│  , use RGBX instead.
└  
┌ Nerf [2c86e8b6-813a-40f3-97f9-c72f78886291]
│  [ Info: [Nerf.jl] Backend: Metal
│  [ Info: [Nerf.jl] Device: Metal.MetalKernels.MetalBackend()
└  
     Testing Running tests...
[ Info: [Nerf.jl] Testing on backend: Metal.MetalKernels.MetalBackend()
Deterministic result: Error During Test at /Users/anand/.julia/dev/Nerf/test/grid_encoding.jl:28
  Got exception outside of a @test
  InvalidIRError: compiling MethodInstance for Nerf.gpu_grid_kernel!(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, ::Metal.MtlDeviceArray{Float32, 3, 1}, ::Nothing, ::Metal.MtlDeviceMatrix{Float32, 1}, ::Metal.MtlDeviceMatrix{Float32, 1}, ::Metal.MtlDeviceVector{UInt32, 1}, ::Val{3}, ::Val{2}, ::UInt32, ::Float32) resulted in invalid LLVM IR
  Reason: unsupported call to an unknown function (call to gpu_malloc)
  Stacktrace:
    [1] malloc
      @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:89
    [2] gc_pool_alloc
      @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:120
    [3] MArray
      @ ~/.julia/packages/StaticArraysCore/U2Z1K/src/StaticArraysCore.jl:180
    [4] macro expansion
      @ ~/.julia/packages/StaticArrays/O6dgq/src/arraymath.jl:15
    [5] _zeros
      @ ~/.julia/packages/StaticArrays/O6dgq/src/arraymath.jl:3
    [6] zeros
      @ ~/.julia/packages/StaticArrays/O6dgq/src/arraymath.jl:2
    [7] encode_grid_position
      @ ~/.julia/dev/Nerf/src/encoding/grid_utils.jl:6
    [8] macro expansion
      @ ~/.julia/dev/Nerf/src/encoding/grid_kernels.jl:14
    [9] gpu_grid_kernel!
      @ ~/.julia/packages/KernelAbstractions/LVKmi/src/macros.jl:81
   [10] gpu_grid_kernel!
      @ ./none:0
  Reason: unsupported call to an unknown function (call to gpu_malloc)
  Stacktrace:
   [1] malloc
     @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:89
   [2] macro expansion
     @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:184
   [3] macro expansion
     @ ./none:0
   [4] box
     @ ./none:0
   [5] box_float32
     @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:213
   [6] multiple call sites
     @ unknown:0
  Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
  Stacktrace:
    [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, args::LLVM.Module)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/validation.jl:149
    [2] macro expansion
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:411 [inlined]
    [3] macro expansion
      @ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
    [4] macro expansion
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:410 [inlined]
    [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/utils.jl:89
    [6] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing, ctx::LLVM.ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:118
    [7] kwcall(::NamedTuple, ::typeof(GPUCompiler.codegen), output::Symbol, job::GPUCompiler.CompilerJob)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:92 [inlined]
    [8] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, ctx::LLVM.ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:88
    [9] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/compilation.jl:59
   [10] (::Metal.var"#60#61"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})(ctx::LLVM.ThreadSafeContext)
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/compilation.jl:55 [inlined]
   [11] LLVM.ThreadSafeContext(f::Metal.var"#60#61"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
      @ LLVM ~/.julia/packages/LLVM/5aiiG/src/executionengine/ts_module.jl:14
   [12] JuliaContext
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:35 [inlined]
   [13] compile
      @ GPUCompiler ~/.julia/packages/Metal/9shJi/src/compiler/compilation.jl:54 [inlined]
   [14] actual_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:125
   [15] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::Function, linker::Function)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:103
   [16] macro expansion
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:162 [inlined]
   [17] macro expansion
      @ Metal ./lock.jl:267 [inlined]
   [18] mtlfunction(f::typeof(Nerf.gpu_grid_kernel!), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, Metal.MtlDeviceArray{Float32, 3, 1}, Nothing, Metal.MtlDeviceMatrix{Float32, 1}, Metal.MtlDeviceMatrix{Float32, 1}, Metal.MtlDeviceVector{UInt32, 1}, Val{3}, Val{2}, UInt32, Float32}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}})
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:157
   [19] mtlfunction(f::typeof(Nerf.gpu_grid_kernel!), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, Metal.MtlDeviceArray{Float32, 3, 1}, Nothing, Metal.MtlDeviceMatrix{Float32, 1}, Metal.MtlDeviceMatrix{Float32, 1}, Metal.MtlDeviceVector{UInt32, 1}, Val{3}, Val{2}, UInt32, Float32}})
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:155
   [20] macro expansion
      @ Metal.MetalKernels ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:77 [inlined]
   [21] (::KernelAbstractions.Kernel{Metal.MetalKernels.MetalBackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(Nerf.gpu_grid_kernel!)})(::Metal.MtlArray{Float32, 3}, ::Vararg{Any}; ndrange::Tuple{Int64, UInt32}, workgroupsize::Nothing)
      @ Metal.MetalKernels ~/.julia/packages/Metal/9shJi/src/MetalKernels.jl:105
   [22] (::Nerf.GridEncoding{Metal.MtlVector{UInt32}})(x::Metal.MtlMatrix{Float32}, θ::Metal.MtlMatrix{Float32})
      @ Nerf ~/.julia/dev/Nerf/src/encoding/grid.jl:73
   [23] macro expansion
      @ ~/.julia/dev/Nerf/test/grid_encoding.jl:34 [inlined]
   [24] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [25] top-level scope
      @ ~/.julia/dev/Nerf/test/grid_encoding.jl:29
   [26] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [27] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:25 [inlined]
   [28] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [29] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:25 [inlined]
   [30] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [31] top-level scope
      @ ~/.julia/dev/Nerf/test/runtests.jl:18
   [32] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [33] top-level scope
      @ none:6
   [34] eval(m::Module, e::Expr)
      @ Core ./boot.jl:383 [inlined]
   [35] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:280
   [36] _start()
      @ Base ./client.jl:541
Hashgrid gradients: Error During Test at /Users/anand/.julia/dev/Nerf/test/grid_encoding.jl:38
  Got exception outside of a @test
  InvalidIRError: compiling MethodInstance for Nerf.gpu_grid_kernel!(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, ::Metal.MtlDeviceArray{Float32, 3, 1}, ::Nothing, ::Metal.MtlDeviceMatrix{Float32, 1}, ::Metal.MtlDeviceMatrix{Float32, 1}, ::Metal.MtlDeviceVector{UInt32, 1}, ::Val{3}, ::Val{2}, ::UInt32, ::Float32) resulted in invalid LLVM IR
  Reason: unsupported call to an unknown function (call to gpu_malloc)
  Stacktrace:
    [1] malloc
      @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:89
    [2] gc_pool_alloc
      @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:120
    [3] MArray
      @ ~/.julia/packages/StaticArraysCore/U2Z1K/src/StaticArraysCore.jl:180
    [4] macro expansion
      @ ~/.julia/packages/StaticArrays/O6dgq/src/arraymath.jl:15
    [5] _zeros
      @ ~/.julia/packages/StaticArrays/O6dgq/src/arraymath.jl:3
    [6] zeros
      @ ~/.julia/packages/StaticArrays/O6dgq/src/arraymath.jl:2
    [7] encode_grid_position
      @ ~/.julia/dev/Nerf/src/encoding/grid_utils.jl:6
    [8] macro expansion
      @ ~/.julia/dev/Nerf/src/encoding/grid_kernels.jl:14
    [9] gpu_grid_kernel!
      @ ~/.julia/packages/KernelAbstractions/LVKmi/src/macros.jl:81
   [10] gpu_grid_kernel!
      @ ./none:0
  Reason: unsupported call to an unknown function (call to gpu_malloc)
  Stacktrace:
   [1] malloc
     @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:89
   [2] macro expansion
     @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:184
   [3] macro expansion
     @ ./none:0
   [4] box
     @ ./none:0
   [5] box_float32
     @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:213
   [6] multiple call sites
     @ unknown:0
  Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
  Stacktrace:
    [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, args::LLVM.Module)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/validation.jl:149
    [2] macro expansion
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:411 [inlined]
    [3] macro expansion
      @ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
    [4] macro expansion
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:410 [inlined]
    [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/utils.jl:89
    [6] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing, ctx::LLVM.ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:118
    [7] kwcall(::NamedTuple, ::typeof(GPUCompiler.codegen), output::Symbol, job::GPUCompiler.CompilerJob)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:92 [inlined]
    [8] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, ctx::LLVM.ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:88
    [9] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/compilation.jl:59
   [10] (::Metal.var"#60#61"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})(ctx::LLVM.ThreadSafeContext)
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/compilation.jl:55 [inlined]
   [11] LLVM.ThreadSafeContext(f::Metal.var"#60#61"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
      @ LLVM ~/.julia/packages/LLVM/5aiiG/src/executionengine/ts_module.jl:14
   [12] JuliaContext
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:35 [inlined]
   [13] compile
      @ GPUCompiler ~/.julia/packages/Metal/9shJi/src/compiler/compilation.jl:54 [inlined]
   [14] actual_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:125
   [15] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::Function, linker::Function)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:103
   [16] macro expansion
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:162 [inlined]
   [17] macro expansion
      @ Metal ./lock.jl:267 [inlined]
   [18] mtlfunction(f::typeof(Nerf.gpu_grid_kernel!), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, Metal.MtlDeviceArray{Float32, 3, 1}, Nothing, Metal.MtlDeviceMatrix{Float32, 1}, Metal.MtlDeviceMatrix{Float32, 1}, Metal.MtlDeviceVector{UInt32, 1}, Val{3}, Val{2}, UInt32, Float32}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}})
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:157
   [19] mtlfunction(f::typeof(Nerf.gpu_grid_kernel!), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, Metal.MtlDeviceArray{Float32, 3, 1}, Nothing, Metal.MtlDeviceMatrix{Float32, 1}, Metal.MtlDeviceMatrix{Float32, 1}, Metal.MtlDeviceVector{UInt32, 1}, Val{3}, Val{2}, UInt32, Float32}})
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:155
   [20] macro expansion
      @ Metal.MetalKernels ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:77 [inlined]
   [21] (::KernelAbstractions.Kernel{Metal.MetalKernels.MetalBackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(Nerf.gpu_grid_kernel!)})(::Metal.MtlArray{Float32, 3}, ::Vararg{Any}; ndrange::Tuple{Int64, UInt32}, workgroupsize::Nothing)
      @ Metal.MetalKernels ~/.julia/packages/Metal/9shJi/src/MetalKernels.jl:105
   [22] (::Nerf.GridEncoding{Metal.MtlVector{UInt32}})(x::Metal.MtlMatrix{Float32}, θ::Metal.MtlMatrix{Float32})
      @ Nerf ~/.julia/dev/Nerf/src/encoding/grid.jl:73
   [23] rrule(ge::Nerf.GridEncoding{Metal.MtlVector{UInt32}}, x::Metal.MtlMatrix{Float32}, θ::Metal.MtlMatrix{Float32})
      @ Nerf ~/.julia/dev/Nerf/src/encoding/grid.jl:120 [inlined]
   [24] rrule(::Zygote.ZygoteRuleConfig{Zygote.Context{false}}, ::Nerf.GridEncoding{Metal.MtlVector{UInt32}}, ::Metal.MtlMatrix{Float32}, ::Metal.MtlMatrix{Float32})
      @ ChainRulesCore ~/.julia/packages/ChainRulesCore/0t04l/src/rules.jl:134 [inlined]
   [25] chain_rrule(::Zygote.ZygoteRuleConfig{Zygote.Context{false}}, ::Nerf.GridEncoding{Metal.MtlVector{UInt32}}, ::Metal.MtlMatrix{Float32}, ::Metal.MtlMatrix{Float32})
      @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/chainrules.jl:223 [inlined]
   [26] macro expansion
      @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0 [inlined]
   [27] _pullback(::Zygote.Context{false}, ::Nerf.GridEncoding{Metal.MtlVector{UInt32}}, ::Metal.MtlMatrix{Float32}, ::Metal.MtlMatrix{Float32})
      @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:81 [inlined]
   [28] #3
      @ Zygote ~/.julia/dev/Nerf/test/grid_encoding.jl:46 [inlined]
   [29] _pullback(ctx::Zygote.Context{false}, f::var"#3#8"{Metal.MtlMatrix{Float32}, Nerf.GridEncoding{Metal.MtlVector{UInt32}}}, args::Metal.MtlMatrix{Float32})
      @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
   [30] pullback(f::Function, cx::Zygote.Context{false}, args::Metal.MtlMatrix{Float32})
      @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:44
   [31] pullback
      @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:42 [inlined]
   [32] gradient(f::Function, args::Metal.MtlMatrix{Float32})
      @ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:96
   [33] macro expansion
      @ ~/.julia/dev/Nerf/test/grid_encoding.jl:45 [inlined]
   [34] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [35] top-level scope
      @ ~/.julia/dev/Nerf/test/grid_encoding.jl:39
   [36] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [37] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:25 [inlined]
   [38] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [39] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:25 [inlined]
   [40] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [41] top-level scope
      @ ~/.julia/dev/Nerf/test/runtests.jl:18
   [42] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [43] top-level scope
      @ none:6
   [44] eval(m::Module, e::Expr)
      @ Core ./boot.jl:383 [inlined]
   [45] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:280
   [46] _start()
      @ Base ./client.jl:541
Density bitfield update: Error During Test at /Users/anand/.julia/dev/Nerf/test/occupancy.jl:1
  Got exception outside of a @test
  Scalar indexing is disallowed.
  Invocation of getindex resulted in scalar indexing of a GPU array.
  This is typically caused by calling an iterating implementation of a method.
  Such implementations *do not* execute on the GPU, but very slowly on the CPU,
  and therefore are only permitted from the REPL for prototyping purposes.
  If you did intend to index this array, annotate the caller with @allowscalar.
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:35
    [2] assertscalar(op::String)
      @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
    [3] getindex(xs::Metal.MtlArray{Float32, 4}, I::Int64)
      @ GPUArrays ~/.julia/packages/GPUArrays/t0LfC/src/host/indexing.jl:9 [inlined]
    [4] getindex(V::SubArray{Float32, 3, Metal.MtlArray{Float32, 4}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, i::Int64)
      @ Base ./subarray.jl:307 [inlined]
    [5] first(a::SubArray{Float32, 3, Metal.MtlArray{Float32, 4}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true})
      @ Base ./abstractarray.jl:449 [inlined]
    [6] _mean(f::Nerf.var"#10#11", A::SubArray{Float32, 3, Metal.MtlArray{Float32, 4}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, dims::Colon)
      @ Statistics /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:176
    [7] mean(f::Nerf.var"#10#11", A::SubArray{Float32, 3, Metal.MtlArray{Float32, 4}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}; dims::Colon)
      @ Statistics /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:104 [inlined]
    [8] update_binary!(oc::Nerf.OccupancyGrid{Metal.MtlArray{Float32, 4}, Metal.MtlVector{UInt8}}; threshold::Float32)
      @ Nerf ~/.julia/dev/Nerf/src/acceleration/occupancy.jl:117
    [9] update_binary!
      @ ~/.julia/dev/Nerf/src/acceleration/occupancy.jl:114 [inlined]
   [10] macro expansion
      @ ~/.julia/dev/Nerf/test/occupancy.jl:15 [inlined]
   [11] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [12] top-level scope
      @ ~/.julia/dev/Nerf/test/occupancy.jl:2
   [13] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [14] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:34 [inlined]
   [15] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [16] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:34 [inlined]
   [17] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [18] top-level scope
      @ ~/.julia/dev/Nerf/test/runtests.jl:18
   [19] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [20] top-level scope
      @ none:6
   [21] eval(m::Module, e::Expr)
      @ Core ./boot.jl:383 [inlined]
   [22] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:280
   [23] _start()
      @ Base ./client.jl:541
Check ray samples span: Error During Test at /Users/anand/.julia/dev/Nerf/test/sampler.jl:1
  Got exception outside of a @test
  InvalidIRError: compiling MethodInstance for Nerf.gpu_generate_points!(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::Metal.MtlDeviceVector{SVector{3, Float32}, 1}, ::Metal.MtlDeviceVector{UInt32, 1}, ::UInt64, ::SubArray{Float32, 4, Metal.MtlDeviceArray{Float32, 4, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true}, ::Nerf.BBox, ::Float32, ::UInt32) resulted in invalid LLVM IR
  Reason: unsupported call to an unknown function (call to gpu_malloc)
  Stacktrace:
    [1] malloc
      @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:89
    [2] macro expansion
      @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:184
    [3] macro expansion
      @ ./none:0
    [4] box
      @ ./none:0
    [5] box_float32
      @ ~/.julia/packages/GPUCompiler/NVLGB/src/runtime.jl:213
    [6] trunc
      @ ./float.jl:874
    [7] floor
      @ ./float.jl:383
    [8] macro expansion
      @ ~/.julia/dev/Nerf/src/acceleration/occupancy.jl:203
    [9] gpu_generate_points!
      @ ~/.julia/packages/KernelAbstractions/LVKmi/src/macros.jl:81
   [10] gpu_generate_points!
      @ ./none:0
  Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
  Stacktrace:
    [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, args::LLVM.Module)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/validation.jl:149
    [2] macro expansion
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:411 [inlined]
    [3] macro expansion
      @ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
    [4] macro expansion
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:410 [inlined]
    [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/utils.jl:89
    [6] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing, ctx::LLVM.ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:118
    [7] kwcall(::NamedTuple, ::typeof(GPUCompiler.codegen), output::Symbol, job::GPUCompiler.CompilerJob)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:92 [inlined]
    [8] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, ctx::LLVM.ThreadSafeContext)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:88
    [9] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/compilation.jl:59
   [10] (::Metal.var"#60#61"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})(ctx::LLVM.ThreadSafeContext)
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/compilation.jl:55 [inlined]
   [11] LLVM.ThreadSafeContext(f::Metal.var"#60#61"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
      @ LLVM ~/.julia/packages/LLVM/5aiiG/src/executionengine/ts_module.jl:14
   [12] JuliaContext
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:35 [inlined]
   [13] compile
      @ GPUCompiler ~/.julia/packages/Metal/9shJi/src/compiler/compilation.jl:54 [inlined]
   [14] actual_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:125
   [15] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::Function, linker::Function)
      @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:103
   [16] macro expansion
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:162 [inlined]
   [17] macro expansion
      @ Metal ./lock.jl:267 [inlined]
   [18] mtlfunction(f::typeof(Nerf.gpu_generate_points!), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Metal.MtlDeviceVector{SVector{3, Float32}, 1}, Metal.MtlDeviceVector{UInt32, 1}, UInt64, SubArray{Float32, 4, Metal.MtlDeviceArray{Float32, 4, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true}, Nerf.BBox, Float32, UInt32}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}})
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:157
   [19] mtlfunction(f::typeof(Nerf.gpu_generate_points!), tt::Type{Tuple{KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, Metal.MtlDeviceVector{SVector{3, Float32}, 1}, Metal.MtlDeviceVector{UInt32, 1}, UInt64, SubArray{Float32, 4, Metal.MtlDeviceArray{Float32, 4, 1}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true}, Nerf.BBox, Float32, UInt32}})
      @ Metal ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:155
   [20] macro expansion
      @ Metal.MetalKernels ~/.julia/packages/Metal/9shJi/src/compiler/execution.jl:77 [inlined]
   [21] (::KernelAbstractions.Kernel{Metal.MetalKernels.MetalBackend, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, typeof(Nerf.gpu_generate_points!)})(::Metal.MtlVector{SVector{3, Float32}}, ::Vararg{Any}; ndrange::Int64, workgroupsize::Nothing)
      @ Metal.MetalKernels ~/.julia/packages/Metal/9shJi/src/MetalKernels.jl:105
   [22] kwcall(::NamedTuple, obj::KernelAbstractions.Kernel{Metal.MetalKernels.MetalBackend}, args::Vararg{Any})
      @ Metal.MetalKernels ~/.julia/packages/Metal/9shJi/src/MetalKernels.jl:101 [inlined]
   [23] update!(density_eval_fn::var"#19#20", oc::Nerf.OccupancyGrid{Metal.MtlArray{Float32, 4}, Metal.MtlVector{UInt8}}; rng_state::UInt64, cone::Nerf.Cone, bbox::Nerf.BBox, step::Int64, update_frequency::Int64, n_levels::Int64, threshold::Float32, decay::Float32, warmup_steps::Int64)
      @ Nerf ~/.julia/dev/Nerf/src/acceleration/occupancy.jl:87
   [24] macro expansion
      @ ~/.julia/dev/Nerf/test/sampler.jl:19 [inlined]
   [25] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [26] top-level scope
      @ ~/.julia/dev/Nerf/test/sampler.jl:2
   [27] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [28] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:37 [inlined]
   [29] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [30] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:37 [inlined]
   [31] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [32] top-level scope
      @ ~/.julia/dev/Nerf/test/runtests.jl:18
   [33] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [34] top-level scope
      @ none:6
   [35] eval(m::Module, e::Expr)
      @ Core ./boot.jl:383 [inlined]
   [36] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:280
   [37] _start()
      @ Base ./client.jl:541
Render cube: Error During Test at /Users/anand/.julia/dev/Nerf/test/renderer.jl:1
  Got exception outside of a @test
  Scalar indexing is disallowed.
  Invocation of getindex resulted in scalar indexing of a GPU array.
  This is typically caused by calling an iterating implementation of a method.
  Such implementations *do not* execute on the GPU, but very slowly on the CPU,
  and therefore are only permitted from the REPL for prototyping purposes.
  If you did intend to index this array, annotate the caller with @allowscalar.
  Stacktrace:
    [1] error(s::String)
      @ Base ./error.jl:35
    [2] assertscalar(op::String)
      @ GPUArraysCore ~/.julia/packages/GPUArraysCore/uOYfN/src/GPUArraysCore.jl:103
    [3] getindex(xs::Metal.MtlArray{Float32, 4}, I::Int64)
      @ GPUArrays ~/.julia/packages/GPUArrays/t0LfC/src/host/indexing.jl:9 [inlined]
    [4] getindex(V::SubArray{Float32, 3, Metal.MtlArray{Float32, 4}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, i::Int64)
      @ Base ./subarray.jl:307 [inlined]
    [5] first(a::SubArray{Float32, 3, Metal.MtlArray{Float32, 4}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true})
      @ Base ./abstractarray.jl:449 [inlined]
    [6] _mean(f::Nerf.var"#10#11", A::SubArray{Float32, 3, Metal.MtlArray{Float32, 4}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}, dims::Colon)
      @ Statistics /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:176
    [7] mean(f::Nerf.var"#10#11", A::SubArray{Float32, 3, Metal.MtlArray{Float32, 4}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Int64}, true}; dims::Colon)
      @ Statistics /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Statistics/src/Statistics.jl:104 [inlined]
    [8] update_binary!(oc::Nerf.OccupancyGrid{Metal.MtlArray{Float32, 4}, Metal.MtlVector{UInt8}}; threshold::Float32)
      @ Nerf ~/.julia/dev/Nerf/src/acceleration/occupancy.jl:117
    [9] update_binary!(oc::Nerf.OccupancyGrid{Metal.MtlArray{Float32, 4}, Metal.MtlVector{UInt8}})
      @ Nerf ~/.julia/dev/Nerf/src/acceleration/occupancy.jl:114
   [10] macro expansion
      @ ~/.julia/dev/Nerf/test/renderer.jl:21 [inlined]
   [11] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [12] top-level scope
      @ ~/.julia/dev/Nerf/test/renderer.jl:2
   [13] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [14] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:40 [inlined]
   [15] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [16] macro expansion
      @ ~/.julia/dev/Nerf/test/runtests.jl:40 [inlined]
   [17] macro expansion
      @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1547 [inlined]
   [18] top-level scope
      @ ~/.julia/dev/Nerf/test/runtests.jl:18
   [19] include(fname::String)
      @ Base.MainInclude ./client.jl:478
   [20] top-level scope
      @ none:6
   [21] eval(m::Module, e::Expr)
      @ Core ./boot.jl:383 [inlined]
   [22] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:280
   [23] _start()
      @ Base ./client.jl:541
Test Summary:               | Pass  Error  Total   Time
Nerf                        |   99      5    104  50.5s
  BBox                      |   24            24   0.4s
  Utils                     |   42            42   1.2s
  Grid encoding             |   18      2     20  12.4s
    Grid Index              |    2             2   0.0s
    Fractional position     |   16            16   0.0s
    Deterministic result    |           1      1   9.0s
    Hashgrid gradients      |           1      1   3.2s
  Spherical harmonics       |    2             2   1.4s
  NN                        |   10            10  13.4s
  Occupancy                 |    3      1      4   3.2s
    Density bitfield update |    3      1      4   1.1s
  Sampler                   |           1      1   7.2s
    Check ray samples span  |           1      1   7.1s
  Renderer                  |           1      1  11.3s
    Render cube             |           1      1  11.3s
ERROR: LoadError: Some tests did not pass: 99 passed, 0 failed, 5 errored, 0 broken.
in expression starting at /Users/anand/.julia/dev/Nerf/test/runtests.jl:17
ERROR: Package Nerf errored during testing
@pxl-th
Copy link
Member

pxl-th commented Jun 15, 2023

Looks like Metal.jl does not support atomics yet: https://github.com/JuliaGPU/Metal.jl/blob/bb6054e0a4a6195f278e56c2c51f16cf240dc7cb/src/MetalKernels.jl#L27

Without them, the training won't work, only inference.

@pxl-th
Copy link
Member

pxl-th commented Jun 15, 2023

Additionally, Metal.jl may be missing some other things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants