Releases: JuliaGPU/Metal.jl
v1.5.0
Metal v1.5.0
Metal.jl 1.5 is a relatively minor release, which the most important change being behind the scenes: GPUArrays.jl v11 has switched to KernelAbstractions.jl (#461).
There is also one (technically) breaking change: code_agx
and @device_code_agx
have been removed (#512) because of the heavy Python dependency, and conflicts with PythonCall.jl. This functionality did not support recent M GPUs anyway, so it is unlikely to affect many users.
Features
Bug fixes
- Fix
fill
: #496
Merged pull requests:
- Add more tests to api validation testing (#447) (@christiangnrd)
- Adapt to GPUArrays.jl transition to KernelAbstractions.jl. (#461) (@maleadt)
- Switch CI to 1.11. (#462) (@maleadt)
- Remove old code and test cleanup (#464) (@christiangnrd)
- Adapt to JuliaGPU/GPUArrays.jl#567. (#475) (@maleadt)
- Bump LLVM downgrader (#479) (@maleadt)
- Store more debug files when encountering compilation errors. (#482) (@maleadt)
- Use
OncePerProcess
in 1.12+ (#483) (@christiangnrd) - Don't run benchmarks from fork (#485) (@christiangnrd)
- Still run GH Action when merged (#486) (@christiangnrd)
- Bump IR downgrader (#489) (@maleadt)
- Move MTL tests and add a few (#491) (@christiangnrd)
- Generate MTL and MPS structs and enums with Clang.jl (#492) (@christiangnrd)
- Fix copy tests (#493) (@christiangnrd)
- Simplify benchmark runner and pipelines (#494) (@maleadt)
- Fix global linear indexing (
fill!
) (#496) (@christiangnrd) - Couple typos and
is_m4
function (#498) (@christiangnrd) - Initial support for MPSNDArray (#499) (@christiangnrd)
- Tweak benchmark CI job (#501) (@maleadt)
- Fix
MPSNDArrayDescriptor
wrapper (#502) (@christiangnrd) - Metal library parsing: using CodecBzip2 feature to ignore padding. (#504) (@maleadt)
- Followup to #492: Enable C function wrapping (#505) (@christiangnrd)
- Rerun random tests with chance of false negative once. (#506) (@christiangnrd)
- Bump LLVM downgrader (#507) (@maleadt)
- Test loading of package on unsupported platforms (#509) (@christiangnrd)
- Remove
device_code_agx
(#512) (@christiangnrd) - Fix typo in random tests (#514) (@christiangnrd)
- Fix Documenter failures (#515) (@christiangnrd)
Closed issues:
- KernelAbstractions: add Atomix back-end (#218)
@device_code_agx
errors when Metal Shader Validation is enabled (#463)fill
broken after KA integration (#466)- Compilation to native code failed: NSError: Undefined symbols (#480)
ObjectiveC.Foundation.NSErrorInstance(ObjectiveC.id{ObjectiveC.Foundation.NSError}(0x000000014cb8bd90))
(#487)- phi-related IR downgrade issue (#488)
- Circular dependency when precompiling (#495)
- Bad interaction between PyCall and Metal (#500)
- Add github actions CI for linux, windows and non-functional macOS to ensure that precompilation and loading works (#508)
v1.4.2
Metal v1.4.2
Merged pull requests:
- Fix loading on unsupported platforms (#459) (@christiangnrd)
Closed issues:
v1.4.1
Metal v1.4.1
Merged pull requests:
- Update Readme (#444) (@christiangnrd)
- Use CPU copy with SharedStorage (#445) (@christiangnrd)
- Disable nightly CI and fix invalid Metal API usage (#448) (@christiangnrd)
- Don't report benchmarks on main branch commits (#450) (@christiangnrd)
- Fix #451 and a couple other fixes (#452) (@christiangnrd)
- Only load BFloat16s extension on Apple systems (#454) (@christiangnrd)
- CompatHelper: bump compat for GPUCompiler to 1, (keep existing compat) (#455) (@github-actions[bot])
Closed issues:
v1.4.0
Metal v1.4.0
Merged pull requests:
- Use unified memory for scalar indexing of permutation matrices (#313) (@tgymnich)
- Add
MPSMatrixRandom
(#321) (@christiangnrd) - [.gitignore] Also ignore versioned Manifests (#410) (@christiangnrd)
- Remove broken link in Docs (#413) (@christiangnrd)
- Remove unused [extras] section in Project.toml (#415) (@christiangnrd)
- Small fix and typos (#417) (@christiangnrd)
- Add Benchmarking CI (#420) (@christiangnrd)
- [NFC] Fix warning in
topk
docstrings (#421) (@christiangnrd) - Allow initialisation of MTLSize with tuples of different integer types (#425) (@tgymnich)
- Add CI for macOS 15 (#426) (@christiangnrd)
- Simplify versioninfo() and report more packages. (#429) (@maleadt)
- Allow controlling compilation target versions. (#430) (@maleadt)
- Add a missing memory fence to a SIMD test. (#432) (@maleadt)
- Fix
MPS.synchronize_state
(#434) (@christiangnrd) - Make
lu
results have same storage mode as input (#435) (@christiangnrd) - Fix benchmarking CI and benchmark Shared and Private storage modes (#437) (@christiangnrd)
- NFC tweak to MPSMatrixCopy tests (#439) (@christiangnrd)
- Get more descriptive errors from flaky test (#440) (@christiangnrd)
Closed issues:
- Port the opportunistic synchronization from CUDA.jl (#317)
- Control flow-related miscompilation: (#401)
- More sporadic 1.11 hangs (#412)
- Support for
LinearAlgebra.kron
(#422) - Can't use gemm! methods with Metal (#423)
- Error for thread/group size with different integer types (#424)
- README example broken (#427)
- Intermittent load_store_tg test failure (#428)
v1.3.0
Metal v1.3.0
Merged pull requests:
- Fix typo in docs (#384) (@christiangnrd)
- Bump minimal Julia requirement to v1.10. (#385) (@maleadt)
- Remove Requires dependency (#386) (@christiangnrd)
- Reflection: Figure out kernel names by looking at metallib section. (#390) (@maleadt)
- Add tests for broadcasting minimum and maximum (#391) (@tgymnich)
- Don't export
MTL
(#392) (@christiangnrd) - Add erfinv (#394) (@tgymnich)
- Add expm1 (#395) (@tgymnich)
- Cleanup some imports (#398) (@christiangnrd)
- Remove type-pirated function (#399) (@christiangnrd)
- Unexport some high-level MPS functionality from
MPS
(#400) (@christiangnrd) - Adapt to new REPL precompile changes (JuliaLang/julia#55210) (#403) (@christiangnrd)
- Bump GPUCompiler. (#404) (@maleadt)
- Bump LLVM compat (#407) (@maleadt)
- Make 1.11 CI success mandatory. (#408) (@maleadt)
Closed issues:
- Audit exports/public symbols (#359)
- Compilation failure on 1.11 (#370)
MTLBinaryArchive
(#387)Metal.code_agx()
failing in MacOS 15 Beta 3 (#388)- Test for min / max broadcasting issue (#389)
- Type piracy (#396)
- Potentially unused code in gpuarrays.jl (#397)
Shared
vsSharedStorage
in examples/unified_memory (#405)- Unsuported call to an unknown function when calling
Distributions
(#406)
v1.2.0
Metal v1.2.0
Merged pull requests:
- Avoid constructing
MulAddMul
s on Julia v1.12+ (#295) (@dkarrasch) - Trigger the runtime profiler when a test times out. (#330) (@maleadt)
- Add MPSMatrixSoftMax (#333) (@christiangnrd)
- Reorganize and add some MPS tests (#335) (@christiangnrd)
- Typo fix (#336) (#337) (@101001000)
- Add error message for running Metal.jl under Rosetta (#339) (@tgymnich)
- Add
MPSCommandBuffer
(#340) (@christiangnrd) - Bump julia-actions/setup-julia from 1 to 2 (#341) (@dependabot[bot])
- Revert error message for Rosetta (#342) (@tgymnich)
- Update to ObjectiveC.jl v3. (#343) (@maleadt)
- Add autoreleasepools to MPS interface methods. (#344) (@maleadt)
- Don't redundantly return the cmdbuf from commit methods. (#345) (@maleadt)
- Whitespace fixes (#346) (@christiangnrd)
- CompatHelper: bump compat for LLVM to 7, (keep existing compat) (#347) (@github-actions[bot])
- CompatHelper: add new compat entry for SpecialFunctions in [weakdeps] at version 2, (keep existing compat) (#352) (@github-actions[bot])
- [NFC] Fix indentation (#353) (@christiangnrd)
- Bump LLVM downgrader (#354) (@maleadt)
- Don't export non-existent
contents
(#356) (@christiangnrd) - Remove/fix unused exports (#357) (@christiangnrd)
- Unexport
SimpleVersion
andAS
(#360) (@christiangnrd) - Add support for opaque pointers (#361) (@maleadt)
- Docstrings (#362) (@christiangnrd)
- Initial MacOS 15 support (#365) (@christiangnrd)
- Replace
current_device()
withdevice()
(#366) (@christiangnrd) - Support reading metallib v1.2.8 files from macOS 15. (#367) (@maleadt)
- Add metallib (dis)assembly helper scripts. (#368) (@maleadt)
- Simplify testing of examples. (#369) (@maleadt)
- Temporarily allow 1.11 to fail. (#371) (@maleadt)
- CompatHelper: add new compat entry for PrecompileTools at version 1, (keep existing compat) (#372) (@github-actions[bot])
- Define complex sqrt (#374) (@mtfishman)
- Check the macOS version during initialization. (#375) (@maleadt)
- CompatHelper: bump compat for LLVM to 8, (keep existing compat) (#376) (@github-actions[bot])
- Add
accumulate
implementation (#377) (@chengchingwen) - fix derived device array (#378) (@chengchingwen)
- avoid ReshapedArray using Int128 in metal kernel (#379) (@chengchingwen)
- improve type stability of derived array (#380) (@chengchingwen)
- add
findall
implementation (#382) (@zhenwu0728) - Bump version (#383) (@christiangnrd)
Closed issues:
- Tests sporadically timing out on 1.11 (#329)
- ReshapedArray indexing broken because of Int128 operation (#332)
- KernelAbstractions copyto! typo (#336)
- Segmentation Faults (#338)
- Port
accmulate!
andfindall
from CUDA.jl (#348) - Tests failing with
GPUCompiler
v0.26.5 andLLVM
v7.1 (#350) - downgrades LLVM (#355)
- sqrt(::Complex) unsupported due to conversion exceptions (#364)
v1.1.0
Metal v1.1.0
Merged pull requests:
- Add
resize!
(#279) (@mtfishman) - Initial MTLTexture support (#280) (@christiangnrd)
- Avoid redundant pointer conversions for threadgroup memory. (#283) (@maleadt)
- Re-implement metallib generation in Julia. (#284) (@maleadt)
- CompatHelper: add new compat entry for SHA at version 0.7, (keep existing compat) (#286) (@github-actions[bot])
- Support more of the metallib format (#288) (@maleadt)
- Address potentiallly buggy
mtl
behaviour. (#290) (@christiangnrd) - CompatHelper: add new compat entry for CodecBzip2 at version 0.8, (keep existing compat) (#292) (@github-actions[bot])
- Remove an unneeded pointer method. (#293) (@maleadt)
- Use NSAutoreleasePool to clean up memory. (#294) (@maleadt)
adapt_storage
-related improvements (#296) (@christiangnrd)- CompatHelper: bump compat for ObjectiveC to 2, (keep existing compat) (#297) (@github-actions[bot])
- Add support for signposts (#300) (@maleadt)
- Retain NSError we rethrow to avoid an UAF. (#302) (@maleadt)
- Minor mapreduce improvements (#303) (@maleadt)
- Specialize broadcast to avoid integer divisions. (#304) (@maleadt)
- Better Support for Unified Memory (#305) (@tgymnich)
- Add 1.11 CI (#306) (@christiangnrd)
- Remove unused files (#307) (@tgymnich)
- Skip profiling tests on macOS 14.4/M1. (#310) (@maleadt)
- Increase test timeout limit to accomodate 1.8 (#311) (@christiangnrd)
- Test all storage modes (#314) (@christiangnrd)
- Fix doctests (#315) (@christiangnrd)
- Fix KernelAbstractions for Unified Memory (#316) (@tgymnich)
- CompatHelper: add new compat entry for Preferences at version 1, (keep existing compat) (#318) (@github-actions[bot])
- Minor cleanup (#319) (@christiangnrd)
- Create MtlArray using memory allocated by Array (#320) (@christiangnrd)
- Re-enable profiling tests on M1/14.4 when using Xcode 15.3. (#322) (@maleadt)
- Small typo and doc fixup (#325) (@christiangnrd)
- BFloat16s.jl extension and related improvements (#326) (@christiangnrd)
- Support for Julia 1.11 (#327) (@maleadt)
Closed issues:
- Validation-related back-end crash on macOS Ventura (#34)
- slow broadcast copy in 2D (#41)
- Poor performance of mapreduce (#46)
- Multiplication with SubArrays (#47)
- Add support to creating MtlArray using a memory allocated by Array (#62)
- Improve use of unified memory (#86)
- Use Autoreleasepools with Metal (#103)
- Unknown RFLT tag generated by macOS 13 Metal compiler (#167)
- mapreduce allocates a lot on the CPU (#211)
- Legalization errors with vectorized code (#257)
- Compilation Failure due to undefined symbols (#276)
resize!
,append!
not defined (#277)- tag new version (#278)
- Panic during profiling tests on 14.4 beta (#281)
- M3 backend cannot handle atomics with complicated pointer conversions (#282)
- Int128 does not compile (#287)
- Two suspicious
mtl
-related behaviours (#289) - LU factorization: add allowsingular keyword argument (#299)
- Autorelease changes lead to use after free with errors (#301)
- Reductions don't work on Shared Arrays (#312)
v1.0.0
Metal v1.0.0
Merged pull requests:
- Matrix batches (#158) (@tgymnich)
- Add 1.10 CI. (#256) (@maleadt)
- Update manifest (#258) (@github-actions[bot])
- CompatHelper: bump compat for GPUCompiler to 0.25, (keep existing compat) (#259) (@github-actions[bot])
- Bump actions/checkout from 3 to 4 (#260) (@dependabot[bot])
- Update manifest (#261) (@github-actions[bot])
- CompatHelper: bump compat for CEnum to 0.5, (keep existing compat) (#262) (@github-actions[bot])
- Update manifest (#263) (@github-actions[bot])
- CompatHelper: add new compat entry for Artifacts at version 1, (keep existing compat) (#264) (@github-actions[bot])
- Reduce launch overhead by generating code to encode arguments. (#265) (@maleadt)
- Remove unused function argument (#266) (@tgymnich)
- Introduce application tracing profiler (#267) (@maleadt)
- Remove content(::MTLBuffer), use convert intead. (#268) (@maleadt)
- Allow more kwargs syntax with kernel launches (#269) (@maleadt)
- Don't re-use the IO object when shelling out to Python. (#271) (@maleadt)
- Preserve storage mode when broadcasting. (#273) (@maleadt)
Closed issues:
v0.5.1
Metal v0.5.1
Merged pull requests:
- MPSMatrix improvements (#157) (@tgymnich)
- Update manifest (#221) (@github-actions[bot])
- Update manifest (#222) (@github-actions[bot])
- Update manifest (#224) (@github-actions[bot])
- Update manifest (#227) (@github-actions[bot])
- CompatHelper: bump compat for ObjectiveC to 1, (keep existing compat) (#228) (@github-actions[bot])
- Update manifest (#230) (@github-actions[bot])
- Fix argument types in sincos (#232) (@fjebaker)
- Update manifest (#233) (@github-actions[bot])
- Improve docs (#235) (@christiangnrd)
- Remove linear algebra section of MPS docs (#237) (@christiangnrd)
- CompatHelper: bump compat for GPUCompiler to 0.22, (keep existing compat) (#238) (@github-actions[bot])
- Port openlibm log1pf as log1p (#239) (@sotlampr)
- Port openlibm erf (#240) (@tgymnich)
- Remove 1.6-era override mechanism. (#241) (@maleadt)
- CompatHelper: add new compat entry for Requires at version 1, (keep existing compat) (#242) (@github-actions[bot])
- Update manifest (#243) (@github-actions[bot])
- enable dependabot for GitHub actions (#244) (@ranocha)
- Bump actions/checkout from 2 to 3 (#245) (@dependabot[bot])
- Bump peter-evans/create-pull-request from 3 to 5 (#246) (@dependabot[bot])
- Show
METAL_CAPTURE_ENABLED
inMetal.versioninfo()
when the environment variable is set (#248) (@christiangnrd) - Update manifest (#249) (@github-actions[bot])
- Adapt to GPUCompiler.jl, and other small updates. (#250) (@maleadt)
- Switch to GPUArrays buffer management. (#251) (@maleadt)
- Update manifest (#252) (@github-actions[bot])
- Update manifest (#253) (@github-actions[bot])
- Bump GPUCompiler (#255) (@maleadt)
Closed issues:
- Random access indexing into MtlArray views cause scalar indexing (#149)
- Q: How to debug kernels - KA.@print? (#223)
- Crash during MTLDispatchListApply (#225)
- Unable to compile trig functions through ForwardDiff (#229)
symbol multiply defined!
Bug/crash on Julia master, fine on 1.10 (#231)log1p
fails onMtlArray{Float32}
(#234)- When precompiling, UndefVarError:
CompilerConfig
not defined (#247)
v0.5.0
Metal v0.5.0
Metal.jl 0.5 is a feature release, bringing initial support for atomic operations (#168).
Low-level atomics that mimic Metal C are supported (atomic_store_explicit
,
atomic_load_explicit
, etc), as well as a higher-level Metal.@atomic
that can be used to
update array values similar to how CUDA.jl's @atomic
works. This uses native atomics when
supported, and falls back to a compare-exchange loop otherwise.
Minor changes include an update for the @device_code_agx
disassembler, the addition of a
type variable to MtlArray
encoding the storage mode (#194), and support for MPSVector
(#199) which should accelerate matrix/vector multiplications.
Also note that Metal.jl now disallows the construction of Float64 arrays, as these are not
support by the Metal libraries.
Closed issues:
- Support for atomics (#79)
- Make
MtlArray
storage mode a type parameter (#190) - Long stacktrace when trying to create Float64 rand arrays (#205)
- allowscalar equivalent for Metal.jl (#206)
- Define map! ? (#219)
Merged pull requests:
- Implement atomics using compiler intrinsics (#168) (@maleadt)
- Parameterize MtlArray storage mode (#194) (@christiangnrd)
- Implement MPSVector (#199) (@tgymnich)
- Update manifest (#200) (@github-actions[bot])
- Add Metal 3.1 to MTLLanguageVersion (#202) (@christiangnrd)
- Update manifest (#203) (@github-actions[bot])
- CompatHelper: bump compat for GPUCompiler to 0.21, (keep existing compat) (#204) (@github-actions[bot])
- Update manifest (#207) (@github-actions[bot])
- Disallow Float64 arrays entirely. (#209) (@maleadt)
- Adapt to LLVM.jl 6. (#213) (@maleadt)
- Update manifest (#215) (@github-actions[bot])
- Bump disassembler. (#216) (@maleadt)