Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling and panic in observablehq notebook #165

Closed
4 tasks
jmatsushita opened this issue Mar 4, 2024 · 5 comments
Closed
4 tasks

Error handling and panic in observablehq notebook #165

jmatsushita opened this issue Mar 4, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@jmatsushita
Copy link

Hi,

Sorry for opening the issue directly on github, but I cannot see this repo in the dropdown menu in https://antv-issue-helper.surge.sh
image

I created a reproducer in ObservableHQ for a panic when using a specific shader:
https://observablehq.com/d/bece57b91ebd5a40

I'm using a compute-toys notebook fork which pulls in the latest version @antv/[email protected]

The setup works for the origami shader, however when using shader code that compiles and runs fine in compute.toys (see https://compute.toys/view/1079) I can get the following stack trace in the console:

glsl_wgsl_compiler_bg.wasm:0x23257e Uncaught (in promise) RuntimeError: unreachable
    at __rust_start_panic (glsl_wgsl_compiler_bg.wasm:0x23257e)
    at rust_panic (glsl_wgsl_compiler_bg.wasm:0x2318d1)
    at std::panicking::rust_panic_with_hook::hdb857edf4e61fe11 (glsl_wgsl_compiler_bg.wasm:0x216e3e)
    at std::panicking::begin_panic_handler::{{closure}}::hef75e0b11b6899ae (glsl_wgsl_compiler_bg.wasm:0x21dd09)
    at std::sys_common::backtrace::__rust_end_short_backtrace::h2c4c1ce3acc2944e (glsl_wgsl_compiler_bg.wasm:0x2321dc)
    at rust_begin_unwind (glsl_wgsl_compiler_bg.wasm:0x228274)
    at core::panicking::panic_fmt::hf20f9922c43dc05c (glsl_wgsl_compiler_bg.wasm:0x22ca20)
    at core::result::unwrap_failed::h92d50af5668640d2 (glsl_wgsl_compiler_bg.wasm:0x21f486)
    at wgslcomposer_wgsl_compile (glsl_wgsl_compiler_bg.wasm:0x180ba4)
    at Rr.wgsl_compile (glsl_wgsl_compiler.js:211:12)
$__rust_start_panic    @    glsl_wgsl_compiler_bg.wasm:0x23257e
$rust_panic    @    glsl_wgsl_compiler_bg.wasm:0x2318d1
$std::panicking::rust_panic_with_hook::hdb857edf4e61fe11    @    glsl_wgsl_compiler_bg.wasm:0x216e3e
$std::panicking::begin_panic_handler::{{closure}}::hef75e0b11b6899ae    @    glsl_wgsl_compiler_bg.wasm:0x21dd09
$std::sys_common::backtrace::__rust_end_short_backtrace::h2c4c1ce3acc2944e    @    glsl_wgsl_compiler_bg.wasm:0x2321dc
$rust_begin_unwind    @    glsl_wgsl_compiler_bg.wasm:0x228274
$core::panicking::panic_fmt::hf20f9922c43dc05c    @    glsl_wgsl_compiler_bg.wasm:0x22ca20
$core::result::unwrap_failed::h92d50af5668640d2    @    glsl_wgsl_compiler_bg.wasm:0x21f486
$wgslcomposer_wgsl_compile    @    glsl_wgsl_compiler_bg.wasm:0x180ba4
wgsl_compile    @    glsl_wgsl_compiler.js:211
(anonymous)    @    compute-toys.js?v=4&…7b91ebd5a40@314:485
(anonymous)    @    compute-toys.js?v=4&…7b91ebd5a40@314:478
eval    @    observablehq-17:43
await in eval (async)        
eval    @    observablehq-295:4
(anonymous)    @    worker-488e6463.js:2
Promise.then (async)        
ea    @    worker-488e6463.js:2
value    @    worker-488e6463.js:2
(anonymous)    @    worker-488e6463.js:2
Promise.then (async)        
value    @    worker-488e6463.js:2
value    @    worker-488e6463.js:2
Pr    @    worker-488e6463.js:2
value    @    worker-488e6463.js:2
value    @    worker-488e6463.js:2
define    @    compute-toys.js?v=4&…7b91ebd5a40@314:698
value    @    worker-488e6463.js:2
(anonymous)    @    worker-488e6463.js:2
Promise.then (async)        
Ea    @    worker-488e6463.js:2
define    @    worker-488e6463.js:2
rs    @    worker-488e6463.js:2
(anonymous)

Additional context

Chrome on MacOS Version 122.0.6261.94 (Official Build) (arm64)

I have noticed the following in regard to errors and error handling in general:

  • Since I've made the run function take in the shader string, there are cases where in particular reusing the prelude without binding for instance time.elapsed will result in these errors being constantly spammed in the console.
GPUValidationError {message: '[Invalid CommandBuffer] is invalid.\n - While calling [Queue].Submit([[Invalid CommandBuffer]])\n'}
GPUValidationError {message: '[Invalid BindGroup] is invalid.\n - While encoding …r].SetBindGroup(0, [Invalid BindGroup], 0, ...).\n'}

Using a trick such as:

var ignore = time.elapsed / time.elapsed; // to avoid rewriting run without the time bindings.
var uv = (fragCoord * 2. - resolution.xy) / resolution.y * ignore;

Works to bypass that problem and avoid having to reinitialise each cell with a different set of bindings.

  • Is there a better way to enable using a notebook to progressively build a compute shader? Maybe some variables can be initialised and made available to the page first, like device and only running createProgram() and device.createComputePipeline() on each cell? If you had a example notebook with this approach that would be great to be able to use observablehq to explain shaders step by step.

  • Some code that runs on compute.toys doesn't work with g-device-api. This seems to have to do with type checking.

let a = hash1(p+vec2(0,0)); 

compute.toys seems able to infer the types but this will spam this error message with g-device-api:

Device.ts:210 
GPUValidationError {message: '[Invalid CommandBuffer] is invalid.\n - While calling [Queue].Submit([[Invalid CommandBuffer]])\n'}

The fix is easy enough, but the error handling and lack of error messages makes it difficult to use.

let a = hash1(p+float2(0,0)); 
// or let a = hash1(p+vec2(0.,0.)); 
  • In general, when there is an error that spams the console, even when the code is fixed in the same session/notebook, the messages continue to appear in the console (I'm not sure if there's still a context behind the scenes that is not cleaned up, or if the messages have been buffered). It requires reloading the notebook for the messages to stop.

Thank you very much for the great library, I'm really looking forward to using it more in observablehq and I hope my remarks are helpful to improve the error handling experience!

Cheers,

Jun

@xiaoiver
Copy link
Contributor

xiaoiver commented Mar 11, 2024

Just try to change const to let:

// before
const tmax = 2000.0;

// after
let tmax = 2000.0;

Or declare it in module scope:

const tmax = 2000.0;

fn main_image() {}

I will check if it's a bug relative to naga, since according to WGSL Spec, a const-declaration can be declared in function scope:
https://www.w3.org/TR/WGSL/#const-decls

@xiaoiver xiaoiver reopened this Mar 11, 2024
@xiaoiver xiaoiver self-assigned this Mar 13, 2024
@xiaoiver xiaoiver added the bug Something isn't working label Mar 13, 2024
@jmatsushita
Copy link
Author

Thank you. Indeed the change you suggested fixes the error:

// after
let tmax = 2000.0;

However, should this be a panic (sometimes just swallowed or displaying RuntimeError: unreachable) would it be possible to have a more granular error message to help troubleshoot the shader next time?

@xiaoiver
Copy link
Contributor

You're right, I should throw the compiling error message from naga.

@xiaoiver
Copy link
Contributor

xiaoiver commented Mar 19, 2024

I update both naga & naga-oil to the latest version and throw error message from Rust side.

// before
naga = { version = "0.14.1", features = ["glsl-in", "wgsl-in", "wgsl-out"] }
naga_oil = "0.11.0"

// after
naga = { version = "0.19.2", features = ["glsl-in", "wgsl-in", "wgsl-out"] }
naga_oil = "0.13.0"

Now the error message can be shown correctly. For example, in the tmax case, naga-oil will complain the following message:

make_naga_module Composer error: expected assignment or increment/decrement, found 'tmax'

But we cannot use alias for now: bevyengine/naga_oil#79
If we try to prepend alias int = i32; to our shader chunk:

Composable module identifiers must not require substitution according to naga writeback rules: `int`

So we have to do some alias work when borrowing shaders from compute-toys. Here is the complete alias map used in compute-toys:

alias int = i32;
alias uint = u32;
alias float = f32;
alias int2 = vec2<i32>;
alias int3 = vec3<i32>;
alias int4 = vec4<i32>;
alias uint2 = vec2<u32>;
alias uint3 = vec3<u32>;
alias uint4 = vec4<u32>;
alias float2 = vec2<f32>;
alias float3 = vec3<f32>;
alias float4 = vec4<f32>;
alias bool2 = vec2<bool>;
alias bool3 = vec3<bool>;
alias bool4 = vec4<bool>;
alias float2x2 = mat2x2<f32>;
alias float2x3 = mat2x3<f32>;
alias float2x4 = mat2x4<f32>;
alias float3x2 = mat3x2<f32>;
alias float3x3 = mat3x3<f32>;
alias float3x4 = mat3x4<f32>;
alias float4x2 = mat4x2<f32>;
alias float4x3 = mat4x3<f32>;
alias float4x4 = mat4x4<f32>;

Or use predeclares instead, eg.

@xiaoiver xiaoiver mentioned this issue Mar 19, 2024
14 tasks
xiaoiver added a commit that referenced this issue Mar 19, 2024
* fix: throw error message from naga #165

* chore: commit changeset

* fix: remove new URL
xiaoiver added a commit that referenced this issue Mar 19, 2024
* Fix 165 (#169)

* fix: throw error message from naga #165

* chore: commit changeset

* fix: remove new URL

* chore(release): bump version (#170)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@xiaoiver
Copy link
Contributor

xiaoiver commented Mar 19, 2024

Just use the latest @antv/[email protected] & WASM:

import { WebGPUDeviceContribution } from '@antv/g-device-api';

const deviceContribution = new WebGPUDeviceContribution({
  shaderCompilerPath: '/glsl_wgsl_compiler_bg.wasm',
  // From CDN
  // shaderCompilerPath: 'https://unpkg.com/@antv/[email protected]/rust/pkg/glsl_wgsl_compiler_bg.wasm',
});

Calls begin/endFrame() at the beginning and end of computePass:

device.beginFrame();

const computePass = device.createComputePass();
computePass.setPipeline(computePipeline);
computePass.setBindings(bindings);
computePass.dispatchWorkgroups(1);
device.submitPass(computePass);

device.endFrame();

@xiaoiver xiaoiver closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants