Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the sifive_rvv configuration #832

Merged
merged 3 commits into from
Jan 20, 2025
Merged

Conversation

myeh01
Copy link
Contributor

@myeh01 myeh01 commented Nov 21, 2024

This PR adds a configuration called sifive_rvv to support RVV platforms beyond SiFive's x280. Essentially, the kernel code currently under sifive_x280 has been migrated to sifive_rvv, but the packm kernel and gemm and gemmtrsm microkernels have been modified slightly so that NR is defined in terms of the machine's VLEN instead of hardcoded to VLEN = 512. sifive_rvv is currently compiled with VLEN = 128 (Zvl128), the minimum VLEN required by the standard V extension, but users can modify make_defs.mk to change it to the VLEN of their target machine for potentially better performance. The sifive_x280 configuration is now defined in terms of sifive_rvv, calling the kernels from sifive_rvv and using VLEN = 512 for packm, gemm, and gemmtrsm.

This PR is based on #822.

Many thanks to Eric Love (@ericlove) and Aaron Hutchinson (@Aaron-Hutchinson) for their help with this PR.

@fgvanzee, @devinamatthews, and others, any feedback is appreciated!

@leekillough
Copy link
Collaborator

We should try to merge this with the rv32v and rv64v configurations. They are vector-length agnostic but only *GEMM was tuned. The rv32 and rv64 configurations are for non-vector RISC-V.

@devinamatthews
Copy link
Member

@myeh01 I just merged #822. Can you update this branch for conflicts?

@devinamatthews
Copy link
Member

@myeh01:

  • Can VLEN be determined at run-time, compile-time, or configure-time?
  • To @leekillough 's question, does this config work for both rv32v and rv64v? If merging is appropriate, perhaps it would be easiest to keep the sifive_rvv kernel set as-is and just merge the configurations?

@myeh01
Copy link
Contributor Author

myeh01 commented Dec 2, 2024

Thanks, @devinamatthews! Just rebased it.

  • Can VLEN be determined at run-time, compile-time, or configure-time?

I'm not sure whether it can be determined at run-time or configure-time. Currently, it is set at compile-time in make_defs.mk in the -march flag. We have set it to 128 bits (zvl128b), which is the minimum required by the standard vector (V) extension.

@leekillough
Copy link
Collaborator

Sorry for the late reply.

@angsch and I would like to merge the functionality of the different RISC-V configurations. Our RISC-V configurations are generalized, in that they will work for both RV32 and RV64, with or without the vector extension present, and our vector implementation is vector-length-agnostic,, but only the vector *GEMM functions have been optimized at this time (the rest of the functionality is the same as generic).

sifive_rvv can be kept as a separate configuration, but it would be good to merge its level-1 and level-2 code, and perhaps level-3 TRSM, into our generalized RISC-V configurations, which are detected if auto is used as the selected configuration. It uses the feature macros defined in the RISC-V C API to auto-detect RV32 and RV64, and whether the vector extension is present; I think that it can also fall back on SoftFloat, but I don't think that any serious BLIS users will be using SoftFloat.

@devinamatthews
Copy link
Member

@leekillough @myeh01

  1. If sifive_rvv is a separate configuration, when would it be selected by auto-configuration? The answer may be it's not which could be OK.
  2. Does current auto-configuration detect vector length?

@devinamatthews
Copy link
Member

OK, I've been reading around a bit and this is what I think I understand:

  • The -march extensions like zvl128b only specify a minimum vector length, and do not force a particular vector length. I do not think setting a higher minimum bit width has any effect on performance(?).
  • Vector length can be automatically detected using csrr (and there is a function for this -- get_vlenb()).
  • On ARM SVE, the detected vector length is actually used to select different GEMM kernels. I do not think that is relevant here at this time(?).

So, my suggestions are:

  • Accept the PR and sifive_rvv as a separate config. It will not currently participate in auto-configuration.
  • rv{32,64}iv configs can certainly use sifive_rvv kernels. The config_registry map would need to be updated. This will be a different PR.

#define BLIS_NR_s ( 4 * __riscv_v_min_vlen / 32 )
#define BLIS_NR_d ( 4 * __riscv_v_min_vlen / 64 )
#define BLIS_NR_c ( 2 * __riscv_v_min_vlen / 32 )
#define BLIS_NR_z ( 2 * __riscv_v_min_vlen / 64 )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary (to fix these sizes)? The kernels do need MR fixed but NR could be determined dynamically from VLEN at runtime. Setting these macros to -1 here (or not defining them) simply disables the unrolled reference GEMM kernel.

@devinamatthews
Copy link
Member

@myeh01 last comment: AFAICT it seems that NR is fixed based on the minimum vector length specified in-march. It would be much better if this could be determined at runtime, see armsve and rv64iv for example.

This shouldn't cause any problems for GEMM, but it might have an effect on packing, and I don't know how TRSM would be affected. If it would be too much work to support then we can go as-is.

@myeh01
Copy link
Contributor Author

myeh01 commented Dec 11, 2024

Thanks for the suggestions, @devinamatthews! I have updated the code so that vlen is computed at runtime.

@devinamatthews devinamatthews merged commit 769d73f into flame:master Jan 20, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants