Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling with CeedScalar defined as float #784

Closed
nbeams opened this issue Jun 17, 2021 · 7 comments · Fixed by #788
Closed

Compiling with CeedScalar defined as float #784

nbeams opened this issue Jun 17, 2021 · 7 comments · Fixed by #788

Comments

@nbeams
Copy link
Contributor

nbeams commented Jun 17, 2021

As a first step toward #778, I tried building and testing libCEED with CeedScalar set to float instead of double. I ran into some minor issues. I was thinking we may want to address some or all of them in a PR prior to adding any mixed-precision functionality. (Of course, we would probably want to keep any potential future changes for mixed precision in mind when deciding what to do about these issues, or whether we want to address them at this time at all.)

Issue 1: Compiler warnings for three CeedVector functions

The affected functions are CeedVectorSetValue, CeedVectorScale, CeedVectorAXPY, which are currently only implemented in the cuda/hip-ref backends.

The warnings come from CeedSetBackendFunction, e.g.

backends/cuda/ceed-cuda-vector.c:659:33: warning: passing argument 5 of 'CeedSetBackendFunction'
       from incompatible pointer type [-Wincompatible-pointer-types]
                                 CeedVectorSetValue_Cuda); CeedChkBackend(ierr);
                                 ^~~~~~~~~~~~~~~~~~~~~~~
./include/ceed/backend.h:93:69: note: expected 'int (*)()' but argument is of type
       'int (*)(struct CeedVector_private *, CeedScalar)' {aka 'int (*)(struct CeedVector_private *, float)'}
                                 const char *func_name, int (*f)());
                                                        ~~~~~~^~~~

The reason only these three functions are affected is that they are the only ones that take a parameter of type CeedScalar (rather than CeedScalar *). From the C standard, section 6.5.2.2, a function declaration with unspecified parameters will result in "default argument promotion," which includes converting float to double. And from section 6.7.5.3, that means the function declaration of int (*f)() is incompatible with a prototype containing a CeedScalar parameter when CeedScalar is float, but it is compatible when CeedScalar = double.

Now, from what I can tell, this incompatibility doesn't actually cause incorrect execution of the three functions, since the functions are still called through function pointers with the correct parameter types for the backend implementation, from the function pointers in the CeedVector object. Perhaps this is obvious to some libCEED developers/expert-level C coders, but I struggled a bit to make sure I understood the distinction between what libCEED does and things that would produce errors in this case. To convince myself, I confirmed this behavior from a minimal test code. Consider three functions with prototypes int func_dbl(double x), int func_flt(float x), and int func_pflt(float *x) and the following main():

  int ierr;
  float a = 0.5;
  double b = 0.5;

  int (*bar)() = &func_dbl;
  // OK: all floating-point arguments considered as double if unspecified
  ierr = bar(b);

  // Create function pointer with correct parameter type (like backend fcn pointers in libCEED objects);
  //  will set/use later
  int (*fbar)(float);
  // Pointer with unspecified parameters: compiler warning due to float promotion causing incompatible types
  int (*fbar_u)() = &func_flt; 

  // Not OK: undefined behavior to call function through this pointer
  ierr = fbar_u(a);

  // Mimimal recreation of behavior of CeedSetBackendFunction and libCEED objects:
  // Create pointer to fcn pointer with no parameters; set equal to address of fbar, with cast.
  int (**pfbar_v)(void) = (int (**)(void))(&fbar);
  // Set value of pfbar_v to the pointer with unspecified paramters (which points to func_flt);
  // this also sets fbar due to previous line
  *pfbar_v = fbar_u;
  // Call function through pointer with correct parameter specification; OK
  ierr = fbar(a);

  // Version where the parameter is a pointer to a float instead of a float.
  // This one does not give a warning because there is no incompatible type promotion
  // when using a pointer.
  int (*ifbar)() = &func_pflt;
  // OK
  ierr = ifbar(&a);

I tried this with GCC 8.3.0 and clang 11.0.1. Both gave the "incompatible pointer type" warning seen in compiling the cuda-ref backend for the line creating fbar_u, and both produced the same runtime behavior. So there's no problem with the functions' behavior in single precision, despite the warnings.

Still, we most likely wouldn't want these warnings to be produced for every backend that implements these three functions (though it's currently only cuda-ref and hip-ref).

A simple solution, though maybe not "nice" since it would ruin the symmetry of setting up all the backend functions of a CeedVector object, would be to perform a cast on these three functions during CeedSetBackendFunction, e.g.

 ierr = CeedSetBackendFunction(ceed, "Vector", vec, "SetValue",
                                (int (*)())(CeedVectorSetValue_Cuda)); CeedChkBackend(ierr);

This removes the compiler warning and is valid per C function pointer conversion rules. I think this is preferable to having these functions take CeedScalar * instead, given how the variables are used in the functions. However, I would like to get input on this from other libCEED developers.

Issue 2: Places where we need to know the type of CeedScalar

There are several places we would need to know exactly what type CeedScalar is, e.g.:

  • In the runtime compilation for CUDA and HIP backends, namely the Ceed[Cuda/Hip]Compile function,
    where the definition of CeedScalar is added directly to the code string to be compiled,
    to avoid having to include any headers

  • In functions that call library routines/instructions for a specific precision: MAGMA non-tensor
    routines (GEMMs), AVX basis routines, the vector norm functions for CUDA/HIP backends...

  • Perhaps in the tests (see issue 3).

The question here is whether we want the user to have to define anything else while compiling, or only change CeedScalar. One option I tried to get the CUDA/HIP RTC working was to create variables of type CeedScalar, float, and double, then compare their sizes to determine which definition to add to the code string. Perhaps something like that could be done in a general helper function somewhere, so that any routine can check? Or is there a more elegant way to do this in C?

Of course moving toward mixed precision support in the future, we may just want to make it so that the compilation functions take a parameter/parameter(s) indicating the main/computational precision of the code to be compiled (since the same compilation functions are used for all kernels, even if not all kernels will have the option for differing computation and argument precisions, the compilation functions must be able to support it).

Issue 3: Required changes to libCEED tests

Many libCEED tests fail when CeedScalar is float. In the case of t104, it just needs a cast to CeedScalar in the comparison, e.g. if (a[3] != (CeedScalar)(-3.14)), otherwise the comparison is done in double and fails. Many other tests seem to fail due to the tolerances being set too low for what we are guaranteed in single precision, though of course not all of them do.

However, these tolerances are not all the same and seem tailored to the individual tests, somewhat complicating the situation if trying to set different tolerances based on CeedScalar (or future mixed-precision routines). This might also require a way to check what CeedScalar is or another definition from the header (mentioned in issue 2).

@jedbrown
Copy link
Member

Thanks for taking the time to test and write this up.

Issue 1: function pointers

We don't ever call though the pointer and thus should use the cast, either as you mention or using void(*)(void), which is what PETSc does in this scenario. Note that the assignment below uses a function that takes (void) rather than unspecified arguments ().

      int (**fpointer)(void) = (int (**)(void))((char *)object + offset); // *NOPAD*
      *fpointer = f;

Issue 2: type-aware dispatch

We should probably make a scalar types enum. The number of bytes sizeof(CeedScalar) == sizeof(float) is okay for normal float, double, and __float128, but does not distinguish FP16 from BF16, nor does it distinguish TF32 (should have been called something like "BF19" because that's the number of active bits) from FP32. I don't think we'll ever have CeedScalar be those types, but we'll still need an explicit way to state types. We can define a macro for build-time types now, but will want to make it more dynamic.

Issue 3: tolerances

We should define a type-specific machine epsilon and make tolerances use that. We may want an inline toleranced comparison utility function. And it may be that some tests just aren't appropriate for single, in which case we need a lightweight way to skip them.

@nbeams
Copy link
Contributor Author

nbeams commented Jun 18, 2021

Issue 1

I originally chose int(*)() because that's what the prototype for CeedSetBackendFunction uses. Is there a reason to prefer int(*)(void) or void(*)(void)? (Note, I don't really care what cast we use, I'm just curious.)

Issue 2

I tried adding an enum and a definition like this in ceed.h:

typedef enum {
  /// Single precision
  CEED_SCALAR_FP32,
  /// Double precision
  CEED_SCALAR_FP64
} CeedScalarType;
/// Base scalar type for the library to use
/// @ingroup Ceed
#ifndef CEED_SCALAR_TYPE
#define CEED_SCALAR_TYPE CEED_SCALAR_FP32
#endif
typedef float CeedScalar;

Then in the code we can use, e.g., if (CEED_SCALAR_TYPE == CEED_SCALAR_FP32) to check for float if we need to make a different kernel or library call, for example.
The problem with what I have here is that I don't see how to avoid having to change both the CEED_SCALAR_TYPE definition and the CeedScalar typdedef, since the preprocessor doesn't know about enum values.

Issue 3

I see there is a CEED_EPSILON already defined to be 1e-16. Was this meant to be used like machine epsilon for double precision? Or should it be kept separate?

@jedbrown
Copy link
Member

  1. IIRC, casting from any specific function pointer type to void(*)(void) and back is defined behavior (like casting data through void* or char*, but casting between other types is not. I think int(*)() was chosen because the conversion could be implicit due to handling of unspecified arguments in C (but not C++).
  2. I would make the code always set those together. Perhaps have the user #include <ceed/ceed-f32.h> instead of defining macros ahead of time.
  3. Yeah, that's it's intent.

@nbeams
Copy link
Contributor Author

nbeams commented Jun 23, 2021

  1. I tried the cast to void(*)(void) and got the same compiler warnings (with GCC 8.3). Unless you meant we could change the prototype of CeedSetBackendFunction?

  2. (&3) I like this idea. I tried having just the enum in ceed.h:

/// Scalar (floating point) types
///
/// @ingroup Ceed
typedef enum {
  /// Single precision
  CEED_SCALAR_FP32,
  /// Double precision
  CEED_SCALAR_FP64
} CeedScalarType;
/// Base scalar type for the library to use: change which header is 
/// included to change the precision.
#include <ceed/ceed-f64.h>

Then, in ceed-f64.h:

#ifndef _ceed_f64_h
#define _ceed_f64_h

#include <ceed/ceed.h>

/// Set base scalar type to FP64.  (See CeedScalarType enum in ceed/ceed.h
/// for all options.)
#define CEED_SCALAR_TYPE CEED_SCALAR_FP64
typedef double CeedScalar;

/// Data alignment of 64 bits
#define CEED_ALIGN 64

/// Machine epsilon
#define CEED_EPSILON 1e-16

#endif

And of course, ceed-f32.h has the corresponding definitions for single precision, namely CEED_ALIGN 32 and CEED_EPSILON 6e-08 (guessing based on 1e-16 for double that we were going with the LAPACK definitions, but obviously it could be whatever). CEED_ALIGN and CEED_EPSILON were moved from backend.h.

This works nicely overall, with a few remaining issues. I disabled the AVX tensor contractions and MAGMA non-tensor basis creation if CEED_SCALAR_TYPE == CEED_SCALAR_F32 which avoids crashes in the tests. However, it does still give a lot of warnings when compiling the MAGMA backend, due to type mismatch with the DGEMM calls (since the compiler doesn't know these will never get used with the non-tensor basis disabled). I suppose an option here would be to also define a macro in ceed-f[64/32].h, like MAGMA_USE_DGEMM, so we could skip DGEMM-related code at compile time. Another option would be to add wrappers for SGEMM calls, which we would probably want to do eventually in order to fully support using float.

Some more specifics on remaining issues with the tests:

Question: what to do with the Fortran tests?

I found in ceed-fortran.h where I could define Fortran parameters corresponding to the CeedScalarType enum, as done for other enums. However, this doesn't solve the issue of how to make CEED_SCALAR_TYPE and CEED_EPSILON available to Fortran codes with the inclusion of ceed-fortran.h (as can be done for C/C++ codes with the inclusion of ceed.h). Since this is just for the tests, and assuming the idea is that any Fortran users will be responsible for properly adjusting their real types in any code using an instance of libCEED that's been compiled for single precision, then I suppose we can just add another header somewhere for the Fortran tests to include? I also suppose we'd have to switch all the real types to use the real(kind=) syntax so we could set them based on something in the header.

Alternatively: we could just not run the Fortran tests for float?

Back to test tolerances: more details

Starting from a list of tests that failed in float, I went through and replaced the tolerance with something based on the double precision CEED_EPSILON: e.g., a tolerance of 1.e-14 became 100.*CEED_EPSILON. I did not do this for tests with tolerance of 1.e-10, which would become, of course, 1000000.*CEED_EPSILON or only 6e-2 for float, which seemed higher than we'd want. Individually looking at the output of some of these tests, it does seem that many could have a lower tolerance for float -- maybe for double as well? This affects quite a few tests: (t312, t314, t315, t316, t321, t322, t323 in the basis tests, and t501, t502, t503, t505, t506, t507, t511, t521, t524, t530, t550, t551, t552, t553 in the operator tests). These tests would probably be good candidates for some sort of inline tolerance checker function as mentioned in your last comment, which could set different values based on CEED_SCALAR_TYPE. I guess a fancier type of tolerance check could be based on matching a certain number of digits rather than a tolerance (scale by magnitude of quantity being compared).

A separate remaining issue pops up with some of the tests that compare to things in output/:

  • t300, t301, t302, t304, t305 -- the output doesn't match exactly in single precision. For some of the tests, we could maybe just shorten the output by a digit or two, for double and float. t300 uses CeedBasisView, and we probably don't want to change the precision it uses for double precision. I'm not sure at the moment how we could choose a different output file to perform the diff with based on CEED_SCALAR_TYPE.
  • This also affects t402, because the QFunctionContext size is 20 with float and 40 with double.

A caveat

I have not yet looked in the Python, Julia, or Rust bindings to see if there are any problems (I suspect there could be -- I did notice a separate CEED_ALIGN definition as 64 in Julia, for example).

@jeremylt
Copy link
Member

jeremylt commented Jun 23, 2021

I wonder if it is worth reworking our Fortran interface while we are in there to support more modern Fortran since Nek5000 is switching to NekRS using C++ with OCCA instead of incorporating libCEED.

@nbeams
Copy link
Contributor Author

nbeams commented Jul 7, 2021

I'm curious what is known about libCEED's current or likely Fortran users -- do they need Fortran 77, or is 90/95 okay (or newer)? We could potentially modernize our tests without changing the interface, when it comes to the scalar types (I think).

I've looked at updating the Python bindings/tests -- it was more involved than I realized before I started, but I think I got it all working (except for the few tests which compare to output rather than test with a tolerance). Though I'm sure there are other options than what I did to get it working, so we could discuss/change it.

I've never really worked with Rust or Julia before, so I may need some help there. For example, how do I best propagate the information in the "extra" included header file (for f32 or f64) to the Julia bindings? (ping: @pazner)

I'm working on a draft PR for my branch right now, assuming we want to go ahead and pursue this all the way to solving the various outstanding issues and merging it, so maybe we can continue the discussion over there once it's opened.

Edit: Also, I apologize for spamming everyone with build failure emails when I push to this branch. Making progress though -- at least Python doesn't fail anymore :)

@jeremylt
Copy link
Member

jeremylt commented Jul 7, 2021

The only reason we added Fortran 77 support was because of Nek5000, but with Nek RS a) being in C++ and b) not being interested in libCEED, this is not really relevant. I haven't done a large survey of Fortran projects or anything, but I would assume that supporting Fortran 90 would be sufficient for any hypothetical Fortran users we may have in the future.

I already moved our Fortran tests away from fixed form, but I would support further updates to those tests. We should be able to update those tests without fully updating to Fortran 90 if we want to retain our Nek5000 example (which currently isn't running in CI for unknown reasons)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants