Compiling with CeedScalar defined as float #784

nbeams · 2021-06-17T14:48:05Z

As a first step toward #778, I tried building and testing libCEED with CeedScalar set to float instead of double. I ran into some minor issues. I was thinking we may want to address some or all of them in a PR prior to adding any mixed-precision functionality. (Of course, we would probably want to keep any potential future changes for mixed precision in mind when deciding what to do about these issues, or whether we want to address them at this time at all.)

Issue 1: Compiler warnings for three `CeedVector` functions

The affected functions are CeedVectorSetValue, CeedVectorScale, CeedVectorAXPY, which are currently only implemented in the cuda/hip-ref backends.

The warnings come from CeedSetBackendFunction, e.g.

backends/cuda/ceed-cuda-vector.c:659:33: warning: passing argument 5 of 'CeedSetBackendFunction'
       from incompatible pointer type [-Wincompatible-pointer-types]
                                 CeedVectorSetValue_Cuda); CeedChkBackend(ierr);
                                 ^~~~~~~~~~~~~~~~~~~~~~~
./include/ceed/backend.h:93:69: note: expected 'int (*)()' but argument is of type
       'int (*)(struct CeedVector_private *, CeedScalar)' {aka 'int (*)(struct CeedVector_private *, float)'}
                                 const char *func_name, int (*f)());
                                                        ~~~~~~^~~~

The reason only these three functions are affected is that they are the only ones that take a parameter of type CeedScalar (rather than CeedScalar *). From the C standard, section 6.5.2.2, a function declaration with unspecified parameters will result in "default argument promotion," which includes converting float to double. And from section 6.7.5.3, that means the function declaration of int (*f)() is incompatible with a prototype containing a CeedScalar parameter when CeedScalar is float, but it is compatible when CeedScalar = double.

Now, from what I can tell, this incompatibility doesn't actually cause incorrect execution of the three functions, since the functions are still called through function pointers with the correct parameter types for the backend implementation, from the function pointers in the CeedVector object. Perhaps this is obvious to some libCEED developers/expert-level C coders, but I struggled a bit to make sure I understood the distinction between what libCEED does and things that would produce errors in this case. To convince myself, I confirmed this behavior from a minimal test code. Consider three functions with prototypes int func_dbl(double x), int func_flt(float x), and int func_pflt(float *x) and the following main():

  int ierr;
  float a = 0.5;
  double b = 0.5;

  int (*bar)() = &func_dbl;
  // OK: all floating-point arguments considered as double if unspecified
  ierr = bar(b);

  // Create function pointer with correct parameter type (like backend fcn pointers in libCEED objects);
  //  will set/use later
  int (*fbar)(float);
  // Pointer with unspecified parameters: compiler warning due to float promotion causing incompatible types
  int (*fbar_u)() = &func_flt; 

  // Not OK: undefined behavior to call function through this pointer
  ierr = fbar_u(a);

  // Mimimal recreation of behavior of CeedSetBackendFunction and libCEED objects:
  // Create pointer to fcn pointer with no parameters; set equal to address of fbar, with cast.
  int (**pfbar_v)(void) = (int (**)(void))(&fbar);
  // Set value of pfbar_v to the pointer with unspecified paramters (which points to func_flt);
  // this also sets fbar due to previous line
  *pfbar_v = fbar_u;
  // Call function through pointer with correct parameter specification; OK
  ierr = fbar(a);

  // Version where the parameter is a pointer to a float instead of a float.
  // This one does not give a warning because there is no incompatible type promotion
  // when using a pointer.
  int (*ifbar)() = &func_pflt;
  // OK
  ierr = ifbar(&a);

I tried this with GCC 8.3.0 and clang 11.0.1. Both gave the "incompatible pointer type" warning seen in compiling the cuda-ref backend for the line creating fbar_u, and both produced the same runtime behavior. So there's no problem with the functions' behavior in single precision, despite the warnings.

Still, we most likely wouldn't want these warnings to be produced for every backend that implements these three functions (though it's currently only cuda-ref and hip-ref).

A simple solution, though maybe not "nice" since it would ruin the symmetry of setting up all the backend functions of a CeedVector object, would be to perform a cast on these three functions during CeedSetBackendFunction, e.g.

 ierr = CeedSetBackendFunction(ceed, "Vector", vec, "SetValue",
                                (int (*)())(CeedVectorSetValue_Cuda)); CeedChkBackend(ierr);

This removes the compiler warning and is valid per C function pointer conversion rules. I think this is preferable to having these functions take CeedScalar * instead, given how the variables are used in the functions. However, I would like to get input on this from other libCEED developers.

Issue 2: Places where we need to know the type of `CeedScalar`

There are several places we would need to know exactly what type CeedScalar is, e.g.:

In the runtime compilation for CUDA and HIP backends, namely the Ceed[Cuda/Hip]Compile function,
where the definition of CeedScalar is added directly to the code string to be compiled,
to avoid having to include any headers
In functions that call library routines/instructions for a specific precision: MAGMA non-tensor
routines (GEMMs), AVX basis routines, the vector norm functions for CUDA/HIP backends...
Perhaps in the tests (see issue 3).

The question here is whether we want the user to have to define anything else while compiling, or only change CeedScalar. One option I tried to get the CUDA/HIP RTC working was to create variables of type CeedScalar, float, and double, then compare their sizes to determine which definition to add to the code string. Perhaps something like that could be done in a general helper function somewhere, so that any routine can check? Or is there a more elegant way to do this in C?

Of course moving toward mixed precision support in the future, we may just want to make it so that the compilation functions take a parameter/parameter(s) indicating the main/computational precision of the code to be compiled (since the same compilation functions are used for all kernels, even if not all kernels will have the option for differing computation and argument precisions, the compilation functions must be able to support it).

Issue 3: Required changes to libCEED tests

Many libCEED tests fail when CeedScalar is float. In the case of t104, it just needs a cast to CeedScalar in the comparison, e.g. if (a[3] != (CeedScalar)(-3.14)), otherwise the comparison is done in double and fails. Many other tests seem to fail due to the tolerances being set too low for what we are guaranteed in single precision, though of course not all of them do.

However, these tolerances are not all the same and seem tailored to the individual tests, somewhat complicating the situation if trying to set different tolerances based on CeedScalar (or future mixed-precision routines). This might also require a way to check what CeedScalar is or another definition from the header (mentioned in issue 2).

The text was updated successfully, but these errors were encountered:

jedbrown · 2021-06-17T15:21:34Z

Thanks for taking the time to test and write this up.

Issue 1: function pointers

We don't ever call though the pointer and thus should use the cast, either as you mention or using void(*)(void), which is what PETSc does in this scenario. Note that the assignment below uses a function that takes (void) rather than unspecified arguments ().

      int (**fpointer)(void) = (int (**)(void))((char *)object + offset); // *NOPAD*
      *fpointer = f;

Issue 2: type-aware dispatch

We should probably make a scalar types enum. The number of bytes sizeof(CeedScalar) == sizeof(float) is okay for normal float, double, and __float128, but does not distinguish FP16 from BF16, nor does it distinguish TF32 (should have been called something like "BF19" because that's the number of active bits) from FP32. I don't think we'll ever have CeedScalar be those types, but we'll still need an explicit way to state types. We can define a macro for build-time types now, but will want to make it more dynamic.

Issue 3: tolerances

We should define a type-specific machine epsilon and make tolerances use that. We may want an inline toleranced comparison utility function. And it may be that some tests just aren't appropriate for single, in which case we need a lightweight way to skip them.

nbeams · 2021-06-18T20:15:16Z

Issue 1

I originally chose int(*)() because that's what the prototype for CeedSetBackendFunction uses. Is there a reason to prefer int(*)(void) or void(*)(void)? (Note, I don't really care what cast we use, I'm just curious.)

Issue 2

I tried adding an enum and a definition like this in ceed.h:

typedef enum {
  /// Single precision
  CEED_SCALAR_FP32,
  /// Double precision
  CEED_SCALAR_FP64
} CeedScalarType;
/// Base scalar type for the library to use
/// @ingroup Ceed
#ifndef CEED_SCALAR_TYPE
#define CEED_SCALAR_TYPE CEED_SCALAR_FP32
#endif
typedef float CeedScalar;

Then in the code we can use, e.g., if (CEED_SCALAR_TYPE == CEED_SCALAR_FP32) to check for float if we need to make a different kernel or library call, for example.
The problem with what I have here is that I don't see how to avoid having to change both the CEED_SCALAR_TYPE definition and the CeedScalar typdedef, since the preprocessor doesn't know about enum values.

Issue 3

I see there is a CEED_EPSILON already defined to be 1e-16. Was this meant to be used like machine epsilon for double precision? Or should it be kept separate?

jedbrown · 2021-06-20T04:19:58Z

IIRC, casting from any specific function pointer type to void(*)(void) and back is defined behavior (like casting data through void* or char*, but casting between other types is not. I think int(*)() was chosen because the conversion could be implicit due to handling of unspecified arguments in C (but not C++).
I would make the code always set those together. Perhaps have the user #include <ceed/ceed-f32.h> instead of defining macros ahead of time.
Yeah, that's it's intent.

nbeams · 2021-06-23T21:42:05Z

I tried the cast to void(*)(void) and got the same compiler warnings (with GCC 8.3). Unless you meant we could change the prototype of CeedSetBackendFunction?
(&3) I like this idea. I tried having just the enum in ceed.h:

/// Scalar (floating point) types
///
/// @ingroup Ceed
typedef enum {
  /// Single precision
  CEED_SCALAR_FP32,
  /// Double precision
  CEED_SCALAR_FP64
} CeedScalarType;
/// Base scalar type for the library to use: change which header is 
/// included to change the precision.
#include <ceed/ceed-f64.h>

Then, in ceed-f64.h:

#ifndef _ceed_f64_h
#define _ceed_f64_h

#include <ceed/ceed.h>

/// Set base scalar type to FP64.  (See CeedScalarType enum in ceed/ceed.h
/// for all options.)
#define CEED_SCALAR_TYPE CEED_SCALAR_FP64
typedef double CeedScalar;

/// Data alignment of 64 bits
#define CEED_ALIGN 64

/// Machine epsilon
#define CEED_EPSILON 1e-16

#endif

And of course, ceed-f32.h has the corresponding definitions for single precision, namely CEED_ALIGN 32 and CEED_EPSILON 6e-08 (guessing based on 1e-16 for double that we were going with the LAPACK definitions, but obviously it could be whatever). CEED_ALIGN and CEED_EPSILON were moved from backend.h.

This works nicely overall, with a few remaining issues. I disabled the AVX tensor contractions and MAGMA non-tensor basis creation if CEED_SCALAR_TYPE == CEED_SCALAR_F32 which avoids crashes in the tests. However, it does still give a lot of warnings when compiling the MAGMA backend, due to type mismatch with the DGEMM calls (since the compiler doesn't know these will never get used with the non-tensor basis disabled). I suppose an option here would be to also define a macro in ceed-f[64/32].h, like MAGMA_USE_DGEMM, so we could skip DGEMM-related code at compile time. Another option would be to add wrappers for SGEMM calls, which we would probably want to do eventually in order to fully support using float.

Some more specifics on remaining issues with the tests:

Question: what to do with the Fortran tests?

I found in ceed-fortran.h where I could define Fortran parameters corresponding to the CeedScalarType enum, as done for other enums. However, this doesn't solve the issue of how to make CEED_SCALAR_TYPE and CEED_EPSILON available to Fortran codes with the inclusion of ceed-fortran.h (as can be done for C/C++ codes with the inclusion of ceed.h). Since this is just for the tests, and assuming the idea is that any Fortran users will be responsible for properly adjusting their real types in any code using an instance of libCEED that's been compiled for single precision, then I suppose we can just add another header somewhere for the Fortran tests to include? I also suppose we'd have to switch all the real types to use the real(kind=) syntax so we could set them based on something in the header.

Alternatively: we could just not run the Fortran tests for float?

Back to test tolerances: more details

Starting from a list of tests that failed in float, I went through and replaced the tolerance with something based on the double precision CEED_EPSILON: e.g., a tolerance of 1.e-14 became 100.*CEED_EPSILON. I did not do this for tests with tolerance of 1.e-10, which would become, of course, 1000000.*CEED_EPSILON or only 6e-2 for float, which seemed higher than we'd want. Individually looking at the output of some of these tests, it does seem that many could have a lower tolerance for float -- maybe for double as well? This affects quite a few tests: (t312, t314, t315, t316, t321, t322, t323 in the basis tests, and t501, t502, t503, t505, t506, t507, t511, t521, t524, t530, t550, t551, t552, t553 in the operator tests). These tests would probably be good candidates for some sort of inline tolerance checker function as mentioned in your last comment, which could set different values based on CEED_SCALAR_TYPE. I guess a fancier type of tolerance check could be based on matching a certain number of digits rather than a tolerance (scale by magnitude of quantity being compared).

A separate remaining issue pops up with some of the tests that compare to things in output/:

t300, t301, t302, t304, t305 -- the output doesn't match exactly in single precision. For some of the tests, we could maybe just shorten the output by a digit or two, for double and float. t300 uses CeedBasisView, and we probably don't want to change the precision it uses for double precision. I'm not sure at the moment how we could choose a different output file to perform the diff with based on CEED_SCALAR_TYPE.
This also affects t402, because the QFunctionContext size is 20 with float and 40 with double.

A caveat

I have not yet looked in the Python, Julia, or Rust bindings to see if there are any problems (I suspect there could be -- I did notice a separate CEED_ALIGN definition as 64 in Julia, for example).

jeremylt · 2021-06-23T21:46:54Z

I wonder if it is worth reworking our Fortran interface while we are in there to support more modern Fortran since Nek5000 is switching to NekRS using C++ with OCCA instead of incorporating libCEED.

nbeams · 2021-07-07T22:06:41Z

I'm curious what is known about libCEED's current or likely Fortran users -- do they need Fortran 77, or is 90/95 okay (or newer)? We could potentially modernize our tests without changing the interface, when it comes to the scalar types (I think).

I've looked at updating the Python bindings/tests -- it was more involved than I realized before I started, but I think I got it all working (except for the few tests which compare to output rather than test with a tolerance). Though I'm sure there are other options than what I did to get it working, so we could discuss/change it.

I've never really worked with Rust or Julia before, so I may need some help there. For example, how do I best propagate the information in the "extra" included header file (for f32 or f64) to the Julia bindings? (ping: @pazner)

I'm working on a draft PR for my branch right now, assuming we want to go ahead and pursue this all the way to solving the various outstanding issues and merging it, so maybe we can continue the discussion over there once it's opened.

Edit: Also, I apologize for spamming everyone with build failure emails when I push to this branch. Making progress though -- at least Python doesn't fail anymore :)

jeremylt · 2021-07-07T22:12:11Z

The only reason we added Fortran 77 support was because of Nek5000, but with Nek RS a) being in C++ and b) not being interested in libCEED, this is not really relevant. I haven't done a large survey of Fortran projects or anything, but I would assume that supporting Fortran 90 would be sufficient for any hypothetical Fortran users we may have in the future.

I already moved our Fortran tests away from fixed form, but I would support further updates to those tests. We should be able to update those tests without fully updating to Fortran 90 if we want to retain our Nek5000 example (which currently isn't running in CI for unknown reasons)

nbeams added question design testing labels Jun 17, 2021

nbeams mentioned this issue Jul 7, 2021

Allow CeedScalar to be single precision #788

Merged

8 tasks

jedbrown closed this as completed in #788 Sep 2, 2021

nbeams mentioned this issue Sep 15, 2021

Store Objects For QFunction Reassembly #811

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling with CeedScalar defined as float #784

Compiling with CeedScalar defined as float #784

nbeams commented Jun 17, 2021

jedbrown commented Jun 17, 2021

nbeams commented Jun 18, 2021

jedbrown commented Jun 20, 2021

nbeams commented Jun 23, 2021

jeremylt commented Jun 23, 2021 •

edited

Loading

nbeams commented Jul 7, 2021 •

edited

Loading

jeremylt commented Jul 7, 2021

Compiling with CeedScalar defined as float #784

Compiling with CeedScalar defined as float #784

Comments

nbeams commented Jun 17, 2021

Issue 1: Compiler warnings for three CeedVector functions

Issue 2: Places where we need to know the type of CeedScalar

Issue 3: Required changes to libCEED tests

jedbrown commented Jun 17, 2021

Issue 1: function pointers

Issue 2: type-aware dispatch

Issue 3: tolerances

nbeams commented Jun 18, 2021

Issue 1

Issue 2

Issue 3

jedbrown commented Jun 20, 2021

nbeams commented Jun 23, 2021

Question: what to do with the Fortran tests?

Back to test tolerances: more details

A caveat

jeremylt commented Jun 23, 2021 • edited Loading

nbeams commented Jul 7, 2021 • edited Loading

jeremylt commented Jul 7, 2021

Issue 1: Compiler warnings for three `CeedVector` functions

Issue 2: Places where we need to know the type of `CeedScalar`

jeremylt commented Jun 23, 2021 •

edited

Loading

nbeams commented Jul 7, 2021 •

edited

Loading