-
Notifications
You must be signed in to change notification settings - Fork 713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf2go: ergonomic enums #1636
base: main
Are you sure you want to change the base?
bpf2go: ergonomic enums #1636
Conversation
0c3db6f
to
345d337
Compare
345d337
to
ad7e33b
Compare
bpf2go orders generated types by name to ensure the output is stable. Adjust data structures such that we can rely on implicit sorting in text/template, removing sortTypes alltogether. The added reservedNames hash is handy for rejecting "reserved" type names such as "<STEM>Specs", and for the upcoming "ergonomic" enum feature (generate short names for enum members if not yet taken). Signed-off-by: Nick Zavaritsky <[email protected]>
ad7e33b
to
d3bc971
Compare
Constant names emitted for enum elements end up quite unwieldy. They are prefixed with enum name because in C struct, union and enum names are in a separate namespace, allowing for overlaps with identifiers, e.g: enum E { E }; While such overlaps are possible in *theory*, people usually don't do it. If a typicall C naming convention is followed, we get enum something { SOMETHING_FOO, SOMETHING_BAR }; generating <STEM>SomethingSOMETHING_FOO and <STEM>SomethingSOMETHING_BAR. In addition to "safe" long names, generate shorter ones if the respective name is not taken. <STEM>SOMETHING_FOO and <STEM>SOMETHING_BAR are much nicer to work with. Signed-off-by: Nick Zavaritsky <[email protected]>
d3bc971
to
12b600e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While yes, this makes for shorter identifiers, it also makes it less deterministic who will get the prefix-less one in case of conflicts. The first-declared one? The one appearing first alphabetically? Is it random? This needs to be clearly defined.
I also kind of don't really see the problem, could you give a real-world example of an identifier that gets unwieldy because of this? There's a case to be made for anonymous enums, which I'm not sure we currently support; at least I couldn't find any examples.
I'm also not sure about adding GoFormatter.ShortEnumIdentifier, we already have EnumIdentifier that could probably be extended.
@@ -116,3 +65,46 @@ func TestObjects(t *testing.T) { | |||
qt.Assert(t, qt.StringContains(str, "Var1 *ebpf.Variable `ebpf:\"var_1\"`")) | |||
qt.Assert(t, qt.StringContains(str, "ProgFoo1 *ebpf.Program `ebpf:\"prog_foo_1\"`")) | |||
} | |||
|
|||
func TestEnums(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to write this test without the range over conflict true/false? I'm not a big fan of treating invariants like a list of test cases, because you inevitably end up sprinkling if <invariant>
around anyway.
These are technically separate tests; perhaps you could extract the series of qt.Assert
calls into a test()
closure within the function to avoid repetition.
} | ||
} | ||
|
||
func wsSeparated(terms ...string) *regexp.Regexp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this strictly required? IMO it's harder to read wsSeparated("barEnumNameV1", "barEnumName", "=", "1")
than simply "barEnumNameV1 barEnumName = 1"
. Or is the amount of whitespace not deterministic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the amount of whitespace not deterministic?
The amount of whitespace depends on the length of other identifiers in the set since formatter inserts extra spaces for alignment. It doesn't improve readability indeed, but makes it easier to update test code in the future.
types, err := sortTypes(typeNames) | ||
if err != nil { | ||
return err | ||
typeByName := map[string]btf.Type{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use make()
for consistency?
// sortTypes returns a list of types sorted by their (generated) Go type name. | ||
// | ||
// Duplicate Go type names are rejected. | ||
func sortTypes(typeNames map[btf.Type]string) ([]btf.Type, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the output no longer need to be sorted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://pkg.go.dev/text/template#hdr-Actions
{{range pipeline}} T1 {{end}}
The value of the pipeline must be an array, slice, map, or channel.
If the value of the pipeline has length zero, nothing is output;
otherwise, dot is set to the successive elements of the array,
slice, or map and T1 is executed. If the value is a map and the
keys are of basic type with a defined order, the elements will be
visited in sorted key order.
args.ObjectFile, | ||
} | ||
|
||
var buf bytes.Buffer | ||
if err := commonTemplate.Execute(&buf, &ctx); err != nil { | ||
return fmt.Errorf("can't generate types: %s", err) | ||
return fmt.Errorf("can't generate types: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why %v over %s? This change feels a little arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant %w.
Identifier: args.Identifier, | ||
ShortEnumIdentifier: func(_, element string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be made a part of EnumIdentifier
instead? Not sure why this deserves its own hook, it's just another step during enum identifier generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this deserves its own hook, it's just another step during enum identifier generation.
I propose generating BOTH long and short identifiers to maintain backwards compatibility.
@@ -12,58 +13,6 @@ import ( | |||
"github.com/cilium/ebpf/cmd/bpf2go/internal" | |||
) | |||
|
|||
func TestOrderTypes(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically same question as above. Do we have another way of ensuring stability in the generated output? Was this always unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we range
over a map in text/template
, items are first ordered by key.
// NB: This also deduplicates types. | ||
typeNames[typ] = args.Stem + args.Identifier(typ.TypeName()) | ||
tn := templateName(args.Stem) | ||
reservedNames := map[string]struct{}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first sight, I'd model this as a property to be specified during construction of the GoFormatter. Everytime the formatter generates an identifier (struct or otherwise), it can check uniqueness and push it to the set of seen names. I think it's a good feature to have in the formatter to prevent it from emitting invalid code, it doesn't need to be b2g-specific.
This way, you also avoid having to close over reservedNames
and typeByName
to get it into an ShortEnumIdentifier
function.
enum reassm_rc {
REASSM_RC_SUCCESS,
REASSM_RC_INTERNAL_ERROR,
REASSM_RC_TOO_MANY_REASSEMBLIES,
...
}; yields const (
frReassmRcREASSM_RC_SUCCESS frReassmRc = 0
frReassmRcREASSM_RC_INTERNAL_ERROR frReassmRc = 1
frReassmRcREASSM_RC_TOO_MANY_REASSEMBLIES frReassmRc = 2
.... This code is meant to be reused across several projects, therefore I can't simply have |
Let's talk about conflicts. Assuming a C compiler, the only conflict possible is with a type name. E.g. struct foo {};
enum E {foo}; Enum item name MUST be unique. The following program fails to compile: enum A {foo};
enum B {foo}; // error: redefinition of enumerator 'foo' In case of conflict, a type name wins. In theory, the input object file could have duplicate enum identifiers. It is possible with non-C languages or multiple C compilation units linked together with Types are visited in alphabetical order, items in a enum are visited in BTF order which typically matches declaration order. |
Constant names emitted for enum elements end up quite unwieldy. They are
prefixed with enum name because in C struct, union and enum names are in
a separate namespace, allowing for overlaps with identifiers, e.g:
While such overlaps are possible in theory, people usually don't do
it. If a typicall C naming convention is followed, we get
generating
<STEM>SomethingSOMETHING_FOO
and<STEM>SomethingSOMETHING_BAR
.In addition to "safe" long names, generate shorter ones if the
respective name is not taken.
<STEM>SOMETHING_FOO
and<STEM>SOMETHING_BAR
are much nicer to work with.