Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: export policy condition types #58

Merged
merged 1 commit into from
Mar 12, 2024
Merged

feat: export policy condition types #58

merged 1 commit into from
Mar 12, 2024

Conversation

sanjams2
Copy link
Contributor

== Motivation ==

Allow package consumers to use typed PolicyViolation 'Data' field

== Details ==

Currently, consumers of this library do not have access to the specific struct types that the 'Data' field of the PolicyViolation object can take on. This makes it near impossible (at least not without brittle reflection based techniques) to access inner fields of the struct in the 'Data' field. An example inner field is the 'ErrNum' field on an Xid PolicyViolation which is crucial for understanding the device health.

This change attempts to solve this by exporting the different struct types that the data field can take on. This allows consumers to cast the data field to a type which they will be able to more safely handle in their code.

== Testing ==

  • built package on g5 aws instance with base deep learning ubuntu AMI (preinstalled with dcgm). Validated tests passed:
ubuntu@<IP>:~/go-dcgm$ make
go build ./pkg/dcgm
cd samples/deviceInfo; go build
cd samples/dmon; go build
cd samples/health; go build
cd samples/hostengineStatus; go build
cd samples/policy; go build
cd samples/processInfo; go build
cd samples/restApi; go build
cd samples/topology; go build
go test -race ./tests
ok  	github.com/NVIDIA/go-dcgm/tests	(cached)
test $(gofmt -l . | tee /dev/stderr | wc -l) -eq 0

@sanjams2
Copy link
Contributor Author

@nvvfedorov do you mind a look at this one or assigning it to someone who can help review?

@nvvfedorov
Copy link
Collaborator

@sanjams2 , I have two requests:

  1. Please add an example of the library usage with a new change to illustrate the use typed PolicyViolation 'Data' field".
  2. Please read: https://github.com/NVIDIA/go-dcgm/blob/main/CONTRIBUTING.md and sign the change.

@sanjams2
Copy link
Contributor Author

sanjams2 commented Feb 25, 2024

Thanks @nvvfedorov

Here is an example:

ch, err := dcgm.ListenForPolicyViolations(ctx, dcgm.XidPolicy)
for notif := range ch {
    xidCondition := notif.Data.(dcgm.XidPolicyCondition) // <-- This casting is not possible without this change
    fmt.Printf("Received Xid violation with ErrNum: %d", xidCondition.ErrNum)
}

Commit has also been signed.

@rohit-arora-dev
Copy link
Collaborator

rohit-arora-dev commented Mar 4, 2024

@sanjams2

The current implementation does not allow insights into specific structures types that is because Data of the PolicyViolation struct is of type interface{} and different policy violation condition types are not exported.

An alternative to above proposed approach is to define a new type for Data called PolicyViolationCondition with a method that returns that condition in string format, e.g.

+type PolicyViolationData interface {
+       GetData() string
+}
+
 type PolicyViolation struct {
        Condition policyCondition
        Timestamp time.Time
-       Data      interface{}
+       Data      PolicyViolationData
+}

And, each of the condition can implement GetData() function e.g.:

type dbePolicyCondition struct {
	Location  string
	NumErrors uint
}
 
+func (p policyConditionParam) GetData() string {
+       return fmt.Sprintf("Location: %s, NumErrors: %d", p.Location, p.NumErrors)
+}
+

Now a caller now, need not re-cast types, e.g.:


ch, err := dcgm.ListenForPolicyViolations(ctx, dcgm.XidPolicy)
for notif := range ch {
    xidCondition := notif.GetData() // <-- No cast needed, and no need to export internal types.
    fmt.Printf("Received Xid violation: %s", xidCondition)
}

Let me know if this alternative approach helps you achieve your goal as it does not require end user to know policy violation condition type or its internal structure.

@sanjams2
Copy link
Contributor Author

sanjams2 commented Mar 4, 2024

@rohit-arora-dev thanks for proposing an alternative. I dont quite see how it solves the original problem though. For example, how do I get the ErrNum of a particular xidPolicyCondition?

@rohit-arora-dev
Copy link
Collaborator

Author

The xidPolicyCondition's GetData method will provide the ErrNum but in string format. e.g.

func (x xidPolicyCondition) GetData() string {
	return fmt.Sprintf("ErrNum: %d", x.ErrNum)
}

Do you have a use case where a direct access to the underlying struct attributes is required instead of getting them in string format?

@sanjams2
Copy link
Contributor Author

sanjams2 commented Mar 4, 2024

Do you have a use case where a direct access to the underlying struct attributes is required instead of getting them in string format?

Yes. In particular, different Xid error codes can have different remediation actions (i.e. restart process, terminate host, etc.). But beyond that, the data within these structs is only useful to have in the first place if you can access it directly.

@rohit-arora-dev
Copy link
Collaborator

@sanjams2

Thanks for the context, if the purpose is more than logging Xid error and use that information to perform other actions then access to Struct as well as its attributes is required.

Copy link
Collaborator

@rohit-arora-dev rohit-arora-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@rohit-arora-dev
Copy link
Collaborator

@sanjams2 Can you please sign your commit, see documentation here, then I will be able to merge your commit.

== Motivation ==

Allow package consumers to use typed PolicyViolation 'Data' field

== Details ==

Currently, consumers of this library do not have access to the
specific struct types that the 'Data' field of the PolicyViolation
object can take on. This makes it near impossible (at least
not without brittle reflection based techniques) to access
inner fields of the struct in the 'Data' field. An example
inner field is the 'ErrNum' field on an Xid PolicyViolation which
is crucial for understanding the device health.

This change attempts to solve this by exporting the different
struct types that the data field can take on. This allows
consumers to cast the data field to a type which they will
be able to more safely handle in their code.

== Testing ==

- built package

Signed-off-by: sanjams <[email protected]>
@sanjams2
Copy link
Contributor Author

signature was incorrect. Had to resign

@glowkey glowkey merged commit 150a87b into NVIDIA:main Mar 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants