Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api/view: improve the collect of the Schema1 events #1147

Draft
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

goneri
Copy link
Contributor

@goneri goneri commented Jun 19, 2024

Jira Issue: https://issues.redhat.com/browse/AAP-25231

Description

Harmonize how we collect we prepare the Schema1 events and how we
do the payload validation and the exception handling.

Changes:

  • Reduce the complexity of the middleware
  • Revert AAP-17357: multi-task suggestions should preserve the original task name #692, which will be replace by a better solution
  • No more event when a payload Validation fails. The current design (one single end-point can have several different structures per type of payload) is too complicated and we can replace this we an generic validationFailure event.

Testing

Steps to test

  1. Pull down the PR
  2. ...
  3. ...

Scenarios tested

Production deployment

  • This code change is ready for production on its own
  • This code change requires the following considerations before going to production:

@goneri goneri marked this pull request as draft June 19, 2024 22:50
@goneri goneri force-pushed the goneri/api-view-improve-the-collect-of-the-Schema1-events_2563 branch 2 times, most recently from 37d6c9d to 97ba1a3 Compare June 19, 2024 23:46
@goneri goneri marked this pull request as ready for review June 21, 2024 18:10
@goneri goneri marked this pull request as draft June 21, 2024 18:10
@goneri goneri force-pushed the goneri/api-view-improve-the-collect-of-the-Schema1-events_2563 branch from 97ba1a3 to 4beada0 Compare June 25, 2024 12:14
@goneri goneri force-pushed the goneri/api-view-improve-the-collect-of-the-Schema1-events_2563 branch 2 times, most recently from 93d00e5 to 6396a9e Compare June 25, 2024 23:16
Copy link
Contributor

@manstis manstis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments...

It'll be an interesting exercise to update the Completions view too which buries the segment code in a class away from the View.

ansible_ai_connect/ai/api/views.py Show resolved Hide resolved
ansible_ai_connect/ai/api/views.py Show resolved Hide resolved
ansible_ai_connect/ai/api/views.py Outdated Show resolved Hide resolved
@goneri goneri force-pushed the goneri/api-view-improve-the-collect-of-the-Schema1-events_2563 branch 4 times, most recently from 501a8b4 to aff6816 Compare June 26, 2024 21:28
@goneri
Copy link
Contributor Author

goneri commented Jun 27, 2024

unit-tests -> FAILED (failures=54, errors=80, skipped=1)

@goneri goneri force-pushed the goneri/api-view-improve-the-collect-of-the-Schema1-events_2563 branch 7 times, most recently from a3637b1 to 269f206 Compare June 27, 2024 23:40
@goneri
Copy link
Contributor Author

goneri commented Jun 28, 2024

unit-tests -> FAILED (failures=14, errors=18, skipped=1)

@goneri goneri force-pushed the goneri/api-view-improve-the-collect-of-the-Schema1-events_2563 branch 2 times, most recently from 7327124 to da722c3 Compare June 28, 2024 18:43
@goneri goneri force-pushed the goneri/api-view-improve-the-collect-of-the-Schema1-events_2563 branch 4 times, most recently from 0c55b44 to 7585c21 Compare June 28, 2024 23:46
Copy link
Contributor

@manstis manstis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments for consideration. There's only one strong comment about passing the Schema1 event to middleware as a global.


schema1_event = schema1.PostprocessLint()
schema1_event.postprocessed = postprocessed_yaml
schema1_event.problem = problem
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line needed? You do the same on L#98

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, thanks!

elif event_type == "ansible-lint":

schema1_event = schema1.PostprocessLint()
schema1_event.postprocessed = postprocessed_yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line needed? You do the same on L#97

@@ -63,6 +63,7 @@ def process(self, context: CompletionContext) -> None:
# Note: Currently we return an array of predictions, but there's only ever one.
# The tasks array added to the completion event is representative of the first (only)
# entry in the predictions array
# https://github.com/ansible/ansible-ai-connect-service/blob/0e083a83fab57e6567197697bad60d306c6e06eb/ansible_ai_connect/ai/api/pipelines/completion_stages/response.py#L64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment meant to be here?

ansible_ai_connect/ai/api/telemetry/schema1.py Outdated Show resolved Hide resolved
from ansible_ai_connect.ai.api.data.data_model import APIPayload
from ansible_ai_connect.ai.api.exceptions import (
FeedbackValidationException,
from ansible_ai_connect.ai.api.exceptions import ( # FeedbackValidationException,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't need this comment.



def send_schema1_event(event_obj) -> None:
print(f"SENDING SCHEMA1 EVENT (name={event_obj.event_name})\n{event_obj.as_dict()} ({type(event_obj)})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a logger vs print?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a debug trace will be removed later.

for k, v in self.validated_data.items():
if k in allowed and v:
ret[k] = v
elif isinstance(v, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make handling of dict recursive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not too. It's just for the Feedback payloads, where we've a sublevel.

# everything.
import ansible_ai_connect.main.middleware

ansible_ai_connect.main.middleware.global_schema1_event = self.schema1_event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oeeeee... this smells bad, does it not!?!? Is this thread safe?

Copy link
Contributor Author

@goneri goneri Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is not thread-safe, but use one process per request. This being said, we don't really need to read the final request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now use a Semaphore to protect the global variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could/should we just stuff the schema1_event in the request object instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both strategy are ugly and can be avoided. At this stage we use the middleware only to get the final HTTP code/message, and those can easily be retrieved from the final exception that we return https://github.com/ansible/ansible-ai-connect-service/pull/1147/files#diff-ecfb6919dfd8379aafba96af7457b253e4dce528897dfe6bfc207ca2b3b2ada9R169

Django will return a 500 unless the exception has a specific code. There is still potential 500 if the exception happens outside of the view, and we can use the exception_handler() to catch them.

@@ -469,142 +457,91 @@ def perform_content_matching(
logger.exception(f"error serializing final response for suggestion {suggestion_id}")
raise InternalServerError

except ModelTimeoutError as e:
exception = e
except ModelTimeoutError as exc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't all of these except's handled by OurAPIView (or at least is that not the intention)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can also remove some of them in #1168, but I was just waiting for the unit-tests to pass before starting more changes.


response = self.get_response(request)
if global_schema1_event:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment earlier about use of this and thread-safety.

goneri added 7 commits July 3, 2024 13:53
AAP-17357 is hard to maintain, only cover a subset of what AAP-24049 will do and finally complexify
the refactoring of the Schema1 event managment, See:
  #1147)
Harmonize how we collect we prepare the Schema1 events and how we
do the payload validation and the exception handling.
Ensure `test_completions_preprocessing_error_without_name_prompt` as
the segment key set to avoid the `segment write key not set` error.
@goneri goneri force-pushed the goneri/api-view-improve-the-collect-of-the-Schema1-events_2563 branch from 30c4206 to c6fdc78 Compare July 3, 2024 17:54
@goneri goneri removed the pr-deployment To trigger a deployment based on this PR's image label Jul 3, 2024
Copy link
Contributor

@romartin romartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @goneri ,

Looks a great code improvement, thanks!!

I just did a quick code review, added a few comments only. When this is ready for testing , please let me konw and will give it a try!!

version_info = VersionInfo()


@define
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use @frozen instead, so the class object itself is immutable, same applies for other classes

@define
class SentimentFeedbackEvent(BaseFeedbackEvent):
event_name: str = "sentimentFeedback"
value: int = field(validator=validators.instance_of(int), converter=int, default=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it makes sense also to validate the range of integers from 0 to 2? So other values will raise an exception.


@define
class ResponsePayload:
exception: str = field(validator=validators.instance_of(str), converter=str, default="")
Copy link
Contributor

@romartin romartin Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another comment that applies to all fields that have a validator, not just this one. It could be the case once the object instance is created that some validator fails? If so, it will throw an exception, and I don't see any try/catch blocks around those instance creations around the code.

For schema2 I just create every instance by using a lambda function, so the instance builder method does internally the try/catch, and in case any validation error in some segment event, it sends another segment event (schema1) that describes the error

self.exception = exc

# Mapping between the internal exceptions and the API exceptions (with a message and a code)
mapping = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to also map WcaSuggestionIdCorrelationFailure/WcaSuggestionIdCorrelationFailureException mapping? These exception names are changed in #1177

What about ModelTimeoutError/ModelTimeoutException too? You don't have a re-try/timeout on Playbook gen/exp (yet) but I think it has been requested... so you'll need these exceptions in your refactoring too.

self.schema1_event = None
response = super().dispatch(request, *args, **kwargs)

if self.schema1_event:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the request fails before initial is called, this won't be set. One example where this occurs is if authentication fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an assumed decision. Those authentication errors should be handled differently IMO.

goneri added a commit that referenced this pull request Jul 10, 2024
AAP-17357 is hard to maintain, only cover a subset of what AAP-24049 will do and finally complexify
the refactoring of the Schema1 event managment, See:
  #1147)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants