-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(send_queue): Persist failed to send errors #4137
Conversation
390f170
to
4e43dc2
Compare
Changelog: We now persist the error that caused an event to fail to send. The error `QueueWedgeError` contains info that client can use to try to resolve the problem when the error is not automatically retriable. Some breaking changes in the ffi layer for `timeline::EventSendState`, `SendingFailed` now directly contains the wedge reason enum. Use it in place of the removed variant of EventSendState
4e43dc2
to
a5e5c02
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4137 +/- ##
==========================================
- Coverage 84.68% 84.56% -0.12%
==========================================
Files 269 269
Lines 28818 28845 +27
==========================================
- Hits 24405 24394 -11
- Misses 4413 4451 +38 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, but some clarification about the serialization/deserialization of the BLOB in the DB when there are new/removed enum cases would be good.
Also, since this is related to the send queue, maybe a 2nd pair of eyes from @bnjbvr wouldn't hurt either.
/// If the event couldn't be sent because of an API error, it's marked as | ||
/// wedged, and won't ever be peeked for sending. The only option is to | ||
/// remove it. | ||
pub is_wedged: bool, | ||
pub error: Option<QueueWedgeError>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we update this doc comment, since an identical one was added below to is_wedged()
?
-- New send queue events, now persists the type of error causing it to be wedged | ||
ALTER TABLE "send_queue_events" | ||
-- Used as a value, thus encrypted/decrypted | ||
ADD COLUMN "wedge_reason" BLOB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how this works in Rust, would this still work if QueueWedgeError
adds/removes some case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the value is serialized (I suspect to JSON bytes) before being put into the table, and it has to be a blob because of the encryption case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no need to annotate this to mark it nullable? and that the default value is null when not set? Maybe it's the default with sqlite, and I don't know, but might be worth indicating in the comment then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. And yes the default in NULLABLE and NULL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job overall! I have issues with the first commit which loses us a bit of static typing and checkability (because of the use of stringly typed values and errors), so I'd like to take another look, otherwise the code looks great and most of my comments should be minor. Thanks!
MissingLocalEchoFailError, | ||
))), | ||
error: QueueWedgeError::GenericApiError { | ||
msg: MISSING_LOCAL_ECHO_FAIL_ERROR.into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for &str
to String
conversions, we prefer .to_owned()
.
/// the current encryption setting prohibit sharing with them. | ||
InsecureDevices { | ||
/// The insecure devices as a Map of userID to deviceID. | ||
user_device_map: HashMap<String, Vec<String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside the FFI layer, we're not using stringly-typing, so this would become a HashMap<OwnedUserId, Vec<OwnedDeviceId>>
, and similarly below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see 3e65753
/// can be taken before retry sending. | ||
#[derive(Clone, Debug, Serialize, Deserialize)] | ||
#[cfg_attr(feature = "uniffi", derive(uniffi::Enum))] | ||
pub enum QueueWedgeError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is related to the send queue, can we put it in the send queue file? Or is there a reason I'm missing to put it into deserialized_responses?
Also, should this use our common pattern of deriving thiserror::Error
, to make it clear it's an error type, and would avoid the manual display impl? (uniffi also has an uniffi::Error
derive which could be convenient here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send Queue is in matrix-sdk
and this error is persisted in the state storage matrix-sdk-base
.
So it cannot be in the send_queue.rs file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, can we put those in matrix-sdk-base
next to the QueueEvent
type, then? (or in a new file send_queue.rs somewhere in that crate)
/// the current encryption setting prohibit sharing when it happens. | ||
IdentityViolations { | ||
/// The users that are expected to be verified but are not. | ||
users: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vec<OwnedUserId>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see here #4137 (comment)
-- New send queue events, now persists the type of error causing it to be wedged | ||
ALTER TABLE "send_queue_events" | ||
-- Used as a value, thus encrypted/decrypted | ||
ADD COLUMN "wedge_reason" BLOB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no need to annotate this to mark it nullable? and that the default value is null when not set? Maybe it's the default with sqlite, and I don't know, but might be worth indicating in the comment then?
crates/matrix-sdk/src/send_queue.rs
Outdated
@@ -1171,8 +1175,8 @@ pub enum LocalEchoContent { | |||
/// A handle to manipulate the sending of the associated event. | |||
send_handle: SendHandle, | |||
/// Whether trying to send this local echo failed in the past with an | |||
/// unrecoverable error (see [`SendQueueRoomError::is_recoverable`]). | |||
is_wedged: bool, | |||
/// error (see [`SendQueueRoomError::is_recoverable`]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep unrecoverable
here. It has a precise meaning in the context of this file, i.e. retrying would lead to the same error again and again (vs network errors that may be transient).
Yeah, I started by doing this refactoring because persisting all the different kind of errors required to make a lot of things serializable, and at the end this is not used at all by the application layer (that was doing a mapping of the error to usable SendingFailed types). I could have done without the |
Good point about needing to ser/de all the error types. In that case, can we keep the String for the generic unrecoverable error case, and use proper types for the other error variants? |
Hello @bnjbvr, What do you think about the following changes:
So my idea is that we keep the global sdk See this commit for this change: e824fd2 As requested I also put back the ruma OwnedUserId/DeviveId inside the QueueWedgeError variant instead of raw string. I did that initially so that I can directly create ffi bindings. But I see the point, and now I separated 2 versions of the error with a string one for bindings. |
Sounds great, thanks, I'll take another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, this is on the right track. I might have misunderstood the intent to store the most generic error type when signalling the error to callers, and it seems we can achieve this; can you try it out please? :)
/// can be taken before retry sending. | ||
#[derive(Clone, Debug, Serialize, Deserialize)] | ||
#[cfg_attr(feature = "uniffi", derive(uniffi::Enum))] | ||
pub enum QueueWedgeError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, can we put those in matrix-sdk-base
next to the QueueEvent
type, then? (or in a new file send_queue.rs somewhere in that crate)
pub enum QueueWedgeError { | ||
/// This error occurs when there are some insecure devices in the room, and | ||
/// the current encryption setting prohibit sharing with them. | ||
#[error("There are insecure devices in the room")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this error message, you won't see the user_device_map
in the display impl. Saying here in case it's not intended; to display them, you'd need to add something like : {user_device_map:?}
to the string above, and it will include the debug display of the btreemap.
@@ -577,6 +589,28 @@ impl RoomSendQueue { | |||
} | |||
} | |||
|
|||
impl From<&crate::Error> for QueueWedgeError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: using a From
impl makes it part of the public API of QueueWedgeError
, which we probably don't want to do, since it's an internal conversion. Can you use a plain function instead, please?
Also, take the crate::Error
by ownership (i.e. no ref &
), since this could avoid some cloning if I'm reading the code correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, take the crate::Error by ownership (i.e. no ref &), since this could avoid some cloning if I'm reading the code correctly.
Retracting this part: the error is used by ownership thereafter (by Arc
ing it). I'd still like to not use From
to avoid adding to the public API of QueueWedgeError
please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in the timeline controller to update the in memory local event EventSendState. Why is it a issue that it is a public API?
let msg = assert_matches!( | ||
error, | ||
QueueWedgeError::GenericApiError { msg } => msg | ||
); | ||
assert_eq!( | ||
msg, | ||
"the server returned an error: [413 / M_TOO_LARGE] Sounds like you have a lot to say!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(with respect to the previous comment, this would mean undoing this, since we could still pattern-match the right client API error here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this one see #4137 (comment)
EventSendState is using QueueWedgeError
and not the full crate::Error
. Notice that for this particular error we propbably will want to add a proper variant for QueueWedgeError
to have a proper media local echo feedback
So as to help landing this PR, I've pushed a small commit that keeps the whole |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Approving assuming no dramatic changes for complement crypto)
Thanks for the PR, notably good job getting through all the layers!
Modify the SendQueue in order to persist the error that cause the event to fail to send as a
QueueWedgeError
. TheQueueWedgeError
is not a 1:1 mapping for all kinds of errors, but holds variant and information that the client can react to in order to propose "quick fixes"/solution before retrying to send.Fixes #3973
Also fixes element-hq/element-x-ios#3287 because when a timeline reset occurs the fail to send reason is also lost.
This PR starts with a refactoring commit e769600 to introduce the new
QueueWedgedError
and move the logic that was in the ffi layer to convert api errors to SendState error variant. ThisQueueWedgedError
can be directly use in theSendingFailed
variant and expose to ffi.Second commit 109c133 adds the persistence,
QueuedEvent
now have an optional error field instead of ais_weged
boolean. Same for LocalEchoContent::Event.Adds also Migration for sqlite and indexeddb
Signed-off-by: