Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(legacy-refunds): support going the spending path on refund #2280

Open
wants to merge 23 commits into
base: dev
Choose a base branch
from

Conversation

mariocynicys
Copy link
Collaborator

This PR attempts to follow a more robust recovery process by attempting both refunding OR spending if the swap goes the ugly way.
This is done by replacing the refund-only logic with refund-or-spend-logic (recovery, recover_funds).

Also recover_funds has been adapted to return more structured error types for information about retrials and such. Also removing some pre-checks that makes us fail early based on local data (e.g. is_swap_finished), as we should still query the rpc to make sure our recovery tx isn't lost or something.

modularize these funcs and introdcue a new error type to filter errors and possible retrys.
tests are not yet adapted.
not finished and on successful checking swaps are removed. these tests were used to check that we error on recover funds when:
- swap is not finished yet: there is not reason not to try recover funds still
- swap was successful (determined the existance of taker/maker payment spend): we should still attempt to recover funds in this case as the tx might be re-orged or some weird thing happned. there is no downside of retrying.

also test_recover_funds_maker_swap_maker_payment_refunded was used to test that we check local data to determine that the swap failed. again this isn't the followed approach here. even if we store maker_payment_refund tx locally (so we should have sent it when the swap was running) we will still attempt to run recover funds and look for the tx on-chain.
this test could have been adapted but then it looks exactly like test_recover_funds_maker_payment_refund_already_refunded, so there is no point of that.
namely, we do try to refund (maker payment) or spend (taker payment), which ever is possible.
we might want to change refund_maker_payment name to recover or something
same as what's done with the maker.
thought much more useful here because on the maker side we can't really miss spending the taker's payment while getting our own payment spent :p
Copy link
Member

@borngraced borngraced left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, looks cleaner. My only note is regarding error handling.

mm2src/mm2_main/src/lp_swap/maker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/taker_swap.rs Outdated Show resolved Hide resolved
@laruh
Copy link
Member

laruh commented Nov 25, 2024

@mariocynicys could you please fix PR lint

@mariocynicys mariocynicys changed the title optimization(legacy-swap): support going the spending path on refund fix(legacy-refunds): support going the spending path on refund Nov 25, 2024
Copy link
Member

@laruh laruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix, here my notes

mm2src/mm2_main/src/lp_swap/maker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/maker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/taker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/taker_swap.rs Outdated Show resolved Hide resolved
mm2src/mm2_main/src/lp_swap/maker_swap.rs Outdated Show resolved Hide resolved
borngraced
borngraced previously approved these changes Dec 2, 2024
Copy link
Collaborator

@shamardy shamardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. First review iteration!

A few suggestions/discussion points:

  • Maybe we should add start/stop swap rpc for @cipig to use to stop endlessly failing swaps until we fix the bugs that lead to this.
  • We will need this for TPU, please add it to an issue checklist.

Comment on lines 1515 to 1518
.maker_coin
.can_refund_htlc(maker_payment_lock)
.await
.map_err(RecoverSwapError::Irrecoverable)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can_refund_htlc can sometimes return a recoverable error, please check utxo implementation for this ref.

let mtp = coin.get_current_mtp().await?;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm2src/mm2_main/src/lp_swap.rs Show resolved Hide resolved
// Roll back to confirming the maker payment spend.
RecoveredSwapAction::SpentOtherPayment => {
info!("Refund canceled. Maker payment spend tx {:02x}", tx_ident.tx_hash);
// TODO: We prepared for refund but didn't finalize refund. This must be breaking something for lightning.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we ever hit this code in lightning? I guess it can happen for taker swap code but it's not a big problem anyways since the finalize refund is just to allow the maker get his payment back instantly instead of waiting for the timelock to expire, and if we spend it, it means it can't be failed backwards. For maker swap code it's a different case, maker fails the htlc backwards before this step, so we will never hit this code.
Usually such things in code is a cue for a needed refactor but not in this PR of course.

Copy link
Collaborator Author

@mariocynicys mariocynicys Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is only relevant for lightning receiving maker. and yeah if the maker fails the HTLC then we can't hit SpentOtherPayment.
removed the todo 70afcde

Comment on lines 1967 to 1968
// FIXME: We should try to `recover_funds` again after locktime. The taker payment might have been
// spent by the maker at this point and we should go for spending the maker payment instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, wait_for_htlc_refund for lightning doesn't depend on locktime, but this is a general code and we should handle this case. For the lightning case, if we got payment successful from here

_ => Ready(MmError::err(RefundError::Internal(ERRL!(
"Payment {} has an invalid status of {} in the db",
payment_hex,
payment.status
)))),

It means that we should be spending the maker payment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that makes sense in the general sense. if the wait_for_htlc_refund fails we can try to recover again.

Copy link
Member

@cipig cipig Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is one case that comes to my mind where it would be bad to retry endlessly: if the initial tx was reverted because of too low gas (so basically a misconfiguration from our side)... if you retry that tx, you will spend gas again, but the result will be the same if you haven't increased gas limits... so you loose the txfees every time you try
idk if we want/should account for this case, but thought i mention it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a good reason to have a start/stop rpc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A start/stop rpc doesn't solve this, as too much gas might have been already spent until the user or @cipig is aware of this, this could drain the user's wallet which is really bad. We need to mark swap as failed for this and this is one of the situations where we will still require manual recover funds usage. @cipig are you aware of other situations like this one, we shouldn't release this PR/fix unless we are sure it's completely safe to retry forever and all edge cases are covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is this #1567, but there you don't loose any fees when you retry forever
i guess reverted "out of gas" txes on EVM chains are the only problem where you loose money if you retry forever
there may be other reasons why a EVM chain reverts your tx, but i haven't encountered any other that would always fail
so short answer is "no", just this case :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covered here: a28c768

for the gas issue with reverted txs: what about retrying the refund in an exponential backoff manner? this greatly reduces the amount of failed recoveries and doesn't hurt the perf/speed that much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, 'reverted' means that tx was cancelled due to errors during the contract execution (for e.g. locktime has not been passed yet), so the tx may be retried yet.
A tx can be cancelled due to out of gas. To prevent retries in this case, I think, we may analyse the tx receipt in eth code and return some unrecoverable error to the swap code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tx can be cancelled due to out of gas. To prevent retries in this case, I think, we may analyse the tx receipt in eth code and return some unrecoverable error to the swap code.

gas spikes on Ethereum are typically temporary nope? and a transaction being "out of gas" is not necessarily a permanent failure.
I would suggest to mark swap failed in this case and stop the process, but allow user to retry spend-or-refun process later and start it manually. And also allow to stop if it takes too long time for them. If retry faced "out of gas" again, then mark as failed and stop process.

@laruh
Copy link
Member

laruh commented Dec 5, 2024

  • We will need this for TPU, please add it to an issue checklist.

I suppose you're referencing to this issue #1895 ?

We have lots of lists, so I just want to clarify. I also added eth coin todos here.

this field was convereted to a hashmap instead of a vector for easy access to the swap.
also we now manually delete the swap from running swaps when the swap is finished/inturepted (memory leak fix). as a consequence to manuallly deleting the swap from running_swaps, we can now store them as arcs instead of weakrefs, which simplifies a lot of .upgrade calls.
)));
},
};
let taker_coin = match swap.taker_coin_ticker() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code repetition for finding maker_coin and taker_coin.
(BTW could we just use lp_coinfind_or_err in all such cases?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. i didn't like it either but needed some assertion xD

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -516,7 +532,7 @@ struct LockedAmountInfo {
}

struct SwapsContext {
running_swaps: Mutex<Vec<Weak<dyn AtomicSwap>>>,
running_swaps: Mutex<Vec<(Weak<dyn AtomicSwap>, AbortOnDropHandle)>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell why do we need this change?

Is it related to some review note? may be I missed smth

Copy link
Collaborator Author

@mariocynicys mariocynicys Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah actually, not discussed in review. @shamardy just told me we need the swaps to be stoppable via rpc (since we now do a run forever recovery), that's why we record their abort handles to be able to stop them mid-recover (or even mid-swap).

@laruh laruh added the priority: medium Moderately important tasks that should be completed but are not urgent. label Jan 10, 2025
};
// Run the swap in an abortable task and wait for it to finish.
let (swap_ended_notifier, swap_ended_notification) = oneshot::channel();
let abortable_swap = spawn_abortable(async move {
Copy link
Collaborator

@dimxy dimxy Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we create an abortable future to run the swap loop (for using this in the new stop rpc I believe).
But run_taker_swap fn, where we are now, is also wrapped in an abortable future. Could we use this one in the stop rpc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but how will we grab that abort handle and supply it to run_taker_swap in the first place?
also kickstart_thread_handler uses run_taker_swap inline (without spawning it), well eventually if u follow the usage ofc u will find kickstart_thread_handler is itself being spawned somewhere. but it's hard to propagate these handles all the way down to the swap runner methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit odd when we use different futures to stop the same swap in different use cases (abort_all and stop_swap_rpc). Maybe we could get the handle for a future from the abortable_system for stop_swap_rpc, so we would use same abort technique in both cases. Maybe some modification for AbortableQueue is needed for that

@dimxy
Copy link
Collaborator

dimxy commented Jan 14, 2025

test_mm2_stops_immediately seems not working btw. I guess clear_running_swaps() is needed here too?

@mariocynicys
Copy link
Collaborator Author

test_mm2_stops_immediately seems not working btw. I guess clear_running_swaps() is needed here too?

yup. I'll just wait till we finish reviewing the mem leak PR and merge it and then merge this with dev.

_touch = touch_loop => unreachable!("Touch loop can not stop!"),
};
// Run the swap in an abortable task and wait for it to finish.
let (swap_ended_notifier, swap_ended_notification) = oneshot::channel();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this code to eliminate oneshot::channel?

-    let (swap_ended_notifier, swap_ended_notification) = oneshot::channel();
-    let abortable_swap = spawn_abortable(async move {
+    let fut_with_touch = async move {
         select! {
             _swap = swap_fut => (), // swap finished normally
             _touch = touch_loop => unreachable!("Touch loop can not stop!"),
-        }
-        if swap_ended_notifier.send(()).is_err() {
-            error!("Swap listener stopped listening!");
-        }
-    });
-    swap_ctx.running_swaps.lock().unwrap().push((weak_ref, abortable_swap));
-    // Halt this function until the swap has finished (or interrupted, i.e. aborted/panic).
-    swap_ended_notification.await.error_log_with_msg("Swap interrupted!");
+        };
+    };
+    let (abortable, handle) = abortable(fut_with_touch);
+    swap_ctx.running_swaps.lock().unwrap().push((weak_ref, handle.into()));
+    if let Err(Aborted) = abortable.await {
+        info!("Swap uuid={} interrupted!", uuid);
+    }

@laruh laruh added priority: high Important tasks that need attention soon. and removed priority: medium Moderately important tasks that should be completed but are not urgent. labels Jan 16, 2025
@laruh
Copy link
Member

laruh commented Jan 16, 2025

@mariocynicys pr started to have conflicts

Comment on lines +2171 to 2191
.fuse()
});

let (abortable, handle) = abortable(async move {
select! {
_swap = swap_fut => (), // swap finished normally
_touch = touch_loop => unreachable!("Touch loop can not stop!"),
}
});
let uuid = running_swap.uuid;
swap_ctx
.running_swaps
.lock()
.unwrap()
.insert(uuid, (running_swap, handle.into()));
// Wait until the swap has finished (or interrupted, i.e. aborted/panic).
if abortable.await.is_err() {
info!("Swap uuid={} interrupted!", uuid);
}
// Remove the swap from the running swaps map.
swap_ctx.running_swaps.lock().unwrap().remove(&uuid);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a side effect introduced in this PR that i don't like:
if the parent of this func call (run_maker_swap/run_taker_swap) aborts mid-way, the swap will not abort. one could still abort it ofc from the running_swaps map.
but i think it's more intuitive to have the swap aborted when run_maker_swap is aborted without the need to abort it via running_swaps map.

i think what i want is to be able to run the swap in the current abortable system while still be able to shut the swap using the aborthandle. so there is a 2-way shutdown mechanism or something like that.

cc/ @dimxy

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think maybe we should create an abortable future (swap_fut+ touch_loop) in run_taker_swap(), add its abort_handle to both the abortable_system and running_swaps.
Then spawn this abortable future and do not await for it.

Copy link
Member

@borngraced borngraced left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last notes from my side.

Comment on lines +2187 to +2189
if abortable.await.is_err() {
info!("Swap uuid={} interrupted!", uuid);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the log level changed to info?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that should be error, thanks.

Comment on lines +535 to +537
if abortable.await.is_err() {
info!("Swap uuid={} interrupted!", uuid);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines +589 to +599
let swap = match SavedSwap::load_my_swap_from_db(&ctx, req.uuid).await {
Ok(Some(s)) => s,
Ok(None) => {
return MmError::err(KickStartSwapErr::NotFound);
},
Err(e) => {
return MmError::err(KickStartSwapErr::Internal(format!(
"Error getting the swap from the DB: {e}"
)))
},
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let swap = match SavedSwap::load_my_swap_from_db(&ctx, req.uuid).await {
Ok(Some(s)) => s,
Ok(None) => {
return MmError::err(KickStartSwapErr::NotFound);
},
Err(e) => {
return MmError::err(KickStartSwapErr::Internal(format!(
"Error getting the swap from the DB: {e}"
)))
},
};
let swap = SavedSwap::load_my_swap_from_db(&ctx, req.uuid)
.await
.mm_err(|e| KickStartSwapErr::Internal(format!("Error getting the swap from the DB: {e}")))?
.or_mm_err(|| KickStartSwapErr::NotFound)?;

Comment on lines +604 to +623
// Get the maker and taker coins.
let find_swap_coin = |ctx: MmArc, maybe_ticker: Result<String, String>, ticker_type: &'static str| async move {
match maybe_ticker {
Ok(coin) => match lp_coinfind(&ctx, &coin).await {
Ok(Some(coin)) => Ok(coin),
Ok(None) => MmError::err(KickStartSwapErr::NeedsCoinActivation(format!(
"{ticker_type} coin {} must be activated",
coin
))),
Err(e) => MmError::err(KickStartSwapErr::Internal(format!(
"Error trying to find {ticker_type} coin: {e}"
))),
},
Err(e) => MmError::err(KickStartSwapErr::Internal(format!(
"Error getting {ticker_type} ticker of swap: {e}"
))),
}
};
let maker_coin = find_swap_coin(ctx.clone(), swap.maker_coin_ticker(), "maker").await?;
let taker_coin = find_swap_coin(ctx.clone(), swap.taker_coin_ticker(), "taker").await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is better and this is too minimal to require cloning arc

Suggested change
// Get the maker and taker coins.
let find_swap_coin = |ctx: MmArc, maybe_ticker: Result<String, String>, ticker_type: &'static str| async move {
match maybe_ticker {
Ok(coin) => match lp_coinfind(&ctx, &coin).await {
Ok(Some(coin)) => Ok(coin),
Ok(None) => MmError::err(KickStartSwapErr::NeedsCoinActivation(format!(
"{ticker_type} coin {} must be activated",
coin
))),
Err(e) => MmError::err(KickStartSwapErr::Internal(format!(
"Error trying to find {ticker_type} coin: {e}"
))),
},
Err(e) => MmError::err(KickStartSwapErr::Internal(format!(
"Error getting {ticker_type} ticker of swap: {e}"
))),
}
};
let maker_coin = find_swap_coin(ctx.clone(), swap.maker_coin_ticker(), "maker").await?;
let taker_coin = find_swap_coin(ctx.clone(), swap.taker_coin_ticker(), "taker").await?;
let maker_coin = swap
.maker_coin_ticker()
.map_to_mm(|e| KickStartSwapErr::Internal(format!("Error getting maker ticker for swap: {e}")))?;
let maker_coin = lp_coinfind(&ctx, &maker_coin)
.await
.map_to_mm(|e| KickStartSwapErr::Internal(format!("Error getting maker ticker for swap: {e}")))?
.or_mm_err(|| KickStartSwapErr::NeedsCoinActivation(format!("maker coin {maker_coin} must be activated")))?;
let taker_coin = swap
.taker_coin_ticker()
.map_to_mm(|e| KickStartSwapErr::Internal(format!("Error getting taker ticker for swap: {e}")))?;
let taker_coin = lp_coinfind(&ctx, &taker_coin)
.await
.map_to_mm(|e| KickStartSwapErr::Internal(format!("Error getting taker ticker for swap: {e}")))?
.or_mm_err(|| KickStartSwapErr::NeedsCoinActivation(format!("taker coin {taker_coin} must be activated")))?;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.4.0-beta priority: high Important tasks that need attention soon. status: pending review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MM2 doesn't retry querying electrums on failed RPC requests
7 participants