valentinewallace/om_probing.md

## om_probing.md

      
    Raw
  

              om_probing.md
            
          
    Introduction

This document outlines a new payment probing scheme based on onion messages.
For context, in today’s lightning, payment reliability tends to heavily depend on having sufficient payment volume to determine current liquidity balances of channels, which is how most big nodes can tell whether a given channel has enough liquidity to forward a given amount. If a node is using HTLC probing to achieve this payment volume, they use a regular update_add_htlc message with a bogus payment hash, where the error returned informs the sender of whether the payment reached the final recipient. Note that there is a tradeoff between always routing through nodes that are known to rebalance their channels vs leaning on probing smaller nodes and “risking” payments through them based on what’s learned.
Today’s HTLC payment probing can work well, but risks channel liquidity being locked up if a peer along the route goes offline. On the upside, for just-in-time probes, it works to loosely “reserve” the channel liquidity along the route for the actual payment.
Onion messages (OMs) present a convenient way to probe without risking locked liquidity along the route.
Design Rationale

A naive approach could be to send onion message probes directly to individual nodes along the desired route, including those to which you don’t have a direct channel. However, this scheme is problematic because it would enable monitoring the payment flows of arbitrary nodes across the network without having to have a channel path to them. This would be a significant degradation of privacy because, for comparison, in HTLC probing it is quite difficult to probe the balances of far-off nodes. And if you can’t probe a node using HTLCs, you can’t send over it anyway, so there’s not a lot of benefit (and significant privacy downside).
Therefore, it is probably best to design a scheme that probes along channel paths, like HTLC probing. This adds more complexity to the case where an intermediate node cannot forward the desired amount due to the stateless nature of OMs, discussed further down.
Scheme

Happy Path

Let’s go through the happy path, where Alice is probing Alice > Bob > Carol > Dave for a 100k msat payment.
She’ll construct an onion message for Bob, the first hop, as such:

Bob receives this OM, sees that he’s able to forward 110k msats to his next-hop Carol, and forwards Carol’s onion packet to her. Carol sees she can forward 105k msats to Dave, and forwards his onion packet. Finally, Dave receives his onion packet, sees he can receive 100k msats from Carol, and uses the provided reply path to send a simple probe success onion message back to Alice:
onion_message_probe_result {
	data_tlv {
		type: 4242,
		probe_id: [u8; 32],
		can_forward_desired_amt: true,
	}
	.. regular OM TLVs
}

Note that Dave will use this same onion message if he can’t receive; he’ll just set can_forward_desired_amt to false.
Sad Path

As an example of the sad path for an intermediate node, if Carol can’t forward 105k msats to Dave, she’ll fail the probe back to Bob by sending this onion message:

This step justifies why we need to encode a failure onion for each intermediate hop of a probe. If we hadn’t done that, and Carol responded to Bob with an empty “failed” message, Bob would have no idea which peer to fail the probe back to, because OMs are stateless. Instead, Bob unwraps his error onion and sees that he needs to fail back to Alice with her provided error onion. Alice receives the failure onion and can see that Carol was not able to forward the desired amount corresponding to the probe id, thus completing the probe.
Outro

While there is nothing stopping nodes from lying about their ability to forward, they may be negatively scored if they do so. Further, adopting a protocol like this could help routing nodes attract more forwarding traffic, which is a nice incentive.
I view this feature as a low priority enhancement, given there are a lot more pressing issues in LN, but open to feedback. Mostly, I thought it could be useful to spark ideas and highlight the flexibility of OMs.