HildisviniOttar/thorchain_vulnerability_consensus.md

## thorchain_vulnerability_consensus.md

      
    Raw
  

              thorchain_vulnerability_consensus.md
            
          
    Forging tx consensus

THORChain works by Bifrost observing external chains and posting MsgObservedTxIn messages to all nodes on what it thinks it saw. Each node does the same, and when 2/3 consensus is achieved, it is considered truth and the transaction is processed.
Thornode keeps track of consensus via an ObservedTxVoter, uniquely identified by tx ID (a sha256 hash):
voter, err := h.mgr.Keeper().GetObservedTxInVoter(ctx, tx.Tx.ID)
MsgObservedTxIn are only able to be sent by active nodes:
if !isSignedByActiveNodeAccounts(ctx, h.mgr, msg.GetSigners()) {
	return cosmos.ErrUnauthorized(fmt.Sprintf("%+v are not authorized", msg.GetSigners()))
}
An evil node operator is able to front-run a legitimate tx and stuff the voter with the fake tx which is used when the real tx arrives and reaches consensus. This enables the attacker to steal unlimited funds.
The Attack

Firstly an evil node operator creates a legitimate utxo transaction (e.g. Litecoin, Bitcoin, ...) and posts it to the mempool. Example:
To: Asgard inbound vault
From: Attacker
Amount: 0.001 BTC
Memo: SWAP:ETH.ETH:<Attackers ETH Address>

The attacker then obtains the tx ID of their unconfirmed tx. For example ABC123.
The attack requires a chain that has some time between posting and being mined, where the transaction ID is known ahead of time.
The attacker then crafts a fake MsgObservedTxIn with the same details as the real tx, except the amount is forged (eg 1000 BTC). For example:
tx = { ABC123, BTC.BTC, From:Attacker, To:Asgard, Coins: 1000 BTC, Memo: SWAP:ETH.ETH:<Attacker> }

MsgObservedTxIn {
  [ ObservedTx{ tx, BlockHeight: X, ObservedPubKey: Asgard, FinaliseHeight: X, etc. } ],
  Signer: Attackers node pub key
}

The attacker signs (with node private key) and posts this message to Tendermint. All nodes accept and process it, creating a voter record for this fake tx.
Minutes later, the real tx is mined. All nodes Bifrosts observe the legitimate tx (with the same hash) and consensus is achieved on the first fake message, resulting in 1000 BTC swap to ETH to the attacker.
Funds at Risk

The size of the UTXO pools or non-UTXO pools, whichever is smallest. Worst case (for THORChain) around $20-30m.

The outbound delayer and humans could pick up on the transactions and stop them, preventing this attack. In which case the stolen amount would be a lot lower.
The handler_observed_txin.go:914 vault.AddFunds(tx.Tx.Coins) uses the real tx coins for vault accounting. So vault never becomes insolvent. But when doing 1000 BTC double-swap to ETH, that will be pool accounting, so the pool becomes insolvent. The insolvency checker works by checking asgard vault balance vs real life, so in this case it wouldn't pick this up, because the vault accounting is correct. A further discussion to be had is whether abs(pool - all asgards - all yggdrasils) > small value is worthwhile to detect imbalances.
Both the human and automatic safety systems are very useful to prevent this attack from being successful.
Difficulty

Moderate: An attacker requires 300,000 RUNE to bond-in as an active validator, and to hand craft MsgObservedTxIn to post to Tendermint.

The attacker would also risk losing some value of their bond (~$3m todays prices) if RUNE dropped in price due to reputational damage.
Code Walkthrough

When the fake message is sent, all nodes pass ValidateCurrent (because it is signed by an active node). Normally a fake transaction would stop at the beginning of HandleCurrent because it never reaches consensus. But let's look what happens behind the scenes.
In handler_observed_txin.go:878 the voter is created from tx.Tx.ID transaction hash.
voter, err := h.mgr.Keeper().GetObservedTxInVoter(ctx, tx.Tx.ID)
For the first one, keeper_txin.go getObservedTxVoter() will return the mostly empty ObservedTxVoter{TxID: hash}.
Next, preflightV1() is called with this (empty) voter, fake tx and activeNodeAccounts. This will return ok false, and our first run through here stops (no consensus, as designed).
However inside preflightV1(), let's see what happens with the fake tx:
if !voter.Add(tx, signer) {
	return voter, ok
}
This Add() is called and because voter is empty, the important part here is it creates the first (remember for later) entry in m.Txs and returns true.
Next we call if voter.HasFinalised(nas) which fails (no consensus), and also the next block voter.HasConsensus(nas) fails also.

The voter is saved:
h.mgr.Keeper().SetObservedTxInVoter(ctx, voter)
And we return ok false, which prevents HandleCurrent from progressing any further.
So for the fake tx, what we accomplished is stuffing voter.Txs with a fake tx matching the expected tx hash as our inbound tx.
The real tx is mined

Moments later, the real tx is mined. Every Bifrost observes the real tx and sends in their own MsgObservedTxIn. Let's see what happens now:
voter, err := h.mgr.Keeper().GetObservedTxInVoter(ctx, tx.Tx.ID)
The same voter is pulled because the ID matches our fake one.
We call preflightV1 with the real tx and pre-stuffed voter.
if !voter.Add(tx, signer) {
	return voter, ok
}
The Add() here will loop through range m.Txs, and continue because (fake) transaction does not equals (real) observedTx. This is because .Equals() matches on every field in the tx, so a single difference like Coins, is a totally different tx. This is by design: we don't want a fake tx achieving consensus piggy-backing on a real one with different parameters.
Therefore we get to the bottom again:
observedTx.Signers = []string{signer.String()}  <-- Real TX
m.Txs = append(m.Txs, observedTx)               <-- Txs now has [fake, real]
return true
Once again, this has not reached consensus yet. So we return ok = false from handlePreflightV1() and wait.
The 2nd and Consensus-1 transactions will call Add() and add their 'vote' to the correct (real) tx:
if votedIdx != -1 {
	return m.Txs[votedIdx].Sign(signer)  <-- Adding vote to real tx.
}
Eventually consensus is reached and the if voter.HasFinalised(nas) returns something different. Let's take a look (getting close to the bug):
// HasFinalised is to check whether the tx with finalise = true  reach super majority
func (m *ObservedTxVoter) HasFinalised(nodeAccounts NodeAccounts) bool {
	finalTx := m.GetTx(nodeAccounts)     <-- LETS LOOK IN HERE
	if finalTx.IsEmpty() {
		return false  <-- Normally we return here when lack of consensus because GetTx returns an empty ObservedTx{} when no consensus.
	}
	return finalTx.IsFinal()   <-- This is just ObservedTx.FinalisedHeight == ObservedTx.BlockHeight (attacker can control to make true).
}
Looking deeper into GetTx(nodeAccounts):
// GetTx return the tx that has super majority
func (m *ObservedTxVoter) GetTx(nodeAccounts NodeAccounts) ObservedTx {
	if !m.Tx.IsEmpty() && m.Tx.IsFinal() {      <-- m.Tx is not set yet, so this doesn't happen
		return m.Tx
	}
	finalTx := m.getConsensusTx(nodeAccounts, true)   <-- Let's look here. I wonder what evil tx this will return :)
	if !finalTx.IsEmpty() {
		m.Tx = finalTx            <-- Here we set m.Tx to what was returned above! Remember this.
	} else {
		discoverTx := m.getConsensusTx(nodeAccounts, false)
		if !discoverTx.IsEmpty() {
			m.Tx = discoverTx
		}
	}
	return m.Tx
}
As you can see from my notes above, we get a finalTx from getConsensusTx(_, true) and as long as the txID is not empty string, it gets set to voter .Tx field.
Finally the problem function, getConsensusTx. This should loop through voter .Txs and return the one that has consensus. If no consensus, then return empty ObservedTx (which fails IsEmpty() upstream causing everything else to return false upstream).
In our case though, we have stuffed Txs with [fake, real]. Let's take a look at first run through with fake tx:
func (m *ObservedTxVoter) getConsensusTx(accounts NodeAccounts, final bool) ObservedTx {
	var txFinal ObservedTx
	voters := make(map[string]bool)
	for _, txIn := range m.Txs {        <-- First run through, txIn is our fake tx (1000 BTC)
		if txIn.IsFinal() != final {      <-- We control .FinalisedHeight and .BlockHeight so we can make this true and pass.
			continue
		}
		if txFinal.IsEmpty() {   <-- This is true first run through
			txFinal = txIn         <-- txFinal is now the evil tx
		}
		if !txFinal.Tx.ID.Equals(txIn.Tx.ID) {   <-- This is true first pass (equal to itself)
			continue
		}

		for _, signer := range txIn.GetSigners() {    <-- Only one signer (attacker node) for this tx.
			_, exist := voters[signer.String()]
			if !exist && accounts.IsNodeKeys(signer) {
				voters[signer.String()] = true             <-- One vote 
			}
		}
	}
	if HasSuperMajority(len(voters), len(accounts)) {
		return txFinal
	}
	return ObservedTx{}
}
The second run through is the interesting part:
func (m *ObservedTxVoter) getConsensusTx(accounts NodeAccounts, final bool) ObservedTx {
	var txFinal ObservedTx
	voters := make(map[string]bool)
	for _, txIn := range m.Txs {        <-- txIn is the REAL tx (0.0001 BTC)
		if txIn.IsFinal() != final {      <-- Passes for small tx
			continue
		}
		if txFinal.IsEmpty() {   <-- This is false: txFinal is our fake tx
			txFinal = txIn         <-- Doesn't get set
		}
		if !txFinal.Tx.ID.Equals(txIn.Tx.ID) {   <-- This is true: our fake tx has same tx hash. We pass here too. :)
			continue
		}

		for _, signer := range txIn.GetSigners() {    <-- Every node has seen the real tx
			_, exist := voters[signer.String()]
			if !exist && accounts.IsNodeKeys(signer) {
				voters[signer.String()] = true             <-- Many votes added 
			}
		}
	}
	if HasSuperMajority(len(voters), len(accounts)) {  <-- Passes supermajority test because of our real tx
		return txFinal      <--  FUCK: WE JUST RETURNED EVIL TX !!!! 
	}
	return ObservedTx{}
}
OK, so our fake tx just got returned from the getConsensusTx. This is passed back to the last call site as finalTx which gets set to voter.Tx, and eventually everything returns true.
Back in preflightV1:
	if voter.HasFinalised(nas) {        <-- This just returned true and internally set voter.Tx = evil
		if voter.FinalisedHeight == 0 {   <-- This passes first time
			ok = true                       <-- Excellent
			voter.FinalisedHeight = common.BlockHeight(ctx)
			voter.Tx = voter.GetTx(nas)     <-- Here we also set voter.Tx = evil for the second time
At the end of preflightV1 we save voter and return voter, ok where ok is true. We just passed consensus.
Back in handleCurrent

We reach the spot:
// all logic after this  is after consensus
Continuing execution, remember that tx this time is the real tx (with low coins value), and voter.Tx is the fake tx.
var txIn ObservedTx  <-- Starts Empty
if voter.HasFinalised(activeNodeAccounts) || voter.HasConsensus(activeNodeAccounts) { <-- Pass
	voter.Tx.Tx.Memo = tx.Tx.Memo  <-- Set Fake memo to equal real memo. This is okay: we control both so nothing happens here.
	txIn = voter.Tx                <-- txIn is now fake tx (<mr burns laugh>)
}
Skipping a few lines, we reach memo decode:
memo, _ := ParseMemoWithTHORNames(ctx, h.mgr.Keeper(), tx.Tx.Memo)
This creates a memo using real tx memo (still attacker controlled).
We're nearly done: we reach processOneTxIn:
m, txErr := processOneTxIn(ctx, h.mgr.GetVersion(), h.mgr.Keeper(), txIn, msg.Signer)
Here every node passes in the fake txIn, receive a valid m and executes it, sending attacker as much coins as they want.
Victory.
How to fix

The voter needs ability to hold multiple Tx to stop other forms of this fraud. So the fix is probably inherently in the getConsensusTx which returned the wrong tx.
Instead it should check for consensus of each tx individually and only return the tx that on its own reaches consensus, instead of aggregating consensus and returning the first one in the array.
Reported

To @heimdall 11am AEST 24 Aug 2021.
Special thanks and credit shared also to Heimdall who helped step through and come up with this exploit: originally I had it the other way around (trying to send fake tx after real tx) which doesn't work due to voter.FinalisedHeight checks, however during the call with Heimdall stepping through the txID ideas, realised the reverse would be possible which we stepped through together to confirm this vulnerability.
Disclosure

Will be disclosed publicly once fixed.