CodeShark/BIP_segwit

## BIP_segwit
<pre>
SEGWIT BIP...yada yada...
</pre>

==Abstract==

This BIP defines a new structure called a "witness" that is committed to blocks separately from the transaction merkle tree. This structure contains data required to check transaction validity but not required to determine transaction effects. In particular, scripts and signatures are moved into this new structure.

The witness is committed in a tree that is nested into the block's existing merkle root via the coinbase transaction for the purpose of making this BIP soft fork compatible. A future hard fork can place this tree in its own branch.

==Motivation==

The entirety of the transaction's effects are determined by output consumption (spends) and new output creation. Other transaction data, and signatures in particular, are only required to validate the blockchain state, not to determine it.

By removing this data from the transaction structure committed to the transaction merkle tree, several problems are fixed:

1) Nonintentional malleability becomes impossible. Since signature data is no longer part of the transaction hash, changes to how the transaction was authorized is no longer relevant to transaction identification. This makes it possible to safely create unconfirmed transaction dependency chains.

2) Signature data becomes prunable.

3) Fraud proofs become possible.

4) Script can be changed with soft forks.

etc...

==Specification==

* CTransaction gets, in addition to vin and vout, a vwit, which
contains a CTxInWitness object for each input. A CTxInWitness contains a CScriptWitness object
and can potentially be extended to contain other kinds of witness data.
A CScriptWitness is a vector of byte vectors (nominally: the input stack to the program, no longer
encoded as a CScript, but just the resulting stack directly).

* A new serialization for CTransaction is defined: http://blockhawk.net/diagrams/witnesstx.png
(int32 nVersion,
0x00 marker, 0x01 flag, vector<CTxIn>, vector<CTxOut>,
vector<CTxInWitness>, int32 nLockTime) instead of (int32 nVersion,
vector<CTxIn>, vector<CTxOut>, int32 nLockTime). This will never parse
as a valid transaction (even if parsing succeeds, it means it's
interpreted as a transaction with no inputs and 1 output). If all
witnesses are empty, the old serialization format is used.
 - Rationale for not having an independent CWitnessTransaction with
its own serialization: this would require separate "tx" and "block"
messages, and all RPC calls operating on raw transactions would need
to be duplicated, or need inefficinent or nondeterministic guesswork
to know which type is to be used.
 - Rationale for not using just a single 0x00 byte as marker: that
would lead to empty transactions (no inputs, no outputs, which are
used in some tests) to be interpreted as new serialized data.
 - Rationale for the 0x01 flag byte in between: this will allow us to
easily add more extra non-committed data to transactions (like txouts
being spent, ...). It can be interpreted as a bitvector.

* A new message 'havewitness' is sent after receiving 'verack' to
indicate that a node can provide witness if requested (similar to
'sendheaders')

* New inv types MSG_WITNESS_TX and MSG_WITNESS_BLOCK are added, only
for use in getdata. Inv itself still use just MSG_TX and MSG_BLOCK,
similar to MSG_FILTERED_BLOCK.
 - Rationale for not advertizing witnessness in invs: we don't always
use invs anymore (with 'sendheaders' BIP 130), plus it's not useful:
implicitly, every transaction and block have a witness, old ones just
have empty ones.

* Transactions' GetHash is always computed on the old non-witness
serialization. A new CTransaction::GetWitnessHash is added which is
computed from the witness-serialization (this means that transactions
with an empty witness have witness hash equal to normal hash).

* A new block rule is added which requires a commitment (a merkle root
computed similarly to the normal transaction one) to the witness
hashes to be present as the last 32 bytes of
block.vtx[0].vin[0].scriptSig (it doesn't need to be a push). This
hopefully does not conflict with any other existing commitment
schemes. To make it extensible, an extra merkle path can be provided
(in the coinbase's "witness" field) so that coinbase commitment can be
used for multiple things.

* A scriptPubKey (or redeemscript in case of P2SH) that is just a
single push of bytes gets a new meaning (the pushed data is called the
"witness program").
 - Rationale for using a template (a single data push) and not a new
opcode: it can't function as an opcode, because it needs to look at
the exact contents of the scriptSig (exactly empty or just exactly the
P2SH redeemscript push, if we want to avoid malleability), while
normal opcode operations can only look at the resulting stack.
- Rationale for supporting witness programs inside P2SH: backward
compatibility with old sender software.

* A witness program starts with a version byte. If that version byte
is unknown, the script is treated as anyone can spend. This allows
softforking _any_ script change later on, by adding new version bytes
(IIRC suggested by Matt).

* Two versions of witness programs are defined: v0 and v1. v0 programs
are 0x00 + redeemscript (and have their initial stack in the witness).
v1 programs are 0x01 + SHA256(redeemscript) (and have redeemscript +
initial stack in the witness).
- Rationale for using SHA256 and not Hash160: P2SH-like scripts should
really be protected by a 256-bit hash, as there are possible collision
attacks (and 2^80 work is not infeasible anymore, see the Bitcoin
blockchain for an example...). Using SHA256 has the additional
advantage of being able to reuse the Hash160-based known script index
in keystore (because Hash160 = RIPEMD160 + SHA256).
- Rationale for having 2 versions: we want to move the contents of
long redeemscripts to the witness (as it's of no use to non-validating
clients, so it belongs there). However, if we would do that for simple
scripts, we end up with a 20-byte hash of a script in the scriptPubKey
(if P2SH), a 32-byte hash of an inner script in the scriptSig, and
then the full script in the witness, which seems to be a lot of
redundancy.

==Examples==

==Reference Implementation==
	<pre>
	SEGWIT BIP...yada yada...
	</pre>

	==Abstract==

	This BIP defines a new structure called a "witness" that is committed to blocks separately from the transaction merkle tree. This structure contains data required to check transaction validity but not required to determine transaction effects. In particular, scripts and signatures are moved into this new structure.

	The witness is committed in a tree that is nested into the block's existing merkle root via the coinbase transaction for the purpose of making this BIP soft fork compatible. A future hard fork can place this tree in its own branch.

	==Motivation==

	The entirety of the transaction's effects are determined by output consumption (spends) and new output creation. Other transaction data, and signatures in particular, are only required to validate the blockchain state, not to determine it.

	By removing this data from the transaction structure committed to the transaction merkle tree, several problems are fixed:

	1) Nonintentional malleability becomes impossible. Since signature data is no longer part of the transaction hash, changes to how the transaction was authorized is no longer relevant to transaction identification. This makes it possible to safely create unconfirmed transaction dependency chains.

	2) Signature data becomes prunable.

	3) Fraud proofs become possible.

	4) Script can be changed with soft forks.

	etc...

	==Specification==

	* CTransaction gets, in addition to vin and vout, a vwit, which
	contains a CTxInWitness object for each input. A CTxInWitness contains a CScriptWitness object
	and can potentially be extended to contain other kinds of witness data.
	A CScriptWitness is a vector of byte vectors (nominally: the input stack to the program, no longer
	encoded as a CScript, but just the resulting stack directly).

	* A new serialization for CTransaction is defined: http://blockhawk.net/diagrams/witnesstx.png
	(int32 nVersion,
	0x00 marker, 0x01 flag, vector<CTxIn>, vector<CTxOut>,
	vector<CTxInWitness>, int32 nLockTime) instead of (int32 nVersion,
	vector<CTxIn>, vector<CTxOut>, int32 nLockTime). This will never parse
	as a valid transaction (even if parsing succeeds, it means it's
	interpreted as a transaction with no inputs and 1 output). If all
	witnesses are empty, the old serialization format is used.
	- Rationale for not having an independent CWitnessTransaction with
	its own serialization: this would require separate "tx" and "block"
	messages, and all RPC calls operating on raw transactions would need
	to be duplicated, or need inefficinent or nondeterministic guesswork
	to know which type is to be used.
	- Rationale for not using just a single 0x00 byte as marker: that
	would lead to empty transactions (no inputs, no outputs, which are
	used in some tests) to be interpreted as new serialized data.
	- Rationale for the 0x01 flag byte in between: this will allow us to
	easily add more extra non-committed data to transactions (like txouts
	being spent, ...). It can be interpreted as a bitvector.

	* A new message 'havewitness' is sent after receiving 'verack' to
	indicate that a node can provide witness if requested (similar to
	'sendheaders')

	* New inv types MSG_WITNESS_TX and MSG_WITNESS_BLOCK are added, only
	for use in getdata. Inv itself still use just MSG_TX and MSG_BLOCK,
	similar to MSG_FILTERED_BLOCK.
	- Rationale for not advertizing witnessness in invs: we don't always
	use invs anymore (with 'sendheaders' BIP 130), plus it's not useful:
	implicitly, every transaction and block have a witness, old ones just
	have empty ones.

	* Transactions' GetHash is always computed on the old non-witness
	serialization. A new CTransaction::GetWitnessHash is added which is
	computed from the witness-serialization (this means that transactions
	with an empty witness have witness hash equal to normal hash).

	* A new block rule is added which requires a commitment (a merkle root
	computed similarly to the normal transaction one) to the witness
	hashes to be present as the last 32 bytes of
	block.vtx[0].vin[0].scriptSig (it doesn't need to be a push). This
	hopefully does not conflict with any other existing commitment
	schemes. To make it extensible, an extra merkle path can be provided
	(in the coinbase's "witness" field) so that coinbase commitment can be
	used for multiple things.

	* A scriptPubKey (or redeemscript in case of P2SH) that is just a
	single push of bytes gets a new meaning (the pushed data is called the
	"witness program").
	- Rationale for using a template (a single data push) and not a new
	opcode: it can't function as an opcode, because it needs to look at
	the exact contents of the scriptSig (exactly empty or just exactly the
	P2SH redeemscript push, if we want to avoid malleability), while
	normal opcode operations can only look at the resulting stack.
	- Rationale for supporting witness programs inside P2SH: backward
	compatibility with old sender software.

	* A witness program starts with a version byte. If that version byte
	is unknown, the script is treated as anyone can spend. This allows
	softforking _any_ script change later on, by adding new version bytes
	(IIRC suggested by Matt).

	* Two versions of witness programs are defined: v0 and v1. v0 programs
	are 0x00 + redeemscript (and have their initial stack in the witness).
	v1 programs are 0x01 + SHA256(redeemscript) (and have redeemscript +
	initial stack in the witness).
	- Rationale for using SHA256 and not Hash160: P2SH-like scripts should
	really be protected by a 256-bit hash, as there are possible collision
	attacks (and 2^80 work is not infeasible anymore, see the Bitcoin
	blockchain for an example...). Using SHA256 has the additional
	advantage of being able to reuse the Hash160-based known script index
	in keystore (because Hash160 = RIPEMD160 + SHA256).
	- Rationale for having 2 versions: we want to move the contents of
	long redeemscripts to the witness (as it's of no use to non-validating
	clients, so it belongs there). However, if we would do that for simple
	scripts, we end up with a 20-byte hash of a script in the scriptPubKey
	(if P2SH), a 32-byte hash of an inner script in the scriptSig, and
	then the full script in the witness, which seems to be a lot of
	redundancy.

	==Examples==

	==Reference Implementation==