Warchant/smcts.md

## smcts.md

      
    Raw
  

              smcts.md
            
          
    Smart contract analysis

Introduction

A smart contract is a computer protocol intended to facilitate, verify, or enforce the negotiation or performance of a contract. Smart contracts were first proposed by Nick Szabo in 1996 [5].

A smart contract is a set of promises, specified in digital form, including protocols within which the parties perform on these promises, -- Nick Szabo [1].

Good analogy to understand smart contracts better:

One possible use is implementation of a vending machine. The vending machine is a contract with bearer: anybody with coins can participate in an exchange with the vendor. -- [6].

From engineering standpoint, smart contracts are run-time definable business logic on top of block chain.
Smart contracts properties and requirements


observability [1] -- the ability of the principals to observe each others' performance of the contract, or to prove their performance to other principals.
The field of accounting is, roughly speaking, primarily concerned with making contracts an organization is involved in more observable.


verifiability [1] -- the ability of a principal to prove to an arbitrator that a contract has been performed or breached, or the ability of the arbitrator to find this out by other means.
The disciplines of auditing and investigation roughly correspond with verification of contract performance. Observability and verifiability can also include the ability to differentiate between intentional violations of the contract and good faith errors.


privity [1] -- only involved in contract parties should perform the contract.
The principle that knowledge and control over the contents and performance of a contract should be distributed among parties only as much as is necessary for the performance of that contract. This is a generalization of the common law principle of contract privity, which states that third parties, other than the designated arbitrators and intermediaries, should have no say in the enforcement of a contract. Generalized privity goes beyond this to formalize the common claim, "it's none of your business".


secrecy [2, 7.7] -- secret data in smart contracts must be kept secret.
It is possible to create default instruments to encrypt data in contracts.


security -- smart contract execution should be safe for ones who execute them.
In other words, if smart contract is a program, then computer which executes this program, should neither be damaged in any way, nor damage other computers. Also, any smart contract should be finite.


anonymity (confidentiality) [2, 8.4] -- transactions must hide certain properties of transactions, such as source, destination and (optionally) content.
Zerocoin and zerocash use zero knowledge proofs (cryptographic mechanism of providing the posession of a specific bit of knowledge without revealing that bit of knowledge) to add anonymity to BTC transaction without adding any trusted parties. The algorithm allows users to mint a number of zero-coins into a pool and to later redeem different ones, but ensures that each user cannot redeem more coins that they minted without revealing any links between the inputs and the outputs.


enforceability [1] -- it should be statistically impossible to break the contract. In other words, contract will be executed exactly as it is written, no matter what.
And at the same time minimizing the need for enforcement. Improved verifiability often also helps meet this objective. Reputation, built-in incentives, "self-enforcing" protocols, and verifiability can all play a strong part in meeting this objective.


reentrancy [2, 7.1] -- is the correct behaviour of a function or method in the event of it being called in the middle of its own execution. This may happen if one contract contacts another contract.


runtime exceptions [2, 7.3] -- stack overflows, stack underflows, out-of-gas exceptions, runtime exceptions - if we want to help developers make their code predictable (and more secure) we must help them be aware of all kinds of exceptions that can occur during contract execution, both implicit exceptions and explicit exceptions. [3, 7.3.3] suggests methods to deal with exceptions.


reimbursement (refund) [2, 7.4] -- a possible issue is incomplete handling of preconditions. It is usually the case that developers focus on the normal execution flow, and when preconditions of a contract are not met, even if execution is aborted, the user may not be returned the money invested in the contract. Furthermore, even if the user checks that the preconditions are met before sending a transaction, it is possible that a race condition changes this fact.


unilateral abortion [2, 7.5] -- in complex multi-stage contracts it is possible for users to stop collaborating on the execution of contracts. Because of this, contracts must be designed with the possibility of eventual non-cooperation in mind for each step. Recent parity vulnerability shows that this property is very important.


predictable state [2, 7.6] -- transactions are processed in blocks, the order in which unprocessed transactions will be processed is unknown. The order is decided by the miner (the one who creates a block) that is successful at a certain point in time.

depending on the transaction processing order, the outcome may be different
miners are able to steal secret from transaction, creating new transaction with this secret
other properties may be manipulated (like timestamps, in Ethereum [2, 7.6])

In other words -- transaction sender always gets predictable state of a smart contract no matter what.


(im)mutable bugs [2, 7.8] -- it would be necessary to find a convenient and convincing way of deciding how to decide about changes, upgrades and bugfixes in contracts, and in general of the rules in a decentralized crypto-system.


lost asset [2, 7.9] -- another consequence of immutability is currency sent to addresses whose key is not known (intentionally or by mistake). If standard cryptographic assumption hold, this currency is statistically impossible to recover and is lost forever. In addition to the economical loss, this produces a burden on the block chain because the unspent currency cannot be forgotten, since there is no proof that it is actually unspendable. In order to reduce the likelyhood of this happening, in the context of Bitcoin, Base58Check is often used which adds redundancy to addresses so that a small type can not produce a different valid address.


randomness [2, 7.10] -- some smart contracts require a safe source of randomness during execution. Example - fair lottery. Block hash probably is a good way to get randomness, however it opens the gate for potential manipulation by the miners.


turing-completeness [2, 7.11] -- a turing complete scripting language is capable of allowing all computable functions to be scripted in the language. This means that the execution of some scripts will not terminate (for some inputs), and even for those that do terminate, execution times can be indefinitely large.


verifiable computation [2, 8.1] -- is a technique that allows the generation of proofs of computation -- that is proofs that the evaluation of a program has a specified result -- can be verified faster than the time that it would take to do the actual computation. This allows the outsourcing of computation to untrusted parties. Example: hash calculation. This property allows miners to do the computations only once and generate a proof of computation, and verifiers would only need to verify that the proof is correct. This would allow for more expensive contracts to be run, since they would not increase the time required to validate the block chain proportionally.


verifiable bounds [2, 8.2] -- even if we cannot give a proof of computation, it would be useful to supply a proven upper bound for the amount of computation that a program will require (read - we can calculate minimal amount of gas for execution).


reusable libraries [2, 8.2] -- contracts are usually designed and reused many times with small changes to their parameters and inputs (example: ERC20 token). It should be possible to create reusable libraries, and store them in block chain.


multiparty computation [2, 8.3], [4, 1] -- it should be possible to implement new complex protocols which allow secure multiparty computations (MSP).
MSP - allows group of mutually distrusting parties to compute a join function f on their private inputs. Example: marriage problem - Bob and Alice have answers {no, yes}, which are private and f is conjuction function (logical AND). If resulting answer is yes, then both of them want to marry, otherwise at least one of them does not. 3rd party can not learn any information who exactly said no. If Bob said no, he can not learn any information about Alice's answer. Use cases are:

online auctions
internet gambling (online casino plays role of trusted third party)
source of randomness (15 - randomness)


logging functions[11] -- it should be possible to receive logs or events from block chain (e.g. pubsub).
Solidity implements event system for this. Events allow the convenient usage of the EVM logging facilities, which in turn can be used to "call" JavaScript callbacks in the user interface of a DAPP (distributed app), which listen for these events.


Usage examples


identity and reputation systems [3] -- example: domain name registration system.
dapps (distributed apps) [12] -- apps that do not require servers, but require only block chain as backend:

A virtual organization where members vote on issues
A transparent association based on shareholder voting
Own country with an unchangeable constitution
A better delegative democracy


trustless crowdfunding [12]

A crowdfund to pre-sell a product
A crowdsale to sell virtual shares in a block chain organization
An auction of a limited number of items


escrow payments [7] -- a contractual arrangement in which a third party (in this case - smart contract) receives and disburses money or documents for the primary transacting parties, with the disbursement dependent on conditions agreed to by the transacting parties, or an account established by a broker for holding funds on behalf of the broker's principal or some other person until the consummation or termination of a transaction [9].
digital cash, cryptocurrencies [8] -- example: ERC20 tokens [10], which can be used to implement cryptocurrency on Ethereum.
digital protocols [8] -- example is pay-for-proof contract, which releases money after user's proof for computationally hard problem (17 - verifiable computation).
storage for smart property [1][7] -- is property, that can be atomically traded and loaned via the block chain. Thus, transferring of virtual property (that can be traded but not duplicated) becomes easy.
agents [7] -- are autonomous programs that maintain their own wallet. Example: election system pays 0.1 ETH for every vote. All votes are collected and counted. It is easy to calculate the winner.
distributed markets [7] -- p2p bond and stock trading markets.
accounting [6] -- it is easy to perform accounting, because of transparency.
electronic data interchange [6] -- is the computer-to-computer communication of standardized business transactions between organizations, in a standard format that permits the receiver to perform the intended transaction. Examples:

administrative (deals and promotions, catalogs, forecasts, statements)
prepurchasing (inventory inquiry/advice)
purchasing (order & acknowledgement, material release, point of sale/inventory on hand)
shipping and receiving (status, notifications, bills)
warehouse (receipt confirmation, shipment confirmation)
customs (declaration, release)
billing and paying (invoice, credit and debit, receipts)


Existing implementations

Bitcoin [2, 2]

The language used for creating scripts in Bitcoin is called Script. Script is a Forth-like bytecode stack-based language but, unlike Forth, Script is designed purposely so that its execution is guaranteed to terminate. Scripts consist of a sequence of instructions, and these are executed linearly, with no jumps. Hence, execution time is bounded above by the length of the script after the instruction pointer.
Total number of opcodes is around a hundred:

addition of constants to the stack
basic condigional flow control (non-looping, lazy)
stack manipulation
string manipulation
bitwise manipulation
32-bit arithmetics with overflow
basic cryptographic primitives for hashing and signature verification
two primitives for delaying and expiring uncommitted transactions, with respect to time of the current height of the block chain

Limitations


on the block size - 1 MB
on the length of the scripts - single element is 512 bytes, 10KB per script
on the number of opcodes - 201 for most opcodes
many disabled opcodes

Ethereum [2, 3]

Ethereum works as decentralized computer, programs that run on this computer are usually referred to as smart contracts and automatically enforced through the block chain validation process that is carried out by all full nodes independently.
In addition to carrying out computations and transferring ETHer, it is also possible for transactions to create standalone contracts that are saved in the block chain and can:

store ETHer
store data
store executable code
communicate with other contracts
create new contracts

Basically contracts act as users, with the difference that they can not initiate transactions. This limitation was imposed in order to avoid DoS attacks against existing contracts. With this design, the gas required to execute the code triggered by a transaction must be initially paid by the user that issued the transaction in the first place.
In Ethereum every block contains a copy of both the transaction list and the most recent state [3].
EVM

The code of Ethereum's smart-contracts is written in bytecode and executed into a virtual machine called EVM. It has a fixed word size of 32 bits and is untyped for simplicity.
Stack of EVM is limited to 1024 elements, but it provides two additional types of storage:

temporary storage (memory, RAM) - byte array, deleted after execution of each transaction
permanent storage (disk) - word (32 bytes)-indexed, key-value dictionary, and is preserved on the block chain between executions, but which can be deallocated explicitly

EVM allows contracts to access several kinds of meta information about the block chain, about the contracts themselves, the code of other contracts. Also EMV provides functionality for logging. These logs are returned as the "receipt" that results from processing the transaction,but they are not explicitly stored in the block chain.
Nxt [2, 4]

The NRS (Nxt Reference Software) uses a client-server architecture. The NRS server is a Java application with two interfaces: one for communicating with other servers through the Internet (forming a network of nodes), and one for responding to requests through its API.
The programmability of the system is provided through a "fat" high level API, which is accessible from Nxt clients through a REST interface. The whitepaper explicitly says that "the core software does not support any form of scripting language", rather users are expected to work with

the built-in transaction types and transactions that support some 250 primitive operations in a number of areas, including basic payments
an alias system - for strings that can be stored on the block chain, representing e.g. URIs
messaging - messages can be sent between accounts (JSON)
exchanging assets
buying and selling through a digital goods store.

Tezos [2, 5.3]

Tezos claims to be the first cryptocurrency that is democratically amendable. It provides an explicit mechanism for deciding on future modificaitions of its own procol, and initially it considers a voting mechanism and a trial period.
The scripting language of Tezos is stack-based but includes high-level primitives like lambdas, sets, maps and context-specific tasks.
To solve the problem of script-based Dos Tezos defines a fixed limit for the number of compuation steps per program.
Unlike Bitcoin, Tezos Script language is statically typed.
Like any stack language, it is very difficult to use.
Hyperledger Fabric

Scripting in Fabric uses chaincode, whiich first had bindings to Google Go language and several others. Chaincode is isolated into Docker containers, thus providing some guarantees automatically. Each chaincode instance can define persistent state variables that are stored on the block chain and are updated when the transaction is invoked [2, 6.3].
Chaincode in Fabric consists of several important functions, provided by SDK:
Bogdan: I mentioned in previous research that this approach is not secure and there are still unsolved problems. Mean execution time for one transaction is huge (seconds). Overhead (disk, RAM, CPU) by running multiple docker containers is huge.
Fabric documentation says:

You must install the chaincode on each endorsing peer node of a channel that will run your chaincode.

Which basically means that network operators must do it manually, for every peer!
Also, chaincode is very slow, in terms of deploy and initializing, because every chaincode instance is running in a separate docker container permanently, as a service.
Proposal for Iroha

Current proposal includes a top-level view on the smart contract system and block chain organization itself.
Key concepts:

peer == client

we separate heavyweight clients and lightweight clients
lightweight client is a client, which does not have an opportunity or desire to download block chain
heavyweight client maintains full block chain
lightweight clients communicate to block chain through heavyweight clients (middleware)


heavyweight peer is "floating"

it is possible to run irohad (peer) without connection to any network

then, using iroha-cli, peer operator is able to create new network providing genesis state (which is parsed and then genesis block created) or join existing network


peer is able to join or leave any network any time

peer -> network request: network considers request and answers with approve/reject
network -> peer request: peer considers invite and answers with approve/reject
thus it is possible to implement both public and private networks


network == block chain == domain == ledger

every network can be identified. Proposal: sha3_256(sha3_256(genesis_block) + nonce) . Nonce here is random bytestring, which may contain network name, or any other meta info.
network consists of peers, who maintain this network. Since we use (some kind of) Proof-Of-Authority, making chain longer does not increase block chain security, as in Proof-Of-Work. Instead, only increasing the number of peers who belong to the same network increase block chain security.


irohad has smart contract engine implementation: VM (bytecode interpreter), high-level language, compiler (language -> bytecode).

contract looks like a class, which has:

statically typed language
init constructor(s)
destructor (contracts can be destroyed)
modifiers

access modifiers: public, private, protected
storage modifiers: storage (variable persisted on block chain), memory (variable stored in RAM)


inheritance -- rules are the same as in Java (implements, extends)
programming by contracts: preconditions, invariants, postconditions
exceptions and exception handling mechanism as in Java -- we force user to handle exceptions
member fields: structs, maps, arrays, variables
function annotations (replacement for Ethereum modifiers)
all contracts have access to block chain meta information

meta info of last 100 blocks is an example (hash, timestamps...)
info of currently executed transaction (hash, timestamp, sender...)


event system: it is possible to define an event, which can be published (and received by subscribers, pubsub model)


smart contracts have

default randomness engine (being investigated, best option is through special secure multiparty computation protocol. tbd).
default cryptography primitives:

to store in contract data, encrypted with user's private key (tbd)
hash primitives


the code of smart contracts can be persisted on ledger in special code registry, like npm (principles are the same)

this allows new users to easily re-use existing contracts (example is ERC20 token)
code can be versioned (todo: update policy? new versions may break interfaces, introduce bugs or backdoors)


explicit is better than implicit:

explicit modifiers
explicit exception handling
explicit types
explicit preconditions, invariants, postconditions


every deployed smart contract is a private KV-database (or SQL database, TBD)

all storage variables are mapped onto keys
this allows to store arbitrary data in contract


block chain itself stores not only compiled code of smart contract, but the code itself (or reference to the code, if code is in code registry) and other meta information
any contracts may communicate to any other contracts,  even in other networks. TBD.


every network (domain) has private smart contracts, which are accessible only by peers within the network. It adds flexible permissions without actually implementing anything related to permission model.

asset is great example of smart contract. This asset is accessible only within a network.


peer can be connected to as many networks as he wants. Synchronization will be trivial and fast, because most block chains will be small.

in this case heavyweight peer works as a router for incoming transactions from lightweight clients


to make anonymous transactions it is possible to use both UTXO + zkSNARKs, and account-based models (for public transactions).

In my opinion  this model implements all requirements we had.
Summary


bitcoin
ethereum
fabric
iroha (proposal)


observability
public block chain (good)
public block chain, all peers execute transactions (good)
per-channel (medium)
per-ledger


verifiability
signatures on transactions
signatures on transactions
signatures on transactions
signatures on transactions


privity
only end users and miners need to execute scripts (medium)
all users execute all contracts (bad)
per-channel execution (good)
per-ledger, e.g. one ledger = domain


secrecy
no specific instruments (bad)
no specific instruments (bad)
?
default instruments to encrypt, decrypt and securely store data on block chain


security
secure (good), VM is sandbox
secure (good), VM is sandbox
(medium), since inside VM attacker is able to run arbitrary programs, thus, hacking others from the inside of chaincode
VM is good sandbox


anonymity (confidentiality)
no (bad)
no (bad)
per-channel (medium)
UTXO and zkSNARKS, as in zcash


enforceability (tx can not be dropped?)
yes, POW (good)
yes, POW (good)
?
??? possibly, through K distributed independent ordering services


reentrancy
no, code is linear (good)
it is possible to make reentrant call (bad) https://ethereum.stackexchange.com/questions/30371/send-ether-reentrancy-attack-in-reality-how-could-fallback-function-make-a-mes
??? possibly it is possible to make reentrant call (bad)
??? possibly, advanced execution contexts in VM can prevent reentrancy


runtime exceptions
return non-zero code from VM without a reason (medium)
no, any exception consumes all gas and does not inform sender about exception (bad)
?
built-in exception system as in Java (enforces users to handle all exceptions)


reimbursement
no, funds may be locked forever (bad)
no, funds may be locked forever (bad), except for contract destroy
?
strict programming by contracts


unilateral abortion
?
no (bad), funds may be locked forever
no, funds may be locked if chaincode is destroyed (bad)
alternative flow through exception mechanism


predictable state
miners can influence on the execution order (bad)
miners can influence the execution order (bad), may forge timestamps (bad)
??? possibly yes -- because peers are authorized (good)
??? possibly yes -- peers are authorized


(im)mutable bugs
it is possible to amend transaction with lock_time field (medium)
contract is deployed forever, no upgrade mechanism (bad)
it is possible to upgrade chaincode (good)
versioning system for smart contracts with notifications for users


lost asset (can you send assets to random address?)
no, because of base58check (good)
yes -- it is possible to make a mistake in address and lose assets (bad)
?
account/contract address encoding with forward error correction codes and special encoding


randomness
no (bad) (same as eth)
no (bad), miners can influence the block hash for their needs
? possibly no (bad)
yes, through secure multi-party computation (tbd)


turing-completeness
no recursion, no loops, no functions, only linear code. (bad)
no. calculations are limited by gas (medium)
yes (medium) -- it is possible to write arbitrary programs, even those which are executed in infinite loop
no, calculations will be limited by  max number of instructions (alternative to gas)


verifiable computation
only in POW, to mine blocks (medium)
only in POW, to mine blocks (medium)
?
tbd


verifiable bounds
execution time is proportional to script length (medium)
execution time can be roughly estimated (medium)
no, it is possible to create a chaincode with infinite loop (?)
tbd


reusable libraries
no (bad)
only full copy-paste of code, no (bad)
?
yes, persisted on block chain repository of versioned libraries like npm


multiparty computation
yes ?
yes ?
?
?


logging functions
return code can be non-zero (medium) -- no knowledge about the reason
pubsub events (good)
?
pubsub events


[1] http://www.fon.hum.uva.nl/rob/Courses/InformationInSpeech/CDROM/Literature/LOTwinterschool2006/szabo.best.vwh.net/smart_contracts_2.html
[2] https://soramitsu.atlassian.net/secure/attachment/12396/SmartContractOverview.pdf
[3] https://github.com/ethereum/wiki/wiki/White-Paper
[4] https://eprint.iacr.org/2013/784.pdf
[5] https://en.wikipedia.org/wiki/Smart_contract
[6] http://www.fon.hum.uva.nl/rob/Courses/InformationInSpeech/CDROM/Literature/LOTwinterschool2006/szabo.best.vwh.net/formalize.html
[7] https://en.bitcoin.it/wiki/Contract
[8] http://www.fon.hum.uva.nl/rob/Courses/InformationInSpeech/CDROM/Literature/LOTwinterschool2006/szabo.best.vwh.net/smart.contracts.html
[9] https://en.wikipedia.org/wiki/Escrow
[10] https://theethereum.wiki/w/index.php/ERC20_Token_Standard
[11] http://solidity.readthedocs.io/en/develop/contracts.html#events
[12] https://www.ethereum.org/
	bitcoin	ethereum	fabric	iroha (proposal)
observability	public block chain (good)	public block chain, all peers execute transactions (good)	per-channel (medium)	per-ledger
verifiability	signatures on transactions	signatures on transactions	signatures on transactions	signatures on transactions
privity	only end users and miners need to execute scripts (medium)	all users execute all contracts (bad)	per-channel execution (good)	per-ledger, e.g. one ledger = domain
secrecy	no specific instruments (bad)	no specific instruments (bad)	?	default instruments to encrypt, decrypt and securely store data on block chain
security	secure (good), VM is sandbox	secure (good), VM is sandbox	(medium), since inside VM attacker is able to run arbitrary programs, thus, hacking others from the inside of chaincode	VM is good sandbox
anonymity (confidentiality)	no (bad)	no (bad)	per-channel (medium)	UTXO and zkSNARKS, as in zcash
enforceability (tx can not be dropped?)	yes, POW (good)	yes, POW (good)	?	??? possibly, through K distributed independent ordering services
reentrancy	no, code is linear (good)	it is possible to make reentrant call (bad) https://ethereum.stackexchange.com/questions/30371/send-ether-reentrancy-attack-in-reality-how-could-fallback-function-make-a-mes	??? possibly it is possible to make reentrant call (bad)	??? possibly, advanced execution contexts in VM can prevent reentrancy
runtime exceptions	return non-zero code from VM without a reason (medium)	no, any exception consumes all gas and does not inform sender about exception (bad)	?	built-in exception system as in Java (enforces users to handle all exceptions)
reimbursement	no, funds may be locked forever (bad)	no, funds may be locked forever (bad), except for contract destroy	?	strict programming by contracts
unilateral abortion	?	no (bad), funds may be locked forever	no, funds may be locked if chaincode is destroyed (bad)	alternative flow through exception mechanism
predictable state	miners can influence on the execution order (bad)	miners can influence the execution order (bad), may forge timestamps (bad)	??? possibly yes -- because peers are authorized (good)	??? possibly yes -- peers are authorized
(im)mutable bugs	it is possible to amend transaction with lock_time field (medium)	contract is deployed forever, no upgrade mechanism (bad)	it is possible to upgrade chaincode (good)	versioning system for smart contracts with notifications for users
lost asset (can you send assets to random address?)	no, because of base58check (good)	yes -- it is possible to make a mistake in address and lose assets (bad)	?	account/contract address encoding with forward error correction codes and special encoding
randomness	no (bad) (same as eth)	no (bad), miners can influence the block hash for their needs	? possibly no (bad)	yes, through secure multi-party computation (tbd)
turing-completeness	no recursion, no loops, no functions, only linear code. (bad)	no. calculations are limited by gas (medium)	yes (medium) -- it is possible to write arbitrary programs, even those which are executed in infinite loop	no, calculations will be limited by max number of instructions (alternative to gas)
verifiable computation	only in POW, to mine blocks (medium)	only in POW, to mine blocks (medium)	?	tbd
verifiable bounds	execution time is proportional to script length (medium)	execution time can be roughly estimated (medium)	no, it is possible to create a chaincode with infinite loop (?)	tbd
reusable libraries	no (bad)	only full copy-paste of code, no (bad)	?	yes, persisted on block chain repository of versioned libraries like npm
multiparty computation	yes ?	yes ?	?	?
logging functions	return code can be non-zero (medium) -- no knowledge about the reason	pubsub events (good)	?	pubsub events