Skip to content

Instantly share code, notes, and snippets.

@rHermes
Last active August 29, 2015 14:01
Show Gist options
  • Save rHermes/251b6369538a670b73b3 to your computer and use it in GitHub Desktop.
Save rHermes/251b6369538a670b73b3 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python2.7
# Written by rHermes
# Cross compapatebility:
try:
xrange
except NameError:
xrange = range
import struct
import os
import glob
# TODO: Optimize this
def struct_s_to_hex(s):
return "".join(['{:x}'.format(ord(x)) for x in s][::-1])
# Convert epoch to utc
def epoch_to_utc(s):
return time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(s))
# Unpack from filehandle
def unpack_file(fh, fmt):
return struct.unpack(fmt, fh.read(struct.calcsize(fmt)))
# Returns the value of a variable length integer
def get_var_length(fh):
# Read in the first byte
num = unpack_file(fh, '<B')[0]
# Were do we go from here
if(num < 0xfd):
return num
elif(num == 0xfd):
return unpack_file(fh, '<H')[0] # 2 Bytes
elif(num == 0xfe):
return unpack_file(fh, '<I')[0] # 4 Bytes
elif(num == 0xff):
return unpack_file(fh, '<Q')[0] # 8 bytes
# -- END UTILITY FUNCTIONS --
# Parse methods
def parse_header(fh):
#TODO: Remove, Magic Id and Block Length out of this, use them for reading operations?
# Could be used to speed up code, eliminating freqvent reads.
# Magic Id, acts as a delimiter. In doge this is 0xC0C0C0C0
ret_header = list(unpack_file(fh, '<I'))
if(ret_header[0] != 0xC0C0C0C0): # If this isn't 0xC0C0C0C0 it isn't a dogeblock.
return 0
# First fixed length, in total it's 84 bytes big
fmt_first = '<' # Little-indian byte encoding
fmt_first += 'I' # Block Length, length of the rest of the block.
fmt_first += 'I' # Version number, Must not change once set.
fmt_first += '32s' # Previous block hash, double SHA256 hash of 80 bytes.
fmt_first += '32s' # Merkle root hash.
fmt_first += 'I' # Timestamp, stored in UNIX epochtime.
fmt_first += 'I' # Bits, the target we have to be under
fmt_first += 'I' # Nonce, random number generated during the mining process.
ret_header.extend((unpack_file(fh, fmt_first)))
# Transaction count, Variable length
ret_header.append(get_var_length(fh))
return ret_header
# Parse a single input
def parse_input(fh):
ret_input = []
# Hash of input transaction, 32 bytes
ret_input.extend(unpack_file(fh, '<32s'))
# Input transaction index, 4 bytes
# This is the index of the specific output in the ferenced input transaction.
# This start from 0 and if a -1 is given it means there are no inputs .
ret_input.extend(unpack_file(fh, '<I'))
# Script length, Variable length integer
# Length of the script data that follows, in bytes.
ret_input.append(get_var_length(fh))
# Response script, Length depends on script length.
# Proves that this transaction is allowed to use the input it references.
ret_input.extend(unpack_file(fh, '<{}s'.format(ret_input[-1])))
# Sequence number, 4 bytes, Not implemented, but could be used for
# transaction replacement.
ret_input.extend(unpack_file(fh, '<I'))
# Return input
return ret_input
def parse_output(fh):
ret_output = []
# Output value, 8 bytes.
# How many base units is being sent.
ret_output.extend(unpack_file(fh, '<Q'))
# Challenge script length, Variable length integer.
# Length of the script that follows in bytes
ret_output.append(get_var_length(fh))
# Challenge script, Length depends on script length.
# This is what proves that you have the r ight to spend the input transaction.
ret_output.extend(unpack_file(fh, '<{}s'.format(ret_output[-1])))
# Return output
return ret_output
# Parse a single transaction
def parse_transaction(fh):
ret_tran = []
# Transaction Version number, 4 bytes
ret_tran.extend(unpack_file(fh, '<I'))
# Input Count, Variable length Integer
ret_tran.append(get_var_length(fh))
# Parse all inputs
inputs = []
for i in xrange(ret_tran[-1]):
inputs.append(parse_input(fh))
ret_tran.append(inputs)
# Output count, variable length integer
ret_tran.append(get_var_length(fh))
# Parse all outputs
outputs = []
for i in xrange(ret_tran[-1]):
outputs.append(parse_output(fh))
ret_tran.append(outputs)
# Transaction Locktime, 4 bytes
ret_tran.extend(unpack_file(fh, '<I'))
# Return transaction
return ret_tran
def main(fh):
blocks = []
while(fh.read(1) != ''):
fh.seek(-1, 1)
block = parse_header(fh)
if(block == 0):
break
transactions = []
# Now that we know how many transactions there are, we loop through them
for i in xrange(block[-1]):
transactions.append(parse_transaction(fh))
block.append(transactions)
blocks.append(block)
return blocks
rawblockchain = sorted(glob.glob(os.path.expanduser('~/.dogecoin/blocks/blk*')))
for f in rawblockchain:
fh = open(f, 'rb')
main(fh)

Rawblock format

Split into 128MB chunks. Everything is little-indian.

When the chunks are splitup, the current block specification is always ended. No need to worry when switching to a new .dat file.

I don't know what is defined as the block "header", have to find out the hard way I guess.

Var length

The first block is only used if it's less thna 0xFD. Otherwise it is discarded, just specifing the length of the number.

Magic network ID : 4 bytes [CONFIRMED]

c0 c0 c0 c0 This is not part of the block. Just acts as a delimiter.

Block length : 4 bytes

Block length in bytes. Not part of the block.

Block format version: 4 bytes

Part of the block.

Hash of previous block: 32 bytes

Merkle root: 32 bytes [UNCONFIRMED]

This didn't match, for some reason

Timestamp: 4 bytes [CONFIRMED]

Given in Unix epoch time

Bits: 4 bytes [UNCONFIRMED]

The target, whish the hash of the block header, must be not exceded to mine the block. may be particular to bitcoin?

Nonce: 4 bytes [CONFIRMED]

Random number generated during the mining process. To successfully mine a block, the header is hashed. If the resulting hash value is not less than or equal to the target, the nonce is incremented and the hash is computed again. This typically happens billions of times before a small enough hash is found.

Transaction count : Variable, max 8 bytes [CONFIRMED]

Not sure how to determine the length of this. Get back to later.

Transaction version number : 4 bytes [CONFIRMED]

CAnnot change, as with Block format version

Count of inputs : Variable length integer - No maximum

Not sure how to determine this either :)

Determined as showed here: https://en.bitcoin.it/wiki/Protocol_specification#Variable_length_integer

And here, by searching for "readVariableLengthInteger": https://code.google.com/p/blockchain/source/browse/trunk/BlockChain.cpp

Input

There can be multiple inputs per transaction

Hash of the input transaction : 32 bytes [INDICATIONS]

This would be the hash of the transaction being referenced as an input, but for block reward transactions, there will be 00.

Input transaction index : 4 bytes [INDICATIONS]

As you know, transactions can have many outputs, and subsequent transactions can take none, one or many of those outputs as inputs. A zero here references the first output of the referenced transaction. The value you see here is a representation of -1, a kind of dummy value because there is no input transaction.

It's interesting to note here that a prior transaction could in theory really have 0xffffffff + 1 (decimal 4,294,967,296) outputs, but if the value 0xffffffff is being used as a sentinel value, there would be no way to reference the 4,294,967,296th output and it would be lost. Fortunately the maximum block size of 4GiB makes it impossible to create that many outputs.

Response script length : Variable length integer - No maximum limit

This is the length of the script that follows.

Response script : Variable length [CONFIRMED]

This is the script that proves that this transaction is allowed to use the inputs it references. This is the first part of the script.

Referance for the script language: https://en.bitcoin.it/wiki/Script

Sequence number : 4 bytes [CONFIRMED]

Support for the transaction replacement feature. Since this feature isn't used in any clients yet, all transactions are locked, by broadcasting 0xFFFFFFFF.

Read more about this at the source.

Output

This is the output part.

Ouput count : Variable length [CONFIRMED]

This is the number of outputs.

Output value : 8 bytes [CONFIRMED]

Number of baseunits sent. One doge is 100,000,000 baseunits.

Challenge script length : Variable length [CONFIRMED]

The length of the response script that follows.

Challenge script : Variable length

This si the second half of the script: the response to the challange. This proves that you have the right to spend the input transaction.

This is of interest if I want to do more indepth scanning of the blockchain

Lock time | 4 bytes [CONFIRMED]

This is set to 0 as the feature is not currently implemented.

Sources

Basically better versions of this article:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment