Irio/gist:7db611b0fcc67a3334249abdc10e6026

## gistfile1.txt
EyeQuant - market tool for understanding how pages work. heatmaps. how clean or cluttered you design seems. collect data about human reactions to try to predict the perception of a new design

PyData Berlin Conference is open. May 21
Frankfurter Tor
BERLIN.PYDATA discount code for the conference. 20% discount

## (Jose Quesada) Distributed processing of large graphs in Python

Director @ Data Science Retreat

5 people already attended DSR

need to know how to make a good question
business case - someone needs to want to make a company out of this
data available
technology existent
we need to be able to know when the solution works (or not)

DS is a creative activity
try to start even with a bad question. something will break on the way. you may need to start from the scratch
sometimes the stakeholder will give the solution, not a question. try to work around that

when tweet to influence the right account?
find the time where everyone in the chain until an expected viewer will be online

algorithms: shortest-path and pagerank

personalized pagerank
being used by twitter (paper “the who to follow…”)
use it to find the most influential people around the user you want to influence

igraph
jung
neo4j (created chipher language to query graphs)
dato
graphx
spark

running in multiple machines is much harder

graphframes wrap graphx
graphframes similar to pandas
graphx similar to spark(?)
uses cipher language

graph had 25gb. too large for 1 machine
3 machines in cluster running in Amazon. no devops or docker. just gimme this cluster. using elastic mapreduce
pyspark couldn’t be used. too high level. used sparkshell
just 5 euros per day. in the old days, just google and twitter could do such thing

finding a good question is half of the problem

egograph - graph just around who you want to influence. complicated to generate. random walking it. alternative to not need 25gb of memory. you could calculate smaller graphs separately and compare results. the same = go with ego

project done in 1 week(?)
nobody has the entire graph of Twitter

larger cluster for less time would cost less (comment from participant)

answer questions who can be answered with existent technology otherwise will take a lot of time. academia will solve your problem first. if you’re the only one with huge problem and important for the business, may be worth of solving

###

Check Box2D Physics. Genetic algorithms

## (Sylvain Bellemare) Bitcoin: Some Nuts & Bolts

@sbellem
sylvain@ascribe.io

software engineer
work with django

SPOOL blockchain protocol and API
BigchainDB scalable blockchain database
2 papers

they use blockchain, not bitcoin

pros and cons
consensus is the coolest thing

nuts and bolts: many pieces. try to make something out of it
don’t understanding everything

babylonian math = understand everything starting from any piece. like developing software. no right place to start

paper by Satoshi Nakamoto defining Bitcoin
paper by Garay. Bitcoin Backbone Protocol

everyone can mine, be a node. requires lots of computing power solving cripto puzzle
Davies-Meyer one-way compression function. no way to return

bitcoin address = public key, in the end
private key is for signing messages

pycoin is one library

bitcoin
mainnet = doing transactions
testnet = testing stuff
racktest = to run locally

in testnet you can add money to your wallet

transaction creation, transaction signature and transaction broadcast

     transactions.create => transaction id
     transactions.sign(id, secret_key)
     transactions.push

pushing sends to a node. may be yours or another through API

     transactions.decode # shows “json” of transaction based on crazy hash

can pay a greater fee to get the transaction approved first
you’re expending “remaining transactions”. there’s no real value in your wallet
transaction is related to a block
blockchain = group/aggregation of blocks

for validating you some a part of the tree. someone can verify by the history
hash you calculate, merkle root, is a hash of all the transactions in the block. changing the block will change the merkle root

the first bitcoin block is called "genesis block”. “pre-defined"
genesis block has just one transaction. its hash is the same as its merkle root

double spending attack = publishing fake blocks

people is needing distributed things

how to ensure people maintaining the blockchain won’t change everything. the longest chain is the most valid
people sending invalid hashes can’t be (almost) prevented because it’s a hash function. time-consuming to calculate, quick to verify

blocks are incentivized to behave well cause users expend money on processing farmers.
25 bitcoins for each block accepted in the network. they win money for doing so

every node receives different transactions, depending on your position. different people will build different blocks

Decentralization efforts

- bigchaindb (database)
- eris industries
- ethereum (processing)
- ipfs (storage)
- tendermint
- ascribe (applications)

"bitcointech" coursera course
in dept: "bitcoin backbone protocol: analysis and applications" paper

#####

after meetup, cards where you can put what you want to listen about. or you can take one and choose to speak
	EyeQuant - market tool for understanding how pages work. heatmaps. how clean or cluttered you design seems. collect data about human reactions to try to predict the perception of a new design

	PyData Berlin Conference is open. May 21
	Frankfurter Tor
	BERLIN.PYDATA discount code for the conference. 20% discount

	## (Jose Quesada) Distributed processing of large graphs in Python

	Director @ Data Science Retreat

	5 people already attended DSR

	need to know how to make a good question
	business case - someone needs to want to make a company out of this
	data available
	technology existent
	we need to be able to know when the solution works (or not)

	DS is a creative activity
	try to start even with a bad question. something will break on the way. you may need to start from the scratch
	sometimes the stakeholder will give the solution, not a question. try to work around that

	when tweet to influence the right account?
	find the time where everyone in the chain until an expected viewer will be online

	algorithms: shortest-path and pagerank

	personalized pagerank
	being used by twitter (paper “the who to follow…”)
	use it to find the most influential people around the user you want to influence

	igraph
	jung
	neo4j (created chipher language to query graphs)
	dato
	graphx
	spark

	running in multiple machines is much harder

	graphframes wrap graphx
	graphframes similar to pandas
	graphx similar to spark(?)
	uses cipher language

	graph had 25gb. too large for 1 machine
	3 machines in cluster running in Amazon. no devops or docker. just gimme this cluster. using elastic mapreduce
	pyspark couldn’t be used. too high level. used sparkshell
	just 5 euros per day. in the old days, just google and twitter could do such thing

	finding a good question is half of the problem

	egograph - graph just around who you want to influence. complicated to generate. random walking it. alternative to not need 25gb of memory. you could calculate smaller graphs separately and compare results. the same = go with ego

	project done in 1 week(?)
	nobody has the entire graph of Twitter

	larger cluster for less time would cost less (comment from participant)

	answer questions who can be answered with existent technology otherwise will take a lot of time. academia will solve your problem first. if you’re the only one with huge problem and important for the business, may be worth of solving

	###

	Check Box2D Physics. Genetic algorithms

	## (Sylvain Bellemare) Bitcoin: Some Nuts & Bolts

	@sbellem
	sylvain@ascribe.io

	software engineer
	work with django

	SPOOL blockchain protocol and API
	BigchainDB scalable blockchain database
	2 papers

	they use blockchain, not bitcoin

	pros and cons
	consensus is the coolest thing

	nuts and bolts: many pieces. try to make something out of it
	don’t understanding everything

	babylonian math = understand everything starting from any piece. like developing software. no right place to start

	paper by Satoshi Nakamoto defining Bitcoin
	paper by Garay. Bitcoin Backbone Protocol

	everyone can mine, be a node. requires lots of computing power solving cripto puzzle
	Davies-Meyer one-way compression function. no way to return

	bitcoin address = public key, in the end
	private key is for signing messages

	pycoin is one library

	bitcoin
	mainnet = doing transactions
	testnet = testing stuff
	racktest = to run locally

	in testnet you can add money to your wallet

	transaction creation, transaction signature and transaction broadcast

	transactions.create => transaction id
	transactions.sign(id, secret_key)
	transactions.push

	pushing sends to a node. may be yours or another through API

	transactions.decode # shows “json” of transaction based on crazy hash

	can pay a greater fee to get the transaction approved first
	you’re expending “remaining transactions”. there’s no real value in your wallet
	transaction is related to a block
	blockchain = group/aggregation of blocks

	for validating you some a part of the tree. someone can verify by the history
	hash you calculate, merkle root, is a hash of all the transactions in the block. changing the block will change the merkle root

	the first bitcoin block is called "genesis block”. “pre-defined"
	genesis block has just one transaction. its hash is the same as its merkle root

	double spending attack = publishing fake blocks

	people is needing distributed things

	how to ensure people maintaining the blockchain won’t change everything. the longest chain is the most valid
	people sending invalid hashes can’t be (almost) prevented because it’s a hash function. time-consuming to calculate, quick to verify

	blocks are incentivized to behave well cause users expend money on processing farmers.
	25 bitcoins for each block accepted in the network. they win money for doing so

	every node receives different transactions, depending on your position. different people will build different blocks

	Decentralization efforts

	- bigchaindb (database)
	- eris industries
	- ethereum (processing)
	- ipfs (storage)
	- tendermint
	- ascribe (applications)

	"bitcointech" coursera course
	in dept: "bitcoin backbone protocol: analysis and applications" paper

	#####

	after meetup, cards where you can put what you want to listen about. or you can take one and choose to speak