Skip to content

Instantly share code, notes, and snippets.

@Irio
Created April 20, 2016 21:05
Show Gist options
  • Save Irio/7db611b0fcc67a3334249abdc10e6026 to your computer and use it in GitHub Desktop.
Save Irio/7db611b0fcc67a3334249abdc10e6026 to your computer and use it in GitHub Desktop.
Meetup PyData Berlin 20/04 - April 2016 Meetup
EyeQuant - market tool for understanding how pages work. heatmaps. how clean or cluttered you design seems. collect data about human reactions to try to predict the perception of a new design
PyData Berlin Conference is open. May 21
Frankfurter Tor
BERLIN.PYDATA discount code for the conference. 20% discount
## (Jose Quesada) Distributed processing of large graphs in Python
Director @ Data Science Retreat
5 people already attended DSR
need to know how to make a good question
business case - someone needs to want to make a company out of this
data available
technology existent
we need to be able to know when the solution works (or not)
DS is a creative activity
try to start even with a bad question. something will break on the way. you may need to start from the scratch
sometimes the stakeholder will give the solution, not a question. try to work around that
when tweet to influence the right account?
find the time where everyone in the chain until an expected viewer will be online
algorithms: shortest-path and pagerank
personalized pagerank
being used by twitter (paper “the who to follow…”)
use it to find the most influential people around the user you want to influence
igraph
jung
neo4j (created chipher language to query graphs)
dato
graphx
spark
running in multiple machines is much harder
graphframes wrap graphx
graphframes similar to pandas
graphx similar to spark(?)
uses cipher language
graph had 25gb. too large for 1 machine
3 machines in cluster running in Amazon. no devops or docker. just gimme this cluster. using elastic mapreduce
pyspark couldn’t be used. too high level. used sparkshell
just 5 euros per day. in the old days, just google and twitter could do such thing
finding a good question is half of the problem
egograph - graph just around who you want to influence. complicated to generate. random walking it. alternative to not need 25gb of memory. you could calculate smaller graphs separately and compare results. the same = go with ego
project done in 1 week(?)
nobody has the entire graph of Twitter
larger cluster for less time would cost less (comment from participant)
answer questions who can be answered with existent technology otherwise will take a lot of time. academia will solve your problem first. if you’re the only one with huge problem and important for the business, may be worth of solving
###
Check Box2D Physics. Genetic algorithms
## (Sylvain Bellemare) Bitcoin: Some Nuts & Bolts
@sbellem
sylvain@ascribe.io
software engineer
work with django
SPOOL blockchain protocol and API
BigchainDB scalable blockchain database
2 papers
they use blockchain, not bitcoin
pros and cons
consensus is the coolest thing
nuts and bolts: many pieces. try to make something out of it
don’t understanding everything
babylonian math = understand everything starting from any piece. like developing software. no right place to start
paper by Satoshi Nakamoto defining Bitcoin
paper by Garay. Bitcoin Backbone Protocol
everyone can mine, be a node. requires lots of computing power solving cripto puzzle
Davies-Meyer one-way compression function. no way to return
bitcoin address = public key, in the end
private key is for signing messages
pycoin is one library
bitcoin
mainnet = doing transactions
testnet = testing stuff
racktest = to run locally
in testnet you can add money to your wallet
transaction creation, transaction signature and transaction broadcast
transactions.create => transaction id
transactions.sign(id, secret_key)
transactions.push
pushing sends to a node. may be yours or another through API
transactions.decode # shows “json” of transaction based on crazy hash
can pay a greater fee to get the transaction approved first
you’re expending “remaining transactions”. there’s no real value in your wallet
transaction is related to a block
blockchain = group/aggregation of blocks
for validating you some a part of the tree. someone can verify by the history
hash you calculate, merkle root, is a hash of all the transactions in the block. changing the block will change the merkle root
the first bitcoin block is called "genesis block”. “pre-defined"
genesis block has just one transaction. its hash is the same as its merkle root
double spending attack = publishing fake blocks
people is needing distributed things
how to ensure people maintaining the blockchain won’t change everything. the longest chain is the most valid
people sending invalid hashes can’t be (almost) prevented because it’s a hash function. time-consuming to calculate, quick to verify
blocks are incentivized to behave well cause users expend money on processing farmers.
25 bitcoins for each block accepted in the network. they win money for doing so
every node receives different transactions, depending on your position. different people will build different blocks
Decentralization efforts
- bigchaindb (database)
- eris industries
- ethereum (processing)
- ipfs (storage)
- tendermint
- ascribe (applications)
"bitcointech" coursera course
in dept: "bitcoin backbone protocol: analysis and applications" paper
#####
after meetup, cards where you can put what you want to listen about. or you can take one and choose to speak
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment