Greg Leclercq ggreg

## celery_workflow_nested.py
@task
def fetch(pattern, src):
    fetcher = LogFetcher(pattern)
    return fetcher.register(src, fetcher.fetch(src)).id


@task
def convert_tsv_blocks(tsv_id_list):
    return group([aggregate.s(tsv_id) for tsv_id in
                  tsv_id_list])()

## elevator.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ggreg
                / elevator.md
            
            
              Created
              December 6, 2012 15:01
                — forked from oleiade/elevator.md
            
              
                Elevator plan
              
          
    First Article (Addressing the problem)

Rationale

Here at Botify

batch processing of TeraBytes of web server logs involves storing temporary data
bulk write/read GigaBytes data loads that would not fit in server's main memory (we preferred to improve performance on a single host at the beginning)
need for persistence


## frp_intro.hs
type Behavior a = Time -> a
type Event a = [(Time,a)]

## main.yml
- name: provision
  local_action: ec2 key_name=greg group="{{default_security_group}}" instance_type="{{instance_type}}" image="{{image_id}}" instance_tags='{{instance_tags | to_json}}' monitoring=yes wait=yes
  register: ec2
- name: set_dns_record
  route53: >
    command=create
    zone=botify.com
    type=A
    value={{item.public_ip}}
  with_items: ec2.instances

## ansible 2-infra layout
infra1/development
       staging
       production
       webservers.yml
       dbservers.yml
       backend_servers.yml
       group_vars/
infra2/development
       staging
       production

## slide24_numbers.txt
Numbers Everyone Should Know
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 3,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns

## gist:7949052
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

- Use create in the index API (assuming you can).
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
- Increase the memory allocated to elasticsearch node. By default its 1g.
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
- Increase the number of machines you have so

## quadcopter.txt
- http://www.intorobotics.com/5-best-examples-of-how-to-build-a-diy-quadcopter/

- http://www.instructables.com/id/Sturdy-Quadcopter-Build/
- http://yameb.blogspot.ro/search/label/Quadrotors
- https://lalegiondesquadri.wordpress.com/2013/09/27/construire-un-multi-cest-facile/

- http://wiki.openpilot.org/display/Doc/Basic+QuadCopter
- http://smaccmpilot.org
- http://www.helimag.com/multirotors-complets/34639-turnigy-h-l-2.html

## test.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ggreg
                / test.md
            
            
              Created
              February 17, 2014 16:30
            
              
                test
              
          
    TEST

  
## pickling_example.scala
import scala.pickling._
import json._

import org.apache.curator.framework.{ CuratorFrameworkFactory }
import org.apache.curator.retry.{ ExponentialBackoffRetry }
import org.apache.zookeeper.CreateMode

abstract class State
case class Scheduled extends State
	@task
	def fetch(pattern, src):
	fetcher = LogFetcher(pattern)
	return fetcher.register(src, fetcher.fetch(src)).id


	@task
	def convert_tsv_blocks(tsv_id_list):
	return group([aggregate.s(tsv_id) for tsv_id in
	tsv_id_list])()
	- name: provision
	local_action: ec2 key_name=greg group="{{default_security_group}}" instance_type="{{instance_type}}" image="{{image_id}}" instance_tags='{{instance_tags \| to_json}}' monitoring=yes wait=yes
	register: ec2
	- name: set_dns_record
	route53: >
	command=create
	zone=botify.com
	type=A
	value={{item.public_ip}}
	with_items: ec2.instances
	infra1/development
	staging
	production
	webservers.yml
	dbservers.yml
	backend_servers.yml
	group_vars/
	infra2/development
	staging
	production
	Numbers Everyone Should Know
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns
	Compress 1K bytes with Zippy 3,000 ns
	Send 2K bytes over 1 Gbps network 20,000 ns
	Read 1 MB sequentially from memory 250,000 ns
	Round trip within same datacenter 500,000 ns
	If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

	- Use create in the index API (assuming you can).
	- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
	- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
	- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
	- Increase the memory allocated to elasticsearch node. By default its 1g.
	- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
	- Increase the number of machines you have so
	- http://www.intorobotics.com/5-best-examples-of-how-to-build-a-diy-quadcopter/

	- http://www.instructables.com/id/Sturdy-Quadcopter-Build/
	- http://yameb.blogspot.ro/search/label/Quadrotors
	- https://lalegiondesquadri.wordpress.com/2013/09/27/construire-un-multi-cest-facile/

	- http://wiki.openpilot.org/display/Doc/Basic+QuadCopter
	- http://smaccmpilot.org
	- http://www.helimag.com/multirotors-complets/34639-turnigy-h-l-2.html
	import scala.pickling._
	import json._

	import org.apache.curator.framework.{ CuratorFrameworkFactory }
	import org.apache.curator.retry.{ ExponentialBackoffRetry }
	import org.apache.zookeeper.CreateMode

	abstract class State
	case class Scheduled extends State