petertc hrchu

## camlistore-server-vps-s3.md

      
              4 files
            
          
              1 fork
            
          
              5 comments
            
          
              12 stars
            
          
                edrex
                / camlistore-server-vps-s3.md
            
            
              Last active
              June 8, 2016 10:12
            
              
                Camlistore on a VPS with S3 blob storage
              
          
    Let's set up Camlistore on a Linux server, with blobs stored in s3. This seems to be the currently best-supported option for "cloud" deployment.
This is meant as a supplement to the official server config doc. Read through both docs before you start.
http://camlistore.org/docs/server-config
This blog post is also recommended reading.
I've posted my config files for reference, but they will be created the first time you run camlistored (for the server) and camput init (for the client) so don't copy them.

  
## gist:4964818
server {
        listen 80;

        server_name     _;
        client_max_body_size 100m;

        location / {
                fastcgi_pass_header     Authorization;
                fastcgi_pass_request_headers on;

## gist:2049562
#!/usr/bin/perl

use ElasticSearch;
use Text::CSV_XS;

my $csv_file = 'output.csv';
open my $fh, '>:encoding(utf8)', $csv_file or die $!;


my $csv = Text::CSV_XS->new;

## gist:f617caa8ed671a0f960ead56556e0c5c

If your processing rate is high this might not be optimal since you'll be committing for each message, even if using async commits there's some degree of performance penalty.
So another alternative is to keep `enable.auto.commit` set to True (default) but disable the automatic offset store.
So what is the offset store?
Each time a message is passed from the client to your application its offset is stored for future commit, the next intervalled commit will then use this stored offset. If the stored offset did not change from the last commit nothing happens.

So by setting `enable.auto.offset.store` to False you keep the convenient intervalled auto commit behaviour but you control what offsets are actually eligible for commit.

## shutdown_example.py
import daemon, lockfile, signal, logging

import eventlet
from eventlet import wsgi, timeout

worker_pool = eventlet.GreenPool(20)
sock = eventlet.listen(('', 8000))

def proper_shutdown():
    worker_pool.resize(0)

## gist:0cc5e783387f5453f528
# Follows the squid format in default:
# logformat squid %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %[un %Sh/%<a %mt
# http://www.squid-cache.org/Doc/config/logformat/

%{NUMBER:timestamp}\s+%{NUMBER:response_time} %{IPORHOST:src_ip} %{WORD:squid_request_status}/%{NUMBER:http_status_code} %{NUMBER:reply_size_include_header} %{WORD:http_method} %{URI:request_url} %{USERNAME:user} %{WORD:squid_hierarchy_status}/%{IPORHOST:server_ip_or_peer_name} (?<mime_content_type>\S+\/\S+)

## s3_multipart_upload.py
#!/usr/bin/env python
"""Split large file into multiple pieces for upload to S3.

S3 only supports 5Gb files for uploading directly, so for larger CloudBioLinux
box images we need to use boto's multipart file support.

This parallelizes the task over available cores using multiprocessing.

Usage:
  s3_multipart_upload.py <file_to_transfer> <bucket_name> [<s3_key_name>]

## how-to.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                hrchu
                / how-to.md
            
            
              Last active
              January 24, 2020 12:47
                — forked from reywood/how-to.md
            
              
                How to get a stack trace from a stuck/hanging python script
              
          
    How to get a stack trace for each thread in a running python script

Sometimes a python script will simply hang forever with no indication of where things went wrong. Perhaps it's polling a service that will never return a value that allows the program to move forward. Here's a way to see where the program is currently stuck.
Install gdb and pyrasite

Install gdb.
# Redhat, CentOS, etc

  
## install-scala-sbt-and-java-on-ubuntu.md

      
              1 file
            
          
              6 forks
            
          
              1 comment
            
          
              18 stars
            
          
                alexislucena
                / install-scala-sbt-and-java-on-ubuntu.md
            
            
              Created
              December 5, 2016 12:35
            
              
                Ubuntu: Install Scala, SBT and Java on Ubuntu 16.04
              
          
    Install Scala 2.11.8
$ sudo apt-get remove scala-library scala
$ sudo wget www.scala-lang.org/files/archive/scala-2.11.8.deb
$ sudo dpkg -i scala-2.11.8.deb

Check Scala version
$ scala -version


## streaming-tar.py
#!/usr/bin/env python
#
# Building a tar file chunk-by-chunk.
#
# This is a quick bit of sample code for streaming data to a tar file,
# building it piece-by-piece. The tarfile is built on-the-fly and streamed
# back out. This is useful for web applications that need to dynamically
# build a tar file without swamping the server.
import os
import sys
	server {
	listen 80;

	server_name _;
	client_max_body_size 100m;

	location / {
	fastcgi_pass_header Authorization;
	fastcgi_pass_request_headers on;
	#!/usr/bin/perl

	use ElasticSearch;
	use Text::CSV_XS;

	my $csv_file = 'output.csv';
	open my $fh, '>:encoding(utf8)', $csv_file or die $!;


	my $csv = Text::CSV_XS->new;

	If your processing rate is high this might not be optimal since you'll be committing for each message, even if using async commits there's some degree of performance penalty.
	So another alternative is to keep `enable.auto.commit` set to True (default) but disable the automatic offset store.
	So what is the offset store?
	Each time a message is passed from the client to your application its offset is stored for future commit, the next intervalled commit will then use this stored offset. If the stored offset did not change from the last commit nothing happens.

	So by setting `enable.auto.offset.store` to False you keep the convenient intervalled auto commit behaviour but you control what offsets are actually eligible for commit.
	import daemon, lockfile, signal, logging

	import eventlet
	from eventlet import wsgi, timeout

	worker_pool = eventlet.GreenPool(20)
	sock = eventlet.listen(('', 8000))

	def proper_shutdown():
	worker_pool.resize(0)
	# Follows the squid format in default:
	# logformat squid %ts.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %[un %Sh/%<a %mt
	# http://www.squid-cache.org/Doc/config/logformat/

	%{NUMBER:timestamp}\s+%{NUMBER:response_time} %{IPORHOST:src_ip} %{WORD:squid_request_status}/%{NUMBER:http_status_code} %{NUMBER:reply_size_include_header} %{WORD:http_method} %{URI:request_url} %{USERNAME:user} %{WORD:squid_hierarchy_status}/%{IPORHOST:server_ip_or_peer_name} (?<mime_content_type>\S+\/\S+)
	#!/usr/bin/env python
	"""Split large file into multiple pieces for upload to S3.

	S3 only supports 5Gb files for uploading directly, so for larger CloudBioLinux
	box images we need to use boto's multipart file support.

	This parallelizes the task over available cores using multiprocessing.

	Usage:
	s3_multipart_upload.py <file_to_transfer> <bucket_name> [<s3_key_name>]
	#!/usr/bin/env python
	#
	# Building a tar file chunk-by-chunk.
	#
	# This is a quick bit of sample code for streaming data to a tar file,
	# building it piece-by-piece. The tarfile is built on-the-fly and streamed
	# back out. This is useful for web applications that need to dynamically
	# build a tar file without swamping the server.
	import os
	import sys