Antonio Piccolboni piccolbo

## hadoop-summit-2014.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                piccolbo
                / hadoop-summit-2014.md
            
            
              Created
              August 13, 2021 12:06
                — forked from lalyos/hadoop-summit-2014.md
            
          
    Putting wings on the Elephant

[operating-hadoop]

HBase is used widely at Facebook and one of the biggest usecase is Facebook Messages. With a billion users there are a lot of reliability and performance challenges on both HBase and HDFS. HDFS was originally designed for a batch processing system like MapReduce/Hive. A realtime usecase like Facebook Messages where the p99 latency can`t be more than a couple hundreds of milliseconds poses a lot of challenges for HDFS. In this talk we will share the work the HDFS team at Facebook has done to support a realtime usecase like Facebook Messages : (1) Using system calls to tune performance; (2) Inline checksums to reduce iops by 40%; (3) Reducing the p99 for read and write latencies by about 10x; (4) Tools used to determine root cause of outliers. We will discuss the details of each technique, the challenges we faced, lessons learned and results showing the impact of each improvement.

speaker: Pritam Damania
Real-Time Market Basket Analysis for Retail with


## pypi-release-checklist2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                piccolbo
                / pypi-release-checklist2.md
            
            
              Last active
              February 23, 2022 17:41
                — forked from audreyfeldroy/pypi-release-checklist2.md
            
              
                My PyPI Release Checklist 2 (now with bumpversion)
              
          
 merge any development branch you need to merge
 git checkout master
 run test

make install-dev
make test


 when test pass git push
 Update HISTORY.rst
 Check readthedocs to make sure docs are OK


## emr_spark_thrift_on_yarn
#on cluster
thrift /spark/sbin/start-thriftserver.sh --master yarn-client
#ssh tunnel, direct 10000 to unused 8157
ssh -i ~/caserta-1.pem -N -L 8157:ec2-54-221-27-21.compute-1.amazonaws.com:10000 hadoop@ec2-54-221-27-21.compute-1.amazonaws.com
#see this for JDBC config on client http://blogs.aws.amazon.com/bigdata/post/TxT7CJ0E7CRX88/Using-Amazon-EMR-with-SQL-Workbench-and-other-BI-Tools
	#on cluster
	thrift /spark/sbin/start-thriftserver.sh --master yarn-client
	#ssh tunnel, direct 10000 to unused 8157
	ssh -i ~/caserta-1.pem -N -L 8157:ec2-54-221-27-21.compute-1.amazonaws.com:10000 hadoop@ec2-54-221-27-21.compute-1.amazonaws.com
	#see this for JDBC config on client http://blogs.aws.amazon.com/bigdata/post/TxT7CJ0E7CRX88/Using-Amazon-EMR-with-SQL-Workbench-and-other-BI-Tools