Antonio Piccolboni piccolbo

## mistral-median.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                piccolbo
                / mistral-median.md
            
            
              Last active
              March 19, 2024 05:33
            
              
                Mistral ai answers and hallucinates about stats
              
          
    (a dialog on statistical algorithms with mistral.ai. My questions are in cursive. My comments are in bold)
Do you know the definition of median in statistics?
Yes, in statistics, the median is a measure of central tendency that represents the middle value in a dataset when the data is arranged in ascending or descending order. If the dataset has an odd number of observations, the median is the middle value. If the dataset has an even number of observations, the median is the average of the two middle values. The median is a useful measure of central tendency when the data is skewed or has outliers, as it is not affected by extreme values like the mean is. For example, if you have a dataset of incomes for a group of people, the median income would be the income level that separates the higher half of incomes from the lower half.
So far so good. Deleted an additional question here that was too open-ended to be clearly wrong or right.
*Can you show me how to find the median of a list in Python? D

  
## hadoop-summit-2014.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                piccolbo
                / hadoop-summit-2014.md
            
            
              Created
              August 13, 2021 12:06
                — forked from lalyos/hadoop-summit-2014.md
            
          
    Putting wings on the Elephant

[operating-hadoop]

HBase is used widely at Facebook and one of the biggest usecase is Facebook Messages. With a billion users there are a lot of reliability and performance challenges on both HBase and HDFS. HDFS was originally designed for a batch processing system like MapReduce/Hive. A realtime usecase like Facebook Messages where the p99 latency can`t be more than a couple hundreds of milliseconds poses a lot of challenges for HDFS. In this talk we will share the work the HDFS team at Facebook has done to support a realtime usecase like Facebook Messages : (1) Using system calls to tune performance; (2) Inline checksums to reduce iops by 40%; (3) Reducing the p99 for read and write latencies by about 10x; (4) Tools used to determine root cause of outliers. We will discuss the details of each technique, the challenges we faced, lessons learned and results showing the impact of each improvement.

speaker: Pritam Damania
Real-Time Market Basket Analysis for Retail with


## My atom data science bundle.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                piccolbo
                / My atom data science bundle.md
            
            
              Last active
              August 24, 2019 04:35
            
              
                A list of atom extensions I use for data science.
              
          
    Code execution


Hydrogen: allows to send any code selection to a kernel and visualize the results in-text, plus watches and more
hydrogen-launcher: launch terminal or iPython

The git bundle

Some of this may be superseeded by native git integration:

git-blame: find who wrote that cryptic code


## pypi-release-checklist2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                piccolbo
                / pypi-release-checklist2.md
            
            
              Last active
              February 23, 2022 17:41
                — forked from audreyfeldroy/pypi-release-checklist2.md
            
              
                My PyPI Release Checklist 2 (now with bumpversion)
              
          
 merge any development branch you need to merge
 git checkout master
 run test

make install-dev
make test


 when test pass git push
 Update HISTORY.rst
 Check readthedocs to make sure docs are OK


## keybase.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                piccolbo
                / keybase.md
            
            
              Created
              October 15, 2017 22:05
            
              
                keybase identification
              
          
    Keybase proof

I hereby claim:

I am piccolbo on github.
I am piccolbo (https://keybase.io/piccolbo) on keybase.
I have a public key ASATO-Kj3cWENOHAPB5OgNFMlc4xEUtScX1L0-Er8tYX-Ao

To claim this, I am signing this object:

  
## emr_spark_thrift_on_yarn
#on cluster
thrift /spark/sbin/start-thriftserver.sh --master yarn-client
#ssh tunnel, direct 10000 to unused 8157
ssh -i ~/caserta-1.pem -N -L 8157:ec2-54-221-27-21.compute-1.amazonaws.com:10000 hadoop@ec2-54-221-27-21.compute-1.amazonaws.com
#see this for JDBC config on client http://blogs.aws.amazon.com/bigdata/post/TxT7CJ0E7CRX88/Using-Amazon-EMR-with-SQL-Workbench-and-other-BI-Tools

## Fully-tele-businesses.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                piccolbo
                / Fully-tele-businesses.md
            
            
              Last active
              May 2, 2016 16:01
            
              
                Fully remote business
              
          
    Criteria: most employess remote all the time. No office space allocated or available. No relocation required. Low travel requirement.


Company
Notes


Automattic
4 weeks/year travel all employees


Rocana


RStudio


Plex


Open Knowledge Foundation
Time zone restricted -3:1, different contracts UK/non UK


## dplyr-backends.md

      
              1 file
            
          
              1 fork
            
          
              4 comments
            
          
              7 stars
            
          
                piccolbo
                / dplyr-backends.md
            
            
              Last active
              June 23, 2018 03:58
            
              
                Dplyr backends: the ultimate collection
              
          
    Dplyr is a well known R package to work on structured data, either in memory or in DB and, more recently, in cluster. The in memory implementations have in general capabilities that are not found in the others, so the notion of backend is used with a bit of a poetic license. Even the different DB and cluster backends differ in subtle ways. But it sure is better than writing SQL directly! Here I provide a list of backends with links to the packages that implement them when necessary. I've done my best to provide links to active projects, but I am not endorsing any of them. Do your own testing. Enjoy and please contribute any corrections or additions, in the comments.


Backend
Package


data.frame
builtin


data.table
builtin


arrays
builtin


SQLite
builtin


PostgreSQL/Redshift
builtin


## names scope.Rmd
```{r}
ff = function(){}
names(ff) = "abc"
# Error in names(ff) = "abc" : names() applied to a non-vector
is.vector(mtcars)
#[1] FALSE
names(mtcars) = LETTERS[1:11]
names(mtcars)
# [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K"
```

## release process.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                piccolbo
                / release process.md
            
            
              Last active
              August 29, 2015 14:15
            
          
 merge into master
 update version #
 update date
 update Rd help()
 push master
 Repeat until tests pass

 test local and debug
 test remote and debug
 test additional platforms


 apply necessary fixes
	#on cluster
	thrift /spark/sbin/start-thriftserver.sh --master yarn-client
	#ssh tunnel, direct 10000 to unused 8157
	ssh -i ~/caserta-1.pem -N -L 8157:ec2-54-221-27-21.compute-1.amazonaws.com:10000 hadoop@ec2-54-221-27-21.compute-1.amazonaws.com
	#see this for JDBC config on client http://blogs.aws.amazon.com/bigdata/post/TxT7CJ0E7CRX88/Using-Amazon-EMR-with-SQL-Workbench-and-other-BI-Tools
Company	Notes
Automattic	4 weeks/year travel all employees
Rocana
RStudio
Plex
Open Knowledge Foundation	Time zone restricted -3:1, different contracts UK/non UK
Backend	Package
data.frame	builtin
data.table	builtin
arrays	builtin
SQLite	builtin
PostgreSQL/Redshift	builtin
	```{r}
	ff = function(){}
	names(ff) = "abc"
	# Error in names(ff) = "abc" : names() applied to a non-vector
	is.vector(mtcars)
	#[1] FALSE
	names(mtcars) = LETTERS[1:11]
	names(mtcars)
	# [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K"
	```