Skip to content

Instantly share code, notes, and snippets.

View RanaivosonHerimanitra's full-sized avatar
🏠
Working from home

Ranaivoson RanaivosonHerimanitra

🏠
Working from home
  • Québec
View GitHub Profile
@RanaivosonHerimanitra
RanaivosonHerimanitra / StratifiedRandomSampling.R
Last active January 3, 2016 13:29
built with 'data.table' package, this function aims to determine the number of sample to be surveyed in each strata using Optimal Stratified Random Sampling ( à la Neyman). Inputs include the raw dataset that contains information about strata, the name of the strata variable in this dataset and another dataset containing standard deviation of a …
require(data.table)
alloc_opti_data.table=function(n=1000
,dataset=ese_op2013_reste
,strate="strate2013"
,sd=ecart_type )
{
#count obs per strata to form Nh:
dataset=data.table(dataset)
output=dataset[,.N,by=strate]
@RanaivosonHerimanitra
RanaivosonHerimanitra / GetSampleO-SRS.R
Last active January 3, 2016 16:39
After the determination of the sample size of each Strata, this function built with 'data.table' package samples units.
require(data.table)
mysample=alloc_opti_data.table()
get_echantillon_data.table=function(
dataset=yourDataset,
alea=1435,
set.alea=TRUE,
strate="strate2013",
eff=mysample[,list(strate2013,nh=round(nh))])
{
if (set.alea==TRUE) {
@RanaivosonHerimanitra
RanaivosonHerimanitra / impute.cpp
Last active January 4, 2016 02:29
This code written with the 'Rcpp' package replaces NA's values of numeric/integer columns by their mean. It's designed to work with big dataframe with thousand of columns and rows.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List modiframe(DataFrame& df ) {
//nrow and ncol of the dataframe:
int nrow = df.nrows(), ncol= df.size() ;
double moy(0);
//define an empty list~dataframe
@RanaivosonHerimanitra
RanaivosonHerimanitra / impute2.cpp
Created January 22, 2014 17:54
improvement of the previous "impute" code :instead of doing the entire loop,index of missing rows have been already registered...
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List modiframe2(DataFrame& df ) {
//nrow and ncol of the dataframe:
int nrow = df.nrows(), ncol= df.size() ;
double moy(0);
//define an empty list~dataframe
//List output(ncol)
library(shiny)
# Define server logic for random distribution application
shinyServer(function(input, output,session) {
data <- reactive({
dist <- switch(input$dist,
norm = rnorm,
unif = runif,
lnorm = rlnorm,
exp = rexp,
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1 " charset="UTF-8" >
<link rel="stylesheet" href="shared/jquery.mobile-1.4.2.min.css" />
<script src="shared/jquery.js" type="text/javascript"></script>
<script src="shared/jquery.mobile-1.4.2.min.js" type="text/javascript" > </script>
<script src="shared/shiny.js" type="text/javascript"></script>
<script src="shared/highcharts.js" type="text/javascript"></script>
<script src="shared/data.js" type="text/javascript"></script>
import pandas
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
from rpy2.robjects.lib import grid
from rpy2.robjects.lib import ggplot2
## read in the distances to railroad (we calculated)
neardist = pandas.read_csv('data/NearDistance.csv')
## convert to R dataframe, via Python Dictionary data type
---
title: "Introduction à R"
author: "Herimanitra R."
date: "17 octobre 2014"
output: html_document
---
<h1>Installation et presentation de l'interface</h1>
R est le logiciel de programmation statistique la plus complète et la plus fournie au monde.
Il contient des milliers de librairie ou package capable d’exécuter des taches spécifiques liées au domaine du calcul. De la bioinformatique à la Statistique en passant par l’économétrie, l’analyse numérique, la cartographie et le Data Mining.
library(shiny)
library(spdep)
library(leaflet)
library(RColorBrewer)
atx <- readRDS('travis.rds')
atx$id <- 1:nrow(atx)
atx2 <- atx[!is.na(atx$income), ]
@RanaivosonHerimanitra
RanaivosonHerimanitra / github-mining.sql
Created January 21, 2017 14:46 — forked from thomasdarimont/github-mining.sql
Sample query for github dataset in big-query
SELECT
cont.sample_repo_name,
repo.watch_count
FROM
[bigquery-public-data:github_repos.sample_contents] as cont
JOIN [bigquery-public-data:github_repos.sample_repos] as repo
ON cont.sample_repo_name = repo.repo_name
WHERE
cont.content CONTAINS 'findbugs-maven-plugin</artifactId>'
AND cont.sample_path LIKE 'pom.xml'