Skip to content

Instantly share code, notes, and snippets.

View rolandkofler's full-sized avatar

Ro rolandkofler

View GitHub Profile
@xrstf
xrstf / setup.md
Last active October 3, 2022 13:30
Nutch 2.3 + ElasticSearch 1.4 + HBase 0.94 Setup

Info

This guide sets up a non-clustered Nutch crawler, which stores its data via HBase. We will not learn how to setup Hadoop et al., but just the bare minimum to crawl and index websites on a single machine.

Terms

  • Nutch - the crawler (fetches and parses websites)
  • HBase - filesystem storage for Nutch (Hadoop component, basically)
@greenido
greenido / AppsScript - Monte Carlo simulation.js
Created October 14, 2013 06:56
This is a simple Monte Carlo simulation to see whether our sale person should execute a strategy of 'many' big deals and 'few' small ones or vis versa.
/***************************************************************************
*
* This is a simple Monte Carlo simulation to see whether our sale
* person should execute a strategy of 'many' big deals and 'few' small ones
* or vis versa.
*
* Author: Ido Green | plus.google.com/+greenido
* Date: 16 July 2013
*
* *************************************************************************/
@magnetikonline
magnetikonline / README.md
Last active March 14, 2024 22:48
IE 7/8/9/10/11 Virtual machines from Microsoft - Linux w/VirtualBox installation notes.
@mrdwab
mrdwab / Stratified.R
Created May 21, 2011 17:06
R stratified random sampling from a data frame
stratified = function(df, group, size) {
# USE: * Specify your data frame and grouping variable (as column
# number) as the first two arguments.
# * Decide on your sample size. For a sample proportional to the
# population, enter "size" as a decimal. For an equal number
# of samples from each group, enter "size" as a whole number.
#
# Example 1: Sample 10% of each group from a data frame named "z",
# where the grouping variable is the fourth variable, use:
#