As a reminder, here are the components in play to run an application:
- The cluster:
- Spark Master: coordinates the resources
- Spark Workers: offer resources to run the applications
- The application:
#!/usr/bin/python | |
DOCUMENTATION = ''' | |
--- | |
module: copy_remotely | |
short_description: Copies a file from the remote server to the remote server. | |
description: | |
- Copies a file but, unlike the M(file) module, the copy is performed on the | |
remote server. | |
The copy is only performed if the source and destination files are different | |
(different MD5 sums) or if the destination file does not exist. |
Basic file formats - such as CSV, JSON or other text formats - can be useful when exchanging data between applications. When it comes to storing intermediate data between steps of an application, Parquet can provide more advanced capabilities:
The tests here are performed with Spark 2.0.1 on a cluster with 3 workers (c4.4xlarge
, 16 vCPU and 30 GB each).
var fs = require('fs'); | |
var data = fs.readFileSync(process.argv[2], { | |
encoding: 'ascii' | |
}); | |
var json = JSON.parse(data); | |
for (var list in json) { | |
var devices = json[list]; | |
for (var i = 0; i < devices.length; i++) { |
#!/usr/bin/python | |
import os | |
import sys | |
import requests | |
schema_registry_url = sys.argv[1] | |
topic = sys.argv[2] | |
schema_file = sys.argv[3] |
<?xml version="1.0"?> | |
<!DOCTYPE module PUBLIC | |
"-//Puppy Crawl//DTD Check Configuration 1.3//EN" | |
"http://www.puppycrawl.com/dtds/configuration_1_3.dtd"> | |
<!-- | |
Checkstyle configuration that checks the Google coding conventions from: | |
- Google Java Style |
region: Ile-de-France | |
departement: Paris | |
zipCode: 75011 | |
city: Paris | |
name: Alexis S | |
email: alexis@xxx.com | |
phoneNumber: "0600000000" | |
hidePhoneNumber: false | |
password: xxxxxxxxx |
PROMPT=$'%{$fg_bold[red]%}%D{%K:%M:%S}%{$reset_color%} %{$fg[cyan]%}%n%{$fg[grey]%}@%{$fg[green]%}%M%{$fg[grey]%}:%{$fg_bold[yellow]%}%d%{$fg[grey]%}$(git_prompt_info) $ %{$reset_color%}' | |
ZSH_THEME_GIT_PROMPT_PREFIX=" %{$fg_bold[white]%}git:(" | |
ZSH_THEME_GIT_PROMPT_SUFFIX="%{$fg[white]%})%{$reset_color%}" | |
ZSH_THEME_GIT_PROMPT_DIRTY="%{$fg[red]%}*" | |
ZSH_THEME_GIT_PROMPT_CLEAN="" |
#!/bin/bash -e | |
if [ ! -d data/wikipedia-pagecounts-hours ]; then | |
mkdir -p data/wikipedia-pagecounts-hours | |
fi | |
cd data/wikipedia-pagecounts-hours | |
yyyy=2014 | |
MM=06 | |
dd=19 |