Skip to content

Instantly share code, notes, and snippets.

@goungoun
goungoun / balance.scala
Created January 21, 2018 13:35
balance
/**
* Exercise 2
*/
def balance(chars: List[Char]): Boolean = {
def num (x: Char): Int = x match {
case ')' => 1
case '(' => -1
case _ => 0
}
object regexpPractice {
println("src/main/scala/org/graphframes/pattern/patterns.scala")
//> src/main/scala/org/graphframes/pattern/patterns.scala
val original = "[a-zA-Z0-9_]+".r //> original : scala.util.matching.Regex = [a-zA-Z0-9_]+
val fix = "[a-zA-Z0-9_.:/]+".r //> fix : scala.util.matching.Regex = [a-zA-Z0-9_.:/]+
"http://www.google.com" match {
case original(_*) => "match!"
case fix(_*) => "fix match!"
@goungoun
goungoun / bashpath.sh
Created March 26, 2018 02:58 — forked from darrenderidder/bashpath.sh
Get path of running script in bash
#!/bin/bash
# A simple test script to demonstrate how to find the
# "absolute path" at which a script is running. Used
# to avoid some of the pitfals of using 'pwd' or hard-
# coded paths when running scripts from cron or another
# directory.
#
# Try it out:
# run the script from the current directory, then
아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다.
# ETL
- fluntd (streaming)
- embulk (batch) http://www.embulk.org/docs/
embulk를 리눅스에 설치해보자. jar를 copy 하면 됨
~~~bash
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"
@goungoun
goungoun / embulk.md
Created April 10, 2018 04:32
GCP with Embulk

아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다.

ETL

embulk를 리눅스에 설치해보자. jar를 copy 하면 됨

curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"
# RULE
# It is a kind of Column Store
# Avoid using * (star) to return all columns, instead use preview
# Check the amount of the processing size by changing query) (500 MB, 1T..)
# Always with LIMIT
# format converts number into string (cannot add)
# cannot use aliased column in where clause like income
# StandardSQL or legacySQL https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql
#standardSQL
@goungoun
goungoun / gist:e290cd8b63eba7955006e12fdd346a76
Created May 1, 2018 14:25
gcloud dataproc clusters create
gcloud dataproc clusters create <NAME-OF-YOUR-CLUSTER> --subnet default --zone us-central1-b --master-machine-type n1-standard-2 --master-boot-disk-size 500 --num-workers 2 --worker-machine-type n1-standard-2 --worker-boot-disk-size 500 --project <YOUR-PROJECT-ID>
Compute Engine: https://cloud.google.com/compute/
Storage: https://cloud.google.com/storage/
Pricing: https://cloud.google.com/pricing/
Cloud Launcher: https://cloud.google.com/launcher/
Pricing Philosophy: https://cloud.google.com/pricing/philosophy/
datalab create mydatalabvm --zone us-central1-b
# gcloud beta pubsub topics create sanidego
# gcloud beta pubsub topics publish sandiego "hello"
from google.cloud import pubsub
client = pubsub.Client()
topic = client.topic("sandiego")
topic.create()