아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다.
- fluntd (streaming)
- embulk (batch) http://www.embulk.org/docs/
embulk를 리눅스에 설치해보자. jar를 copy 하면 됨
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"
/** | |
* Exercise 2 | |
*/ | |
def balance(chars: List[Char]): Boolean = { | |
def num (x: Char): Int = x match { | |
case ')' => 1 | |
case '(' => -1 | |
case _ => 0 | |
} |
object regexpPractice { | |
println("src/main/scala/org/graphframes/pattern/patterns.scala") | |
//> src/main/scala/org/graphframes/pattern/patterns.scala | |
val original = "[a-zA-Z0-9_]+".r //> original : scala.util.matching.Regex = [a-zA-Z0-9_]+ | |
val fix = "[a-zA-Z0-9_.:/]+".r //> fix : scala.util.matching.Regex = [a-zA-Z0-9_.:/]+ | |
"http://www.google.com" match { | |
case original(_*) => "match!" | |
case fix(_*) => "fix match!" |
#!/bin/bash | |
# A simple test script to demonstrate how to find the | |
# "absolute path" at which a script is running. Used | |
# to avoid some of the pitfals of using 'pwd' or hard- | |
# coded paths when running scripts from cron or another | |
# directory. | |
# | |
# Try it out: | |
# run the script from the current directory, then |
아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다. | |
# ETL | |
- fluntd (streaming) | |
- embulk (batch) http://www.embulk.org/docs/ | |
embulk를 리눅스에 설치해보자. jar를 copy 하면 됨 | |
~~~bash | |
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar" |
아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다.
embulk를 리눅스에 설치해보자. jar를 copy 하면 됨
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"
# RULE | |
# It is a kind of Column Store | |
# Avoid using * (star) to return all columns, instead use preview | |
# Check the amount of the processing size by changing query) (500 MB, 1T..) | |
# Always with LIMIT | |
# format converts number into string (cannot add) | |
# cannot use aliased column in where clause like income | |
# StandardSQL or legacySQL https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql | |
#standardSQL |
gcloud dataproc clusters create <NAME-OF-YOUR-CLUSTER> --subnet default --zone us-central1-b --master-machine-type n1-standard-2 --master-boot-disk-size 500 --num-workers 2 --worker-machine-type n1-standard-2 --worker-boot-disk-size 500 --project <YOUR-PROJECT-ID> |
Compute Engine: https://cloud.google.com/compute/ | |
Storage: https://cloud.google.com/storage/ | |
Pricing: https://cloud.google.com/pricing/ | |
Cloud Launcher: https://cloud.google.com/launcher/ | |
Pricing Philosophy: https://cloud.google.com/pricing/philosophy/ |
datalab create mydatalabvm --zone us-central1-b |
# gcloud beta pubsub topics create sanidego | |
# gcloud beta pubsub topics publish sandiego "hello" | |
from google.cloud import pubsub | |
client = pubsub.Client() | |
topic = client.topic("sandiego") | |
topic.create() |