goungoun

## balance.scala
  /**
   * Exercise 2
   */
    def balance(chars: List[Char]): Boolean = {

      def num (x: Char): Int = x match {
        case ')' => 1
        case '(' => -1
        case _ => 0
      }

## patterns.scala
object regexpPractice {
  println("src/main/scala/org/graphframes/pattern/patterns.scala")
                                                  //> src/main/scala/org/graphframes/pattern/patterns.scala

  val original = "[a-zA-Z0-9_]+".r                //> original  : scala.util.matching.Regex = [a-zA-Z0-9_]+
  val fix = "[a-zA-Z0-9_.:/]+".r                  //> fix  : scala.util.matching.Regex = [a-zA-Z0-9_.:/]+

  "http://www.google.com" match {
    case original(_*) => "match!"
    case fix(_*) => "fix match!"

## bashpath.sh
#!/bin/bash

# A simple test script to demonstrate how to find the
# "absolute path" at which a script is running. Used
# to avoid some of the pitfals of using 'pwd' or hard-
# coded paths when running scripts from cron or another
# directory.
#
# Try it out:
# run the script from the current directory, then

## gist:946caba852187a4f4b9ac05f3f877817
아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다.

# ETL
- fluntd (streaming)
- embulk (batch) http://www.embulk.org/docs/

embulk를 리눅스에 설치해보자. jar를 copy 하면 됨

~~~bash
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"

## embulk.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                goungoun
                / embulk.md
            
            
              Created
              April 10, 2018 04:32
            
              
                GCP with Embulk
              
          
    아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다.
ETL


fluntd (streaming)
embulk (batch) http://www.embulk.org/docs/

embulk를 리눅스에 설치해보자. jar를 copy 하면 됨
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"

  
## big query example.sql
# RULE
# It is a kind of Column Store
# Avoid using * (star) to return all columns, instead use preview
# Check the amount of the processing size by changing query) (500 MB, 1T..)
# Always with LIMIT
# format converts number into string (cannot add)
# cannot use aliased column in where clause like income
# StandardSQL or legacySQL https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql

#standardSQL

## gist:e290cd8b63eba7955006e12fdd346a76
gcloud dataproc clusters create <NAME-OF-YOUR-CLUSTER> --subnet default --zone us-central1-b --master-machine-type n1-standard-2 --master-boot-disk-size 500 --num-workers 2 --worker-machine-type n1-standard-2 --worker-boot-disk-size 500 --project <YOUR-PROJECT-ID>

## gist:f49795a11755a5aa20226e5b822317ec
Compute Engine: https://cloud.google.com/compute/
Storage: https://cloud.google.com/storage/
Pricing: https://cloud.google.com/pricing/
Cloud Launcher: https://cloud.google.com/launcher/
Pricing Philosophy: https://cloud.google.com/pricing/philosophy/

## GCP datalab
datalab create mydatalabvm --zone us-central1-b

## GCP-pubsub.py

# gcloud beta pubsub topics create sanidego
# gcloud beta pubsub topics publish sandiego "hello"


from google.cloud import pubsub
client = pubsub.Client()

topic = client.topic("sandiego")
topic.create()
	/**
	* Exercise 2
	*/
	def balance(chars: List[Char]): Boolean = {

	def num (x: Char): Int = x match {
	case ')' => 1
	case '(' => -1
	case _ => 0
	}
	object regexpPractice {
	println("src/main/scala/org/graphframes/pattern/patterns.scala")
	//> src/main/scala/org/graphframes/pattern/patterns.scala

	val original = "[a-zA-Z0-9_]+".r //> original : scala.util.matching.Regex = [a-zA-Z0-9_]+
	val fix = "[a-zA-Z0-9_.:/]+".r //> fix : scala.util.matching.Regex = [a-zA-Z0-9_.:/]+

	"http://www.google.com" match {
	case original(_*) => "match!"
	case fix(_*) => "fix match!"
	#!/bin/bash

	# A simple test script to demonstrate how to find the
	# "absolute path" at which a script is running. Used
	# to avoid some of the pitfals of using 'pwd' or hard-
	# coded paths when running scripts from cron or another
	# directory.
	#
	# Try it out:
	# run the script from the current directory, then
	아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다.

	# ETL
	- fluntd (streaming)
	- embulk (batch) http://www.embulk.org/docs/

	embulk를 리눅스에 설치해보자. jar를 copy 하면 됨

	~~~bash
	curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"
	# RULE
	# It is a kind of Column Store
	# Avoid using * (star) to return all columns, instead use preview
	# Check the amount of the processing size by changing query) (500 MB, 1T..)
	# Always with LIMIT
	# format converts number into string (cannot add)
	# cannot use aliased column in where clause like income
	# StandardSQL or legacySQL https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql

	#standardSQL
	Compute Engine: https://cloud.google.com/compute/
	Storage: https://cloud.google.com/storage/
	Pricing: https://cloud.google.com/pricing/
	Cloud Launcher: https://cloud.google.com/launcher/
	Pricing Philosophy: https://cloud.google.com/pricing/philosophy/

	# gcloud beta pubsub topics create sanidego
	# gcloud beta pubsub topics publish sandiego "hello"


	from google.cloud import pubsub
	client = pubsub.Client()

	topic = client.topic("sandiego")
	topic.create()