Sid Anand r39132

## JsonToParquetConverter.java
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.generic.GenericRecordBuilder;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.EncoderFactory;

## gist:3ef24b3ffc4ebf7f0509fc635165f3af
Ignored 2 lines in your exclusion files as comments or empty lines.

*****************************************************
Summary
-------
Generated at: 2018-04-05T23:32:08-07:00

Notes: 5
Binaries: 36
Archives: 0

## RF_QCon_London_Talk_Abstract
PayPal currently supports 4+ generations of software stacks in production and runs 2k+ distinct microservices, which together provide customers with the fast and seamless user experience they expect. To maintain high quality while promoting happy, productive developers in such an environment, self-service tools with a high grade of automation under the hood are paramount. In this talk, I will tell the story of how PayPal moved from a "PayPal on a box"-test environment, to VM-based environments, and finally to a delivery pipeline leveraging our container platform. I will describe how our pipeline delivers containers to fly-away test environments for automated integration testing and how that paradigm shift impacted our engineering teams and their workflows. We will share our insights and learnings on what worked really well for us as well as how some of our learnings can be applied at other companies.

## gist:b44f7d791e11f882cde28a219df97c29
from airflow import DAG, utils
from airflow.operators.dummy_operator import DummyOperator
from datetime import date, datetime, time, timedelta


today = datetime.today()
# Round to align with the schedule interval
START_DATE = today.replace(minute=0, second=0, microsecond=0)
DAG_NAME = 'clear_task_bug_dag_1.0'

## example_http_operator_sid_v2
"""
### Example HTTP operator and sensor
"""
from airflow import DAG
from airflow.operators.http_operator import SimpleHttpOperator
from airflow.operators.sensors import HttpSensor
from datetime import datetime, timedelta
import json

seven_days_ago = datetime.combine(datetime.today() - timedelta(7),

## gist:30cc62c74b3ba23039a622c31016766f
I recently migrated from AWS ES 2.3 to 5.1.

I followed the instructions on [http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-version-migration.html]

TLDR, i snapshotted my 2.3 ES cluster to S3 and then restored to the new 5.1 cluster from S3.

However, I ran into a problem. I added an *ip* field to my indexes, which included indexes brought over from 2.3 as well as new indexes created on 5.1. Here's an sample mapping:

curl -XPUT "localhost:80/cars/_mapping/transactions" -d'
{

## Failure_running_terraform_apply_to_add_tag_to_Kinesis
sid-as-mbp:ep siddharth$ terraform plan --target=aws_kinesis_stream.scored_output
var.im_ami
  Enter a value: 1

Refreshing Terraform state prior to plan...

aws_s3_bucket.agari_stage_ep_scored_output_firehose_bucket: Refreshing state... (ID: agari-stage-ep-scored-output-firehose)
aws_iam_role.firehose_role: Refreshing state... (ID: collector_ingest_firehose_role)
aws_kinesis_firehose_delivery_stream.scored_output_firehose: Refreshing state... (ID: arn:aws:firehose:us-west-2:118435376172:deliverystream/agari-stage-ep-scored-output-firehose)
aws_kinesis_stream.scored_output: Refreshing state... (ID: arn:aws:kinesis:us-west-2:118435376172:stream/agari-stage-ep-scored-output)

## example_dag_creation
now = datetime.now()
now_to_the_hour = now.replace(hour=now.time().hour, minute=0, second=0, microsecond=0)
START_DATE = now_to_the_hour + timedelta(hours=-3)
DAG_NAME = 'ep_telemetry_v2'
ORG_IDS = get_active_org_ids_string()

default_args = {
    'owner': 'sanand',
    'depends_on_past': True,
    'pool': 'ep_data_pipeline',

## Only_run_latest
from datetime import datetime
from airflow.models import DAG
from airflow.operators import BashOperator, ShortCircuitOperator
import logging


def skip_to_current_job(ds, **kwargs):
    now = datetime.now()
    left_window = kwargs['dag'].following_schedule(kwargs['execution_date'])
    right_window = kwargs['dag'].following_schedule(left_window)

## gist:8e2f86516fc5be3c63944ac1ca9600f6
check process airflow-webserver with pidfile /home/deploy/airflow/pids/airflow-webserver.pid
  group airflow
  start program "/bin/sh -c '( HISTTIMEFORMAT="%d/%m/%y %T " TMP=/data/tmp AIRFLOW_HOME=/home/deploy/airflow PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin airflow webserver -p 8080 2>&1 & echo $! > /home/deploy/airflow/pids/airflow-webserver.pid ) | logger -p local7.info'"
    as uid deploy and gid deploy
  stop program "/bin/sh -c 'PATH=/bin:/sbin:/usr/bin:/usr/sbin pkill -TERM -P `cat /home/deploy/airflow/pids/airflow-webserver.pid` && rm -f /home/deploy/airflow/pids/airflow-webserver.pid'"
    as uid deploy and gid deploy
~
	import java.io.File;
	import java.io.IOException;
	import java.nio.file.Files;
	import java.nio.file.Paths;
	import org.apache.avro.Schema;
	import org.apache.avro.generic.GenericData;
	import org.apache.avro.generic.GenericRecord;
	import org.apache.avro.generic.GenericRecordBuilder;
	import org.apache.avro.io.DatumWriter;
	import org.apache.avro.io.EncoderFactory;
	Ignored 2 lines in your exclusion files as comments or empty lines.

	*****************************************************
	Summary
	-------
	Generated at: 2018-04-05T23:32:08-07:00

	Notes: 5
	Binaries: 36
	Archives: 0
	from airflow import DAG, utils
	from airflow.operators.dummy_operator import DummyOperator
	from datetime import date, datetime, time, timedelta


	today = datetime.today()
	# Round to align with the schedule interval
	START_DATE = today.replace(minute=0, second=0, microsecond=0)
	DAG_NAME = 'clear_task_bug_dag_1.0'
	"""
	### Example HTTP operator and sensor
	"""
	from airflow import DAG
	from airflow.operators.http_operator import SimpleHttpOperator
	from airflow.operators.sensors import HttpSensor
	from datetime import datetime, timedelta
	import json

	seven_days_ago = datetime.combine(datetime.today() - timedelta(7),
	I recently migrated from AWS ES 2.3 to 5.1.

	I followed the instructions on [http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-version-migration.html]

	TLDR, i snapshotted my 2.3 ES cluster to S3 and then restored to the new 5.1 cluster from S3.

	However, I ran into a problem. I added an ip field to my indexes, which included indexes brought over from 2.3 as well as new indexes created on 5.1. Here's an sample mapping:

	curl -XPUT "localhost:80/cars/_mapping/transactions" -d'
	{
	sid-as-mbp:ep siddharth$ terraform plan --target=aws_kinesis_stream.scored_output
	var.im_ami
	Enter a value: 1

	Refreshing Terraform state prior to plan...

	aws_s3_bucket.agari_stage_ep_scored_output_firehose_bucket: Refreshing state... (ID: agari-stage-ep-scored-output-firehose)
	aws_iam_role.firehose_role: Refreshing state... (ID: collector_ingest_firehose_role)
	aws_kinesis_firehose_delivery_stream.scored_output_firehose: Refreshing state... (ID: arn:aws:firehose:us-west-2:118435376172:deliverystream/agari-stage-ep-scored-output-firehose)
	aws_kinesis_stream.scored_output: Refreshing state... (ID: arn:aws:kinesis:us-west-2:118435376172:stream/agari-stage-ep-scored-output)
	now = datetime.now()
	now_to_the_hour = now.replace(hour=now.time().hour, minute=0, second=0, microsecond=0)
	START_DATE = now_to_the_hour + timedelta(hours=-3)
	DAG_NAME = 'ep_telemetry_v2'
	ORG_IDS = get_active_org_ids_string()

	default_args = {
	'owner': 'sanand',
	'depends_on_past': True,
	'pool': 'ep_data_pipeline',
	from datetime import datetime
	from airflow.models import DAG
	from airflow.operators import BashOperator, ShortCircuitOperator
	import logging


	def skip_to_current_job(ds, **kwargs):
	now = datetime.now()
	left_window = kwargs['dag'].following_schedule(kwargs['execution_date'])
	right_window = kwargs['dag'].following_schedule(left_window)
	check process airflow-webserver with pidfile /home/deploy/airflow/pids/airflow-webserver.pid
	group airflow
	start program "/bin/sh -c '( HISTTIMEFORMAT="%d/%m/%y %T " TMP=/data/tmp AIRFLOW_HOME=/home/deploy/airflow PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin airflow webserver -p 8080 2>&1 & echo $! > /home/deploy/airflow/pids/airflow-webserver.pid ) \| logger -p local7.info'"
	as uid deploy and gid deploy
	stop program "/bin/sh -c 'PATH=/bin:/sbin:/usr/bin:/usr/sbin pkill -TERM -P `cat /home/deploy/airflow/pids/airflow-webserver.pid` && rm -f /home/deploy/airflow/pids/airflow-webserver.pid'"
	as uid deploy and gid deploy
	~