Dovy Paukstys dovy

## README.md

      
              3 files
            
          
              103 forks
            
          
              7 comments
            
          
              295 stars
            
          
                denji
                / README.md
            
            
              Last active
              April 26, 2024 18:09
                — forked from istepanov/gist:3950977
            
              
                Remove/Backup – settings & cli for macOS (OS X) – DataGrip, AppCode, CLion, Gogland, IntelliJ, PhpStorm, PyCharm, Rider, RubyMine, WebStorm
              
          
    Moved to git repository: https://github.com/denji/jetbrains-cleanup-backup

Quick uninstall JetBrains settings:

curl -sL https://gist.github.com/denji/9731967/raw/jetbrains-uninstall.sh | bash -s

Quick backup JetBrains settings:

curl -sL https://gist.github.com/denji/9731967/raw/jetbrains-backup.sh | bash -s


## including_external_package_in_dataflow.md

      
              1 file
            
          
              1 fork
            
          
              6 comments
            
          
              26 stars
            
          
                inchoate
                / including_external_package_in_dataflow.md
            
            
              Last active
              February 2, 2024 11:40
            
              
                Adding an extra package to a Python Dataflow project to run on GCP
              
          
    The Problem

The documentation for how to deploy a pipeline with extra, non-PyPi, pure Python packages on GCP is missing some detail. This gist shows how to package and deploy an external pure-Python, non-PyPi dependency to a managed dataflow pipeline on GCP.
TL;DR: You external package needs to be a python (source/binary) distro properly packaged and shipped alongside your pipeline. It is not enough to only specify a tar file with a setup.py.
Preparing the External Package

Your external package must have a proper setup.py. What follow is an example setup.py for our ETL package. This is used to package version 1.1.1 of the etl library. The library requires 3 native PyPi packages to run. These are specified in the install_requires field. This package also ships with custom external JSON data, declared in the package_data section. Last, the setuptools.find_packages function searches for all available packages and returns that

  
## airflow-supervisord.conf
; Configuration for Airflow webserver and scheduler in Supervisor

[program:airflow]
command=/bin/airflow webserver
stopsignal=QUIT
stopasgroup=true
user=airflow
stdout_logfile=/var/log/airflow/airflow-stdout.log
stderr_logfile=/var/log/airflow/airflow-stderr.log
environment=HOME="/home/airflow",AIRFLOW_HOME="/etc/airflow",TMPDIR="/storage/airflow_tmp"

## .travis.yml
# Ruby is our language as asciidoctor is a ruby gem.
lang: ruby
before_install:
  - sudo apt-get install pandoc
  - gem install asciidoctor
script:
  - make
after_success:
  - .travis/push.sh
env:

## gist:104413
def extract_form_fields(self, soup):
    "Turn a BeautifulSoup form in to a dict of fields and default values"
    fields = {}
    for input in soup.findAll('input'):
        # ignore submit/image with no name attribute
        if input['type'] in ('submit', 'image') and not input.has_key('name'):
            continue

        # single element nome/value fields
        if input['type'] in ('text', 'hidden', 'password', 'submit', 'image'):

## CsvResponse.php
<?php

namespace Jb\AdminBundle\Http;

use Symfony\Component\HttpFoundation\Response;

class CsvResponse extends Response
{
    protected $data;

## spark_dataframe_size_estimator.py
# Function to convert python object to Java objects
def _to_java_object_rdd(rdd):
    """ Return a JavaRDD of Object by unpickling
    It will convert each Python object into Java object by Pyrolite, whenever the
    RDD is serialized in batch or not.
    """
    rdd = rdd._reserialize(AutoBatchedSerializer(PickleSerializer()))
    return rdd.ctx._jvm.org.apache.spark.mllib.api.python.SerDe.pythonToJava(rdd._jrdd, True)

# Convert DataFrame to an RDD

## Spark+ipython_on_MacOS.md

      
              1 file
            
          
              54 forks
            
          
              18 comments
            
          
              157 stars
            
          
                ololobus
                / Spark+ipython_on_MacOS.md
            
            
              Last active
              November 22, 2022 22:24
            
              
                Apache Spark installation + ipython/jupyter notebook integration guide for macOS
              
          
    Apache Spark installation + ipython/jupyter notebook integration guide for macOS

Tested with Apache Spark 2.1.0, Python 2.7.13 and Java 1.8.0_112
For older versions of Spark and ipython, please, see also previous version of text.
Install Java Development Kit


## mysql_to_big_query.sh
#!/bin/sh
TABLE_SCHEMA=$1
TABLE_NAME=$2

mytime=`date '+%y%m%d%H%M'`
hostname=`hostname | tr 'A-Z' 'a-z'`
file_prefix="trimax$TABLE_NAME$mytime$TABLE_SCHEMA"
bucket_name=$file_prefix
splitat="4000000000"
bulkfiles=200

## jenkins_wpsvn_deploy.sh
# In your Jenkins job configuration, select "Add build step > Execute shell", and paste this script contents.
# Replace `______your-plugin-name______`, `______your-wp-username______` and `______your-wp-password______` as needed.


# main config
WP_ORG_USER="______your-wp-username______" # your WordPress.org username
WP_ORG_PASS="______your-wp-password______" # your WordPress.org password
PLUGINSLUG="______your-plugin-name______"
CURRENTDIR=`pwd`
MAINFILE="______your-plugin-name______.php" # this should be the name of your main php file in the wordpress plugin
	; Configuration for Airflow webserver and scheduler in Supervisor

	[program:airflow]
	command=/bin/airflow webserver
	stopsignal=QUIT
	stopasgroup=true
	user=airflow
	stdout_logfile=/var/log/airflow/airflow-stdout.log
	stderr_logfile=/var/log/airflow/airflow-stderr.log
	environment=HOME="/home/airflow",AIRFLOW_HOME="/etc/airflow",TMPDIR="/storage/airflow_tmp"
	# Ruby is our language as asciidoctor is a ruby gem.
	lang: ruby
	before_install:
	- sudo apt-get install pandoc
	- gem install asciidoctor
	script:
	- make
	after_success:
	- .travis/push.sh
	env:
	def extract_form_fields(self, soup):
	"Turn a BeautifulSoup form in to a dict of fields and default values"
	fields = {}
	for input in soup.findAll('input'):
	# ignore submit/image with no name attribute
	if input['type'] in ('submit', 'image') and not input.has_key('name'):
	continue

	# single element nome/value fields
	if input['type'] in ('text', 'hidden', 'password', 'submit', 'image'):
	<?php

	namespace Jb\AdminBundle\Http;

	use Symfony\Component\HttpFoundation\Response;

	class CsvResponse extends Response
	{
	protected $data;
	# Function to convert python object to Java objects
	def _to_java_object_rdd(rdd):
	""" Return a JavaRDD of Object by unpickling
	It will convert each Python object into Java object by Pyrolite, whenever the
	RDD is serialized in batch or not.
	"""
	rdd = rdd._reserialize(AutoBatchedSerializer(PickleSerializer()))
	return rdd.ctx._jvm.org.apache.spark.mllib.api.python.SerDe.pythonToJava(rdd._jrdd, True)

	# Convert DataFrame to an RDD
	#!/bin/sh
	TABLE_SCHEMA=$1
	TABLE_NAME=$2

	mytime=`date '+%y%m%d%H%M'`
	hostname=`hostname \| tr 'A-Z' 'a-z'`
	file_prefix="trimax$TABLE_NAME$mytime$TABLE_SCHEMA"
	bucket_name=$file_prefix
	splitat="4000000000"
	bulkfiles=200
	# In your Jenkins job configuration, select "Add build step > Execute shell", and paste this script contents.
	# Replace `______your-plugin-name______`, `______your-wp-username______` and `______your-wp-password______` as needed.


	# main config
	WP_ORG_USER="______your-wp-username______" # your WordPress.org username
	WP_ORG_PASS="______your-wp-password______" # your WordPress.org password
	PLUGINSLUG="______your-plugin-name______"
	CURRENTDIR=`pwd`
	MAINFILE="______your-plugin-name______.php" # this should be the name of your main php file in the wordpress plugin