Skip to content

Instantly share code, notes, and snippets.

@shunsukeaihara
shunsukeaihara / pic.py
Created January 23, 2013 08:37
power iteration clustering
#!/opt/local/bin/python
# module power iteration clustering
import numpy as NP
from scipy.cluster.vq import kmeans2
def calcNorm1(v):
return NP.sum(NP.fabs(v))
def calcDelta(v,v2):
@rxaviers
rxaviers / gist:7360908
Last active July 25, 2024 19:00
Complete list of github markdown emoji markup

People

:bowtie: :bowtie: 😄 :smile: 😆 :laughing:
😊 :blush: 😃 :smiley: ☺️ :relaxed:
😏 :smirk: 😍 :heart_eyes: 😘 :kissing_heart:
😚 :kissing_closed_eyes: 😳 :flushed: 😌 :relieved:
😆 :satisfied: 😁 :grin: 😉 :wink:
😜 :stuck_out_tongue_winking_eye: 😝 :stuck_out_tongue_closed_eyes: 😀 :grinning:
😗 :kissing: 😙 :kissing_smiling_eyes: 😛 :stuck_out_tongue:
@vinovator
vinovator / pdfTextMiner.py
Last active April 20, 2023 03:47
A sample code which uses pdfminer module to extract text from pdf files
# pdfTextMiner.py
# Python 2.7.6
# For Python 3.x use pdfminer3k module
# This link has useful information on components of the program
# https://euske.github.io/pdfminer/programming.html
# http://denis.papathanasiou.org/posts/2010.08.04.post.html
''' Important classes to remember
PDFParser - fetches data from pdf file
@superjamie
superjamie / raspberry-pi-vpn-router.md
Last active July 2, 2024 07:48
Raspberry Pi VPN Router

Raspberry Pi VPN Router

This is a quick-and-dirty guide to setting up a Raspberry Pi as a "router on a stick" to PrivateInternetAccess VPN.

Requirements

Install Raspbian Jessie (2016-05-27-raspbian-jessie.img) to your Pi's sdcard.

Use the Raspberry Pi Configuration tool or sudo raspi-config to:

@pylover
pylover / inspections.txt
Last active July 25, 2024 15:18 — forked from ar45/inspections.txt
PyCharm inspections
# Extracted using: $ unzip -p lib/pycharm.jar com/jetbrains/python/PyBundle.properties | grep -B1 INSP.NAME | grep '^#' | sed 's|Inspection||g' | sed -e 's|#\s\{,1\}|# noinspection |'
# noinspection PyPep8
# noinspection PyPep8Naming
# noinspection PyTypeChecker
# noinspection PyAbstractClass
# noinspection PyArgumentEqualDefault
# noinspection PyArgumentList
# noinspection PyAssignmentToLoopOrWithParameter
# noinspection PyAttributeOutsideInit
@andrearota
andrearota / example.scala
Created October 18, 2016 08:40
Creating Spark UDF with extra parameters via currying
// Problem: creating a Spark UDF that take extra parameter at invocation time.
// Solution: using currying
// http://stackoverflow.com/questions/35546576/how-can-i-pass-extra-parameters-to-udfs-in-sparksql
// We want to create hideTabooValues, a Spark UDF that set to -1 fields that contains any of given taboo values.
// E.g. forbiddenValues = [1, 2, 3]
// dataframe = [1, 2, 3, 4, 5, 6]
// dataframe.select(hideTabooValues(forbiddenValues)) :> [-1, -1, -1, 4, 5, 6]
//
// Implementing this in Spark, we find two major issues:
@thvitt
thvitt / register-jupyter-env
Last active March 31, 2023 16:26
Register a jupyter kernel for the current pyenv.
#!/bin/sh
if [ "$PYENV_VERSION" -ne "" ]
then
name=`pyenv version-name`
python=`pyenv which python`
else
name=`basename "$VIRTUAL_ENV"`
python="$VIRTUALENV/bin/python"
fi
@bborgesr
bborgesr / reset-fileInput-and-data.R
Created March 20, 2017 16:45
How to "reset" a fileInput widget and the underlying data (must treat these as two different things)
library(shiny)
library(shinyjs)
ui <- fluidPage(
useShinyjs(),
fileInput('inFile', 'Choose file'),
actionButton('reset', 'Reset'),
tableOutput('tbl')
)
@claudinei-daitx
claudinei-daitx / SparkSessionS3.scala
Created December 15, 2017 13:02
Create a Spark session optimized to work with Amazon S3.
import org.apache.spark.sql.SparkSession
object SparkSessionS3 {
//create a spark session with optimizations to work with Amazon S3.
def getSparkSession: SparkSession = {
val spark = SparkSession
.builder
.appName("my spark application name")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.hadoop.fs.s3a.access.key", "my access key")
@bgweber
bgweber / pandasUDF.py
Last active July 14, 2024 14:13
Distributing Feature Generation with Pandas UDFs
import featuretools as ft
from pyspark.sql.functions import pandas_udf, PandasUDFType
@pandas_udf(schema, PandasUDFType.GROUPED_MAP)
def apply_feature_generation(pandasInputDF):
# create Entity Set representation
es = ft.EntitySet(id="events")
es = es.entity_from_dataframe(entity_id="events", dataframe=pandasInputDF)
es = es.normalize_entity(base_entity_id="events", new_entity_id="users", index="user_id")