Skip to content

Instantly share code, notes, and snippets.

View BryanCutler's full-sized avatar

Bryan Cutler BryanCutler

View GitHub Profile
@BryanCutler
BryanCutler / start_jupyter_pyspark.sh
Last active July 29, 2022 01:06
How to start a Jupyter Notebook with PySpark Kernel
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
@BryanCutler
BryanCutler / PySpark_createDataFrame_with_Arrow.ipynb
Last active September 16, 2020 02:30
How to create a Spark DataFrame from Pandas or NumPy with Arrow
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@BryanCutler
BryanCutler / PySpark_Vectorized_UDFs.ipynb
Last active February 17, 2022 13:57
PySpark vectorized UDFs with Arrow
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@BryanCutler
BryanCutler / PySpark_to_Pandas_with_Arrow.ipynb
Last active January 24, 2019 11:12
Spark to Pandas Conversion with Arrow Example
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@BryanCutler
BryanCutler / pandas_rdd.py
Last active March 14, 2018 05:47
Vectorized UDFs in Python SPARK-21190
class DataFrame(object):
...
def asPandas(self):
return ArrowDataFrame(self)
class ArrowDataFrame(object):
"""
Wraps a Python DataFrame to group/winow then apply using``pandas.DataFrame``
"""
import io.netty.buffer.ArrowBuf;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.file.ArrowWriter;
import org.apache.arrow.vector.schema.ArrowFieldNode;
import org.apache.arrow.vector.schema.ArrowRecordBatch;
import org.apache.arrow.vector.types.pojo.Field;