Skip to content

Instantly share code, notes, and snippets.

View rjzamora's full-sized avatar

Richard (Rick) Zamora rjzamora

View GitHub Profile
@rjzamora
rjzamora / multi-file-json.ipynb
Created May 30, 2024 18:24
Multi-file json read experiments
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / ray_shuffle.ipynb
Last active April 18, 2024 16:02
Simple shuffling example with `cudf` and `ray`
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / parquet_delayed_mapping.ipynb
Created December 20, 2023 18:01
Experimenting with simpler ``blocksize`` logic for ``read_parquet``
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / ray_exploration.ipynb
Created November 21, 2023 17:56
Exploring Ray and Dask on Ray with GPUs
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / dask_expr_dask-demo-day.ipynb
Last active May 18, 2023 14:58
Demo: Dask Expressions
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / backend_dispatch_demo.ipynb
Last active February 16, 2023 16:42
High-level demo on backend-configuration dispatching in Dask
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / remote_parquet_benchmark.py
Created March 21, 2022 18:15
Simple benchmark to measure the performance of fsspec.parquet.read_parquet_file for a single-column read.
import time
import argparse
try:
import cudf
except ImportError:
cudf = None
import pandas as pd
import numpy as np
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import importlib
import time
import dask.dataframe as dd
from dask.distributed import LocalCluster, Client
try:
from dask_cuda import LocalCUDACluster
except ImportError:
dask_cuda = None
try: