Skip to content

Instantly share code, notes, and snippets.

View rjzamora's full-sized avatar

Richard (Rick) Zamora rjzamora

View GitHub Profile
@rjzamora
rjzamora / rearrange_cudf_test.ipynb
Last active July 30, 2019 21:01
Comparing behavior of cudf and pandas rearrange_by_divisions behavior in Dask
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / Categorical_Debugging.ipynb
Created September 25, 2019 14:15
Attempt to reproduce behavior described in cudf#2850 (instead reproduces behavior described in cudf#2862)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / dask_cudf_merge_benchmark.ipynb
Last active December 9, 2019 22:29
dask_cudf + UCX merge benchmark
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file has been truncated, but you can view the full file.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
@rjzamora
rjzamora / CPU_caching_experiment.ipynb
Created April 25, 2020 16:46
Experiments for caching cudf DataFrame objects that will be re-read many times
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / nvtabular-dask-criteo-tutorial.ipynb
Last active October 11, 2020 17:16
NVTabular-0.2-Dask-Blog-Tutorial
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from dask.distributed import Client, LocalCluster, wait
import dask.dataframe as dd
from dask.datasets import timeseries
import glob
import time
import argparse
import numpy as np
def run(args):