Skip to content

Instantly share code, notes, and snippets.

View jrhemstad's full-sized avatar
🏠
⬇️ 👢

Jake Hemstad jrhemstad

🏠
⬇️ 👢
View GitHub Profile
@jrhemstad
jrhemstad / nvtxpp.md
Last active August 18, 2020 21:26
C++ wrappers for NVTX memory annotations
namespace nvtx3{

enum class heap_kind{
   LINEAR,
   3D,
   ARRAY
};

struct heap_type{
namespace {
device_memory_resource* cuda_resource() {
static cuda_memory_resource resource{};
return &resource;
std::atomic<device_memory_resource*>& get_default() {
static std::atomic<device_memory_resource*> res{detail::cuda_resource()};
return res;
}
@jrhemstad
jrhemstad / string_columns.md
Last active August 27, 2019 16:17
Musings on STRING cudf::columns
  • String Column Factory

    • make_string_column(...)
    • What are the necessary inputs?
    • Do we need more than one factory for String columns for different inputs?
  • String column wrapper type

    • There should be a type, cudf::string_column that is a thin wrapper around cudf::column that encodes behavior unique to string columns, e.g., it abstracts which children are offsets vs characters, etc.
    • Example usage would be something like:
      unique_ptr<cudf::column> col = make_string_column(...);
      
@jrhemstad
jrhemstad / dispatcher_benchmark.md
Last active July 3, 2019 15:56
Description of type_dispatcher micro benchmark

We'd like to better understand if and when using libcudf's type_dispatcher adds overhead when used in both host and device code.

In order to test this, we'd like to create a set of micro-benchmarks to profile the performance of operating on a set of n gdf_column objects.

To keep the benchmark simple, the work of each kernel will be simply applying some in-place transformation functor to every element of every column. For example:

template <template typename<> UnaryFunctor>
benchmark(cudf::table input){
@jrhemstad
jrhemstad / ninja_instructions.md
Last active May 27, 2024 07:34
How to build with Ninja

How to Use Ninja

  1. Install Ninja
sudo apt install ninja-build
  1. Configure CMake to create Ninja build files
mkdir build &amp;&amp; cd build
namespace detail
{
inline memory_resource* default_resource(){
static memory_resource res();
return &res;
}
inline std::atomic<memory_resource*>& get_global()
{
@jrhemstad
jrhemstad / AGGREGATIONS.md
Last active May 1, 2019 23:01
Nomenclature for the different kinds of groupby aggregations

In order to facilitate conversation about the different kinds of aggregations that can be performed via groupby, it is important to be clear on the following terms:

Distributive

An aggregate function is distributive when it can be computed in a distributed manner. For example, assume that a data set is partitioned into n sets. The distributive aggregate function is applied to each set resulting in n aggregate values. If the aggregate value of the whole data set can now be computed by applying the same aggregate function to all the set aggregate values then the aggregate function is distributive. Examples of distributive aggregate functions are sum(), count(), min(), and max().

@jrhemstad
jrhemstad / TRANSITIONGUIDE.md
Last active July 11, 2019 18:49
Guide for transitioning a C API to C++ API

Overview

As libcudf transitions from a C API to C++ API, this is a list of guidance on what should be done for all new PRs against libcudf.

File Structure

In lieu of the monolithic functions.h, external function APIs should be grouped based on functionality into an appropriately titled header file cudf/cpp/include/. For example, cudf/cpp/include/copying.hpp contains the APIs for functions related to copying from one column to another. Note the .hpp file extension used to indicate a C++ header file.

@jrhemstad
jrhemstad / column.md
Last active July 23, 2019 08:37
Requirements and design for a better column abstraction in libcudf

Problem Statement:

We want to write an algorithm once and allow it to work on a variety of columns.

The kinds of columns include:

  • Elements are fixed-width types (int8, int32, float, double, date32, timestamp, etc.)
  • Elements are variable length (strings, …)
  • “Dictionary” columns
  • Elements in the column are indices into a dictionary
  • Elements are fixed-size structs