Muhammad Osama neoblizz

## tcolorbox.tex
% Pick a nice color
\usepackage{xcolor}
\definecolor{pastelpurple}{RGB}{221,199,255}

% For Alert Boxes
% \usepackage{awesomebox}
\usepackage[most]{tcolorbox}

% textmarker style from colorbox doc
\tcbset{textmarker/.style={%

## hipifyme.sh
find . -type f \( -iname \*.cu -o -iname \*.cuh -o -iname \*.cpp -o -iname \*.hpp -o -iname \*.hxx -o -iname *.cxx \) -exec hipify-perl -inplace -experimental -print-stats {} \;

## doxygen-install.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                neoblizz
                / doxygen-install.md
            
            
              Created
              November 21, 2022 18:41
            
              
                Install Latest Doxygen (UNIX)
              
          
    I got Doxygen version 1.9.6 (6bc5f864d0c847a74944d6e9e4a42346e8c18b28) to build using the following instructions on Ubuntu 18.04.
sudo apt-get install graphviz bison flex
git clone https://github.com/doxygen/doxygen.git
cd doxygen
mkdir build
cd build
cmake -G "Unix Makefiles" ..
# Default installation at /usr/local/bin/doxygen

  
## launch.cuh
#pragma once

// Includes CUDA
#include <cuda_runtime.h>
#include <cooperative_groups.h>

#include <utility>

namespace cg = cooperative_groups;

## SoL-SpMM-GEMM.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                neoblizz
                / SoL-SpMM-GEMM.md
            
            
              Last active
              March 26, 2022 02:01
            
              
                Simple Speed-of-Light Analysis of SpMM and GEMM
              
          
    Hardware Speed-of-Light Analysis

The following numbers are based on NVIDIA's Volta microarchitecture. To perform a similar analysis for a newer architecture, I recommend changing the numbers below based on device_query CUDA sample or wikipedia page.
CUDA Cores = SM * Cores per SM (SM = 80, Cores/SM = 64)
Maximum Clock Rate = Clock Rate (KHz) * 1e-6 (GHz)
Maximum Throughput (type == floats, doubles or half) =
    CUDA Cores * Maximum Clock Rate * Type Ratio (device properties) (GFLOP/s)

Maximum Memory Bandwidth = 


## sssp.cpp
#include <vector>
#include <algorithm>
#include <execution>
#include <mutex>
#include <utility>
#include <ranges>

struct frontier_t {
    // Underlying representation of frontier.
    std::vector<int> active_vertices;

## ignore_on_device.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                neoblizz
                / ignore_on_device.md
            
            
              Last active
              January 17, 2022 19:45
            
              
                Envisioning `__ignore__` support in NVCC with a simple example.
              
          
    How crazy is it to imagine a keyword (NVCC-supported), something like __ignore__, where if you use that in front of an expression (function, variable, object, etc.), it is ignored on the device side (in __device__ and __global__). This solves the issue where complicated containers that support host and device code, and their constructors/destructors that run on host code are all just ignored on device when they are passed as a member of larger class or struct. For example;
__global__ void kernel(foo_t foo) {
  auto idx = threadIdx.x;
  auto ptr = foo.get_ptr();
  ptr[idx] = idx;
}

  
## inheritance.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                neoblizz
                / inheritance.md
            
            
              Last active
              December 7, 2021 19:33
            
              
                Capturing conditional inheritance in CPP (with pixel shaders as a toy example)
              
          
    The Problem

We have a top-level object that the user wants to interact with, such as a pixel on the screen. But given the contents within that pixel, it may choose to color/shade it differently. If that pixel is representing a cloth, it may have a texture and color of a cloth, if it is representing metal, it may be shiny and metal-like... you get the point.
To represent this object in c++, we have number of options. The most obvious one is to have a function that colors (or applies some sort of texture) to the pixel, and has the different specializations for the materials/colors within that function.
Obvious approach

void apply_texture(pixel_t* p, texture_t t) {
  if(t == texture_t::cloth) {
    // apply cloth
 } else if (t == texture_t::skin) {

  
## sparsify.cu
#include <stdio.h>
#include <stdlib.h>
#include <ctime>
#include <random>

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/transform.h>

#include <thrust/iterator/counting_iterator.h>

## effective_modern_cmake.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                neoblizz
                / effective_modern_cmake.md
            
            
              Created
              February 9, 2021 12:15
                — forked from mbinna/effective_modern_cmake.md
            
              
                Effective Modern CMake
              
          
    Effective Modern CMake

Getting Started

For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.
After that, watch Mathieu Ropert’s CppCon 2017 talk Using Modern CMake Patterns to Enforce a Good Modular Design (slides). It provides a thorough explanation of what modern CMake is and why it is so much better than “old school” CMake. The modular design ideas in this talk are based on the book [Large-Scale C++ Software Design](https://www.amazon.de/Large-Scale-Soft
	% Pick a nice color
	\usepackage{xcolor}
	\definecolor{pastelpurple}{RGB}{221,199,255}

	% For Alert Boxes
	% \usepackage{awesomebox}
	\usepackage[most]{tcolorbox}

	% textmarker style from colorbox doc
	\tcbset{textmarker/.style={%
	#pragma once

	// Includes CUDA
	#include <cuda_runtime.h>
	#include <cooperative_groups.h>

	#include <utility>

	namespace cg = cooperative_groups;
	#include <vector>
	#include <algorithm>
	#include <execution>
	#include <mutex>
	#include <utility>
	#include <ranges>

	struct frontier_t {
	// Underlying representation of frontier.
	std::vector<int> active_vertices;
	#include <stdio.h>
	#include <stdlib.h>
	#include <ctime>
	#include <random>

	#include <thrust/device_vector.h>
	#include <thrust/host_vector.h>
	#include <thrust/transform.h>

	#include <thrust/iterator/counting_iterator.h>