Skip to content

Instantly share code, notes, and snippets.

View DaisukeMiyamoto's full-sized avatar

Daisuke Miyamoto DaisukeMiyamoto

  • Tokyo, Japan
View GitHub Profile
@sean-smith
sean-smith / slurm-mps-prolog.md
Last active May 15, 2022 18:33
Start CUDA MPS Server on each node

👾 Slurm CUDA MPS Prolog

The following Slurm Prolog starts the CUDA MPS server on each compute node before the job is started.

cat << EOF > /opt/slurm/etc/prolog.sh
#!/bin/sh

# start mps
nvidia-cuda-mps-control -d
@biochem-fan
biochem-fan / NOTES.md
Last active May 26, 2024 05:36
Warp-RELION4-M Protocol
@sean-smith
sean-smith / hpcg.md
Last active May 8, 2019 23:12
AWS ParallelCluster + AWS Batch

AWS ParallelCluster + AWS Batch

Today I'm going to demonstrate running High Performance Conjucate Grandients (HPCG) in a containerized workload. This takes advantage of AWS ParallelCluster, AWS Batch, and OpenMPI.

First install aws-parallelcluster:

$ pip install aws-parallelcluster
---
AWSTemplateFormatVersion: '2010-09-09'
Description: Mobile App CICD Demo
Parameters:
DeviceFarmProjectName:
Type: String
Default: demo-app-devicefarm
@eshelman
eshelman / latency.txt
Last active May 7, 2024 17:49 — forked from jboner/latency.txt
HPC-oriented Latency Numbers Every Programmer Should Know
Latency Comparison Numbers
--------------------------
L1 cache reference/hit 1.5 ns 4 cycles
Floating-point add/mult/FMA operation 1.5 ns 4 cycles
L2 cache reference/hit 5 ns 12 ~ 17 cycles
Branch mispredict 6 ns 15 ~ 20 cycles
L3 cache hit (unshared cache line) 16 ns 42 cycles
L3 cache hit (shared line in another core) 25 ns 65 cycles
Mutex lock/unlock 25 ns
L3 cache hit (modified in another core) 29 ns 75 cycles
@mirakui
mirakui / Gemfile
Created July 19, 2010 09:44
AWS S3 read/write Benchmark
source :gemcutter
gem 'pit'
gem 'sauberia-aws-s3'