Skip to content

Instantly share code, notes, and snippets.

View joyxu's full-sized avatar

Joy Xu joyxu

  • Hisilicon
View GitHub Profile
@joyxu
joyxu / sve2.md
Created March 7, 2023 06:01 — forked from zingaburga/sve2.md
ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

Scalable Vector Extensions (SVE) is ARM’s latest SIMD extension to their instruction set, which was announced back in 2016. A follow-up SVE2 extension was announced in 2019, designed to incorporate all functionality from ARM’s current primary SIMD extension, NEON (aka ASIMD).

Despite being announced 5 years ago, there is currently no generally available CPU which supports any form of SVE (which excludes the [Fugaku supercomputer](https://www.fujitsu.com/global/about/innovation/

# IDA (disassembler) and Hex-Rays (decompiler) plugin for Apple AMX
#
# WIP research. (This was edited to add more info after someone posted it to
# Hacker News. Click "Revisions" to see full changes.)
#
# Copyright (c) 2020 dougallj
# Based on Python port of VMX intrinsics plugin:
# Copyright (c) 2019 w4kfu - Synacktiv
@joyxu
joyxu / drm-prime-dumb-kms.c
Created July 29, 2022 08:58 — forked from Miouyouyou/drm-prime-dumb-kms.c
Simple example showing how to use DRM to : allocate a Dumb buffer on the GPU, use it as a framebuffer, use this CRTC on the currently connected screen (expecting 1 connected screen), export the buffer, reimport it implicitly with mmap and write in it.
// This will works on Embedded GPU that implements .gem_prime_mmap like Rockchip ones.
// This will fail on most DRM drivers for GPU with dedicated memory as they tend to NOT implement .gem_prime_mmap.
#include <stdio.h>
#include <libdrm/drm.h>
#include <stdint.h>
#include <sys/mman.h>
#include <string.h>
@joyxu
joyxu / GPUOptimizationForGameDev.md
Created April 12, 2022 06:11 — forked from silvesthu/GPUOptimizationForGameDev.md
GPU Optimization for GameDev
@joyxu
joyxu / thread-local-storage-benchmark.diff
Created July 30, 2021 03:45 — forked from tim-janik/thread-local-storage-benchmark.diff
TLS - Thread Local Storage Benchmark Patch against Rapicorn
diff --git rcore/tests/multitest.cc rcore/tests/multitest.cc
index e8bed7a..0b985fd 100644
--- rcore/tests/multitest.cc
+++ rcore/tests/multitest.cc
@@ -15,6 +15,125 @@ using namespace Rapicorn;
#error RAPICORN_CHECK_VERSION() implementation is broken
#endif
+#define VOLATILE volatile
+
What Is OpenGL?
OpenGL is a Graphics rendering API which is operating system independent, window system independent and has high-quality color images composed of geometric and image primitives.
OpenGL APIs can use following …
Gl
OpenGL API implementation (http://www.opengl.org)
Glu
OpenGL Utility
Glut – GLUT (OpenGL Utility Toolkit) – Glut is portable windowing API and it is not officially part of OpenGL.
OpenGL Utility Toolkit (http://www.opengl.org/resources/libraries/glut/)
@joyxu
joyxu / VFIO.md
Created May 6, 2021 06:15 — forked from k-amin07/VFIO.md
VFIO Guide for GPU Passthrough

Introduction:

This guide is for achieving PCI-Passthrough with Intel 7700k and AMD RX 580. My host OS is Manjaro KDE edition, and guest is Windows 10.

Hardware:

Device Type Device
CPU Intel Core i7-7700K
Motherboard ASUS Prime Z270P
RAM Corsair Vengeance (DDR4 3000 MHz)
GPU (Host) Intel HD Graphics
@joyxu
joyxu / aarch64qemu.md
Created December 21, 2020 02:48 — forked from ecliptik/aarch64qemu.md
Ubuntu 14.04 arm64 Port QEMU Configuration

Setting up a Ubuntu 14.04 or Debian 8 (jessie) arm64 VM

This is mainly a notes dump and should be used for reference. This guide assumes:

  • Ubuntu 14.04 (or Debian 8) hypervisor/host with bridge networking
  • Knowledge of qemu
  • Knowledge of debootstrap

Limitations of the qemu-system-aarch64 emulator on x86 include only being able to emulate one CPU and no KVM support.

@joyxu
joyxu / notes.md
Created December 3, 2020 02:15 — forked from congto/notes.md
Linux Performance Tuning

1.1 Linux process management

  • process scheduling
  • interrupt handling
  • signaling
  • process prioritization
  • process switching
  • process state
  • process memory
@joyxu
joyxu / latency.txt
Created October 20, 2020 04:02 — forked from jboner/latency.txt
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD