Skip to content

Instantly share code, notes, and snippets.

View nihui's full-sized avatar

nihui

  • Shanghai
  • 17:48 (UTC +08:00)
  • X @nihui
View GitHub Profile
@nihui
nihui / ncnn-res.md
Last active June 18, 2022 04:51
ncnn memory usage
#include <malloc.h>

// call malloc_trim(0);
static double get_memory_resident()
{
    unsigned long size;
    unsigned long resident;
    unsigned long share;
 unsigned long text;
@nihui
nihui / mul.cpp
Last active May 10, 2022 13:33
int8 vector multiplication in loongson mmi and mips msa
// g++ mul.cpp -o mul -mmsa -mloongson-mmi -O3
// https://github.com/Tencent/ncnn/blob/master/src/layer/mips/loongson_mmi.h
// root@ls2k:~/ncnn/build# ./quant
// mul_s8x8 385.743
// mul_s8x8_mmi 611.364
// mul_s8x8_msa 173.241
// -66 2 0 4 10 18 28 40
@nihui
nihui / avx512_transpose.cpp
Last active May 2, 2024 07:24
avx512 16x24 16x16 16x12 16x8 16x4 16x2 8x24 8x16 8x12 8x8 8x4 8x2 matrix transpose
// g++ -mfma -mf16c -mavx512f -mavx512vnni -mavx512vl
#include <immintrin.h>
#include <stdio.h>
static void print(const __m512& _x)
{
__attribute__((aligned(64)))
float a[16];
@nihui
nihui / testpt.cpp
Created January 27, 2022 07:08
libtorch inference test
#include <torch/script.h>
int main()
{
c10::InferenceMode guard(true);
torch::NoGradGuard no_grad;
torch::jit::Module mod2 = torch::jit::load("stylegan.pt");
mod2.eval();
@nihui
nihui / ncnn-c_api-customop.c
Created January 21, 2022 14:34
register custom layer with ncnn c api
#include <stdio.h>
// ncnn
#include <c_api.h>
/** test.param content
7767517
3 3
Input in0 0 1 input0
@nihui
nihui / github-ssh-push.md
Last active October 1, 2021 07:14
github-ssh-push.md
$ git push
remote: Support for password authentication was removed on August 13, 2021. Please use a personal access token instead.
remote: Please see https://github.blog/2020-12-15-token-authentication-requirements-for-git-operations/ for more information.
fatal: 'https://github.com/nihui/ncnn.git/' 鉴权失败
$ git remote -v
origin https://github.com/nihui/ncnn.git (fetch)
@nihui
nihui / simple-https-server.py
Last active February 4, 2023 16:53
simple https server
from http.server import HTTPServer, SimpleHTTPRequestHandler
import ssl, os
os.system("openssl req -nodes -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -subj '/CN=mylocalhost'")
port = 8888
httpd = HTTPServer(('0.0.0.0', port), SimpleHTTPRequestHandler)
httpd.socket = ssl.wrap_socket(httpd.socket, keyfile='key.pem', certfile="cert.pem", server_side=True)
print(f"Server running on https://0.0.0.0:{port}")
httpd.serve_forever()
@nihui
nihui / github-pr.md
Last active August 23, 2023 04:11
github-pr.md

Detail behind NCNN's factory pattern

NCNN adopts the factory pattern to create the layers of a nueral network. It's also the way the well-known library Caffe takes. It differs from Caffe in the implementation of the registry table. On one hand, the Caffe registry is populated in runtime as the side effect of initializion of global variable (which is a popular way for library initialization). On the other hand, the NCNN registry is determined in compile time. The registry is generated in a brilliant way using CMake instead of a hand-crafted table. NCNN's approach provides several benefits compared to Caffe's approach.

First, it's suitable for building a static library. When building a static library, the linker will strip any unused global variable to minimize the size of the library. This makes sense but it also strips the global variable which need to be inintialized to insert te layer creator into the registry. Tricky linker flags and related instrutions are required to resolve this issue. By creating

@nihui
nihui / M1.GPU.md
Created November 22, 2020 01:42 — forked from BlueCocoa/M1.GPU.md
Model Image Size Target Size Block Size Total Time(sec) GPU Memory(MB)
models-cunet 200x200 400x400 400/200/100 0.93/0.30/0.33 615/615/173
models-cunet 400x400 800x800 400/200/100 0.78/0.71/0.78 2408/615/174
models-cunet 1000x1000 2000x2000 400/200/100 3.16/3.21/3.53 2416/618/175
models-cunet 2000x2000 4000x4000 400/200/100 11.40/11.98/13.86 2420/669/193
models-cunet 4000x4000 8000x8000 400/200/100 44.33/47.15/54.76 2452/644/197
models-upconv_7_anime_style_art_rgb 200x200 400x400 400/200/100 0.16/0.16/0.15 459/459/119
models-upconv_7_anime_style_art_rgb 400x400 800x800 400/200/100 0.43/0.37/0.37 1741/460/119
models-upconv_7_anime_style_art_rgb 1000x1000 2000x2000 400/200/100 1.62/1.59/1.67 1764/462/120