Artem Belevich Artem-B

## gist:9d59658f4c64940c2da4d59fd14096f4
sccache stats: N/A No new compilation requests
+ PRESET=thrust-cpp20
+ test_preset Thrust thrust-cpp20
+ local BUILD_NAME=Thrust
+ local PRESET=thrust-cpp20
+ pushd ..
+ ctest --preset=thrust-cpp20
Test project /usr/local/google/home/tra/work/cccl/build/thrust-cpp20
        Start   1: thrust.cpp.cuda.cpp20.test.adjacent_difference
  1/362 Test   #1: thrust.cpp.cuda.cpp20.test.adjacent_difference ...........................***Failed    0.90 sec

## generate_texture_tests.py

from itertools import product
from string import Template
from itertools import product, count
from string import Template
from absl import app
from typing import Sequence
types = [
    "char", "signed char", "char1", "char2", "char4", "unsigned char", "uchar1",
    "uchar2", "uchar4", "short", "short1", "short2", "short4", "ushort",

## ConvFwd_Add_Add_ReluFwd_eng15_k5=1_k6=0_k7=1_k10=1.log
Do cudnn execution plan with plan tag: ConvFwd_Add_Add_ReluFwd_eng15_k5=1_k6=0_k7=1_k10=1
Workspace size in bytes: 409856
VariantPack: CUDNN_BACKEND_VARIANT_PACK_DESCRIPTOR : has 5 data pointers
I0509 11:27:06.030540 1504356 cuda_dnn.cc:4319]
Tensor_x: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_FLOAT Id: 120 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,64,7,9 ] Str [ 4032,63,9,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
Tensor_y: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_FLOAT Id: 121 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,64,3,4 ] Str [ 768,12,4,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
Tensor_z: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_FLOAT Id: 122 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,64,3,4 ] Str [ 768,12,4,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
Tensor_w: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_

## ConvFwd_Add_Add_eng15_k5=1_k6=0_k7=1_k10=1.log
1687-Tag: ConvFwd_Add_Add_
1688-
1689-[cudnn_frontend] CUDNN_BACKEND_ENGINE_DESCRIPTOR : ID: 15 Has 4 knobs
1690-[cudnn_frontend] CUDNN_BACKEND_ENGINECFG_DESCRIPTOR : Number of knobs: 4
1691:[cudnn_frontend] CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR : ConvFwd_Add_Add_eng15_k5=1_k6=0_k7=1_k10=1, numeric_notes:[CUDNN_NUMERICAL_NOTE_WINOGRAD,CUDNN_NUMERICAL_NOTE_WINOGRAD_TILE_4x4,] behavior_notes:[] workSpaceSize: 895504
1692-[cudnn_frontend] CUDNN_BACKEND_VARIANT_PACK_DESCRIPTOR : has 5 data pointers
1693-[cudnn_frontend] CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_HALF Id: 120 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,96,44,60 ] Str [ 253440,2640,60,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
1694-[cudnn_frontend] CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_HALF Id: 121 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,32,44,60 ] Str [ 84480,2640,60,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
1695-

## check_crash.sh
#! /usr/bin/bash

GOOD=0
BAD=0
N=100

for ((i=0; i<N; i++)); do
    glxinfo -B > /dev/null 2>/dev/null |:
    if [[ ${PIPESTATUS[0]} = 0 ]]; then
        ((GOOD++))

## query wrap test
test

## abi_break_workaround.cc
// ABI compatibility shims for  CUDA-11.7.
// Patch affected libraries with:
// objcopy \
// --redefine-sym cudaCreateTextureObject=cudaCreateTextureObject_v115 \
// --redefine-sym cudaGetTextureObjectTextureDesc=cudaGetTextureObjectTextureDesc_v115 \
// --redefine-sym cublasGetVersion_v2=cublasGetVersion_v2_v115 \
// --redefine-sym cublasLtGetVersion=cublasLtGetVersion_v115 \
// libnvinfer_static.a libcudnn_static.a
//

## cudacreatetexture.cc
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

#include "cuda_runtime.h"
#include "helper_cuda.h" // from cuda_samples

// Copy of texture descriptor from CUDA-11.7, so we can build the sample
// in a way that simulates compilation with an older CUDA version.

## stutter.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                Artem-B
                / stutter.md
            
            
              Last active
              December 11, 2021 07:55
            
              
                Windows audio/video stutter.md
              
          
    Now and then video/sound stutters for about a second or two. I do have Hyper-V enabled and do use WSL2 (and docker, configured to use it), but in my case the issue does seem to happen when none of WSL2 VMs are running, so it's possible that it may be more of a Hyper-V issue than WSL2 itself.
I've managed to capture it with ETW trace. As far as I can tell, during this time, everything stalls for about 50ms, then resumes for 10ms and this cycle continues.

Absolutely no events get captured during the quiet periods. Here's one example:


## opt-O3-divergence-reproducer.ll
; Reproducer for a bad performance regression triggered by switch to the new PM.
; `barney` ended up with the local variables not being optimized away and that
; had rather dramatic effect on some GPU code. See
; https://bugs.llvm.org/show_bug.cgi?id=52037 for the gory details.
;
; NOTE that opt -O3 produces different IR.
;
; RUN: opt -mtriple=nvptx64-nvidia-cuda -passes='default<O3>' -S %s -o - \
; RUN:  | llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3  -o - \
; RUN:  | FileCheck %s
	sccache stats: N/A No new compilation requests
	+ PRESET=thrust-cpp20
	+ test_preset Thrust thrust-cpp20
	+ local BUILD_NAME=Thrust
	+ local PRESET=thrust-cpp20
	+ pushd ..
	+ ctest --preset=thrust-cpp20
	Test project /usr/local/google/home/tra/work/cccl/build/thrust-cpp20
	Start 1: thrust.cpp.cuda.cpp20.test.adjacent_difference
	1/362 Test #1: thrust.cpp.cuda.cpp20.test.adjacent_difference ...........................***Failed 0.90 sec

	from itertools import product
	from string import Template
	from itertools import product, count
	from string import Template
	from absl import app
	from typing import Sequence
	types = [
	"char", "signed char", "char1", "char2", "char4", "unsigned char", "uchar1",
	"uchar2", "uchar4", "short", "short1", "short2", "short4", "ushort",
	Do cudnn execution plan with plan tag: ConvFwd_Add_Add_ReluFwd_eng15_k5=1_k6=0_k7=1_k10=1
	Workspace size in bytes: 409856
	VariantPack: CUDNN_BACKEND_VARIANT_PACK_DESCRIPTOR : has 5 data pointers
	I0509 11:27:06.030540 1504356 cuda_dnn.cc:4319]
	Tensor_x: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_FLOAT Id: 120 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,64,7,9 ] Str [ 4032,63,9,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
	Tensor_y: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_FLOAT Id: 121 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,64,3,4 ] Str [ 768,12,4,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
	Tensor_z: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_FLOAT Id: 122 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,64,3,4 ] Str [ 768,12,4,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
	Tensor_w: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_
	1687-Tag: ConvFwd_Add_Add_
	1688-
	1689-[cudnn_frontend] CUDNN_BACKEND_ENGINE_DESCRIPTOR : ID: 15 Has 4 knobs
	1690-[cudnn_frontend] CUDNN_BACKEND_ENGINECFG_DESCRIPTOR : Number of knobs: 4
	1691:[cudnn_frontend] CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR : ConvFwd_Add_Add_eng15_k5=1_k6=0_k7=1_k10=1, numeric_notes:[CUDNN_NUMERICAL_NOTE_WINOGRAD,CUDNN_NUMERICAL_NOTE_WINOGRAD_TILE_4x4,] behavior_notes:[] workSpaceSize: 895504
	1692-[cudnn_frontend] CUDNN_BACKEND_VARIANT_PACK_DESCRIPTOR : has 5 data pointers
	1693-[cudnn_frontend] CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_HALF Id: 120 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,96,44,60 ] Str [ 253440,2640,60,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
	1694-[cudnn_frontend] CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: CUDNN_DATA_HALF Id: 121 Alignment: 32 nDims 4 VectorCount: 1 vectorDimension -1 Dim [ 1,32,44,60 ] Str [ 84480,2640,60,1 ] isVirtual: 0 isByValue: 0 reorder_type: CUDNN_TENSOR_REORDERING_NONE
	1695-
	#! /usr/bin/bash

	GOOD=0
	BAD=0
	N=100

	for ((i=0; i<N; i++)); do
	glxinfo -B > /dev/null 2>/dev/null \|:
	if [[ ${PIPESTATUS[0]} = 0 ]]; then
	((GOOD++))
	// ABI compatibility shims for CUDA-11.7.
	// Patch affected libraries with:
	// objcopy \
	// --redefine-sym cudaCreateTextureObject=cudaCreateTextureObject_v115 \
	// --redefine-sym cudaGetTextureObjectTextureDesc=cudaGetTextureObjectTextureDesc_v115 \
	// --redefine-sym cublasGetVersion_v2=cublasGetVersion_v2_v115 \
	// --redefine-sym cublasLtGetVersion=cublasLtGetVersion_v115 \
	// libnvinfer_static.a libcudnn_static.a
	//
	#include <stdlib.h>
	#include <stdio.h>
	#include <string.h>
	#include <math.h>

	#include "cuda_runtime.h"
	#include "helper_cuda.h" // from cuda_samples

	// Copy of texture descriptor from CUDA-11.7, so we can build the sample
	// in a way that simulates compilation with an older CUDA version.
	; Reproducer for a bad performance regression triggered by switch to the new PM.
	; `barney` ended up with the local variables not being optimized away and that
	; had rather dramatic effect on some GPU code. See
	; https://bugs.llvm.org/show_bug.cgi?id=52037 for the gory details.
	;
	; NOTE that opt -O3 produces different IR.
	;
	; RUN: opt -mtriple=nvptx64-nvidia-cuda -passes='default<O3>' -S %s -o - \
	; RUN: \| llc -mtriple=nvptx64-nvidia-cuda -mcpu=sm_70 -O3 -o - \
	; RUN: \| FileCheck %s