Meng, Hengyu airMeng

## part_3_vectorization_techniques.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              7 stars
            
          
                mingfeima
                / part_3_vectorization_techniques.md
            
            
              Last active
              June 28, 2024 11:03
            
              
                PyTorch CPU Performance Optimization Tutorial - Section III
              
          
    Part III: Vectorization Techniques

(Training material on pytorch CPU performance optimization)

Part I: Memory Formats and Channels Last Optimization
Part II: Parallelization Techniques
Part IV: BFloat16 Kernel Optimization

Chinese version for this chapter, link.
This section contains the following subjects:

  
## pytorch_channels_last_perf_optimization.md

      
              1 file
            
          
              3 forks
            
          
              1 comment
            
          
              17 stars
            
          
                mingfeima
                / pytorch_channels_last_perf_optimization.md
            
            
              Last active
              September 1, 2023 03:02
            
              
                PyTorch Channels Last memory format perf optimization and oneDNN integration plan.
              
          
    PyTorch Channels Last Memory Format Performance Optimization on CPU Path

("mkldnn" has been renamed to "oneDNN", but exsiting PyTorch APIs still use "mkldnn", future work will align PyTorch user level APIs to "oneDNN")
Table of Contents


PyTorch Channels Last memory format introduction
oneDNN API for NHWC layout
Generic Channels Last memory format optimization with ATen native
oneDNN NHWC integration

NB: Memory format refers to data representation that describes how multidimensional arrays (nD) are stored in linear (1D) memory address space. Memory format has the same semantic with layout in oneDNN. Layout in PyTorch has other semantic ofdescribing dense or sparse with the attributes: 'torch.strided', 'torch.sparse_coo'.

  
## topk.md

      
              1 file
            
          
              0 forks
            
          
              2 comments
            
          
              1 star
            
          
                mingfeima
                / topk.md
            
            
              Last active
              July 2, 2019 02:43
            
              
                topk_optimization_backups
              
          
    backups for PR19736 of topk() performance optimization on CPU.

description

Suppose input tensor has shape of [N, C], performance input.topk(K, sorted=Sorted) for the followings scenarios:

C = 10000, 40000, 320000
K = 10, 50, 100, C/10, C/2, C-5
Test with 20 threads and 1 thread
Test with Sorted=True and Sorted=False