-
-
Save imaginary-person/4b4fda660534f0493bf9573d511a878d to your computer and use it in GitHub Desktop.
AVX512 support in ATen (#56992)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Please refer to the comment below |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
AVX512 support in ATen
Files in
aten/src/ATen/cpu/vec
vec/functional.h
is common for bothAVX2
&AVX512
, and so arevec/functional_base.h
andvec/bfloat16_functional.h
.So, the latter two files were moved to
aten/src/ATen/cpu/vec
.First, the files in
vec/vec256
were modified to remove code pertaining toCPU_CAPABILITY_AVX
.These files are -
vec/vec256/vec256.h
vec/vec256/vec256_base.h
vec/vec256/vec256_bfloat16.h
vec/vec256/vec256_complex_double.h
vec/vec256/vec256_complex_float.h
vec/vec256/vec256_double.h
vec/vec256/vec256_float.h
vec/vec256/vec256_int.h
vec/vec256/vec256_qint.h
vec/vec256/intrinsics.h
Then, their counterpart files in
vec/vec512
were created -vec/vec512.h
vec/vec512/vec512_base.h
vec/vec512/vec512_bfloat16.h
vec/vec512/vec512_complex_double.h
vec/vec512'vec512_complex_float.h
vec/vec512/vec512_double.h
vec/vec512/vec512_float.h
vec/vec512/vec512_int.h
vec/vec512/vec512_qint.h
vec/vec512/intrinsics.h
For reviewing the
vec/vec512
files, I believe using a 2 column GUI diff tool would help,as the files in
vec/vec512
are based on the files invec/vec256
.The intrinsics are not exactly same, as AVX512 has some instrinsics whose counterparts aren't present in AVX2, and vice-versa, but most of them are similar.
Files in aten/src/ATen/test/
aten/src/ATen/test/vec_test_all_types.cpp
was modified because one test had been hardcoded for AVX2.aten/src/ATen/test/vec_test_all_types.h
was modified to removeCPU_CAPABILITY_AVX
, addCPU_CAPABILITY_AVX512
,and change the alignment size for
CPU_CAPABILITY_AVX512
.Files in aten/src/ATen/cpu/
CMake files
cmake/Codegen.cmake
was modified to build AVX512 kernels, and to use 32 ymm registers with AVX2 whenAVX512VL
is supported.caffe2/CMakeLists.txt
was modified to ensure thatSIGILL
s don't happen.cmake/Modules/FindAVX.cmake
was modified to add checks forAVX512
support.aten/src/ATen/CMakeLists.txt
was modified to addvec/vec512/*.h
to a GLOB.setup.py
was also modified to addvec/vec512/*.h
, as directed byaten/src/ATen/CMakeLists.txt
.Description about
ATEN_AVX512_256
environment variable was added. When TRUE, the build process compiles AVX2 kernelswith 32 ymm registers instead of the default 16.
aten.bzl
was modified to remove AVX. I haven't added AVX512 support to the Bazel build yet.Files in
aten/src/ATen/
aten/src/ATen/Version.cpp
was modified to add the CPU capability version string forAVX512
, and remove that ofAVX
.Files in
aten/src/ATen/native/
&aten/src/ATen/native/cpu
aten/src/ATen/native/SegmentReduce.cpp
- AVX dispatch was removed, and AVX512 dispatch was added.aten/src/ATen/native/mkl/SpectralOps.cpp
- AVX dispatch was removed, and AVX512 dispatch was added.aten/src/ATen/native/cpu/SumKernel.cpp
- AVX512 dispatch was removed forsum
kernel.aten/src/ATen/native/cpu/SoftMaxKernel.cpp
- ATORCH_CHECK
was changed because it was hardcoded for AVX2.aten/src/ATen/native/cpu/ReduceOpsKernel.cpp
- AVX512 dispatch was removed fornansum
.aten/src/ATen/native/cpu/Reduce.h
- Values hardcoded for AVX2 were conditionally harcoded for AVX512 as well.aten/src/ATen/native/cpu/README.md
was modified.Additional files, modifying whom was necessary in the PR -
test/quantization/bc/test_backward_compatibility.py
-test_lstm
was skipped for machines with AVX512_VNNI support.torch/testing/_internal/common_utils.py
-IS_AVX512_VNNI_SUPPORTED
was added to skip the test mentioned in the file above.test/cpp/api/dispatch.cpp
- Tests CPU Capability dispatchThe files listed above should be in a single PR, but the following files can also be used to create 3 separate PRs
aten/src/ATen/native/cpu/avx_mathfun.h
&aten/src/ATen/native/cpu/DistributionTemplates.h
-avx_mathfun.h
was only being used withAVX2 for
normal_fill
. I removed the correspondingSSE
code. The corresponding AVX512 version exposes flakiness in some tests, soCPU_CAPABILITY_AVX512
would also use AVX2 code.aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp
- AVX512 support was added for various kernels..github/scripts/generate_ci_workflows.py
&.github/workflows/pytorch-linux-bionic-py3.8-gcc9-coverage.yml
- These weremodified/create to create a
pytorch-linux-bionic-py3.8-gcc9-coverage
workflow. However, it'd have to be removed from CircleCI aswell.