Skip to content

Instantly share code, notes, and snippets.

@woachk
Created September 27, 2021 22:13
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save woachk/d51856c8a8dd49d103de86cd75ca4b5e to your computer and use it in GitHub Desktop.
Save woachk/d51856c8a8dd49d103de86cd75ca4b5e to your computer and use it in GitHub Desktop.
clpeak w/ MoltenVK and clspv on M1
% ./clpeak
[mvk-info] MoltenVK version 1.1.5, supporting Vulkan version 1.1.189.
The following 72 Vulkan extensions are supported:
VK_KHR_16bit_storage v1
VK_KHR_8bit_storage v1
VK_KHR_bind_memory2 v1
VK_KHR_create_renderpass2 v1
VK_KHR_dedicated_allocation v3
VK_KHR_depth_stencil_resolve v1
VK_KHR_descriptor_update_template v1
VK_KHR_device_group v4
VK_KHR_device_group_creation v1
VK_KHR_driver_properties v1
VK_KHR_external_fence v1
VK_KHR_external_fence_capabilities v1
VK_KHR_external_memory v1
VK_KHR_external_memory_capabilities v1
VK_KHR_external_semaphore v1
VK_KHR_external_semaphore_capabilities v1
VK_KHR_get_memory_requirements2 v1
VK_KHR_get_physical_device_properties2 v2
VK_KHR_get_surface_capabilities2 v1
VK_KHR_imageless_framebuffer v1
VK_KHR_image_format_list v1
VK_KHR_maintenance1 v2
VK_KHR_maintenance2 v1
VK_KHR_maintenance3 v1
VK_KHR_multiview v1
VK_KHR_portability_subset v1
VK_KHR_push_descriptor v2
VK_KHR_relaxed_block_layout v1
VK_KHR_sampler_mirror_clamp_to_edge v3
VK_KHR_sampler_ycbcr_conversion v14
VK_KHR_shader_draw_parameters v1
VK_KHR_shader_float16_int8 v1
VK_KHR_shader_subgroup_extended_types v1
VK_KHR_storage_buffer_storage_class v1
VK_KHR_surface v25
VK_KHR_swapchain v70
VK_KHR_swapchain_mutable_format v1
VK_KHR_timeline_semaphore v2
VK_KHR_uniform_buffer_standard_layout v1
VK_KHR_variable_pointers v1
VK_EXT_debug_marker v4
VK_EXT_debug_report v10
VK_EXT_debug_utils v2
VK_EXT_descriptor_indexing v2
VK_EXT_fragment_shader_interlock v1
VK_EXT_hdr_metadata v2
VK_EXT_host_query_reset v1
VK_EXT_image_robustness v1
VK_EXT_inline_uniform_block v1
VK_EXT_memory_budget v1
VK_EXT_metal_surface v1
VK_EXT_post_depth_coverage v1
VK_EXT_private_data v1
VK_EXT_robustness2 v1
VK_EXT_scalar_block_layout v1
VK_EXT_shader_stencil_export v1
VK_EXT_shader_viewport_index_layer v1
VK_EXT_subgroup_size_control v2
VK_EXT_swapchain_colorspace v4
VK_EXT_texel_buffer_alignment v1
VK_EXT_texture_compression_astc_hdr v1
VK_EXT_vertex_attribute_divisor v3
VK_AMD_gpu_shader_half_float v2
VK_AMD_negative_viewport_height v1
VK_AMD_shader_image_load_store_lod v1
VK_AMD_shader_trinary_minmax v1
VK_IMG_format_pvrtc v1
VK_INTEL_shader_integer_functions2 v1
VK_GOOGLE_display_timing v1
VK_MVK_macos_surface v3
VK_MVK_moltenvk v32
VK_NV_glsl_shader v1
[mvk-info] GPU device:
model: Apple M1
type: Integrated
vendorID: 0x106b
deviceID: 0xa140
pipelineCacheUUID: 00002779-0400-03EF-0000-000000000000
supports the following Metal Versions, GPU's and Feature Sets:
Metal Shading Language 2.3
GPU Family Apple 7
GPU Family Apple 6
GPU Family Apple 5
GPU Family Apple 4
GPU Family Apple 3
GPU Family Apple 2
GPU Family Apple 1
GPU Family Mac 2
GPU Family Mac 1
GPU Family Common 3
GPU Family Common 2
GPU Family Common 1
macOS GPU Family 2 v1
macOS GPU Family 1 v4
macOS GPU Family 1 v3
macOS GPU Family 1 v2
macOS GPU Family 1 v1
[mvk-info] Created VkInstance for Vulkan version 1.0.0, as requested by app, with the following 2 Vulkan extensions enabled:
VK_KHR_get_physical_device_properties2 v2
VK_EXT_debug_report v10
[mvk-info] Using MTLEvent for Vulkan semaphores.
[mvk-info] Created VkDevice to run on GPU Apple M1 with the following 6 Vulkan extensions enabled:
VK_KHR_16bit_storage v1
VK_KHR_8bit_storage v1
VK_KHR_driver_properties v1
VK_KHR_shader_float16_int8 v1
VK_KHR_uniform_buffer_standard_layout v1
VK_KHR_variable_pointers v1
Platform: clvk
Device: Apple M1
Driver version : 1.2 CLVK on Vulkan v1.1.189 driver 10105 (Macintosh)
Compute units : 1
Clock frequency : 0 MHz
Global memory bandwidth (GBPS)
float : 55.63
float2 : 57.66
float4 : 57.50
float8 : 57.88
float16 : 62.41
Single-precision compute (GFLOPS)
float : 1267.18
float2 : 1502.12
float4 : 1525.60
float8 : 892.10
float16 : 1470.11
Half-precision compute (GFLOPS)
half : 1333.62
half2 : 1508.59
half4 : 1591.51
half8 : 1525.85
half16 : 1423.71
No double precision support! Skipped
Integer compute (GIOPS)
int : 472.07
int2 : 467.69
int4 : 469.41
int8 : 476.42
int16 : 461.98
Integer compute Fast 24bit (GIOPS)
int : 480.22
int2 : 478.74
int4 : 437.78
int8 : 474.67
int16 : 475.28
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 28.62
enqueueReadBuffer : 28.93
enqueueWriteBuffer non-blocking : 28.99
enqueueReadBuffer non-blocking : 28.97
enqueueMapBuffer(for read) : 550636.56
memcpy from mapped ptr : 28.75
enqueueUnmap(after write) : 727960.19
memcpy to mapped ptr : 28.96
Kernel launch latency : 6.92 us
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment