ubombi/clinfo

## clinfo
Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.0 AMD-APP.internal (2814.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_object_metadata cl_amd_event_callback


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 Vega 10 XTX [Radeon Vega Frontier Edition]
  Device Topology:				 PCI[ B#9, D#0, F#0 ]
  Max compute units:				 64
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 1600Mhz
  Address bits:					 64
  Max memory allocation:			 14588628172
  Image support:				 No
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 16384
  Global memory size:				 17163091968
  Constant buffer size:				 14588628172
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 65536
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 1703726284
  Max global variable size:			 14588628172
  Max global variable preferred total size:	 17163091968
  Max read/write image args:			 0
  Max on device events:				 1024
  Queue on device max size:			 8388608
  Max on device queues:				 1
  Queue on device preferred size:		 262144
  SVM capabilities:
    Coarse grain buffer:			 Yes
    Fine grain buffer:				 Yes
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 64
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 0x7fb8f190e4d0
  Name:						 gfx900
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0
  Driver version:				 2814.0 (HSA1.1,LC)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.2
  Extensions:					 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

## hipconfig
HIP version  : 1.5.0

== hipconfig
HIP_PATH     : /opt/rocm
HIP_PLATFORM : hcc
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__=   -I/opt/rocm/include -I/opt/rocm/hcc/include

== hcc
HSA_PATH     : /opt/rocm/hsa
HCC_HOME     : /opt/rocm/hcc
HCC clang version 9.0.0 (https://github.com/RadeonOpenCompute/hcc-clang-upgrade.git c792478f19beee13540053f188094898a008d245) (https://github.com/RadeonOpenCompute/llvm.git 68584f0b7bc07d43af64f90b3726988b5a513bf9) (based on HCC 1.3.19092-1dcecffc-c792478f19-68584f0b7bc )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
  LLVM version 9.0.0svn
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags :  -hc -std=c++amp -I/opt/rocm/hcc/include -I/opt/rocm/includeHCC-ldflags  :  -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

=== Environment Variables
PATH=/opt/google-cloud-sdk/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/opt/cuda/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/home/ubombi/esp/xtensa-esp32-elf/bin/:/home/ubombi/esp/xtensa-lx106-elf/bin/

== Linux Kernel
Hostname     : shreder
Linux shreder 5.0.0-mainline #1 SMP PREEMPT Wed Mar 6 21:12:30 EET 2019 x86_64 GNU/Linux
LSB Version:	n/a
Distributor ID:	ManjaroLinux
Description:	Manjaro Linux
Release:	18.0.3
Codename:	Illyria

## rocminfo
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 7 2700X Eight-Core Processor
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Frequency (MHz):3700
  BDFID:                   0(0x0)
  Compute Unit:            16(0x10)
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16418992(0xfa88b0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Acessible by all:        TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16418992(0xfa88b0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Acessible by all:        TRUE
  ISA Info:
    N/A
*******
Agent 2
*******
  Name:                    gfx900
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          4096(0x1000)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
  Chip ID:                 26723(0x6863)
  Cacheline Size:          64(0x40)
  Max Clock Frequency (MHz):1600
  BDFID:                   2304(0x900)
  Compute Unit:            64(0x40)
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      FALSE
  Wavefront Size:          64(0x40)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Waves Per CU:            40(0x28)
  Max Work-item Per CU:    2560(0xa00)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max number Of fbarriers Per Workgroup:32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16760832(0xffc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Acessible by all:        FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Acessible by all:        FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx900
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

## test.py
import tensorflow as tf
with tf.device('/gpu:0'):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))  # same ihipException here
print(sess.run(c))
	Number of platforms: 1
	Platform Profile: FULL_PROFILE
	Platform Version: OpenCL 2.0 AMD-APP.internal (2814.0)
	Platform Name: AMD Accelerated Parallel Processing
	Platform Vendor: Advanced Micro Devices, Inc.
	Platform Extensions: cl_khr_icd cl_amd_object_metadata cl_amd_event_callback


	Platform Name: AMD Accelerated Parallel Processing
	Number of devices: 1
	Device Type: CL_DEVICE_TYPE_GPU
	Vendor ID: 1002h
	Board name: Vega 10 XTX [Radeon Vega Frontier Edition]
	Device Topology: PCI[ B#9, D#0, F#0 ]
	Max compute units: 64
	Max work items dimensions: 3
	Max work items[0]: 1024
	Max work items[1]: 1024
	Max work items[2]: 1024
	Max work group size: 256
	Preferred vector width char: 4
	Preferred vector width short: 2
	Preferred vector width int: 1
	Preferred vector width long: 1
	Preferred vector width float: 1
	Preferred vector width double: 1
	Native vector width char: 4
	Native vector width short: 2
	Native vector width int: 1
	Native vector width long: 1
	Native vector width float: 1
	Native vector width double: 1
	Max clock frequency: 1600Mhz
	Address bits: 64
	Max memory allocation: 14588628172
	Image support: No
	Max size of kernel argument: 1024
	Alignment (bits) of base address: 1024
	Minimum alignment (bytes) for any datatype: 128
	Single precision floating point capability
	Denorms: Yes
	Quiet NaNs: Yes
	Round to nearest even: Yes
	Round to zero: Yes
	Round to +ve and infinity: Yes
	IEEE754-2008 fused multiply-add: Yes
	Cache type: Read/Write
	Cache line size: 64
	Cache size: 16384
	Global memory size: 17163091968
	Constant buffer size: 14588628172
	Max number of constant args: 8
	Local memory type: Scratchpad
	Local memory size: 65536
	Max pipe arguments: 16
	Max pipe active reservations: 16
	Max pipe packet size: 1703726284
	Max global variable size: 14588628172
	Max global variable preferred total size: 17163091968
	Max read/write image args: 0
	Max on device events: 1024
	Queue on device max size: 8388608
	Max on device queues: 1
	Queue on device preferred size: 262144
	SVM capabilities:
	Coarse grain buffer: Yes
	Fine grain buffer: Yes
	Fine grain system: No
	Atomics: No
	Preferred platform atomic alignment: 0
	Preferred global atomic alignment: 0
	Preferred local atomic alignment: 0
	Kernel Preferred work group size multiple: 64
	Error correction support: 0
	Unified memory for Host and Device: 0
	Profiling timer resolution: 1
	Device endianess: Little
	Available: Yes
	Compiler available: Yes
	Execution capabilities:
	Execute OpenCL kernels: Yes
	Execute native function: No
	Queue on Host properties:
	Out-of-Order: No
	Profiling : Yes
	Queue on Device properties:
	Out-of-Order: Yes
	Profiling : Yes
	Platform ID: 0x7fb8f190e4d0
	Name: gfx900
	Vendor: Advanced Micro Devices, Inc.
	Device OpenCL C version: OpenCL C 2.0
	Driver version: 2814.0 (HSA1.1,LC)
	Profile: FULL_PROFILE
	Version: OpenCL 1.2
	Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
	HIP version : 1.5.0

	== hipconfig
	HIP_PATH : /opt/rocm
	HIP_PLATFORM : hcc
	CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -I/opt/rocm/include -I/opt/rocm/hcc/include

	== hcc
	HSA_PATH : /opt/rocm/hsa
	HCC_HOME : /opt/rocm/hcc
	HCC clang version 9.0.0 (https://github.com/RadeonOpenCompute/hcc-clang-upgrade.git c792478f19beee13540053f188094898a008d245) (https://github.com/RadeonOpenCompute/llvm.git 68584f0b7bc07d43af64f90b3726988b5a513bf9) (based on HCC 1.3.19092-1dcecffc-c792478f19-68584f0b7bc )
	Target: x86_64-unknown-linux-gnu
	Thread model: posix
	InstalledDir: /opt/rocm/hcc/bin
	LLVM (http://llvm.org/):
	LLVM version 9.0.0svn
	Optimized build.
	Default target: x86_64-unknown-linux-gnu
	Host CPU: znver1

	Registered Targets:
	amdgcn - AMD GCN GPUs
	r600 - AMD GPUs HD2XXX-HD6XXX
	x86 - 32-bit X86: Pentium-Pro and above
	x86-64 - 64-bit X86: EM64T and AMD64
	HCC-cxxflags : -hc -std=c++amp -I/opt/rocm/hcc/include -I/opt/rocm/includeHCC-ldflags : -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

	=== Environment Variables
	PATH=/opt/google-cloud-sdk/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/opt/cuda/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/home/ubombi/esp/xtensa-esp32-elf/bin/:/home/ubombi/esp/xtensa-lx106-elf/bin/

	== Linux Kernel
	Hostname : shreder
	Linux shreder 5.0.0-mainline #1 SMP PREEMPT Wed Mar 6 21:12:30 EET 2019 x86_64 GNU/Linux
	LSB Version: n/a
	Distributor ID: ManjaroLinux
	Description: Manjaro Linux
	Release: 18.0.3
	Codename: Illyria
	=====================
	HSA System Attributes
	=====================
	Runtime Version: 1.1
	System Timestamp Freq.: 1000.000000MHz
	Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
	Machine Model: LARGE
	System Endianness: LITTLE

	==========
	HSA Agents
	==========
	*******
	Agent 1
	*******
	Name: AMD Ryzen 7 2700X Eight-Core Processor
	Vendor Name: CPU
	Feature: None specified
	Profile: FULL_PROFILE
	Float Round Mode: NEAR
	Max Queue Number: 0(0x0)
	Queue Min Size: 0(0x0)
	Queue Max Size: 0(0x0)
	Queue Type: MULTI
	Node: 0
	Device Type: CPU
	Cache Info:
	L1: 32768(0x8000) KB
	Chip ID: 0(0x0)
	Cacheline Size: 64(0x40)
	Max Clock Frequency (MHz):3700
	BDFID: 0(0x0)
	Compute Unit: 16(0x10)
	Features: None
	Pool Info:
	Pool 1
	Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
	Size: 16418992(0xfa88b0) KB
	Allocatable: TRUE
	Alloc Granule: 4KB
	Alloc Alignment: 4KB
	Acessible by all: TRUE
	Pool 2
	Segment: GLOBAL; FLAGS: COARSE GRAINED
	Size: 16418992(0xfa88b0) KB
	Allocatable: TRUE
	Alloc Granule: 4KB
	Alloc Alignment: 4KB
	Acessible by all: TRUE
	ISA Info:
	N/A
	*******
	Agent 2
	*******
	Name: gfx900
	Vendor Name: AMD
	Feature: KERNEL_DISPATCH
	Profile: BASE_PROFILE
	Float Round Mode: NEAR
	Max Queue Number: 128(0x80)
	Queue Min Size: 4096(0x1000)
	Queue Max Size: 131072(0x20000)
	Queue Type: MULTI
	Node: 1
	Device Type: GPU
	Cache Info:
	L1: 16(0x10) KB
	Chip ID: 26723(0x6863)
	Cacheline Size: 64(0x40)
	Max Clock Frequency (MHz):1600
	BDFID: 2304(0x900)
	Compute Unit: 64(0x40)
	Features: KERNEL_DISPATCH
	Fast F16 Operation: FALSE
	Wavefront Size: 64(0x40)
	Workgroup Max Size: 1024(0x400)
	Workgroup Max Size per Dimension:
	x 1024(0x400)
	y 1024(0x400)
	z 1024(0x400)
	Waves Per CU: 40(0x28)
	Max Work-item Per CU: 2560(0xa00)
	Grid Max Size: 4294967295(0xffffffff)
	Grid Max Size per Dimension:
	x 4294967295(0xffffffff)
	y 4294967295(0xffffffff)
	z 4294967295(0xffffffff)
	Max number Of fbarriers Per Workgroup:32
	Pool Info:
	Pool 1
	Segment: GLOBAL; FLAGS: COARSE GRAINED
	Size: 16760832(0xffc000) KB
	Allocatable: TRUE
	Alloc Granule: 4KB
	Alloc Alignment: 4KB
	Acessible by all: FALSE
	Pool 2
	Segment: GROUP
	Size: 64(0x40) KB
	Allocatable: FALSE
	Alloc Granule: 0KB
	Alloc Alignment: 0KB
	Acessible by all: FALSE
	ISA Info:
	ISA 1
	Name: amdgcn-amd-amdhsa--gfx900
	Machine Models: HSA_MACHINE_MODEL_LARGE
	Profiles: HSA_PROFILE_BASE
	Default Rounding Mode: NEAR
	Default Rounding Mode: NEAR
	Fast f16: TRUE
	Workgroup Max Size: 1024(0x400)
	Workgroup Max Size per Dimension:
	x 1024(0x400)
	y 1024(0x400)
	z 1024(0x400)
	Grid Max Size: 4294967295(0xffffffff)
	Grid Max Size per Dimension:
	x 4294967295(0xffffffff)
	y 4294967295(0xffffffff)
	z 4294967295(0xffffffff)
	FBarrier Max Size: 32
	* Done *
	import tensorflow as tf
	with tf.device('/gpu:0'):
	a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
	b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
	c = tf.matmul(a, b)
	sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # same ihipException here
	print(sess.run(c))