balidani/chat.log Secret

## chat.log
<plaintext> I'll explain the part where I create a new dummy too
<plaintext> because the original used to work and now it doesn't
<plaintext> Here is the OpenCL code for a kernel that produces a relatively long ISA
<plaintext> https://gist.github.com/balidani/a76cc41b1b041f02f980
<plaintext> this produces ~2.7K of ISA
<plaintext> I load this into sample.cl and then run
<plaintext> run/binary_gen 1 0 sample.bin sample.cl
<plaintext> "1 0" means device 0 of platform 1, which is the Tahiti device we need
<plaintext> sample.bin will be the generated binary
<plaintext> I find the code section and change everything to "00 00 80 bf" (= NOP)
<plaintext> Then I create the ISA for a much simpler kernel:
<plaintext> https://gist.github.com/balidani/f5745068b7e140acec0f
<plaintext> Even though the AMD KernelAnalyzer can generate ISA, I used OpenCL itself to do it, because I trust it
<plaintext> so I create a sample2.cl, and load it again with the binary_gen program
<plaintext> this time I use the _temp_0_Tahiti_main.isa file that was generated due to the "-save-temps" option
<plaintext> I cut the ISA and modify the comment syntax, because gcnasm uses ";" instead of "//"
<plaintext> I also add a 0 to the last instruction, s_endpgm, because gcnasm can't handle instructions with no arguments yet
<plaintext> I load the ISA to a file (test.isa)
<plaintext> and then run the assembler script
<plaintext> I tried doing the same as the assembler script by hand. Here are the steps for that too:
<plaintext> I create the microcode for the ISA file using this command:
<plaintext> run/gcnasm test.isa test.bin
<plaintext> now I use the python patching script
<plaintext> python tools/dummy_elf_patcher/patch_dummy.py sample.bin test.bin output.bin
<plaintext> output.bin will be the patched ELF
<plaintext> now I load this into OpenCL using this command:
<plaintext> run/binary_gen 1 0 output.bin none.cl
<plaintext> "none.cl" is something non-existing, since if the output.bin is found, the source is not used
<plaintext> and the result I get is an "error", which means that whatever I loaded as the ISA, I get the output from the last GPU execution
<plaintext> so if my small test contained out[gid] = 1337, and I change the isa to "out[gid] = 777", I will still find 1337 on the output
<plaintext> I also tried to change the vgpr and sgpr count in the binaries ATI CAL comments but it didn't help
<plaintext> tell me if something is hard to understand, it was a bit rushed, sorry
<ukasz_> are you able to patch any binary at the moment?
<plaintext> no, it looks like I'm not
	<plaintext> I'll explain the part where I create a new dummy too
	<plaintext> because the original used to work and now it doesn't
	<plaintext> Here is the OpenCL code for a kernel that produces a relatively long ISA
	<plaintext> https://gist.github.com/balidani/a76cc41b1b041f02f980
	<plaintext> this produces ~2.7K of ISA
	<plaintext> I load this into sample.cl and then run
	<plaintext> run/binary_gen 1 0 sample.bin sample.cl
	<plaintext> "1 0" means device 0 of platform 1, which is the Tahiti device we need
	<plaintext> sample.bin will be the generated binary
	<plaintext> I find the code section and change everything to "00 00 80 bf" (= NOP)
	<plaintext> Then I create the ISA for a much simpler kernel:
	<plaintext> https://gist.github.com/balidani/f5745068b7e140acec0f
	<plaintext> Even though the AMD KernelAnalyzer can generate ISA, I used OpenCL itself to do it, because I trust it
	<plaintext> so I create a sample2.cl, and load it again with the binary_gen program
	<plaintext> this time I use the _temp_0_Tahiti_main.isa file that was generated due to the "-save-temps" option
	<plaintext> I cut the ISA and modify the comment syntax, because gcnasm uses ";" instead of "//"
	<plaintext> I also add a 0 to the last instruction, s_endpgm, because gcnasm can't handle instructions with no arguments yet
	<plaintext> I load the ISA to a file (test.isa)
	<plaintext> and then run the assembler script
	<plaintext> I tried doing the same as the assembler script by hand. Here are the steps for that too:
	<plaintext> I create the microcode for the ISA file using this command:
	<plaintext> run/gcnasm test.isa test.bin
	<plaintext> now I use the python patching script
	<plaintext> python tools/dummy_elf_patcher/patch_dummy.py sample.bin test.bin output.bin
	<plaintext> output.bin will be the patched ELF
	<plaintext> now I load this into OpenCL using this command:
	<plaintext> run/binary_gen 1 0 output.bin none.cl
	<plaintext> "none.cl" is something non-existing, since if the output.bin is found, the source is not used
	<plaintext> and the result I get is an "error", which means that whatever I loaded as the ISA, I get the output from the last GPU execution
	<plaintext> so if my small test contained out[gid] = 1337, and I change the isa to "out[gid] = 777", I will still find 1337 on the output
	<plaintext> I also tried to change the vgpr and sgpr count in the binaries ATI CAL comments but it didn't help
	<plaintext> tell me if something is hard to understand, it was a bit rushed, sorry
	<ukasz_> are you able to patch any binary at the moment?
	<plaintext> no, it looks like I'm not