Create a gist now

Instantly share code, notes, and snippets.

@Orvid /t.md Secret
Last active Nov 14, 2017

What would you like to do?
Example

Whoops, sorry I failed to actually check back after saying this.

I actually manage to trigger both the first and 3rd cases in the same function, which I use for case-insensitive hashing of identifiers (in the context of a compiler and a C-like language):

size_t CaselessIdentifierHasher::doIdentifierHash(const char* s, size_t len) {
  const char* cStr = s;

  // We know the string is null-terminated, so we can align to 2.
  size_t lenLeft = ((len + 1) & 0xFFFFFFFFFFFFFFFEULL);
  size_t iterCount = lenLeft >> 2;
  uint32_t val = 0x84222325U;
  size_t i = iterCount;
  while (i)
    val = _mm_crc32_u32(val, ((uint32_t*)cStr)[--i] | 0x20202020);
  if (lenLeft & 2)
    val = _mm_crc32_u16(val, *(uint16_t*)(cStr + (iterCount * 4)) | (uint16_t)0x2020);
  return ((size_t)val << 32) | val;
}

That generates this with the latest pre-release available via NuGet (14.0.24120-Pre) compiled as 64-bit with /Ox /Ob2 /Oi /Ot /Oy:

; 141  :   const char* cStr = s;
; 142  : 
; 143  :   // We know the string is null-terminated, so we can align to 2.
; 144  :   size_t lenLeft = ((len + 1) & 0xFFFFFFFFFFFFFFFEULL);
  00000	4c 8d 52 01	 lea	 r10, QWORD PTR [rdx+1]
  00004	4c 8b c1	 mov	 r8, rcx
  00007	49 83 e2 fe	 and	 r10, -2
; 145  :   size_t iterCount = lenLeft >> 2;
; 146  :   uint32_t val = 0x84222325U;
  0000b	b8 25 23 22 84	 mov	 eax, -2078137563	; 84222325H
  00010	4d 8b ca	 mov	 r9, r10
  00013	49 c1 e9 02	 shr	 r9, 2
; 147  :   size_t i = iterCount;
  00017	49 8b d1	 mov	 rdx, r9
; 148  :   while (i)
  0001a	4d 85 c9	 test	 r9, r9
  0001d	74 19		 je	 SHORT $LN3@doIdentifi
  0001f	90		 npad	 1
$LL2@doIdentifi:
; 149  :     val = _mm_crc32_u32(val, ((uint32_t*)cStr)[--i] | 0x20202020);
  00020	41 8b 4c 90 fc	 mov	 ecx, DWORD PTR [r8+rdx*4-4]
  00025	48 ff ca	 dec	 rdx
  00028	81 c9 20 20 20
	20		 or	 ecx, 538976288		; 20202020H
  0002e	f2 0f 38 f1 c1	 crc32	 eax, ecx
  00033	48 85 d2	 test	 rdx, rdx
  00036	75 e8		 jne	 SHORT $LL2@doIdentifi
$LN3@doIdentifi:

; 150  :   if (lenLeft & 2)

  00038	41 f6 c2 02	 test	 r10b, 2
  0003c	74 13		 je	 SHORT $LN4@doIdentifi

; 151  :     val = _mm_crc32_u16(val, *(uint16_t*)(cStr + (iterCount * 4)) | (uint16_t)0x2020);

  0003e	43 0f b7 0c 88	 movzx	 ecx, WORD PTR [r8+r9*4]
  00043	ba 20 20 00 00	 mov	 edx, 8224		; 00002020H
  00048	66 0b ca	 or	 cx, dx
  0004b	66 f2 0f 38 f1
	c1		 crc32	 eax, cx
$LN4@doIdentifi:

; 152  :   return ((size_t)val << 32) | val;

  00051	8b c8		 mov	 ecx, eax
  00053	8b c0		 mov	 eax, eax
  00055	48 c1 e0 20	 shl	 rax, 32			; 00000020H
  00059	48 0b c1	 or	 rax, rcx

; 153  : }

  0005c	c3		 ret	 0

For issue 3, see the code attributed to return ((size_t)val << 32) | val; For issue 1, see the block immediately above that, in particular:

  00043	ba 20 20 00 00	 mov	 edx, 8224		; 00002020H
  00048	66 0b ca	 or	 cx, dx

There's also the issue of the dec for i being done with a separate test i, i later when it could be done by moving the dec to where the test is, where it would actually get fused as a single micro-op, but I haven't been able to find a way to structure that loop in such a way that any compiler would generate that :(

On a secondary side note:

size_t a = 0x84222325U;
a = _mm_crc32_u32(a, a);

generates:

00000	b8 25 23 22 84	 mov	 eax, -2078137563	; 84222325H
00005	f2 0f 38 f1 c0	 crc32	 eax, eax
0000a	8b c0		 mov	 eax, eax

As the compiler doesn't seem to be aware that the upper 32-bits are already zeroed by the crc32, probably because of the fact it's a builtin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment