Skip to content

Instantly share code, notes, and snippets.

@kellabyte
Last active December 14, 2015 20:49
Show Gist options
  • Save kellabyte/5146568 to your computer and use it in GitHub Desktop.
Save kellabyte/5146568 to your computer and use it in GitHub Desktop.
void TestReadingMMap()
{
double bytesRead = 0;
double elapsed = 0;
mapped_file_source file;
__int64 checksum = 0;
{
auto_cpu_timer timer;
file.open("C:\SomeBigFile.dat", 2147483647);
stream<mapped_file_source> input(file);
if(file.is_open())
{
const int segmentSize = 4096;
const int size = file.size();
const int remainder = file.size() % segmentSize;
const int segmentLoops = file.size() / segmentSize;
char bytes[segmentSize];
for (int x=0; x<segmentLoops; x++)
{
input.read(bytes, segmentSize);
#pragma unroll 4096
for (int i=0; i<segmentSize; i++)
{
// This block reduces IO rate from 1.6GB/s
// to 1GB/s loosing 600MB/s.
checksum += bytes[i];
}
bytesRead += segmentSize;
}
// Moving this out here rather than a condition in the loop above
// reduced branch mispredictions.
if (remainder > 0)
{
input.read(bytes, remainder);
for (int i=0; i<remainder; i++)
{
checksum += bytes[i];
}
bytesRead += remainder;
}
input.close();
file.close(); // Unmap the file.
if (checksum == 43089565243)
{
cout << "Checksum passed" << endl;
}
}
else
{
cout << "could not map the file" << std::endl;
}
elapsed = timer.elapsed().wall;
}
cout << bytesRead / 1048576 << "MB at " << bytesRead / 1048576 / (elapsed/ 1000000000) << " MB/s" << endl;
cout << "Checksum: " << checksum << endl;
}
@mshappe
Copy link

mshappe commented Mar 12, 2013

Suggestion #1] I don't know about Windows, but in the Unix universe, the magic number for buffer sizes is tends to be 8192.

Suggestion #2] This is what I was trying to get across on Twitter. If the checksum does not actually need to be a literal summation of the entire file, but just some sort of sanity check, try cherry picking by bigger multiples, thus:

for (int i=0; i<segmentSize; i+=16)
{
checksum += bytes[i];
}

Suggestion #3] Try treating the array as an array of unsigned long. Warning: this will only work if the array is aligned on ARM architectures, although it should work fine on x86:

unsigned long *x = (unsigned long*)bytes;
for (int i = 0; i < segementSIze/4; i++) 
{
    checksum += x[i];
}

@talisein
Copy link

Try a more parallel summation. With your current code every addition has to wait for the previous summation to finish. Something like this should allow you to take advantage of more arithematic units:

for (int i = 0; i < segmentSize; i += 8) {
  int a = x[i] + x[i+1];
  int b = x[i+2] + x[i+3];
  int c = x[i+3] + x[i+5];
  int d = x[i+6] + x[i+7];
  a += b;
  c += d;
  a += c;
  checksum += a;
}

@MorganPersson
Copy link

I assume segmentSize is divisible by 4 :-)

for (int i=0; i<segmentSize; i+=4)
{
    checksum += bytes[i] + bytes[i+1] + bytes[i+2] + bytes[i+3];
}

@talisein
Copy link

@MorganPersson Presumably that's what the #pragma unroll does.

@jrwren
Copy link

jrwren commented Mar 13, 2013

I achieved great speedup by using an array of 64bit int and doing 64bit addition instead of 8 times as many 8bit addition
.

    if(file.is_open()) 
    {
        const int segmentSize = 4096;
        const int size = file.size();
        const int remainder = file.size() % segmentSize;
        const int segmentLoops = file.size() / segmentSize;
        __int64 value = 0;

        __int64 bytes[segmentSize/sizeof(__int64)];
        for (int x=0; x<segmentLoops; x++)
        {
            input.read((char*)bytes, segmentSize);
            #pragma unroll 4096
            for (int i=0; i<segmentSize/sizeof(__int64); i++)
            {
                // This block reduces IO rate from 1.6GB/s 
                // to 1GB/s loosing 600MB/s.
                // JRW: original was 220MB/s to 80MB/s on my old system

                checksum += bytes[i];
                // instead of adding lots of 8bit numbers we add 64bit numbers and I keep my 220MB/s
            }
            bytesRead += segmentSize;
        }

        // Moving this out here rather than a condition in the loop above 
        // reduced branch mispredictions.
        if (remainder > 0)
        {
            input.read((char*)bytes, remainder);
            for (int i=0; i<remainder/sizeof(__int64); i++)
            {
                checksum += bytes[i];
            }
            bytesRead += remainder;
        }

        input.close();
        file.close(); // Unmap the file.

        if (checksum == 43089565243)
        {
            cout << "Checksum passed" << endl;
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment