Last active
November 29, 2018 13:37
-
-
Save antonmks/1273c5923f0ffda1ecd988d57d3788f7 to your computer and use it in GitHub Desktop.
Sort scalability on modern cpus
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sort scalability on modern cpus | |
We don't need core counts of thirty | |
If we win by playing dirty! | |
Lets test linux sort on Coffee Lake generation processor : | |
System : Ubuntu 16, 6 cores i5-8500, 16GB of DDR4, Nvidia GTX 1080 | |
Test file : 760 MB text file of tab separated numeric and text fields from TPC-H benchmark | |
The sort program is run repeatedly, so the source file is read from cache, the results are written | |
to /dev/null so we are comparing the cpu sort performance and not the disk performance. | |
Sorting on 16th field of variable length text : | |
Cores used : 6 5 4 3 2 1 | |
Running time(seconds) : 9.1 8.9 9.1 13.4 13.8 23.6 | |
Sorting on 3rd field (numeric) | |
Cores used : 6 5 4 3 2 1 | |
Running time(seconds) : 3.3 3.4 3.4 4.8 4.9 7.9 | |
As we see, the sort doesn't scale very well, the speed-up resulting from additional cores is limited. | |
Let's see the numbers when using a gpu. Here is a simple program that reads the file into gpu memory, | |
sorts on the specified fields, copies the file back into the main memory and writes the file to disk. | |
For sorting the program uses partitioning that exactly load-balances work over each gpu thread. | |
The results are not written to a file to match the cpu tests. | |
Running time(seconds) for text field sort : 1.5 ( the actual copying the data and sort takes 0.79 seconds, | |
the rest is gpu initialization time). | |
Running time(seconds) for numeric field sort : 0.8 ( 0.4 seconds takes the data copying and sorting). | |
my gpu sort program link : https://github.com/antonmks/nvSort | |
nvSort program is not ready for production, it was written just for the purpose of this benchmark. | |
Not tested on any other files. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment