Looking for an efficient pure GO approach to copy repeating patterns into a slice, for a toy project, I ran a few tests and discovered a neat approach to significantly improve performance. For the toy project, I am using this to fill a background buffer with a specific RGB color pattern, so improving this performance significantly improved my acheivable framerate.
All the test were run with a buffer of 73437 bytes, allocated as follows
var bigSlice = make([]byte, 73437, 73437)
for j := 0; j < len(bigSlice); j++ {
bigSlice[j] = 65
}
Name | Executions | Time/Op | Bytes/Op | Allocs/Op |
---|---|---|---|---|
Benchmark_FillsliceIndex-4 | 24945 | 45540 ns/op | 0 B/op | 0 allocs/op |
for j := range bigSlice {
bigSlice[j] = 66
}
Name | Executions | Time/Op | Bytes/Op | Allocs/Op |
---|---|---|---|---|
Benchmark_FillsliceRange-4 | 35086 | 34316 ns/op | 0 B/op | 0 allocs/op |
Fill the slice using the builtin copy function to incrementally duplicate the data through the array.
// Preload the first value into the array/slice
bigSlice[0] = 67
// Incrementally duplicate the value into the rest of the container
for j := 1; j < len(bigSlice); j *= 2 {
copy(bigSlice[j:], bigSlice[:j])
}
Name | Executions | Time/Op | Bytes/Op | Allocs/Op |
---|---|---|---|---|
Benchmark_FillsliceCopyTrick-4 | 749976 | 1579 ns/op | 0 B/op | 0 allocs/op |
// Define the pattern
pattern := []byte{0xde, 0xad, 0xbe, 0xef}
// Copy the pattern into the start of the container
copy(bigSlice, pattern)
// Incrementally duplicate the pattern throughout the container
for j := len(pattern); j < len(bigSlice); j *= 2 {
copy(bigSlice[j:], bigSlice[:j])
}
Name | Executions | Time/Op | Bytes/Op | Allocs/Op |
---|---|---|---|---|
Benchmark_FillslicePatternCopyTrick-4 | 798944 | 1563 ns/op | 0 B/op | 0 allocs/op |
Name | Executions | Time/Op | Bytes/Op | Allocs/Op |
---|---|---|---|---|
Benchmark_FillsliceIndex-4 | 24945 | 45540 ns/op | 0 B/op | 0 allocs/op |
Benchmark_FillsliceRange-4 | 35086 | 34316 ns/op | 0 B/op | 0 allocs/op |
Benchmark_FillsliceCopyTrick-4 | 749976 | 1579 ns/op | 0 B/op | 0 allocs/op |
Benchmark_FillslicePatternCopyTrick-4 | 798944 | 1563 ns/op | 0 B/op | 0 allocs/op |
Using the builtin copy function avoids a lot of the overhead indexing and bounds checking each element of the container. Each call to copy will copy double the amount of data into the array, further amortizing the cost for each succesive copy. In addition, and I have not looked at the actual implementation, but presumably the builtin copy function will perform block copies for the large spans.
Since the copy function will will not copy beyond the capacity of the target array/slice, the code avoids doing that bounds check for the final copy and lets the copy function handle the edge case. Since we are just copying existing data, there are no over allocations etc.
Here is a crude visualization of what is happening
['A'] seed value loaded at index 0
Current Data | Action | New Data |
---|---|---|
['A'] | copy to index 0 to index 1 | ['A', 'A'] |
['A', 'A'] | copy the first 2 elements to index 2 | ['A', 'A', 'A', 'A'] |
['A', 'A', 'A', 'A'] | copy the first 4 elements to index 4 | ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'] |
Each iteration doubles the amount of work carried out per call to copy.
If i didn't get it wrong, what it does is basically "Ctrl-ACVVACVVACVVACVVACVVACVV..."
lmao