noureddin/performance-1d-vs-2d-arrays.md

## performance-1d-vs-2d-arrays.md

      
    Raw
  

              performance-1d-vs-2d-arrays.md
            
          
    Dynamically allocated arrays: 1D vs 2D

Summary: In C++, implementing a 2D array as a 1D array can provide about 125% to 130%  performance gain. It's not much if your application is not performance critical, but the handling code of 1D arrays is still far simpler than that of 2D arrays.
Overview

Let us write a small C++ class for matrices, with multiplying and printing functions. And let us benchmark it to see how can we make it most performant. Obviously we need to choose a performant algorithm; e.g., using Strassen algorithm is far better than multiplying matrices the naïve way. But I will do it the naïve way just to focus on one aspect only, namely, the performance impact of using 2D vs 1D dynamically allocated arrays.
I will compile using GCC (g++) with the highest level of optimization (-O3). I am using GCC 5.4.0.
I do the benchmarking using the function-like macro time_test from this header file, called timetest.h:
#include <stdio.h>    // printf
#include <sys/time.h> // everything else

// https://stackoverflow.com/a/1861493
typedef unsigned long long timestamp_t;
static timestamp_t get_timestamp() {
    struct timeval now;  gettimeofday (&now, NULL);
    return now.tv_usec + (timestamp_t)now.tv_sec * 1000000;
}

#define time_test(opname, n, ...) ({                \
        long double time = 0;                       \
        for (unsigned i = n; i; --i) {              \
            timestamp_t t0 = get_timestamp();       \
            __VA_ARGS__                             \
            timestamp_t t1 = get_timestamp();       \
            time += (t1 - t0) / 1000000.0L;         \
        }                                           \
        time /= n;                                  \
        fprintf(stderr, "%s, repeated %u times, "   \
                        "takes %Lf on average\n",   \
                opname, n, time);                   \
    })

A C++ matrix class with 1D dynamically allocated array

matrix1.hpp:
#ifndef _MATRIX1_HPP_
#define _MATRIX1_HPP_ 1
#include <cstdio>   // std::printf
#include <cstddef>  // size_t

class mat {
    public:
        mat(size_t nrow, size_t ncol, double fillvalue);
        mat();
        ~mat();
        mat& operator=(const mat& rval);
        void mul(const mat& a, const mat& b);
        void print() const;
        double& operator()(size_t i, size_t j);
        const double& operator()(size_t i, size_t j) const;
    private:
        size_t nrow, ncol;
        double* vals;
};

mat::mat(size_t nrow, size_t ncol, double fillvalue)
{
    this->nrow = nrow;
    this->ncol = ncol;
    this->vals = new double [nrow*ncol];
    for (size_t i = 0; i < nrow * ncol; ++i)
        this->vals[i] = fillvalue;
}

mat::mat()
{
    nrow = 0;
    ncol = 0;
    vals = 0;
}

mat::~mat()
{
    delete[] vals;
    vals = 0;
}

mat& mat::operator=(const mat& rval)
{
    delete[] vals;
    this->nrow = rval.nrow;
    this->ncol = rval.ncol;
    this->vals = new double [nrow*ncol];
    for (size_t i = 0; i < nrow * ncol; ++i)
        this->vals[i] = rval.vals[i];
    return *this;
}

void mat::mul(const mat& a, const mat& b)
{
    if (a.ncol != b.nrow) {
        *this = mat();
        return;
    }
    *this = mat(a.nrow, b.ncol, 0);
    for (size_t i = 0; i < this->nrow; ++i)
        for (size_t j = 0; j < this->ncol; ++j)
            for (size_t k = 0; k < a.ncol; ++k)
                (*this)(i, j) += a(i, k) * b(k, j);
    return;
}

void mat::print() const
{
    for (size_t i = 0; i < nrow; ++i, puts(""))
        for (size_t j = 0; j < ncol; ++j)
            std::printf("%g\t", (*this)(i,j));
}

double& mat::operator()(size_t i, size_t j)
{ return vals[j + i*ncol]; }

const double& mat::operator()(size_t i, size_t j) const
{ return vals[j + i*ncol]; }

#endif  // ! _MATRIX1_HPP_
main1.cpp:
#include "matrix1.hpp"
#include "timetest.h"

int main()
{
    time_test("1D dynamic arrays", 5,
        mat a(10, 1000000, 2);
        mat b(1000000, 10, 3);
        mat r; r.mul(a, b);
        r.print();  // prints 10x10 matrix of 6e6
    );
}
Now compile and run with the following command:
g++ -o 1 main1.cpp -O3 && ./1 > /dev/null
The result on my machine:
1D dynamic arrays, repeated 5 times, takes 0.870031 on average

A C++ matrix class with 2D dynamically allocated array

We'll create matrix2.h and main2.c as the following.
matrix2.hpp:
#ifndef _MATRIX2_HPP_
#define _MATRIX2_HPP_ 1
#include <cstdio>   // std::printf
#include <cstddef>  // size_t

class mat {
    public:
        mat(size_t nrow, size_t ncol, double fillvalue);
        mat();
        ~mat();
        mat& operator=(const mat& rval);
        void mul(const mat& a, const mat& b);
        void print() const;
        double& operator()(size_t i, size_t j);
        const double& operator()(size_t i, size_t j) const;
    private:
        size_t nrow, ncol;
        double** vals;
};

mat::mat(size_t nrow, size_t ncol, double fillvalue)
{
    this->nrow = nrow;
    this->ncol = ncol;
    this->vals = new double* [nrow];
    for (size_t i = 0; i < nrow; ++i) {
        this->vals[i] = new double[ncol];
        for (size_t j = 0; j < ncol; ++j)
            this->vals[i][j] = fillvalue;
    }
}

mat::mat()
{
    nrow = 0;
    ncol = 0;
    vals = 0;
}

mat::~mat()
{
    for (size_t i = 0; i < nrow; ++i)
        delete[] vals[i];
    delete[] vals;
    vals = 0;
}

mat& mat::operator=(const mat& rval)
{
    for (size_t i = 0; i < nrow; ++i)
        delete[] vals[i];
    delete[] vals;
    this->nrow = rval.nrow;
    this->ncol = rval.ncol;
    this->vals = new double* [nrow];
    for (size_t i = 0; i < nrow; ++i) {
        this->vals[i] = new double[ncol];
        for (size_t j = 0; j < nrow; ++j)
            this->vals[i][j] = rval.vals[i][j];
    }
    return *this;
}

void mat::mul(const mat& a, const mat& b)
{
    if (a.ncol != b.nrow) {
        *this = mat();
        return;
    }
    *this = mat(a.nrow, b.ncol, 0);
    for (size_t i = 0; i < this->nrow; ++i)
        for (size_t j = 0; j < this->ncol; ++j)
            for (size_t k = 0; k < a.ncol; ++k)
                (*this)(i, j) += a(i, k) * b(k, j);
    return;
}

void mat::print() const
{
    for (size_t i = 0; i < nrow; ++i, puts(""))
        for (size_t j = 0; j < ncol; ++j)
            std::printf("%g\t", (*this)(i,j));
}

double& mat::operator()(size_t i, size_t j)
{ return vals[i][j]; }

const double& mat::operator()(size_t i, size_t j) const
{ return vals[i][j]; }

#endif  // ! _MATRIX2_HPP_
main2.cpp:
#include "matrix2.hpp"
#include "timetest.h"

int main()
{
    time_test("2D dynamic arrays", 5,
        mat a(10, 1000000, 2);
        mat b(1000000, 10, 3);
        mat r; r.mul(a, b);
        r.print();  // prints 10x10 matrix of 6e6
    );
}
Now compile and run with the following command:
g++ -o 2 main2.cpp -O3 && ./2 > /dev/null
The result on my machine:
2D dynamic arrays, repeated 5 times, takes 1.169311 on average

More complicated code, with less performance!
Conclusion

If you don't have a really good reason to use a 2D array, prefer a 1D. Notice that it's all about the inner implementation; mat::mul() and mat::print() weren't affected by the change between the two; they still see the data as 2D.
See also: C++ Two Dimensional std::vector best practices.
See also: Performance of C functions: separate implementation and interface, or implementation in header?.