Skip to content

Instantly share code, notes, and snippets.

@bfraboni
Last active April 14, 2024 21:07
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save bfraboni/946d9456b15cac3170514307cf032a27 to your computer and use it in GitHub Desktop.
Save bfraboni/946d9456b15cac3170514307cf032a27 to your computer and use it in GitHub Desktop.
C++ implementation of a fast Gaussian blur algorithm by Ivan Kutskir - Integer and Floating point version
// Copyright (C) 2017 Basile Fraboni
// Copyright (C) 2014 Ivan Kutskir
// All Rights Reserved
// You may use, distribute and modify this code under the
// terms of the MIT license. For further details please refer
// to : https://mit-license.org/
//
//!
//! \file blur.cpp
//! \author Basile Fraboni
//! \date 2017
//!
//! \brief The software is a C++ implementation of a fast
//! Gaussian blur algorithm by Ivan Kutskir. For further details
//! please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! Floating point version
//!
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
#include <iostream>
#include <cmath>
#include <chrono>
//!
//! \fn void std_to_box(int boxes[], float sigma, int n)
//!
//! \brief this function converts the standard deviation of
//! Gaussian blur into dimensions of boxes for box blur. For
//! further details please refer to :
//! https://www.peterkovesi.com/matlabfns/#integral
//! https://www.peterkovesi.com/papers/FastGaussianSmoothing.pdf
//!
//! \param[out] boxes boxes dimensions
//! \param[in] sigma Gaussian standard deviation
//! \param[in] n number of boxes
//!
void std_to_box(int boxes[], float sigma, int n)
{
// ideal filter width
float wi = std::sqrt((12*sigma*sigma/n)+1);
int wl = std::floor(wi);
if(wl%2==0) wl--;
int wu = wl+2;
float mi = (12*sigma*sigma - n*wl*wl - 4*n*wl - 3*n)/(-4*wl - 4);
int m = std::round(mi);
for(int i=0; i<n; i++)
boxes[i] = ((i < m ? wl : wu) - 1) / 2;
}
//!
//! \fn void horizontal_blur(float * in, float * out, int w, int h, int r)
//!
//! \brief this function performs the horizontal blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] r box dimension
//!
void horizontal_blur(float * in, float * out, int w, int h, int r)
{
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<h; i++)
{
int ti = i*w, li = ti, ri = ti+r;
float fv = in[ti], lv = in[ti+w-1], val = (r+1)*fv;
for(int j=0; j<r; j++) val += in[ti+j];
for(int j=0 ; j<=r ; j++) { val += in[ri++] - fv ; out[ti++] = val*iarr; }
for(int j=r+1; j<w-r; j++) { val += in[ri++] - in[li++]; out[ti++] = val*iarr; }
for(int j=w-r; j<w ; j++) { val += lv - in[li++]; out[ti++] = val*iarr; }
}
}
//!
//! \fn void total_blur(float * in, float * out, int w, int h, int r)
//!
//! \brief this function performs the total blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] r box dimension
//!
void total_blur(float * in, float * out, int w, int h, int r)
{
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<w; i++)
{
int ti = i, li = ti, ri = ti+r*w;
float fv = in[ti], lv = in[ti+w*(h-1)], val = (r+1)*fv;
for(int j=0; j<r; j++) val += in[ti+j*w];
for(int j=0 ; j<=r ; j++) { val += in[ri] - fv ; out[ti] = val*iarr; ri+=w; ti+=w; }
for(int j=r+1; j<h-r; j++) { val += in[ri] - in[li]; out[ti] = val*iarr; li+=w; ri+=w; ti+=w; }
for(int j=h-r; j<h ; j++) { val += lv - in[li]; out[ti] = val*iarr; li+=w; ti+=w; }
}
}
//!
//! \fn void box_blur(float * in, float * out, int w, int h, int r)
//!
//! \brief this function performs a box blur pass.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] r box dimension
//!
void box_blur(float *& in, float *& out, int w, int h, int r)
{
std::swap(in, out);
horizontal_blur(out, in, w, h, r);
total_blur(in, out, w, h, r);
// Note to myself :
// here we could go anisotropic with different radiis rx,ry in HBlur and TBlur
}
//!
//! \fn void fast_gaussian_blur(float * in, float * out, int w, int h, float sigma)
//!
//! \brief this function performs a fast Gaussian blur. Applying several
//! times box blur tends towards a true Gaussian blur. Three passes are sufficient
//! for good results. For further details please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] r box dimension
//!
void fast_gaussian_blur(float *& in, float *& out, int w, int h, float sigma)
{
// sigma conversion to box dimensions
int boxes[3];
std_to_box(boxes, sigma, 3);
box_blur(in, out, w, h, boxes[0]);
box_blur(out, in, w, h, boxes[1]);
box_blur(in, out, w, h, boxes[2]);
}
//! \code{.cpp}
int main(int argc, char * argv[])
{
if( argc < 2 ) exit(1);
const char * image_file = argv[1];
const float sigma = argc > 2 ? std::atof(argv[2]) : 3.;
const char * output_file = argc > 3 ? argv[3] : "blur.png";
// Image loading
int width, height, channels;
unsigned char * image_data = stbi_load(argv[1], &width, &height, &channels, 0);
std::cout << "Source image: " << width<<"x" << height << " ("<<channels<<")" << std::endl;
if(channels < 3)
{
std::cout<< "Input images must be RGB images."<<std::endl;
exit(1);
}
// copy data
int size = width * height;
// output channels r,g,b
float * newb = new float[size];
float * newg = new float[size];
float * newr = new float[size];
// input channels r,g,b
float * oldb = new float[size];
float * oldg = new float[size];
float * oldr = new float[size];
// channels copy r,g,b
for(int i = 0; i < size; ++i)
{
oldb[i] = image_data[channels * i + 0] / 255.f;
oldg[i] = image_data[channels * i + 1] / 255.f;
oldr[i] = image_data[channels * i + 2] / 255.f;
}
// per channel filter
auto start = std::chrono::system_clock::now();
fast_gaussian_blur(oldb, newb, width, height, sigma);
fast_gaussian_blur(oldg, newg, width, height, sigma);
fast_gaussian_blur(oldr, newr, width, height, sigma);
auto end = std::chrono::system_clock::now();
// stats
float elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count();
std::cout << "time " << elapsed << "ms" << std::endl;
// channels copy r,g,b
for(int i = 0; i < size; ++i)
{
image_data[channels * i + 0] = (unsigned char) std::min(255.f, std::max(0.f, 255.f * newb[i]));
image_data[channels * i + 1] = (unsigned char) std::min(255.f, std::max(0.f, 255.f * newg[i]));
image_data[channels * i + 2] = (unsigned char) std::min(255.f, std::max(0.f, 255.f * newr[i]));
}
// save
std::string file(output_file);
std::string ext = file.substr(file.size()-3);
if( ext == "bmp" )
stbi_write_bmp(output_file, width, height, channels, image_data);
else if( ext == "jpg" )
stbi_write_jpg(output_file, width, height, channels, image_data, 90);
else
{
if( ext != "png" )
{
std::cout << "format '" << ext << "' not supported writing default .png" << std::endl;
file = file.substr(0, file.size()-4) + std::string(".png");
}
stbi_write_png(file.c_str(), width, height, channels, image_data, channels*width);
}
stbi_image_free(image_data);
// clean memory
delete[] newr;
delete[] newb;
delete[] newg;
delete[] oldr;
delete[] oldb;
delete[] oldg;
image_data = stbi_load(argv[1], &width, &height, &channels, 0);
return 0;
}
//! \endcode
// Copyright (C) 2017 Basile Fraboni
// Copyright (C) 2014 Ivan Kutskir
// All Rights Reserved
// You may use, distribute and modify this code under the
// terms of the MIT license. For further details please refer
// to : https://mit-license.org/
//
//!
//! \file blur.cpp
//! \author Basile Fraboni
//! \date 2017
//!
//! \brief The software is a C++ implementation of a fast
//! Gaussian blur algorithm by Ivan Kutskir. For further details
//! please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! Floating point single precision version
//!
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
#include <iostream>
#include <cmath>
#include <chrono>
//!
//! \fn void std_to_box(int boxes[], float sigma, int n)
//!
//! \brief this function converts the standard deviation of
//! Gaussian blur into dimensions of boxes for box blur. For
//! further details please refer to :
//! https://www.peterkovesi.com/matlabfns/#integral
//! https://www.peterkovesi.com/papers/FastGaussianSmoothing.pdf
//!
//! \param[out] boxes boxes dimensions
//! \param[in] sigma Gaussian standard deviation
//! \param[in] n number of boxes
//!
void std_to_box(int boxes[], float sigma, int n)
{
// ideal filter width
float wi = std::sqrt((12*sigma*sigma/n)+1);
int wl = std::floor(wi);
if(wl%2==0) wl--;
int wu = wl+2;
float mi = (12*sigma*sigma - n*wl*wl - 4*n*wl - 3*n)/(-4*wl - 4);
int m = std::round(mi);
for(int i=0; i<n; i++)
boxes[i] = ((i < m ? wl : wu) - 1) / 2;
}
//!
//! \fn void horizontal_blur_rgb(float * in, float * out, int w, int h, int c, int r)
//!
//! \brief this function performs the horizontal blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void horizontal_blur_rgb(float * in, float * out, int w, int h, int c, int r)
{
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<h; i++)
{
int ti = i*w;
int li = ti;
int ri = ti+r;
float fv[3] = { in[ti*c+0], in[ti*c+1], in[ti*c+2] };
float lv[3] = { in[(ti+w-1)*c+0], in[(ti+w-1)*c+1], in[(ti+w-1)*c+2] };
float val[3] = { (r+1)*fv[0], (r+1)*fv[1], (r+1)*fv[2] };
for(int j=0; j<r; j++)
{
val[0] += in[(ti+j)*c+0];
val[1] += in[(ti+j)*c+1];
val[2] += in[(ti+j)*c+2];
}
for(int j=0; j<=r; j++, ri++, ti++)
{
val[0] += in[ri*c+0] - fv[0];
val[1] += in[ri*c+1] - fv[1];
val[2] += in[ri*c+2] - fv[2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
for(int j=r+1; j<w-r; j++, ri++, ti++, li++)
{
val[0] += in[ri*c+0] - in[li*c+0];
val[1] += in[ri*c+1] - in[li*c+1];
val[2] += in[ri*c+2] - in[li*c+2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
for(int j=w-r; j<w; j++, ti++, li++)
{
val[0] += lv[0] - in[li*c+0];
val[1] += lv[1] - in[li*c+1];
val[2] += lv[2] - in[li*c+2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
}
}
//!
//! \fn void total_blur_rgb(float * in, float * out, int w, int h, int c, int r)
//!
//! \brief this function performs the total blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void total_blur_rgb(float * in, float * out, int w, int h, int c, int r)
{
// radius range on either side of a pixel + the pixel itself
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<w; i++)
{
int ti = i;
int li = ti;
int ri = ti+r*w;
float fv[3] = {in[ti*c+0], in[ti*c+1], in[ti*c+2] };
float lv[3] = {in[(ti+w*(h-1))*c+0], in[(ti+w*(h-1))*c+1], in[(ti+w*(h-1))*c+2] };
float val[3] = {(r+1)*fv[0], (r+1)*fv[1], (r+1)*fv[2] };
for(int j=0; j<r; j++)
{
val[0] += in[(ti+j*w)*c+0];
val[1] += in[(ti+j*w)*c+1];
val[2] += in[(ti+j*w)*c+2];
}
for(int j=0; j<=r; j++, ri+=w, ti+=w)
{
val[0] += in[ri*c+0] - fv[0];
val[1] += in[ri*c+1] - fv[1];
val[2] += in[ri*c+2] - fv[2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
for(int j=r+1; j<h-r; j++, ri+=w, ti+=w, li+=w)
{
val[0] += in[ri*c+0] - in[li*c+0];
val[1] += in[ri*c+1] - in[li*c+1];
val[2] += in[ri*c+2] - in[li*c+2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
for(int j=h-r; j<h; j++, ti+=w, li+=w)
{
val[0] += lv[0] - in[li*c+0];
val[1] += lv[1] - in[li*c+1];
val[2] += lv[2] - in[li*c+2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
}
}
//!
//! \fn void box_blur_rgb(float * in, float * out, int w, int h, int c, int r)
//!
//! \brief this function performs a box blur pass.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void box_blur_rgb(float *& in, float *& out, int w, int h, int c, int r)
{
std::swap(in, out);
horizontal_blur_rgb(out, in, w, h, c, r);
total_blur_rgb(in, out, w, h, c, r);
// Note to myself :
// here we could go anisotropic with different radiis rx,ry in HBlur and TBlur
}
//!
//! \fn void fast_gaussian_blur_rgb(float * in, float * out, int w, int h, int c, float sigma)
//!
//! \brief this function performs a fast Gaussian blur. Applying several
//! times box blur tends towards a true Gaussian blur. Three passes are sufficient
//! for good results. For further details please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] sigma gaussian std dev
//!
void fast_gaussian_blur_rgb(float *& in, float *& out, int w, int h, int c, float sigma)
{
// sigma conversion to box dimensions
int boxes[3];
std_to_box(boxes, sigma, 3);
box_blur_rgb(in, out, w, h, c, boxes[0]);
box_blur_rgb(out, in, w, h, c, boxes[1]);
box_blur_rgb(in, out, w, h, c, boxes[2]);
}
//! \code{.cpp}
int main(int argc, char * argv[])
{
if( argc < 2 ) exit(1);
const char * image_file = argv[1];
const float sigma = argc > 2 ? std::atof(argv[2]) : 3.;
const char * output_file = argc > 3 ? argv[3] : "blur.png";
// image loading
int width, height, channels;
unsigned char * image_data = stbi_load(argv[1], &width, &height, &channels, 0);
std::cout << "Source image: " << width<<"x" << height << " ("<<channels<<")" << std::endl;
if(channels < 3)
{
std::cout<< "Input images must be RGB images."<<std::endl;
exit(1);
}
// copy data
int size = width * height * channels;
// output channels r,g,b
float * new_image = new float[size];
float * old_image = new float[size];
// channels copy r,g,b
for(int i = 0; i < size; ++i)
old_image[i] = image_data[i] / 255.f;
// per channel filter
auto start = std::chrono::system_clock::now();
fast_gaussian_blur_rgb(old_image, new_image, width, height, channels, sigma);
auto end = std::chrono::system_clock::now();
// stats
float elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count();
std::cout << "time " << elapsed << "ms" << std::endl;
// channels copy r,g,b
for(int i = 0; i < size; ++i)
image_data[i] = (unsigned char) std::min(255.f, std::max(0.f, 255.f * new_image[i]));
// save
std::string file(output_file);
std::string ext = file.substr(file.size()-3);
if( ext == "bmp" )
stbi_write_bmp(output_file, width, height, channels, image_data);
else if( ext == "jpg" )
stbi_write_jpg(output_file, width, height, channels, image_data, 90);
else
{
if( ext != "png" )
{
std::cout << "format '" << ext << "' not supported writing default .png" << std::endl;
file = file.substr(0, file.size()-4) + std::string(".png");
}
stbi_write_png(file.c_str(), width, height, channels, image_data, channels*width);
}
stbi_image_free(image_data);
// clean memory
delete[] new_image;
delete[] old_image;
return 0;
}
//! \endcode
// Copyright (C) 2017 Basile Fraboni
// Copyright (C) 2014 Ivan Kutskir
// All Rights Reserved
// You may use, distribute and modify this code under the
// terms of the MIT license. For further details please refer
// to : https://mit-license.org/
//
//!
//! \file blur.cpp
//! \author Basile Fraboni
//! \date 2017
//!
//! \brief The software is a C++ implementation of a fast
//! Gaussian blur algorithm by Ivan Kutskir. For further details
//! please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! Integer version
//!
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
#include <iostream>
#include <cmath>
#include <chrono>
//!
//! \fn void std_to_box(int boxes[], float sigma, int n)
//!
//! \brief this function converts the standard deviation of
//! Gaussian blur into dimensions of boxes for box blur. For
//! further details please refer to :
//! https://www.peterkovesi.com/matlabfns/#integral
//! https://www.peterkovesi.com/papers/FastGaussianSmoothing.pdf
//!
//! \param[out] boxes boxes dimensions
//! \param[in] sigma Gaussian standard deviation
//! \param[in] n number of boxes
//!
void std_to_box(int boxes[], float sigma, int n)
{
// ideal filter width
float wi = std::sqrt((12*sigma*sigma/n)+1);
int wl = std::floor(wi);
if(wl%2==0) wl--;
int wu = wl+2;
float mi = (12*sigma*sigma - n*wl*wl - 4*n*wl - 3*n)/(-4*wl - 4);
int m = std::round(mi);
for(int i=0; i<n; i++)
boxes[i] = ((i < m ? wl : wu) - 1) / 2;
}
//!
//! \fn void horizontal_blur(int * in, int * out, int w,int h, int r)
//!
//! \brief this function performs the horizontal blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] r box dimension
//!
void horizontal_blur(int * in, int * out, int w, int h, int r)
{
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<h; i++)
{
int ti = i*w, li = ti, ri = ti+r, fv = in[ti], lv = in[ti+w-1], val = (r+1)*fv;
for(int j=0; j<r; j++) { val += in[ti+j]; }
for(int j=0 ; j<=r ; j++) { val += in[ri++] - fv ; out[ti++] = std::round(val*iarr); }
for(int j=r+1; j<w-r; j++) { val += in[ri++] - in[li++]; out[ti++] = std::round(val*iarr); }
for(int j=w-r; j<w ; j++) { val += lv - in[li++]; out[ti++] = std::round(val*iarr); }
}
}
//!
//! \fn void total_blur(int * in, int * out, int w, int h, int r)
//!
//! \brief this function performs the total blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] r box dimension
//!
void total_blur(int * in, int * out, int w, int h, int r)
{
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<w; i++)
{
int ti = i, li = ti, ri = ti+r*w, fv = in[ti], lv = in[ti+w*(h-1)], val = (r+1)*fv;
for(int j=0; j<r; j++) { val += in[ti+j*w]; }
for(int j=0 ; j<=r ; j++) { val += in[ri] - fv ; out[ti] = std::round(val*iarr); ri+=w; ti+=w; }
for(int j=r+1; j<h-r; j++) { val += in[ri] - in[li]; out[ti] = std::round(val*iarr); li+=w; ri+=w; ti+=w; }
for(int j=h-r; j<h ; j++) { val += lv - in[li]; out[ti] = std::round(val*iarr); li+=w; ti+=w; }
}
}
//!
//! \fn void box_blur(int * in, int * out, int w, int h, int r)
//!
//! \brief this function performs a box blur pass.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] r box dimension
//!
void box_blur(int *& in, int *& out, int w, int h, int r)
{
std::swap(in, out);
horizontal_blur(out, in, w, h, r);
total_blur(in, out, w, h, r);
// Note to myself :
// here we could go anisotropic with different radiis rx,ry in HBlur and TBlur
}
//!
//! \fn void fast_gaussian_blur(int * in, int * out, int w, int h, float sigma)
//!
//! \brief this function performs a fast Gaussian blur. Applying several
//! times box blur tends towards a true Gaussian blur. Three passes are sufficient
//! for good results. For further details please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] sigma gaussian std dev
//!
void fast_gaussian_blur(int *& in, int *& out, int w, int h, float sigma)
{
// sigma conversion to box dimensions
int boxes[3];
std_to_box(boxes, sigma, 3);
box_blur(in, out, w, h, boxes[0]);
box_blur(out, in, w, h, boxes[1]);
box_blur(in, out, w, h, boxes[2]);
}
//! \code{.cpp}
int main(int argc, char * argv[])
{
if( argc < 2 ) exit(1);
const char * image_file = argv[1];
const float sigma = argc > 2 ? std::atof(argv[2]) : 3.;
const char * output_file = argc > 3 ? argv[3] : "blur.png";
// image loading
int width, height, channels;
unsigned char * image_data = stbi_load(argv[1], &width, &height, &channels, 0);
std::cout << "Source image: " << width<<"x" << height << " ("<<channels<<")" << std::endl;
if(channels < 3)
{
std::cout<< "Input images must be RGB images."<<std::endl;
exit(1);
}
// copy data
int size = width * height;
// output channels r,g,b
int * newb = new int[size];
int * newg = new int[size];
int * newr = new int[size];
// input channels r,g,b
int * oldb = new int[size];
int * oldg = new int[size];
int * oldr = new int[size];
// channels copy r,g,b
for(int i = 0; i < size; ++i)
{
oldb[i] = image_data[channels * i + 0];
oldg[i] = image_data[channels * i + 1];
oldr[i] = image_data[channels * i + 2];
}
// per channel filter
auto start = std::chrono::system_clock::now();
fast_gaussian_blur(oldb, newb, width, height, sigma);
fast_gaussian_blur(oldg, newg, width, height, sigma);
fast_gaussian_blur(oldr, newr, width, height, sigma);
auto end = std::chrono::system_clock::now();
// stats
float elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count();
std::cout << "time " << elapsed << "ms" << std::endl;
// channels copy r,g,b
for(int i = 0; i < size; ++i)
{
image_data[channels * i + 0] = (unsigned char) std::min(255, std::max(0, newb[i]));
image_data[channels * i + 1] = (unsigned char) std::min(255, std::max(0, newg[i]));
image_data[channels * i + 2] = (unsigned char) std::min(255, std::max(0, newr[i]));
}
// save
std::string file(output_file);
std::string ext = file.substr(file.size()-3);
if( ext == "bmp" )
stbi_write_bmp(output_file, width, height, channels, image_data);
else if( ext == "jpg" )
stbi_write_jpg(output_file, width, height, channels, image_data, 90);
else
{
if( ext != "png" )
{
std::cout << "format '" << ext << "' not supported writing default .png" << std::endl;
file = file.substr(0, file.size()-4) + std::string(".png");
}
stbi_write_png(file.c_str(), width, height, channels, image_data, channels*width);
}
stbi_image_free(image_data);
// clean memory
delete[] newr;
delete[] newb;
delete[] newg;
delete[] oldr;
delete[] oldb;
delete[] oldg;
return 0;
}
//! \endcode
// Copyright (C) 2017 Basile Fraboni
// Copyright (C) 2014 Ivan Kutskir
// All Rights Reserved
// You may use, distribute and modify this code under the
// terms of the MIT license. For further details please refer
// to : https://mit-license.org/
//
//!
//! \file blur.cpp
//! \author Basile Fraboni
//! \date 2017
//!
//! \brief The software is a C++ implementation of a fast
//! Gaussian blur algorithm by Ivan Kutskir. For further details
//! please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! Integer version
//!
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
#include <iostream>
#include <cmath>
#include <chrono>
//!
//! \fn void std_to_box(float boxes[], float sigma, int n)
//!
//! \brief this function converts the standard deviation of
//! Gaussian blur into dimensions of boxes for box blur. For
//! further details please refer to :
//! https://www.peterkovesi.com/matlabfns/#integral
//! https://www.peterkovesi.com/papers/FastGaussianSmoothing.pdf
//!
//! \param[out] boxes boxes dimensions
//! \param[in] sigma Gaussian standard deviation
//! \param[in] n number of boxes
//!
void std_to_box(int boxes[], float sigma, int n)
{
// ideal filter width
float wi = std::sqrt((12*sigma*sigma/n)+1);
int wl = std::floor(wi);
if(wl%2==0) wl--;
int wu = wl+2;
float mi = (12*sigma*sigma - n*wl*wl - 4*n*wl - 3*n)/(-4*wl - 4);
int m = std::round(mi);
for(int i=0; i<n; i++)
boxes[i] = ((i < m ? wl : wu) - 1) / 2;
}
//!
//! \fn void horizontal_blur_rgb(int * in, int * out, int w, int h, int c, int r)
//!
//! \brief this function performs the horizontal blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void horizontal_blur_rgb(int * in, int * out, int w, int h, int c, int r)
{
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<h; i++)
{
int ti = i*w;
int li = ti;
int ri = ti+r;
int fv[3] = { in[ti*c+0], in[ti*c+1], in[ti*c+2] };
int lv[3] = { in[(ti+w-1)*c+0], in[(ti+w-1)*c+1], in[(ti+w-1)*c+2] };
int val[3] = { (r+1)*fv[0], (r+1)*fv[1], (r+1)*fv[2] };
for(int j=0; j<r; j++)
{
val[0] += in[(ti+j)*c+0];
val[1] += in[(ti+j)*c+1];
val[2] += in[(ti+j)*c+2];
}
for(int j=0; j<=r; j++, ri++, ti++)
{
val[0] += in[ri*c+0] - fv[0];
val[1] += in[ri*c+1] - fv[1];
val[2] += in[ri*c+2] - fv[2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
for(int j=r+1; j<w-r; j++, ri++, ti++, li++)
{
val[0] += in[ri*c+0] - in[li*c+0];
val[1] += in[ri*c+1] - in[li*c+1];
val[2] += in[ri*c+2] - in[li*c+2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
for(int j=w-r; j<w; j++, ti++, li++)
{
val[0] += lv[0] - in[li*c+0];
val[1] += lv[1] - in[li*c+1];
val[2] += lv[2] - in[li*c+2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
}
}
//!
//! \fn void total_blur_rgb(int * in, int * out, int w, int h, int c, int r)
//!
//! \brief this function performs the total blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void total_blur_rgb(int * in, int * out, int w, int h, int c, int r)
{
// radius range on either side of a pixel + the pixel itself
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<w; i++)
{
int ti = i;
int li = ti;
int ri = ti+r*w;
int fv[3] = {in[ti*c+0], in[ti*c+1], in[ti*c+2] };
int lv[3] = {in[(ti+w*(h-1))*c+0], in[(ti+w*(h-1))*c+1], in[(ti+w*(h-1))*c+2] };
int val[3] = {(r+1)*fv[0], (r+1)*fv[1], (r+1)*fv[2] };
for(int j=0; j<r; j++)
{
val[0] += in[(ti+j*w)*c+0];
val[1] += in[(ti+j*w)*c+1];
val[2] += in[(ti+j*w)*c+2];
}
for(int j=0; j<=r; j++, ri+=w, ti+=w)
{
val[0] += in[ri*c+0] - fv[0];
val[1] += in[ri*c+1] - fv[1];
val[2] += in[ri*c+2] - fv[2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
for(int j=r+1; j<h-r; j++, ri+=w, ti+=w, li+=w)
{
val[0] += in[ri*c+0] - in[li*c+0];
val[1] += in[ri*c+1] - in[li*c+1];
val[2] += in[ri*c+2] - in[li*c+2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
for(int j=h-r; j<h; j++, ti+=w, li+=w)
{
val[0] += lv[0] - in[li*c+0];
val[1] += lv[1] - in[li*c+1];
val[2] += lv[2] - in[li*c+2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
}
}
//!
//! \fn void box_blur_rgb(int * in, int * out, int w, int h, int c, int r)
//!
//! \brief this function performs a box blur pass.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void box_blur_rgb(int *& in, int *& out, int w, int h, int c, int r)
{
std::swap(in, out);
horizontal_blur_rgb(out, in, w, h, c, r);
total_blur_rgb(in, out, w, h, c, r);
// Note to myself :
// here we could go anisotropic with different radiis rx,ry in HBlur and TBlur
}
//!
//! \fn void fast_gaussian_blur_rgb(int * in, int * out, int w, int h, int c, float sigma)
//!
//! \brief this function performs a fast Gaussian blur. Applying several
//! times box blur tends towards a true Gaussian blur. Three passes are sufficient
//! for good results. For further details please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] sigma gaussian std dev
//!
void fast_gaussian_blur_rgb(int *& in, int *& out, int w, int h, int c, float sigma)
{
// sigma conversion to box dimensions
int boxes[3];
std_to_box(boxes, sigma, 3);
box_blur_rgb(in, out, w, h, c, boxes[0]);
box_blur_rgb(out, in, w, h, c, boxes[1]);
box_blur_rgb(in, out, w, h, c, boxes[2]);
}
//! \code{.cpp}
int main(int argc, char * argv[])
{
if( argc < 2 ) exit(1);
const char * image_file = argv[1];
const float sigma = argc > 2 ? std::atof(argv[2]) : 3.;
const char * output_file = argc > 3 ? argv[3] : "blur.png";
// image loading
int width, height, channels;
unsigned char * image_data = stbi_load(argv[1], &width, &height, &channels, 0);
std::cout << "Source image: " << width<<"x" << height << " ("<<channels<<")" << std::endl;
if(channels < 3)
{
std::cout<< "Input images must be RGB images."<<std::endl;
exit(1);
}
// copy data
int size = width * height * channels;
// output channels r,g,b
int * new_image = new int[size];
int * old_image = new int[size];
// channels copy r,g,b
for(int i = 0; i < size; ++i)
old_image[i] = image_data[i];
// per channel filter
auto start = std::chrono::system_clock::now();
fast_gaussian_blur_rgb(old_image, new_image, width, height, channels, sigma);
auto end = std::chrono::system_clock::now();
// stats
float elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count();
std::cout << "time " << elapsed << "ms" << std::endl;
// channels copy r,g,b
for(int i = 0; i < size; ++i)
image_data[i] = (unsigned char) std::min(255, std::max(0, new_image[i]));
// save
std::string file(output_file);
std::string ext = file.substr(file.size()-3);
if( ext == "bmp" )
stbi_write_bmp(output_file, width, height, channels, image_data);
else if( ext == "jpg" )
stbi_write_jpg(output_file, width, height, channels, image_data, 90);
else
{
if( ext != "png" )
{
std::cout << "format '" << ext << "' not supported writing default .png" << std::endl;
file = file.substr(0, file.size()-4) + std::string(".png");
}
stbi_write_png(file.c_str(), width, height, channels, image_data, channels*width);
}
stbi_image_free(image_data);
// clean memory
delete[] new_image;
delete[] old_image;
return 0;
}
//! \endcode
// Copyright (C) 2017 Basile Fraboni
// Copyright (C) 2014 Ivan Kutskir
// All Rights Reserved
// You may use, distribute and modify this code under the
// terms of the MIT license. For further details please refer
// to : https://mit-license.org/
//
//!
//! \file blur.cpp
//! \author Basile Fraboni
//! \date 2017
//!
//! \brief The software is a C++ implementation of a fast
//! Gaussian blur algorithm by Ivan Kutskir. For further details
//! please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! Unsigned char version
//!
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
#include <iostream>
#include <cmath>
#include <cstring>
#include <chrono>
typedef unsigned char uchar;
//!
//! \fn void std_to_box(float boxes[], float sigma, int n)
//!
//! \brief this function converts the standard deviation of
//! Gaussian blur into dimensions of boxes for box blur. For
//! further details please refer to :
//! https://www.peterkovesi.com/matlabfns/#integral
//! https://www.peterkovesi.com/papers/FastGaussianSmoothing.pdf
//!
//! \param[out] boxes boxes dimensions
//! \param[in] sigma Gaussian standard deviation
//! \param[in] n number of boxes
//!
void std_to_box(int boxes[], float sigma, int n)
{
// ideal filter width
float wi = std::sqrt((12*sigma*sigma/n)+1);
int wl = std::floor(wi);
if(wl%2==0) wl--;
int wu = wl+2;
float mi = (12*sigma*sigma - n*wl*wl - 4*n*wl - 3*n)/(-4*wl - 4);
int m = std::round(mi);
for(int i=0; i<n; i++)
boxes[i] = ((i < m ? wl : wu) - 1) / 2;
}
//!
//! \fn void horizontal_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
//!
//! \brief this function performs the horizontal blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void horizontal_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
{
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<h; i++)
{
int ti = i*w;
int li = ti;
int ri = ti+r;
int fv[3] = { in[ti*c+0], in[ti*c+1], in[ti*c+2] };
int lv[3] = { in[(ti+w-1)*c+0], in[(ti+w-1)*c+1], in[(ti+w-1)*c+2] };
int val[3] = { (r+1)*fv[0], (r+1)*fv[1], (r+1)*fv[2] };
for(int j=0; j<r; j++)
{
val[0] += in[(ti+j)*c+0];
val[1] += in[(ti+j)*c+1];
val[2] += in[(ti+j)*c+2];
}
for(int j=0; j<=r; j++, ri++, ti++)
{
val[0] += in[ri*c+0] - fv[0];
val[1] += in[ri*c+1] - fv[1];
val[2] += in[ri*c+2] - fv[2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
for(int j=r+1; j<w-r; j++, ri++, ti++, li++)
{
val[0] += in[ri*c+0] - in[li*c+0];
val[1] += in[ri*c+1] - in[li*c+1];
val[2] += in[ri*c+2] - in[li*c+2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
for(int j=w-r; j<w; j++, ti++, li++)
{
val[0] += lv[0] - in[li*c+0];
val[1] += lv[1] - in[li*c+1];
val[2] += lv[2] - in[li*c+2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
}
}
//!
//! \fn void total_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
//!
//! \brief this function performs the total blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void total_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
{
// radius range on either side of a pixel + the pixel itself
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<w; i++)
{
int ti = i;
int li = ti;
int ri = ti+r*w;
int fv[3] = {in[ti*c+0], in[ti*c+1], in[ti*c+2] };
int lv[3] = {in[(ti+w*(h-1))*c+0], in[(ti+w*(h-1))*c+1], in[(ti+w*(h-1))*c+2] };
int val[3] = {(r+1)*fv[0], (r+1)*fv[1], (r+1)*fv[2] };
for(int j=0; j<r; j++)
{
val[0] += in[(ti+j*w)*c+0];
val[1] += in[(ti+j*w)*c+1];
val[2] += in[(ti+j*w)*c+2];
}
for(int j=0; j<=r; j++, ri+=w, ti+=w)
{
val[0] += in[ri*c+0] - fv[0];
val[1] += in[ri*c+1] - fv[1];
val[2] += in[ri*c+2] - fv[2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
for(int j=r+1; j<h-r; j++, ri+=w, ti+=w, li+=w)
{
val[0] += in[ri*c+0] - in[li*c+0];
val[1] += in[ri*c+1] - in[li*c+1];
val[2] += in[ri*c+2] - in[li*c+2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
for(int j=h-r; j<h; j++, ti+=w, li+=w)
{
val[0] += lv[0] - in[li*c+0];
val[1] += lv[1] - in[li*c+1];
val[2] += lv[2] - in[li*c+2];
out[ti*c+0] = std::round(val[0]*iarr);
out[ti*c+1] = std::round(val[1]*iarr);
out[ti*c+2] = std::round(val[2]*iarr);
}
}
}
//!
//! \fn void box_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
//!
//! \brief this function performs a box blur pass.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void box_blur_rgb(uchar *& in, uchar *& out, int w, int h, int c, int r)
{
std::swap(in, out);
horizontal_blur_rgb(out, in, w, h, c, r);
total_blur_rgb(in, out, w, h, c, r);
// Note to myself :
// here we could go anisotropic with different radiis rx,ry in HBlur and TBlur
}
//!
//! \fn void fast_gaussian_blur_rgb(uchar * in, uchar * out, int w, int h, int c, float sigma)
//!
//! \brief this function performs a fast Gaussian blur. Applying several
//! times box blur tends towards a true Gaussian blur. Three passes are sufficient
//! for good results. For further details please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] sigma gaussian std dev
//!
void fast_gaussian_blur_rgb(uchar *& in, uchar *& out, int w, int h, int c, float sigma)
{
// sigma conversion to box dimensions
int boxes[3];
std_to_box(boxes, sigma, 3);
box_blur_rgb(in, out, w, h, c, boxes[0]);
box_blur_rgb(out, in, w, h, c, boxes[1]);
box_blur_rgb(in, out, w, h, c, boxes[2]);
}
//! \code{.cpp}
int main(int argc, char * argv[])
{
if( argc < 2 ) exit(1);
const char * image_file = argv[1];
const float sigma = argc > 2 ? std::atof(argv[2]) : 3.;
const char * output_file = argc > 3 ? argv[3] : "blur.png";
// image loading
int width, height, channels;
uchar * image_data = stbi_load(argv[1], &width, &height, &channels, 0);
std::cout << "Source image: " << width<<"x" << height << " ("<<channels<<")" << std::endl;
if(channels < 3)
{
std::cout<< "Input images must be RGB images."<<std::endl;
exit(1);
}
// copy data
int size = width * height * channels;
// output channels r,g,b
uchar * new_image = new uchar[size];
uchar * old_image = new uchar[size];
// channels copy r,g,b
for(int i = 0; i < size; ++i)
old_image[i] = image_data[i];
// per channel filter
auto start = std::chrono::system_clock::now();
fast_gaussian_blur_rgb(old_image, new_image, width, height, channels, sigma);
auto end = std::chrono::system_clock::now();
// stats
float elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count();
std::cout << "time " << elapsed << "ms" << std::endl;
// channels copy r,g,b
for(int i = 0; i < size; ++i)
image_data[i] = (uchar) std::min((uchar)255, std::max((uchar)0, new_image[i]));
// save
std::string file(output_file);
std::string ext = file.substr(file.size()-3);
if( ext == "bmp" )
stbi_write_bmp(output_file, width, height, channels, image_data);
else if( ext == "jpg" )
stbi_write_jpg(output_file, width, height, channels, image_data, 90);
else
{
if( ext != "png" )
{
std::cout << "format '" << ext << "' not supported writing default .png" << std::endl;
file = file.substr(0, file.size()-4) + std::string(".png");
}
stbi_write_png(file.c_str(), width, height, channels, image_data, channels*width);
}
stbi_image_free(image_data);
// clean memory
delete[] new_image;
delete[] old_image;
return 0;
}
//! \endcode
// Copyright (C) 2017 Basile Fraboni
// Copyright (C) 2014 Ivan Kutskir
// All Rights Reserved
// You may use, distribute and modify this code under the
// terms of the MIT license. For further details please refer
// to : https://mit-license.org/
//
//!
//! \file blur.cpp
//! \author Basile Fraboni
//! \date 2017
//!
//! \brief The software is a C++ implementation of a fast
//! Gaussian blur algorithm by Ivan Kutskir. For further details
//! please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! Unsigned char version
//!
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
#include <iostream>
#include <cmath>
#include <cstring>
#include <chrono>
typedef unsigned char uchar;
//!
//! \fn void std_to_box(float boxes[], float sigma, int n)
//!
//! \brief this function converts the standard deviation of
//! Gaussian blur into dimensions of boxes for box blur. For
//! further details please refer to :
//! https://www.peterkovesi.com/matlabfns/#integral
//! https://www.peterkovesi.com/papers/FastGaussianSmoothing.pdf
//!
//! \param[out] boxes boxes dimensions
//! \param[in] sigma Gaussian standard deviation
//! \param[in] n number of boxes
//!
void std_to_box(int boxes[], float sigma, int n)
{
// ideal filter width
float wi = std::sqrt((12*sigma*sigma/n)+1);
int wl = std::floor(wi);
if(wl%2==0) wl--;
int wu = wl+2;
float mi = (12*sigma*sigma - n*wl*wl - 4*n*wl - 3*n)/(-4*wl - 4);
int m = std::round(mi);
for(int i=0; i<n; i++)
boxes[i] = ((i < m ? wl : wu) - 1) / 2;
}
//!
//! \fn void horizontal_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
//!
//! \brief this function performs the horizontal blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void horizontal_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
{
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<h; i++)
{
int ti = i*w;
int li = ti;
int ri = ti+r;
int fv[3] = { in[ti*c+0], in[ti*c+1], in[ti*c+2] };
int lv[3] = { in[(ti+w-1)*c+0], in[(ti+w-1)*c+1], in[(ti+w-1)*c+2] };
int val[3] = { (r+1)*fv[0], (r+1)*fv[1], (r+1)*fv[2] };
for(int j=0; j<r; j++)
{
val[0] += in[(ti+j)*c+0];
val[1] += in[(ti+j)*c+1];
val[2] += in[(ti+j)*c+2];
}
for(int j=0; j<=r; j++, ri++, ti++)
{
val[0] += in[ri*c+0] - fv[0];
val[1] += in[ri*c+1] - fv[1];
val[2] += in[ri*c+2] - fv[2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
for(int j=r+1; j<w-r; j++, ri++, ti++, li++)
{
val[0] += in[ri*c+0] - in[li*c+0];
val[1] += in[ri*c+1] - in[li*c+1];
val[2] += in[ri*c+2] - in[li*c+2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
for(int j=w-r; j<w; j++, ti++, li++)
{
val[0] += lv[0] - in[li*c+0];
val[1] += lv[1] - in[li*c+1];
val[2] += lv[2] - in[li*c+2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
}
}
//!
//! \fn void total_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
//!
//! \brief this function performs the total blur pass for box blur.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void total_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
{
// radius range on either side of a pixel + the pixel itself
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<w; i++)
{
int ti = i;
int li = ti;
int ri = ti+r*w;
int fv[3] = {in[ti*c+0], in[ti*c+1], in[ti*c+2] };
int lv[3] = {in[(ti+w*(h-1))*c+0], in[(ti+w*(h-1))*c+1], in[(ti+w*(h-1))*c+2] };
int val[3] = {(r+1)*fv[0], (r+1)*fv[1], (r+1)*fv[2] };
for(int j=0; j<r; j++)
{
val[0] += in[(ti+j*w)*c+0];
val[1] += in[(ti+j*w)*c+1];
val[2] += in[(ti+j*w)*c+2];
}
for(int j=0; j<=r; j++, ri+=w, ti+=w)
{
val[0] += in[ri*c+0] - fv[0];
val[1] += in[ri*c+1] - fv[1];
val[2] += in[ri*c+2] - fv[2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
for(int j=r+1; j<h-r; j++, ri+=w, ti+=w, li+=w)
{
val[0] += in[ri*c+0] - in[li*c+0];
val[1] += in[ri*c+1] - in[li*c+1];
val[2] += in[ri*c+2] - in[li*c+2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
for(int j=h-r; j<h; j++, ti+=w, li+=w)
{
val[0] += lv[0] - in[li*c+0];
val[1] += lv[1] - in[li*c+1];
val[2] += lv[2] - in[li*c+2];
out[ti*c+0] = val[0]*iarr;
out[ti*c+1] = val[1]*iarr;
out[ti*c+2] = val[2]*iarr;
}
}
}
//!
//! \fn void box_blur_rgb(uchar * in, uchar * out, int w, int h, int c, int r)
//!
//! \brief this function performs a box blur pass.
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
void box_blur_rgb(uchar *& in, uchar *& out, int w, int h, int c, int r)
{
std::swap(in, out);
horizontal_blur_rgb(out, in, w, h, c, r);
total_blur_rgb(in, out, w, h, c, r);
// Note to myself :
// here we could go anisotropic with different radiis rx,ry in HBlur and TBlur
}
//!
//! \fn void fast_gaussian_blur_rgb(uchar * in, uchar * out, int w, int h, int c, float sigma)
//!
//! \brief this function performs a fast Gaussian blur. Applying several
//! times box blur tends towards a true Gaussian blur. Three passes are sufficient
//! for good results. For further details please refer to :
//! http://blog.ivank.net/fastest-gaussian-blur.html
//!
//! \param[in,out] in source channel
//! \param[in,out] out target channel
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] sigma gaussian std dev
//!
void fast_gaussian_blur_rgb(uchar *& in, uchar *& out, int w, int h, int c, float sigma)
{
// sigma conversion to box dimensions
int boxes[3];
std_to_box(boxes, sigma, 3);
box_blur_rgb(in, out, w, h, c, boxes[0]);
box_blur_rgb(out, in, w, h, c, boxes[1]);
box_blur_rgb(in, out, w, h, c, boxes[2]);
}
//! \code{.cpp}
int main(int argc, char * argv[])
{
if( argc < 2 ) exit(1);
const char * image_file = argv[1];
const float sigma = argc > 2 ? std::atof(argv[2]) : 3.;
const char * output_file = argc > 3 ? argv[3] : "blur.png";
// image loading
int width, height, channels;
uchar * image_data = stbi_load(argv[1], &width, &height, &channels, 0);
std::cout << "Source image: " << width<<"x" << height << " ("<<channels<<")" << std::endl;
if(channels < 3)
{
std::cout<< "Input images must be RGB images."<<std::endl;
exit(1);
}
// copy data
int size = width * height * channels;
// output channels r,g,b
uchar * new_image = new uchar[size];
uchar * old_image = new uchar[size];
// channels copy r,g,b
for(int i = 0; i < size; ++i)
old_image[i] = image_data[i];
// per channel filter
auto start = std::chrono::system_clock::now();
fast_gaussian_blur_rgb(old_image, new_image, width, height, channels, sigma);
auto end = std::chrono::system_clock::now();
// stats
float elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end-start).count();
std::cout << "time " << elapsed << "ms" << std::endl;
// channels copy r,g,b
for(int i = 0; i < size; ++i)
image_data[i] = (uchar) std::min((uchar)255, std::max((uchar)0, new_image[i]));
// save
std::string file(output_file);
std::string ext = file.substr(file.size()-3);
if( ext == "bmp" )
stbi_write_bmp(output_file, width, height, channels, image_data);
else if( ext == "jpg" )
stbi_write_jpg(output_file, width, height, channels, image_data, 90);
else
{
if( ext != "png" )
{
std::cout << "format '" << ext << "' not supported writing default .png" << std::endl;
file = file.substr(0, file.size()-4) + std::string(".png");
}
stbi_write_png(file.c_str(), width, height, channels, image_data, channels*width);
}
stbi_image_free(image_data);
// clean memory
delete[] new_image;
delete[] old_image;
return 0;
}
//! \endcode
// Copyright (C) 2017-2021 Basile Fraboni
// Copyright (C) 2014 Ivan Kutskir (for the original fast blur implmentation)
// All Rights Reserved
// You may use, distribute and modify this code under the
// terms of the MIT license. For further details please refer
// to : https://mit-license.org/
//
#define STB_IMAGE_IMPLEMENTATION
#include "stb_image.h"
#define STB_IMAGE_WRITE_IMPLEMENTATION
#include "stb_image_write.h"
#include <iostream>
#include <cstring>
#include <chrono>
#include <type_traits>
#include <algorithm>
#include <cmath>
#include <vector>
//!
//! \file fast_gaussian_blur_template.cpp
//! \author Basile Fraboni
//! \date 2021
//!
//! \brief This contains a C++ implementation of a fast Gaussian blur algorithm in linear time.
//! The image buffer is supposed to be of size w * h * c, with w its width, h its height, and c its number of channels.
//! The default implementation only supports up to 4 channels images, but one can easily add support for any number of channels
//! using either specific template cases or a generic function that takes the number of channels as an explicit parameter.
//! This implementation is focused on learning and readability more than on performance.
//! The fast blur algorithm is performed with several box blur passes over an image.
//! The filter converges towards a true Gaussian blur after several passes. In practice,
//! three passes are sufficient for good quality results.
//! For further details please refer to:
//!
//! http://blog.ivank.net/fastest-gaussian-blur.html
//! https://www.peterkovesi.com/papers/FastGaussianSmoothing.pdf
//! https://github.com/bfraboni/FastGaussianBlur
//!
//!
//! \brief This function performs a single separable horizontal pass for box blur.
//! To complete a box blur pass we need to do this operation two times, one horizontally
//! and one vertically. Templated by buffer data type T, buffer number of channels C, and border policy P.
//! For a detailed description of border policies please refer to:
//! https://en.wikipedia.org/wiki/Kernel_(image_processing)#Edge_Handling
//!
//! \param[in] in source buffer
//! \param[in,out] out target buffer
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] r box dimension
//!
enum Policy {EXTEND, KERNEL_CROP};
template<typename T, int C, Policy P = KERNEL_CROP>
void horizontal_blur(const T * in, T * out, const int w, const int h, const int r)
{
float iarr = 1.f / (r+r+1);
#pragma omp parallel for
for(int i=0; i<h; i++)
{
int ti = i*w, li = ti, ri = ti+r;
float fv[C], lv[C], val[C];
for(int ch = 0; ch < C; ++ch)
{
fv[ch] = P == EXTEND ? in[ti*C+ch] : 0; // unused with kcrop policy
lv[ch] = P == EXTEND ? in[(ti+w-1)*C+ch] : 0; // unused with kcrop policy
val[ch] = P == EXTEND ? (r+1)*fv[ch] : 0;
}
// initial acucmulation
for(int j=0; j<r; j++)
for(int ch = 0; ch < C; ++ch)
{
val[ch] += in[(ti+j)*C+ch];
}
// left border - filter kernel is incomplete
for(int j=0; j<=r; j++, ri++, ti++)
for(int ch = 0; ch < C; ++ch)
{
val[ch] += P == EXTEND ? in[ri*C+ch] - fv[ch] : in[ri*C+ch];
out[ti*C+ch] = P == EXTEND ? val[ch]*iarr : val[ch]/(r+j+1);
}
// center of the image - filter kernel is complete
for(int j=r+1; j<w-r; j++, ri++, ti++, li++)
for(int ch = 0; ch < C; ++ch)
{
val[ch] += in[ri*C+ch] - in[li*C+ch];
out[ti*C+ch] = val[ch]*iarr;
}
// right border - filter kernel is incomplete
for(int j=w-r; j<w; j++, ti++, li++)
for(int ch = 0; ch < C; ++ch)
{
val[ch] += P == EXTEND ? lv[ch] - in[li*C+ch] : -in[li*C+ch];
out[ti*C+ch] = P == EXTEND ? val[ch]*iarr : val[ch]/(r+w-j);
}
}
}
//!
//! \brief Utility template dispatcher function for horizontal_blur. Templated by buffer data type T.
//!
//! \param[in] in source buffer
//! \param[in,out] out target buffer
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] r box dimension
//!
template<typename T>
void horizontal_blur(const T * in, T * out, const int w, const int h, const int c, const int r)
{
switch(c)
{
case 1: horizontal_blur<T,1>(in, out, w, h, r); break;
case 2: horizontal_blur<T,2>(in, out, w, h, r); break;
case 3: horizontal_blur<T,3>(in, out, w, h, r); break;
case 4: horizontal_blur<T,4>(in, out, w, h, r); break;
default: printf("%d channels is not supported yet. Add a specific case if possible or fall back to the generic version.", c); break;
// default: horizontal_blur<T>(in, out, w, h, c, r); break;
}
}
//!
//! \brief This function performs a 2D tranposition of an image. The transposition is done per
//! block to reduce the number of cache misses and improve cache coherency for large image buffers.
//! Templated by buffer data type T and buffer number of channels C.
//!
//! \param[in] in source buffer
//! \param[in,out] out target buffer
//! \param[in] w image width
//! \param[in] h image height
//!
template<typename T, int C>
void flip_block(const T * in, T * out, const int w, const int h)
{
constexpr int block = 256/C;
#pragma omp parallel for collapse(2)
for(int x= 0; x < w; x+= block)
for(int y= 0; y < h; y+= block)
{
const T * p = in + y*w*C + x*C;
T * q = out + y*C + x*h*C;
const int blockx= std::min(w, x+block) - x;
const int blocky= std::min(h, y+block) - y;
for(int xx= 0; xx < blockx; xx++)
{
for(int yy= 0; yy < blocky; yy++)
{
for(int k= 0; k < C; k++)
q[k]= p[k];
p+= w*C;
q+= C;
}
p+= -blocky*w*C + C;
q+= -blocky*C + h*C;
}
}
}
//!
//! \brief Utility template dispatcher function for flip_block. Templated by buffer data type T.
//!
//! \param[in] in source buffer
//! \param[in,out] out target buffer
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//!
template<typename T>
void flip_block(const T * in, T * out, const int w, const int h, const int c)
{
switch(c)
{
case 1: flip_block<T,1>(in, out, w, h); break;
case 2: flip_block<T,2>(in, out, w, h); break;
case 3: flip_block<T,3>(in, out, w, h); break;
case 4: flip_block<T,4>(in, out, w, h); break;
default: printf("%d channels is not supported yet. Add a specific case if possible or fall back to the generic version.", c); break;
// default: flip_block<T>(in, out, w, h, c); break;
}
}
//!
//! \brief this function converts the standard deviation of
//! Gaussian blur into a box radius for each box blur pass. For
//! further details please refer to :
//! https://www.peterkovesi.com/papers/FastGaussianSmoothing.pdf
//!
//! \param[out] boxes box radiis for kernel sizes of 2*boxes[i]+1
//! \param[in] sigma Gaussian standard deviation
//! \param[in] n number of box blur pass
//!
void sigma_to_box_radius(int boxes[], const float sigma, const int n)
{
// ideal filter width
float wi = std::sqrt((12*sigma*sigma/n)+1);
int wl = wi; // no need std::floor
if(wl%2==0) wl--;
int wu = wl+2;
float mi = (12*sigma*sigma - n*wl*wl - 4*n*wl - 3*n)/(-4*wl - 4);
int m = mi+0.5f; // avoid std::round by adding 0.5f and cast to integer type
for(int i=0; i<n; i++)
boxes[i] = ((i < m ? wl : wu) - 1) / 2;
}
//!
//! \brief This function performs a fast Gaussian blur. Applying several
//! times box blur tends towards a true Gaussian blur. Three passes are sufficient
//! for good results. Templated by buffer data type T. The input buffer is also used
//! as temporary and modified during the process hence it can not be constant.
//!
//! Normally the process should alternate between horizontal and vertical passes
//! as much times as we want box blur passes. However thanks to box blur properties
//! the separable passes can be performed in any order without changing the result.
//! Hence for performance purposes the algorithm is:
//! - apply N times horizontal blur (-> horizontal passes)
//! - flip the image buffer (transposition)
//! - apply N times horizontal blur (-> vertical passes)
//! - flip the image buffer (transposition)
//!
//! We provide two version of the function:
//! - 3 passes only
//! - N passes (in which more std:swaps are used)
//!
//! \param[in,out] in source buffer
//! \param[in,out] out target buffer
//! \param[in] w image width
//! \param[in] h image height
//! \param[in] c image channels
//! \param[in] sigma Gaussian standard deviation
//!
template<typename T>
void fast_gaussian_blur_3(T *& in, T *& out, const int w, const int h, const int c, const float sigma)
{
// compute box kernel sizes
int n = 3;
int boxes[3];
sigma_to_box_radius(boxes, sigma, n);
// perform 3 horizontal blur passes
horizontal_blur(in, out, w, h, c, boxes[0]);
horizontal_blur(out, in, w, h, c, boxes[1]);
horizontal_blur(in, out, w, h, c, boxes[2]);
// flip buffer
flip_block(out, in, w, h, c);
// perform 3 horizontal blur passes
horizontal_blur(in, out, h, w, c, boxes[0]);
horizontal_blur(out, in, h, w, c, boxes[1]);
horizontal_blur(in, out, h, w, c, boxes[2]);
// flip buffer
flip_block(out, in, h, w, c);
// swap pointers to get result in the ouput buffer
std::swap(in, out);
}
template<typename T>
void fast_gaussian_blur_N(T *& in, T *& out, const int w, const int h, const int c, const float sigma, const int n)
{
// compute box kernel sizes
std::vector<int> boxes(n);
sigma_to_box_radius(boxes.data(), sigma, n);
// perform N horizontal blur passes
for(int i = 0; i < n; ++i)
{
horizontal_blur(in, out, w, h, c, boxes[i]);
std::swap(in, out);
}
// flip buffer
flip_block(in, out, w, h, c);
std::swap(in, out);
// perform N horizontal blur passes
for(int i = 0; i < n; ++i)
{
horizontal_blur(in, out, h, w, c, boxes[i]);
std::swap(in, out);
}
// flip buffer
flip_block(in, out, h, w, c);
}
typedef unsigned char uchar;
int main(int argc, char * argv[])
{
// helper
if( argc < 4 )
{
printf("%s [input] [output] [sigma] [passes - optional]\n", argv[0]);
exit(1);
}
// load image
int width, height, channels;
uchar * image_data = stbi_load(argv[1], &width, &height, &channels, 0);
std::cout << "Source image: " << width<<"x" << height << " ("<<channels<<")" << std::endl;
// temporary data
std::size_t size = width * height * channels;
float * new_image = new float[size];
float * old_image = new float[size];
// channels copy r,g,b
for(int i = 0; i < size; ++i)
old_image[i] = (float)image_data[i] / 255.f;
// perform gaussian blur
float sigma = std::atof(argv[3]);
if( argc > 4 )
fast_gaussian_blur_N(old_image, new_image, width, height, channels, sigma, std::atoi(argv[4]));
else
fast_gaussian_blur_3(old_image, new_image, width, height, channels, sigma);
for(int i = 0; i < size; ++i)
image_data[i] = (uchar)(new_image[i] * 255.f);
// save image
std::string file(argv[2]);
std::string ext = file.substr(file.size()-3);
if( ext == "bmp" )
stbi_write_bmp(argv[2], width, height, channels, image_data);
else if( ext == "jpg" )
stbi_write_jpg(argv[2], width, height, channels, image_data, 90);
else
{
if( ext != "png" )
{
std::cout << "format '" << ext << "' not supported writing default .png" << std::endl;
file = file.substr(0, file.size()-4) + std::string(".png");
}
stbi_write_png(file.c_str(), width, height, channels, image_data, channels*width);
}
// clean memory
stbi_image_free(image_data);
delete[] new_image;
delete[] old_image;
return 0;
}
@LI7XI
Copy link

LI7XI commented Jan 24, 2021

I'm pretty new to c++ so i had few questions.

What's the difference between int and float versions?
Also will the blur function return image as a file or a pixels array or a buffer?
thanks in advanced!

@bfraboni
Copy link
Author

bfraboni commented Jan 24, 2021

I'm pretty new to c++ so i had few questions.

What's the difference between int and float versions?

There is not much difference between them. I've put both since I used it in two different codebases, one working with uchar 0 - 255 encoded images, and another one with floating point encoded HDR images.

Also will the blur function return image as a file or a pixels array or a buffer?

Here the Image structure /class is not defined, hence the blur function will not compile. This is only an example usage of the above code. The blur works independently per channel, whatever your Image structure is, you need to put all pixel values per channel in a buffer (here oldr, oldg, oldb) as input and provide the output buffer (here newr, newb, newg) to the function fast_gaussian_blur. Then you will have to copy back these buffers in you Image structure.

thanks in advanced!

Hope this helps. If needed I can show an example with a single header image library, such as stb_image, providing such an Image structure.

@LI7XI
Copy link

LI7XI commented Jan 27, 2021

Sorry for late reply, i didn't setup the notifications.

Here the Image structure /class is not defined, hence the blur function will not compile.

So the Image blur(const Image& source, double sigma) source parameter should be an image data?
Tbh i still don't understand how it works, maybe because i don't know enough about cpp (mostly i work with javascript).

Hope this helps. If needed I can show an example with a single header image library, such as stb_image, providing such an Image structure.

That would be really appreciated if you have a noob beginner friendly example ^^;
But if i still didn't understand then just tell me the thread head and i will research!

Another thing, can this be used in real-time? as example for a video after converting frames to buffer?

@bfraboni
Copy link
Author

bfraboni commented Jan 29, 2021

Hi, I was a bit busy but I spent some time rewriting more readable and usable examples. There are now four implementations : integer and floating point versions, and channel per channel or all channels at once for each.

I recommend you check the new repo it came with : https://github.com/bfraboni/FastGaussianBlur. The final code on the repo has almost no dependencies (only single header libs and std lib used) and can be compiled super easily with the provided Makefile.

The fastest version is the RGB floating point version (blur_float_rgb.cpp). It runs on a single core of the CPU and achieves ~3ms to blur 160k pixels and ~70ms for 1M pixels on a Ryzen 7 2700 X. Thus real time is possible but with reasonable image resolutions. However it could be made faster with a proper SIMD implementation, or a GPU implementation.

@LI7XI
Copy link

LI7XI commented Jan 29, 2021

Hi, I was a bit busy but I spent some time rewriting more readable and usable examples. There are now four implementations : integer and floating point versions, and channel per channel or all channels at once for each.

I recommend you check the new repo it came with : https://github.com/bfraboni/FastGaussianBlur. The final code on the repo has almost no dependencies (only single header libs and std lib used) and can be compiled super easily with the provided Makefile.

The fastest version is the RGB floating point version (blur_float_rgb.cpp). It runs on a single core of the CPU and achieves ~3ms to blur 160k pixels and ~70ms for 1M pixels on a Ryzen 7 2700 X. Thus real time is possible but with reasonable image resolutions. However it could be made faster with a proper SIMD implementation, or a GPU implementation.

I checked the repo and it looked pretty clean and straightforward! Definitely will play out with it in my spare time.
Edit: I tried it and i'm pretty impressed of its speed, compared to ImageMagick or GraphicMagick, it's performing at light speed!

It runs on a single core of the CPU

Can it be made multi-threaded?
Also these numbers are pretty good for real-time but only reasonable for kinda low res (160k pixels).
GPU implementation of this would add a huge boost i guess. I don't know the details but it's something worth experimenting.

Great work!

P.S. Do you have an idea of how windows achieving its acrylic blur? It's fast like real-time with high sigma factor and blurs anything behind the window or under the taskbar.

@bfraboni
Copy link
Author

bfraboni commented Jan 29, 2021

Can it be made multi-threaded?
Also these numbers are pretty good for real-time but only reasonable for kinda low res (160k pixels).
GPU implementation of this would add a huge boost i guess. I don't know the details but it's something worth experimenting.

Yes I think so:

  • OpenMP could make is faster by dividing the loops, but it is not very straightforward for me how to cleanly parallelize it with simple OpenMP now, worth a try someday.
  • SIMD optimizations could make it faster by doing multiple operation at the same time, but makes the code heavier to read, so I am not doing it for now.
  • GPU implem with CUDA (sadly hardware dependent), or OpenCL or Compute Shader could make it faster too. But right now I do not have time to experiment GPU toys.

Great work!

Thanks, I am glad you like it.

P.S. Do you have an idea of how windows achieving its acrylic blur? It's fast like real-time with high sigma factor and blurs anything behind the window or under the taskbar.

I mostly work on Linux so I don't see what Windows blur is, but the fast Gaussian blur approx is linear in time regarding the size of the input, but independent of sigma, hence sigma = 5 is equally fast as sigma = 50.

@bfraboni
Copy link
Author

Ok OpenMP support for multi threading was trivial to add so it is now in every version (cf . bfraboni/FastGaussianBlur@f061404). Another improvement was to use direct uchar buffers which is a lot faster now (cf. bfraboni/FastGaussianBlur@9b0eba1). The fastest version blurs 160k pixels in ~1ms, and 1000k pixels in ~7ms on all cores of a Ryzen 7 2700X CPU with OpenMP. So I think you can use it in real time applications now ;)

@LI7XI
Copy link

LI7XI commented Jan 29, 2021

SIMD optimizations could make it faster by doing multiple operation at the same time, but makes the code heavier to read, so I am not doing it for now.

SIMD optimization sound good but it's a bit complicated, i may check on it at later time.

GPU implem with CUDA (sadly hardware dependent), or OpenCL or Compute Shader could make it faster too. But right now I do not have time to experiment GPU toys.

That's sad :(
Since i have an igpu this wont work, but at sometime in future this might be used as an optional speedup.
Like if Cuda GPU detected then use Cuda optimizations, Else do it on CPU.

hence sigma = 5 is equally fast as sigma = 50.

Thats impressive, i thought the sigma makes Big difference.

Ok OpenMP support for multi threading was trivial to add so it is now in every version (cf . bfraboni/FastGaussianBlur@f061404). Another improvement was to use direct uchar buffers which is a lot faster now (cf. bfraboni/FastGaussianBlur@9b0eba1). The fastest version blurs 160k pixels in ~1ms, and 1000k pixels in ~7ms on all cores of a Ryzen 7 2700X CPU with OpenMP. So I think you can use it in real time applications now ;)

OMG you are a godsend! ヾ(≧ ▽ ≦)ゝ
I was about to say i'll see how OpenMP works and try to implement it, but you were fast. ^^
Also with these numbers finally my dream came true, i can blur videos in real-time!

I guess SIMD optimizations aren't needed for now right? Considering how fast it became..

I want to implement this in an app where it sets a video as desktop wallpaper, so i can blur it on the fly and do some fancy effects in real-time.
That's why i want it. :)

Huge thanks for implementing this in C++!

@bfraboni
Copy link
Author

Yes I am happy with the perfs now, I updloaded a graph on the github repo for comparison with naive Gaussian blur (convolution with a kernel). The actual code is not in the best shape but is ok for now, I will add a single C++ header grouping all implementations later and rework a bit all functions to improve readability. And also add the naive Gaussian blur implementation for comparison.

Well, your turn to code now ;)

Cheers,
B.

@zsln
Copy link

zsln commented Mar 2, 2021

Hey B,
thanks for providing this useful implementation. I've encountered some weird behavior using the blur_float.cpp .
Suppose following scenario:

// somewhere in the code
const auto res = 720; // px. or just any other
const auto val = 7.f; // just some value
std::vector<float> data(res*res, 0.f);
std::vector<float> result(res*res, 0.f);

data[0] = val; /// border/corner pixel
data[1080] = val; /// inside pixel

float* in = data.data();
float* out = result.data();

fast_gaussian_blur(in, out, res, res, 4);

This basically just inserts two values (one on the border/corner and one not) and blurrs away. I've saved the results into some grayscale images and I've also debugged both vectors to verify them. It seems the value result[0] is significantly larger than result[1080].
I assume code handles border/corner pixels in an unexpected way or there might be some out-of-bounds access at play? I've also attached two images which highlight this issue.

Input Image. You see two equally bright pixels:
input

Output Image. You see that the corner pixel is way brighter than the inside pixel.

output

Digging into your implementation would be very time consuming, so I'm writing in hope that you might find the time to check this issue.

Thanks alot!

@bfraboni
Copy link
Author

bfraboni commented Mar 2, 2021

Hi, thanks for your interest in this piece of code.

For the pixel issue this is an expected behavior of the algorithm since it performs border extrapolation of the image. It is a common solution to handle the border problem of image convolution with a kernel, where some data is missing under the extent of the filter kernel.

There is a discussion on that topic here:
https://dsp.stackexchange.com/questions/25889/gaussian-filter-close-to-image-border
Wikipedia has a section on this too with description of common policies:
https://en.wikipedia.org/wiki/Kernel_(image_processing)#Edge_Handling

What you were expecting is a "kernel crop" policy which would just skip the kernel positions outside the image, instead of extrapolating pixel values, hence resulting in a much darker pixel value in the top left border. Kernel crop and kernel re-normalization is usually avoided for performance purpose since it requires extra care near the borders.

Do you specifically require a kernel crop version or the current could be fine for your purposes ?
(If yes I may try it)

B-

@zsln
Copy link

zsln commented Mar 3, 2021

Hey, thanks for the fast reply.

This makes sense, I think I was expecting some sort of cropping behavior. Consequently it was quite the surprise to see ratios between values change after blurring, especially if some small border value would suddenly dominate all other previously largest values.

For my purpose I would require a version with a kernel crop, because it is quite important that largest values from the input would remain so after blurring. I would like to take upon and thank you for your offer. Feel free to take your time, as it is not an urgent matter.

Thank you very much!

@bfraboni
Copy link
Author

bfraboni commented Mar 4, 2021

I would like to take upon and thank you for your offer. Feel free to take your time, as it is not an urgent matter.

Did it, please check the new template version I uploaded (which also should be faster than any other) :
https://gist.github.com/bfraboni/946d9456b15cac3170514307cf032a27#file-fast_blur_template-cpp

I've added a template parameter on the horizontal_blur function that changes the border policy and defaulted as kernel crop.
See the example below for an idea of the difference between kernel crop and extend policy:
Extend
test-extend

Kernel Crop
test-kernel-crop

Difference 10 times magnified
test-diff-x10

Please give it a try with your example data and let me know if its ok now.

B-

Edit: I tried with HDR data and bright pixels as in your example and it looks ok now:
Extend
test-border-tone-000

Kernel Crop
test-border-kc-tone-001

@zsln
Copy link

zsln commented Mar 9, 2021

Hey B,

excuse the late reply. So I've played around with the old code and the new one.
I've took the old example, padded a black pixel border around it, blurred it, and clipped again in the hope to have a workaround. This managed to avoid the issues, however the blurring pattern was not very round as I would expect.

Tried the same with the new templated version. I've noticed the blurring pattern improved, it looks very round now.
I've attached some images which shows this very well:

Input:
pre

Old Example using 1px padding
blur

New Example without padding
pad

Currently the ratio of the maxima changes after blurring, but I think this is unavoidable as long as I don't pad with 50% of Sigma to avoid this completely. This however is very specific to my application so I think this should be fine.

Thanks alot!

@bfraboni
Copy link
Author

bfraboni commented Mar 9, 2021

Hi,
Indeed there is an issue in the blur passes (H or V I do not know) in your first example, I should look at the code...

For your second example with the new code (template version) this should not happen with the kernel crop policy.
Both pixels should have the same brightness, as in my example in the precedent comment.
I can send you the full example with bright pixels, or you can send me your main.cpp so I check where this comes from.

Basile

@zsln
Copy link

zsln commented Mar 9, 2021

Hey,

Here is the example snippet I use:

constexpr auto x_res = 50;
constexpr auto y_res = 50;
constexpr auto channels = 1;
constexpr auto sigma = 4;
constexpr auto iterations = 3;

std::vector<float> img;
img.resize(x_res * y_res);

img[0] = 1.f;
img[1275] = 1.f;

std::vector<float> copy(img.size());

float* in = img.data();
float* out = copy.data();
fast_gaussian_blur_N(in, out, x_res, y_res, channels, sigma, iterations);

printf("%f\n", out[0]); // or copy[...] for that matter...
printf("%f\n", out[1275]);

For me it prints

0.016883
0.009507

Thanks for the quick response

@bfraboni
Copy link
Author

bfraboni commented Mar 12, 2021

Ok I was wrong, indeed without the explicit padding the diffusion to the adjacent pixels is not taken into account in the next iteration. Hence this should be fine with padding, but half the kernel support is not enough for perfect accuracy, it should be the sum of half the kernel support at each iteration:

For N passes just sum up the content of the array boxes that is filled by the function sigma_to_box_radius to get the correct value.
B-

@zsln
Copy link

zsln commented Mar 12, 2021

I would consider putting a comment at the top of the files. I suppose not everyone might check the comments and will not be aware of this.

@bfraboni
Copy link
Author

Yes, I may add a comment on this indeed, this is something important considering that a multi-pass algorithm progressively diffuses signal rather going to the desired diffusion step in one pass (e.g. true Gaussian blur), and that the fix is to use an adequate padding for the diffusion process.
Thanks for your feedback.
B-

@bfraboni
Copy link
Author

Hello @Youana-Joseph ,

The latest version of this code is here:
https://github.com/bfraboni/FastGaussianBlur
So please check to one out instead, it should be faster.

The blur code itself should be pretty fast already without the allocations, make sure you compile with OpenMP linked, to get the best out of it.
Also I have no idea what your use case is, so it is difficult to be of any help..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment