Skip to content

Instantly share code, notes, and snippets.

@bobthecat
Created September 10, 2013 13:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bobthecat/6509321 to your computer and use it in GitHub Desktop.
Save bobthecat/6509321 to your computer and use it in GitHub Desktop.
Rcpp implementation of the Dice coefficient
---
title: Dice coefficient with RcppEigen
author: David Ruau
license: GPL (>= 2)
tags: Rcpp RcppEigen
summary: Compute the Dice coefficient (1945) between column of a matrix.
---
The Dice coefficient is a simple measure of similarity / dissimilarity (depending how you take it). It is intended to compare asymmetric binary vectors, meaning one of the combination (usually 0-0) is not important and agreement (1-1 pairs) have more weight than disagreement (1-0 or 0-1 pairs). Imagine the following contingency table:
```{}
1 0
1 a b
0 c d
```
The Dice coef is: `(2*a) / (2*a +b + c)`
The implementation of the Dice coefficient in the package `arules` relied to the algorithm from Leisch (2005) using the `tcrossproduct()` function in R. It was still too slow for me. So I wrote a Rcpp version of crossprod that is 2-3 time faster. This is based on the example code in the `RcppEigen` vignette.
```{}
library(Rcpp)
library(RcppEigen)
library(inline)
crossprodCpp <- '
using Eigen::Map;
using Eigen::MatrixXi;
using Eigen::Lower;
const Map<MatrixXi> A(as<Map<MatrixXi> >(AA));
const int m(A.rows()), n(A.cols());
MatrixXi AtA(MatrixXi(n, n).setZero().selfadjointView<Lower>().rankUpdate(A.adjoint()));
return wrap(AtA);
'
fcprd <- cxxfunction(signature(AA = "matrix"), crossprodCpp, "RcppEigen")
```
Using the following wrapper function in R we compute the Dice coefficient.
```{r}
diceR <- function(X){
a <- fcprd(X)
nx <- ncol(X)
rsx <- colSums(X)
c <- matrix(rsx, nrow = nx, ncol = nx) - a
b <- t(c)
m <- (2 * a) / (2*a + b + c)
return(m)
}
```
Let's do a little benchmark.
This new function is roughly 3 time faster than the one in arules.
```
library(microbenchmark)
library(arules)
# test matrix
x <- rbinom(1:200000, 1, 0.5)
X <- matrix(x, nrow = 200, ncol = 1000)
m <- microbenchmark(diceR(X), dissimilarity(t(X), method="dice"), times=100)
m
Unit: milliseconds
expr min lq median uq max neval
diceR(X) 77.12359 163.8737 168.0566 171.4417 185.0053 100
dissimilarity(t(X), method = "dice") 275.06503 365.3696 372.0783 453.1885 1118.4639 100
```
This implementation above use the `inline` library. Below is the pure C++ function. This is the form you want if you want to put this code in a R package as a source file.
```
#include <Rcpp.h>
#include <RcppEigen.h>
using namespace Rcpp;
SEXP crossprodCpp(SEXP binaryMat){
using Eigen::Map;
using Eigen::MatrixXi;
using Eigen::Lower;
const Map<MatrixXi> A(as<Map<MatrixXi> >(binaryMat));
const int m(A.rows()), n(A.cols());
MatrixXi AtA(MatrixXi(n, n).setZero().selfadjointView<Lower>().rankUpdate(A.adjoint()));
return wrap(AtA);
}
```
@eddelbuettel
Copy link

Nice. I think we need to a) standardize the tags (just 'eigen') and b) convert the code to use the newer Rcpp attributes (see the corresponding vignette). I can do that quickly if you;d want me to.

@bobthecat
Copy link
Author

Hi,
If you could do it that would be great. I have no idea about the new Rcpp attributes.

@eddelbuettel
Copy link

Will do, probably during the soccer game this evening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment