Introduction

Starting with version 1.2, cubature now uses Rcpp. Also, version 1.3 uses the newer version (1.0.2) of Steven G. Johnson’s hcubature and pcubature routines, including the vectorized interface.

Per the documentation, use of pcubature is advisable only for smooth integrands in dimesions up to three at most. In fact, the pcubature routines perform significantly worse than the vectorized hcubature in inappropriate cases. So when in doubt, you are better off using hcubature.

The main point of this note is to examine the difference vectorization makes. My recommendations are below in the summary section.

A Timing Harness

Our harness will provide timing results for hcubature, pcubature (where appropriate) and R2Cuba calls. We begin by creating a harness for these calls.

library(microbenchmark)
library(cubature)
library(R2Cuba)

harness <- function(which = NULL,
                    f, fv, lowerLimit, upperLimit, tol = 1e-3, times = 20, ...) {

    fns <- c(hc = "Non-vectorized Hcubature",
             hc.v = "Vectorized Hcubature",
             pc = "Non-vectorized Pcubature",
             pc.v = "Vectorized Pcubature",
             cc = "R2Cuba::cuhre")

    hc <- function() cubature::hcubature(f = f,
                                         lowerLimit = lowerLimit,
                                         upperLimit = upperLimit,
                                         tol = tol,
                                         ...)

    hc.v <- function() cubature::hcubature(f = fv,
                                           lowerLimit = lowerLimit,
                                           upperLimit = upperLimit,
                                           tol = tol,
                                           vectorInterface = TRUE,
                                           ...)

    pc <- function() cubature::pcubature(f = f,
                                         lowerLimit = lowerLimit,
                                         upperLimit = upperLimit,
                                         tol = tol,
                                         ...)

    pc.v <- function() cubature::pcubature(f = fv,
                                           lowerLimit = lowerLimit,
                                           upperLimit = upperLimit,
                                           tol = tol,
                                           vectorInterface = TRUE,
                                           ...)
    ndim = length(lowerLimit)

    cc <- function() R2Cuba::cuhre(ndim = ndim, ncomp = 1, integrand = f,
                                   lower = lowerLimit, upper = upperLimit,
                                   flags = list(verbose = 0, final = 1),
                                   rel.tol = tol,
                                   max.eval = 10^6,
                                   ...)
    if (is.null(which)) {
        fnIndices <- seq_along(fns)
    } else {
        fnIndices <- match(which, names(fns))
    }
    fnList <- lapply(names(fns)[fnIndices], function(x) call(x))
    argList <- c(fnList, unit = "ms", times = times)
    result <- do.call(microbenchmark, args = argList)
    d <- summary(result)
    d$expr <- fns[fnIndices]
    d
}

We reel off the timing runs.

Example 1.

func <- function(x) sin(x[1]) * cos(x[2]) * exp(x[3])
func.v <- function(x) {
    matrix(apply(x, 2, function(z) sin(z[1]) * cos(z[2]) * exp(z[3])), ncol = ncol(x))
}

d <- harness(f = func, fv = func.v,
             lowerLimit = rep(0, 3),
             upperLimit = rep(1, 3),
             tol = 1e-5,
             times = 100)
knitr::kable(d, digits = 3, row.names = FALSE)
expr min lq mean median uq max neval
Non-vectorized Hcubature 4.183 4.351 4.666 4.444 4.673 7.653 100
Vectorized Hcubature 0.753 0.800 0.890 0.832 0.867 2.393 100
Non-vectorized Pcubature 13.304 13.928 14.821 14.541 14.898 45.800 100
Vectorized Pcubature 2.190 2.320 2.524 2.380 2.472 4.009 100
R2Cuba::cuhre 0.605 0.688 0.784 0.718 0.765 2.334 100

Multivariate Normal

Using cubature, we evaluate \[ \int_R\phi(x)dx \] where \(\phi(x)\) is the three-dimensional multivariate normal density with mean 0, and variance \[ \Sigma = \left(\begin{array}{rrr} 1 &\frac{3}{5} &\frac{1}{3}\\ \frac{3}{5} &1 &\frac{11}{15}\\ \frac{1}{3} &\frac{11}{15} & 1 \end{array} \right) \] and \(R\) is \([-\frac{1}{2}, 1] \times [-\frac{1}{2}, 4] \times [-\frac{1}{2}, 2].\)

We construct a scalar function (my_dmvnorm) and a vector analog (my_dmvnorm_v). First the functions.

m <- 3
sigma <- diag(3)
sigma[2,1] <- sigma[1, 2] <- 3/5 ; sigma[3,1] <- sigma[1, 3] <- 1/3
sigma[3,2] <- sigma[2, 3] <- 11/15

my_dmvnorm <- function (x, mean, sigma) {
    x <- matrix(x, ncol = length(x))
    distval <- stats::mahalanobis(x, center = mean, cov = sigma)
    logdet <- sum(log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values))
    exp(-(ncol(x) * log(2 * pi) + logdet + distval)/2)
}

my_dmvnorm_v <- function (x, mean, sigma) {
    distval <- stats::mahalanobis(t(x), center = mean, cov = sigma)
    logdet <- sum(log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values))
    exp(matrix(-(nrow(x) * log(2 * pi) + logdet + distval)/2, ncol = ncol(x)))
}

Now the timing.

d <- harness(f = my_dmvnorm, fv = my_dmvnorm_v,
             lowerLimit = rep(-0.5, 3),
             upperLimit = c(1, 4, 2),
             tol = 1e-5,
             times = 10,
             mean = rep(0, m), sigma = sigma)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 1741.670 1771.437 1787.878 1780.758 1798.312 1862.847 10
Vectorized Hcubature 3.323 3.341 3.746 3.532 4.223 4.379 10
Non-vectorized Pcubature 757.702 761.811 778.558 765.213 786.515 868.152 10
Vectorized Pcubature 2.350 2.353 2.519 2.398 2.724 2.938 10
R2Cuba::cuhre 612.642 613.002 627.903 622.214 634.663 662.175 10

The effect of vectorization is huge. So it makes sense for users to vectorize the integrands as much as possible for efficiency.

Furthermore, for this particular example, we expect mvtnorm::pmvnorm to do pretty well since it is specialized for the multivariate normal. The good news is that the vectorized versions of hcubature and pcubature are quite competitive if you compare the table above to the one below.

library(mvtnorm)
g1 <- function() mvtnorm::pmvnorm(lower = rep(-0.5, m),
                                  upper = c(1, 4, 2), mean = rep(0, m), corr = sigma,
                                  alg = Miwa(), abseps = 1e-5, releps = 1e-5)
g2 <- function() mvtnorm::pmvnorm(lower = rep(-0.5, m),
                                  upper = c(1, 4, 2), mean = rep(0, m), corr = sigma,
                                  alg = GenzBretz(), abseps = 1e-5, releps = 1e-5)
g3 <- function() mvtnorm::pmvnorm(lower = rep(-0.5, m),
                                  upper = c(1, 4, 2), mean = rep(0, m), corr = sigma,
                                  alg = TVPACK(), abseps = 1e-5, releps = 1e-5)

knitr::kable(summary(microbenchmark(g1(), g2(), g3(), times = 20)),
             digits = 3, row.names = FALSE)
expr min lq mean median uq max neval
g1() 1.513 2.101 2.624 2.729 2.763 6.311 20
g2() 1.517 2.731 2.630 2.736 2.768 2.824 20
g3() 1.509 2.725 2.536 2.733 2.780 3.232 20

Product of cosines

testFn0 <- function(x) prod(cos(x))
testFn0_v <- function(x) matrix(apply(x, 2, function(z) prod(cos(z))), ncol = ncol(x))

d <- harness(f = testFn0, fv = testFn0_v,
             lowerLimit = rep(0, 2), upperLimit = rep(1, 2), times = 100)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 0.342 0.355 0.399 0.368 0.403 1.394 100
Vectorized Hcubature 0.134 0.139 0.152 0.142 0.151 0.271 100
Non-vectorized Pcubature 0.491 0.506 0.569 0.525 0.580 0.837 100
Vectorized Pcubature 0.255 0.265 0.291 0.272 0.293 0.521 100
R2Cuba::cuhre 0.287 0.308 0.365 0.320 0.372 1.456 100

Gaussian function

testFn1 <- function(x) {
    val <- sum(((1 - x) / x)^2)
    scale <- prod((2 / sqrt(pi)) / x^2)
    exp(-val) * scale
}

testFn1_v <- function(x) {
    val <- matrix(apply(x, 2, function(z) sum(((1 - z) / z)^2)), ncol(x))
    scale <- matrix(apply(x, 2, function(z) prod((2 / sqrt(pi)) / z^2)), ncol(x))
    exp(-val) * scale
}

d <- harness(f = testFn1, fv = testFn1_v,
             lowerLimit = rep(0, 3), upperLimit = rep(1, 3), times = 10)

knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 28.288 28.970 29.779 29.582 29.828 33.849 10
Vectorized Hcubature 8.032 8.206 8.679 8.378 9.170 9.775 10
Non-vectorized Pcubature 0.617 0.659 0.688 0.684 0.710 0.757 10
Vectorized Pcubature 0.301 0.331 0.388 0.384 0.395 0.625 10
R2Cuba::cuhre 15.262 15.665 16.299 16.084 16.522 18.961 10

Discontinuous function

testFn2 <- function(x) {
    radius <- 0.50124145262344534123412
    ifelse(sum(x * x) < radius * radius, 1, 0)
}

testFn2_v <- function(x) {
    radius <- 0.50124145262344534123412
    matrix(apply(x, 2, function(z) ifelse(sum(z * z) < radius * radius, 1, 0)), ncol = ncol(x))
}

d <- harness(which = c("hc", "hc.v", "cc"),
             f = testFn2, fv = testFn2_v,
             lowerLimit = rep(0, 2), upperLimit = rep(1, 2), times = 10)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 361.514 362.203 366.917 363.348 364.637 399.637 10
Vectorized Hcubature 81.704 82.573 84.016 83.350 84.300 90.672 10
R2Cuba::cuhre 969.891 980.440 987.537 982.629 990.671 1017.835 10

A Simple Polynomial (product of coordinates)

testFn3 <- function(x) prod(2 * x)
testFn3_v <- function(x) matrix(apply(x, 2, function(z) prod(2 * z)), ncol = ncol(x))

d <- harness(f = testFn3, fv = testFn3_v,
             lowerLimit = rep(0, 3), upperLimit = rep(1, 3), times = 20)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 0.635 0.679 0.747 0.706 0.737 1.655 20
Vectorized Hcubature 0.179 0.194 0.237 0.198 0.211 0.940 20
Non-vectorized Pcubature 0.533 0.587 0.611 0.606 0.633 0.696 20
Vectorized Pcubature 0.163 0.174 0.181 0.182 0.187 0.197 20
R2Cuba::cuhre 0.499 0.541 0.572 0.571 0.590 0.665 20

Gaussian centered at

testFn4 <- function(x) {
    a <- 0.1
    s <- sum((x - 0.5)^2)
    ((2 / sqrt(pi)) / (2. * a))^length(x) * exp (-s / (a * a))
}

testFn4_v <- function(x) {
    a <- 0.1
    r <- apply(x, 2, function(z) {
        s <- sum((z - 0.5)^2)
        ((2 / sqrt(pi)) / (2. * a))^length(z) * exp (-s / (a * a))
    })
    matrix(r, ncol = ncol(x))
}

d <- harness(f = testFn4, fv = testFn4_v,
             lowerLimit = rep(0, 2), upperLimit = rep(1, 2), times = 20)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 14.968 15.401 16.183 16.386 16.699 18.377 20
Vectorized Hcubature 3.441 3.550 3.665 3.605 3.701 4.627 20
Non-vectorized Pcubature 22.098 22.806 23.358 23.565 23.951 24.209 20
Vectorized Pcubature 4.967 4.997 5.542 5.065 5.242 12.123 20
R2Cuba::cuhre 4.170 4.275 4.529 4.395 4.489 5.482 20

Double Gaussian

testFn5 <- function(x) {
    a <- 0.1
    s1 <- sum((x - 1 / 3)^2)
    s2 <- sum((x - 2 / 3)^2)
    0.5 * ((2 / sqrt(pi)) / (2. * a))^length(x) * (exp(-s1 / (a * a)) + exp(-s2 / (a * a)))
}
testFn5_v <- function(x) {
    a <- 0.1
    r <- apply(x, 2, function(z) {
        s1 <- sum((z - 1 / 3)^2)
        s2 <- sum((z - 2 / 3)^2)
        0.5 * ((2 / sqrt(pi)) / (2. * a))^length(z) * (exp(-s1 / (a * a)) + exp(-s2 / (a * a)))
    })
    matrix(r, ncol = ncol(x))
}

d <- harness(f = testFn5, fv = testFn5_v,
             lowerLimit = rep(0, 2), upperLimit = rep(1, 2), times = 20)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 35.329 35.746 36.953 36.293 37.332 43.165 20
Vectorized Hcubature 9.072 9.339 9.980 9.709 10.853 11.253 20
Non-vectorized Pcubature 23.732 24.829 25.548 25.215 26.021 29.313 20
Vectorized Pcubature 6.498 6.641 7.060 6.935 7.222 8.212 20
R2Cuba::cuhre 10.698 11.124 13.378 12.019 12.214 45.406 20

Tsuda’s Example

testFn6 <- function(x) {
    a <- (1 + sqrt(10.0)) / 9.0
    prod( a / (a + 1) * ((a + 1) / (a + x))^2)
}

testFn6_v <- function(x) {
    a <- (1 + sqrt(10.0)) / 9.0
    r <- apply(x, 2, function(z) prod( a / (a + 1) * ((a + 1) / (a + z))^2))
    matrix(r, ncol = ncol(x))
}

d <- harness(f = testFn6, fv = testFn6_v,
             lowerLimit = rep(0, 3), upperLimit = rep(1, 3), times = 20)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 19.213 19.447 20.271 19.747 20.772 24.665 20
Vectorized Hcubature 3.556 3.680 3.836 3.705 3.779 4.897 20
Non-vectorized Pcubature 101.111 102.592 104.465 103.571 106.627 109.684 20
Vectorized Pcubature 16.325 16.860 17.508 17.172 17.883 21.528 20
R2Cuba::cuhre 5.206 5.533 6.134 5.760 6.783 8.529 20

Morokoff & Calflish Example

testFn7 <- function(x) {
    n <- length(x)
    p <- 1/n
    (1 + p)^n * prod(x^p)
}
testFn7_v <- function(x) {
    matrix(apply(x, 2, function(z) {
        n <- length(z)
        p <- 1/n
        (1 + p)^n * prod(z^p)
    }), ncol = ncol(x))
}

d <- harness(f = testFn7, fv = testFn7_v,
             lowerLimit = rep(0, 3), upperLimit = rep(1, 3), times = 20)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 39.321 39.587 40.155 40.310 40.490 40.774 20
Vectorized Hcubature 7.207 7.309 7.700 7.492 7.932 9.114 20
Non-vectorized Pcubature 96.106 98.310 101.345 99.295 100.605 131.865 20
Vectorized Pcubature 16.597 16.824 17.498 17.186 18.080 19.709 20
R2Cuba::cuhre 38.903 41.015 41.647 42.033 42.456 43.355 20

Wang-Landau Sampling 1d, 2d Examples

I.1d <- function(x) {
    sin(4 * x) *
        x * ((x * ( x * (x * x - 4) + 1) - 1))
}
I.1d_v <- function(x) {
    matrix(apply(x, 2, function(z)
        sin(4 * z) *
        z * ((z * ( z * (z * z - 4) + 1) - 1))),
        ncol = ncol(x))
}
d <- harness(f = I.1d, fv = I.1d_v,
             lowerLimit = -2, upperLimit = 2, times = 50)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 1.968 2.007 2.139 2.099 2.141 3.211 50
Vectorized Hcubature 0.491 0.508 0.550 0.524 0.556 1.239 50
Non-vectorized Pcubature 0.668 0.679 0.730 0.708 0.736 1.707 50
Vectorized Pcubature 0.394 0.412 0.558 0.421 0.439 6.045 50
R2Cuba::cuhre 0.398 0.419 0.450 0.442 0.470 0.540 50
I.2d <- function(x) {
    x1 <- x[1]; x2 <- x[2]
    sin(4 * x1 + 1) * cos(4 * x2) * x1 * (x1 * (x1 * x1)^2 - x2 * (x2 * x2 - x1) +2)
}
I.2d_v <- function(x) {
    matrix(apply(x, 2,
                 function(z) {
                     x1 <- z[1]; x2 <- z[2]
                     sin(4 * x1 + 1) * cos(4 * x2) * x1 * (x1 * (x1 * x1)^2 - x2 * (x2 * x2 - x1) +2)
                 }),
           ncol = ncol(x))
}
d <- harness(f = I.2d, fv = I.2d_v,
             lowerLimit = rep(-1, 2), upperLimit = rep(1, 2), times = 50)
knitr::kable(d, digits = 3)
expr min lq mean median uq max neval
Non-vectorized Hcubature 70.108 71.451 72.719 72.638 73.033 80.249 50
Vectorized Hcubature 13.800 14.258 14.999 14.871 15.400 19.573 50
Non-vectorized Pcubature 5.803 6.035 6.296 6.112 6.272 7.410 50
Vectorized Pcubature 1.589 1.651 1.740 1.690 1.734 2.816 50
R2Cuba::cuhre 2.023 2.239 3.038 2.303 2.381 35.928 50

An implementation note

About the only real modification we have made to the underlying cubature-1.0.2 library is that we use M = 16 rather than the default M = 19 suggested by the original author for pcubature. This allows us to comply with CRAN package size limits and seems to work reasonably well for the above tests. Future versions will allow for such customization on demand.

Apropos the Cuba library

The package R2Cuba provides a suite of cubature and other useful Monte Carlo integration routines linked against version 1.6 of the C library. The authors of R2Cuba have obviously put a lot of work has into it since it uses C-style R API. This approach also means that it is harder to keep the R package in sync with new versions of the underlying C library. In fact, the Cuba C library has marched on now to version 4.2.

In a matter of a couple of hours, I was able to link the latest version (4.2) of the Cuba libraries with R using Rcpp; you can see it on the Cuba branch of my Github repo. This branch package builds and installs in R on my Mac and Ubuntu machines and gives correct answers at least for cuhre. The 4.2 version of the Cuba library also has vectorized versions of the routines that can be gainfully exploited (not implemented in the branch). As of this writing, I have also not yet carefully considered the issue of parallel execution (via fork()) which might be problematic in the Windows version. In addition, my timing benchmarks showed very disappointing results.

For the above reasons, I decided not to bother with Cuba for now, but if there is enough interest, I might consider rolling Cuba-4.2+ into this cubature package in the future.

Summary

The following is therefore my recommendation.

  1. Vectorize your function. The time spent in so doing pays back enormously. This is easy to do and the examples above show how.

  2. Vectorized hcubature seems to be a good starting point.

  3. For smooth integrands in low dimensions (\(\leq 3\)), pcubature might be worth trying out. Experiment before using in a production package.