The proffer package profiles R code to find bottlenecks. Visit https://r-prof.github.io/proffer for documentation. https://r-prof.github.io/proffer/reference/index.html has a complete list of available functions in the package.

Why use a profiler?

This data processing code is slow.

system.time({
  n <- 1e5
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
})
#>   user  system elapsed 
#> 82.060  28.440 110.582 

Why exactly does it take so long? Is it because for loops are slow as a general rule? Let us find out empirically.

library(proffer)
px <- pprof({
  n <- 1e5
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
})
#> http://localhost:64610

When we navigate to http://localhost:64610 and look at the flame graph, we see [<-.data.frame() (i.e. x[i, ] <- x[i, ] + 1) is taking most of the runtime.

top

So we refactor the code to avoid data frame row assignment. Much faster, even with a for loop!

system.time({
  n <- 1e5
  x <- rnorm(n)
  y <- rnorm(n)
  for (i in seq_len(n)) {
    x[i] <- x[i] + 1
    y[i] <- y[i] + 1
  }
  x <- data.frame(x = x, y = y)
})
#>    user  system elapsed 
#>   0.045   0.001   0.047

Moral of the story: before you optimize, throw away your assumptions and run your code through a profiler. That way, you can spend your time optimizing where it counts!

Managing the pprof server

The pprof server is a background processx process, and you can manage it with the processx methods described here. Remember to terminate the process with $kill() when you are done with it.

# px is a process handler.
px <- pprof({
  n <- 1e4
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
})
#> http://localhost:50195

# Summary of the background process.
px
#> PROCESS 'pprof', running, pid 10451.

px$is_alive()
# [1] TRUE

# Error messages, some of which do not matter.
px$read_error()
#> [1] "Main binary filename not available.\n"

# Terminate the process when you are done.
px$kill()

Serving pprof remotely

As with Jupyter notebooks, you can serve pprof from one computer and use it from another computer on the same network. On the server, you must

  1. Find the server’s host name or IP address in advance.
  2. Supply "0.0.0.0" as the host argument.
system2("hostname")
#> mycomputer

px <- pprof({
  n <- 1e4
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
}, host = "0.0.0.0")
#> http://0.0.0.0:610712

Then, in the client machine navigate a web browser to the server’s host name or IP address and use the port number printed above, e.g. https://mycomputer:61072.

Installation

For old versions of proffer (0.0.2 and below) refer to these older installation instructions instead of the ones below.

The R package

The latest release of proffer is available on CRAN.

install.packages("proffer")

Alternatively, you can install the development version from GitHub.

# install.packages("remotes")
remotes::install_github("r-prof/proffer")

The proffer package requires the RProtoBuf package, which may require installation of additional system dependencies on Linux. See its installation instructions.

Non-R dependencies

proffer requires

  1. Go: https://golang.org/doc/install
  2. Graphviz: https://www.graphviz.org/download
  3. pprof: https://github.com/google/pprof

On Mac and Windows, you can find installers of Go and Graphviz from the links above. pprof should come automatically installed with Go (see the configuration section below). On Linux, you can install pprof and Go directly from R:

library(proffer)
install_go() # Also installs pprof if on Linux.

Configuration

First, run pprof_sitrep() to see if proffer can already find all the required non-R dependencies. Then, run test_pprof() to see if pprof actually works for you. If both checks pass, you are done with installation.

Otherwise, open your your .Renviron file and define special environment variables that point to system dependencies. The edit_r_environ() function in the usethis package can help you. Configuration varies according to your platform and installation method.

Verification

Run pprof_sitrep() again to verify that everything is installed and configured correctly.

library(proffer)
pprof_sitrep()
#> ● Call test_pprof() to test installation.
#> 
#> ── Requirements ──────────────────────────────
#> ✓ pprof /Users/c240390/go/bin/pprof
#> ✓ Go binary /usr/local/bin/go
#> ✓ Go folder /Users/c240390/go
#> ✓ Graphviz /usr/local/bin/dot
#> 
#> ── Custom ────────────────────────────────────
#> ℹ `PROFFER_PPROF_BIN` missing
#> ● Run `usethis::edit_r_environ()` to edit
#>   .Renviron file.
#> ● PROFFER_GO_BIN=/usr/local/bin/go
#> ℹ `PROFFER_GO_BIN` missing
#> ● Run `usethis::edit_r_environ()` to edit
#>   .Renviron file.
#> ● PROFFER_GO_BIN=/usr/local/bin/go
#> ℹ `PROFFER_GRAPHVIZ_BIN` missing
#> ● Run `usethis::edit_r_environ()` to edit
#>   .Renviron file.
#> ● PROFFER_GRAPHVIZ_BIN=/usr/local/bin/dot
#> 
#> ── System ────────────────────────────────────
#> ✓ pprof system path /Users/c240390/go/bin/pprof
#> ✓ Go binary system path /usr/local/bin/go
#> ✓ Graphviz system path /usr/local/bin/dot
#> 
#> ── Deprecated ────────────────────────────────
#> ✓ `pprof_path` env variable omitted.

If all dependencies are accounted for, proffer should work. Test it out with test_pprof(). On a local machine, it should launch a browser window showing an instance of pprof.

library(proffer)
test_pprof()

Contributing

We encourage participation through issues and pull requests. proffer has a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Similar work

profvis

The profvis is much easier to install than proffer and equally easy to invoke.

library(profvis)
profvis({
  n <- 1e5
  x <- data.frame(x = rnorm(n), y = rnorm(n))
  for (i in seq_len(n)) {
    x[i, ] <- x[i, ] + 1
  }
  x
})

However, profvis-generated flame graphs can be difficult to read and slow to respond to mouse clicks.

top

proffer uses pprof to create friendlier, faster visualizations.