The three rules of software optimization are:

  1. Don’t.
  2. Don’t do it yet.
  3. Profile before optimizing.

A profiler is a tool that identifies which parts of your code take the most time. One way to do this is to run the code and halt execution every so often (by default 50 times per second), and record the call stack on each occurrence. The combined samples will likely show in which part of your code the most time is spent.

In R, a sampling profiler is available via Rprof(), but it provides detailed measurements only for R code. This vignette shows how to include detailed information on native code run times in the profile.

Prerequisite: Code to profile

For this vignette, we’ll be profiling R and C++ implementations of functions that computes the Fibonacci sequence using dynamic programming:

(A few tweaks are necessary to allow profiling the Rcpp-compiled function when run from the vignette. See the source for details.)

The implementations are supposed to behave identically.

Our test function test() calls both functions and compares their output in a loop:

A minimal example

Rprof() creates a text file that contains the stack trace for each interruption. The summaryRprof() function converts this to a data frame with one row per function and the corresponding run time estimates (self: code in that function only, total: including all descendant functions):

This is a nice overview, but for fun_cpp() the run time is hidden in the entry for .Call(): calls into native code are opaque and cannot be resolved further. Enter joint profiling:

library(jointprof)
out_file <- tempfile("jointprof", fileext = ".out")
start_profiler(out_file)
#> Temporary files: /tmp/RtmpfhFVeu/jointprof51e916b384d6.prof, /tmp/RtmpfhFVeu/jointprof51e916ad14f1.out
for (i in 1:10) test(700)
stop_profiler()

summary <- summaryRprof(out_file)
summary$by.self
#>                                                      self.time self.pct total.time total.pct
#> "fib_r"                                                   4.10    35.16       8.58     73.58
#> "c"                                                       3.00    25.73       3.50     30.02
#> "<GC>"                                                    1.54    13.21       1.54     13.21
#> "RealAnswer"                                              0.56     4.80       0.66      5.66
#> "-"                                                       0.30     2.57       0.36      3.09
#> "Rf_eval"                                                 0.22     1.89       0.28      2.40
#> "R_gc_internal"                                           0.20     1.72       0.34      2.92
#> "<="                                                      0.20     1.72       0.20      1.72
#> "_ZN4Rcpp8internal13r_init_vectorILi14EEEvP7SEXPREC"      0.18     1.54       0.18      1.54
#> "+"                                                       0.18     1.54       0.18      1.54
#> "<?>"                                                     0.16     1.37       0.30      2.57
#> "_Z7fib_cppi"                                             0.10     0.86       0.70      6.00
#> "stopifnot"                                               0.08     0.69      11.66    100.00
#> "CONS_NR"                                                 0.06     0.51       0.12      1.03
#> "identical"                                               0.04     0.34      11.40     97.77
#> "Rf_evalList"                                             0.04     0.34       0.14      1.20
#> "do_subset"                                               0.04     0.34       0.12      1.03
#> "Rf_begincontext"                                         0.04     0.34       0.08      0.69
#> "do_begin"                                                0.04     0.34       0.06      0.51
#> "R_FreeStringBufferL"                                     0.04     0.34       0.04      0.34
#> "Rf_cons"                                                 0.04     0.34       0.04      0.34
#> "Rf_endcontext"                                           0.04     0.34       0.04      0.34
#> "Rf_findVarInFrame3"                                      0.04     0.34       0.04      0.34
#> "Rf_mkPROMISE"                                            0.04     0.34       0.04      0.34
#> "sys.call"                                                0.04     0.34       0.04      0.34
#> "doTryCatch"                                              0.02     0.17      11.66    100.00
#> "tryCatch"                                                0.02     0.17      11.66    100.00
#> "tryCatchList"                                            0.02     0.17      11.66    100.00
#> "tryCatchOne"                                             0.02     0.17      11.66    100.00
#> "withCallingHandlers"                                     0.02     0.17      11.66    100.00
#> "do_c"                                                    0.02     0.17       1.00      8.58
#> "do_c_dflt"                                               0.02     0.17       0.98      8.40
#> "_init"                                                   0.02     0.17       0.04      0.34
#> "do_relop_dflt"                                           0.02     0.17       0.04      0.34
#> "do_if"                                                   0.02     0.17       0.02      0.17
#> "do_subset_dflt"                                          0.02     0.17       0.02      0.17
#> "length"                                                  0.02     0.17       0.02      0.17
#> "RecursiveRelease"                                        0.02     0.17       0.02      0.17
#> "Rf_findFun3"                                             0.02     0.17       0.02      0.17
#> "SET_FRAME"                                               0.02     0.17       0.02      0.17
#> "SETCDR"                                                  0.02     0.17       0.02      0.17
#> "sys.function"                                            0.02     0.17       0.02      0.17

The summary now also contains a breakdown for native functions below the R code that calls them.

Installation

Ubuntu

Other Linux distributions may work if you install the right system dependencies (let me know which!).

  1. Install system dependencies:

  2. Install pprof:

  3. Install the package:

Other OS

Windows and Solaris are not supported.

Usage

The start_profiler() and stop_profiler() functions are replacements for Rprof() and Rprof(NULL), respectively. The generated output file is compatible with that generated by Rprof(). This enables a variety of ways to analyze the results, see also the subsequent section.

Furthermore stop_profiler() returns an in-memory representation of the profiled data, invisibly. See ?profiler::validate_profile for a description of the data format.

out_file <- tempfile("jointprof", fileext = ".out")
start_profiler(out_file)
#> Temporary files: /tmp/RtmpfhFVeu/jointprof51e94fc0c688.prof, /tmp/RtmpfhFVeu/jointprof51e95592e06b.out
for (i in 1:10) test(700)
profile_data <- stop_profiler()

profile_data
#> Profile data: 680 samples

profile_data$samples
#> # A tibble: 680 x 4
#>    value locations          .pprof_locations  .rprof_locations  
#>    <int> <list>             <list>            <list>            
#>  1     1 <tibble [115 × 1]> <tibble [63 × 1]> <tibble [115 × 1]>
#>  2     1 <tibble [55 × 1]>  <tibble [63 × 1]> <tibble [55 × 1]> 
#>  3     1 <tibble [179 × 1]> <tibble [63 × 1]> <tibble [179 × 1]>
#>  4     1 <tibble [111 × 1]> <tibble [63 × 1]> <tibble [111 × 1]>
#>  5     1 <tibble [42 × 1]>  <tibble [63 × 1]> <tibble [41 × 1]> 
#>  6     1 <tibble [58 × 1]>  <tibble [63 × 1]> <tibble [58 × 1]> 
#>  7     1 <tibble [43 × 1]>  <tibble [63 × 1]> <tibble [42 × 1]> 
#>  8     1 <tibble [272 × 1]> <tibble [63 × 1]> <tibble [272 × 1]>
#>  9     1 <tibble [52 × 1]>  <tibble [63 × 1]> <tibble [52 × 1]> 
#> 10     1 <tibble [98 × 1]>  <tibble [63 × 1]> <tibble [98 × 1]> 
#> # ... with 670 more rows
profile_data$locations
#> # A tibble: 459 x 3
#>    location_id function_id  line
#>  *       <int>       <int> <int>
#>  1           1           1     0
#>  2           2           2     4
#>  3           3           3     0
#>  4           4           4     0
#>  5           5           5     0
#>  6           6           6     4
#>  7           7           7     0
#>  8           8           8     0
#>  9           9           9     0
#> 10          10          10     0
#> # ... with 449 more rows
profile_data$functions
#> # A tibble: 170 x 5
#>    function_id name        system_name filename start_line
#>          <int> <chr>       <chr>       <chr>         <int>
#>  1           1 -           -           ""                0
#>  2           2 -           -           <text>            4
#>  3           3 ___elt      ...elt      ""                0
#>  4           4 _Call       .Call       ""                0
#>  5           5 +           +           ""                0
#>  6           6 +           +           <text>            4
#>  7           7 <=          <=          ""                0
#>  8           8 <Anonymous> <Anonymous> ""                0
#>  9           9 gc          <GC>        ""                0
#> 10          10 block_exec  block_exec  ""                0
#> # ... with 160 more rows

The profile data can be edited, and written to a file via write_rprof() or (in pprof-compatible format) via profile::write_pprof(). The example below removes the first 15 entries from all samples, and writes the resulting profile data in pprof format:

We then use pprof to visualize the call graph. The only exported function in the pprof package, get_pprof_pkg_path(), tells us the path to the pprof executable.

Call graph (click to show image)

The call graph shows a unified view of run time costs for both R and native code. (Click on the image to see the SVG version.)

For interactive exploration of a call graph, try the built-in web browser view:

Other packages for analyzing profiler data

I’m aware of the following R packages that process profiler data, ordered by date of first CRAN release:

  • proftools: Mature, the paper’s includes facilities for summarizing results at the function, call, and source line level; for filtering to narrow the focus to functions of primary interest; and for visualizing profiling data. Also offers an export to callgrind format.

  • profvis: Interactive exploration of the flame graph and call tree in a web browser.

  • prof.tree: Shows a call tree on the console.

I’d also like to mention two other tools outside the R ecosystem:

  • pprof: A new tool written in Go, supports varios output formats (including callgrind) and also interactive exploration of the flame graph and call graph in a web browser.

  • KCacheGrind: KDE-based interactive exploration, expects callgrind input.

Caveats

A few major points have been left unresolved until a forthcoming release:

  • Nested calls back into R code are not fully supported yet. In situations where native code evaluates R expressions which then invoke native code again, the recorded stack trace will look wrong.

If your R code calls into native code, or if you want to try pprof with R profiling results, give the jointprof package a try! If you discover problems or have comments, file an issue. I’d be especially grateful for help with other Linux distributions or OS X.