data.frame usage. To check what's slowing down your R code, just use the Rprof command like so:I wrote this code to be particularly slow due to its
data.frame usage. You may view the results of the profiling by running R CMD Rprof summ.prof:
Each sample represents 0.02 seconds. Total run time: 2.98 seconds. Total seconds: time spent in function and callees. Self seconds: time spent in function alone. % total % self total seconds self seconds name 81.2 2.42 1.3 0.04 "[<-" 79.9 2.38 65.8 1.96 "[<-.data.frame" 18.1 0.54 10.1 0.30 "[.data.frame" 18.1 0.54 0.0 0.00 "[" 12.1 0.36 1.3 0.04 "%in%" 11.4 0.34 9.4 0.28 "match" 4.7 0.14 4.0 0.12 "anyDuplicated" 2.0 0.06 2.0 0.06 "names" 2.0 0.06 2.0 0.06 "sys.call" 1.3 0.04 1.3 0.04 "==" 0.7 0.02 0.7 0.02 ".row_names_info" 0.7 0.02 0.7 0.02 "NROW" 0.7 0.02 0.7 0.02 "anyDuplicated.default" 0.7 0.02 0.7 0.02 "cos" % self % total self seconds total seconds name 65.8 1.96 79.9 2.38 "[<-.data.frame" 10.1 0.30 18.1 0.54 "[.data.frame" 9.4 0.28 11.4 0.34 "match" 4.0 0.12 4.7 0.14 "anyDuplicated" 2.0 0.06 2.0 0.06 "names" 2.0 0.06 2.0 0.06 "sys.call" 1.3 0.04 81.2 2.42 "[<-" 1.3 0.04 12.1 0.36 "%in%" 1.3 0.04 1.3 0.04 "==" 0.7 0.02 0.7 0.02 ".row_names_info" 0.7 0.02 0.7 0.02 "NROW" 0.7 0.02 0.7 0.02 "anyDuplicated.default" 0.7 0.02 0.7 0.02 "cos"As you can see, it's very slow. After profiling your own code, if you find that the top calls are
[.data.frame or  [<-.data.frame, then you have a data.frame problem. Here's how I solve this, in order of things I try:-  avoid loops, use vectorized code (no forloops, noapply, nosapply). In the example, used[,1]<-, for assigning an entire column
-  Use numeric indices when possible. In our example, that means using d[i, 1]instead ofd[i, "x"]
-  Get rid of the data.framefor heavy calculations by using thedata.matrixcommand. In the example above, just used.matrix <- data.matrix(d)
Rewriting the silly example above,
This time, it runs too quickly for any events to be recorded with the profiling. The execution time is 0.192 seconds, whereas the first version is 3.48 seconds. A pretty good speed-up.
 
No comments:
Post a Comment