Parallel Computing in R

By | 2017-08-16T14:51:12+00:00 August 16th, 2017|Advanced Analytics, Data Science, Microsoft R Server|0 Comments

Historically R has been a single threaded programming language, however, with Microsoft R many basic operations have been optimized to run in parallel for larger datasets. Additionally, Microsoft R enables developers to easily write parallel solutions using the RevoScaleR package. Please note that the code below is written and executed on my desktop, therefore, execution time could be faster if ran in a Microsoft R Server environment.

Running simulations in R has always been a major pain point for me. Simulations have been more difficult than needed because of the time it takes to run them. For example, assume we have three dice, and want to calculate and test the probability that the sum of the dice is greater than 12 and less than 18. Theoretically the probability of this occurring is 0.255 (see the code below to figure out the theoretic probability).

However, in order to test this we will need to randomly simulate the rolling of three dice thousands of times, and calculate the sum of the dice. Therefore, we write a little R code that rolls three dice and returns their sum:

Each time we call the R function above, it will return the sum of the dice. Now that we can simulate the roll of dice, we will test the probability of having a sum greater than 12 and less than 18. To test we will roll the dice 20,000 times. Therefore, we run the following:

The execution time of 20,000 rolls is about 27 seconds.

While 27 seconds is not too long to wait, imagine if the simulation took hours or days. This would be annoying, and force you to have long code iterations. Therefore, we want to speed the execution time up using the rxExec function.

To prove a point, we will execute the same code 200,000 times instead of 20,000. Notice, that it only takes 7 seconds to run 10 times roll simulations. A huge improvement!

The focus of this blog was not to actually test the probability of the dice sum, but if you are interested in doing so, you can calculate the probability with the following code.

rxExec is a great way to write parallel R code. Check out this Microsoft page for a few more examples!

About the Author:

Leave A Comment