I’ve been trying to find a way to introduce threads to R. I guess there can be many reasons to do that, among which I could mention simplified input/output logic, sending tasks to the background (e.g. building a model asynchronously), running computation-intensive tasks in parallel (e.g. parallel, chunk-wise var()
on a large vector). Finally, it’s just a neat problem to look at 😉 I’m trying to follow approach similar to Python’s global interpreter lock.
So far it seems that:
- one can re-enter the interpreter with
R_tryEval
which internally callsR_ToplevelExec
, which in turn intercepts all long jumps (e.g. errors) - there are a few basic checks to verify whether the stack is in a good shape, e.g.
R_CStackStart
which checks stack frames andR_PPStackTop
which checks objects underPROTECT
ion
I think that one can run multiple threads in R and maintain a separate interpreter “instance” in each of them. R interpreter uses stack for its bookkeeping and each thread has its own stack. It also counts objects excluded from garbage collection with PROTECT
. Thus, when coming back to a given R interpreter “instance” (after thread-level context switch), one needs to pay attention to re-set R_PPStackTop
to whatever that thread was left with.
I have put these ideas together in the form of a R package thread (GitHub). This is what it can do:
- start a new thread and execute a R function in its own interpreter
- switch between threads on specific function calls, e.g.
thread_join()
,thread_print()
,thread_sleep()
- finish thread execution
- keep track of
R_PPStackTop
- avoid SIGSEGV-faulting the R process 😉
Here’s an example where two functions are run in parallel R threads (it’s also available via thread::run_r_printing_example()
):
library(thread) thread_runner <- function (data) { thread_print(paste("thread", data, "starting\n")) for (i in 1:10) { timeout <- as.integer(abs(rnorm(1, 500, 1000))) thread_print(paste("thread", data, "iteration", i, "sleeping for", timeout, "\n")) thread_sleep(timeout) } thread_print(paste("thread", data, "exiting\n")) } message("starting the first thread") thread1 <- new_thread(thread_runner, 1) print(ls(threads)) message("starting the second thread") thread2 <- new_thread(thread_runner, 2) print(ls(threads)) message("going to join() both threads") thread_join(thread1) thread_join(thread2)
And here’s the output from my Ubuntu 16.10 x64:
starting the first thread [1] "thread_140737231587072" starting the second thread [1] "thread_140737223194368" "thread_140737231587072" going to join() both threads thread 1 starting thread 1 iteration 1 sleeping for 144 thread 2 starting thread 2 iteration 1 sleeping for 587 thread 1 iteration 2 sleeping for 761 thread 2 iteration 2 sleeping for 1327 thread 1 iteration 3 sleeping for 360 thread 1 iteration 4 sleeping for 1802 thread 2 iteration 3 sleeping for 704 thread 2 iteration 4 sleeping for 463 thread 1 iteration 5 sleeping for 368 thread 2 iteration 5 sleeping for 977 thread 1 iteration 6 sleeping for 261 thread 1 iteration 7 sleeping for 323 thread 1 iteration 8 sleeping for 571 thread 2 iteration 6 sleeping for 509 thread 2 iteration 7 sleeping for 2521 thread 1 iteration 9 sleeping for 298 thread 1 iteration 10 sleeping for 394 thread 1 exiting thread 2 iteration 8 sleeping for 966 thread 2 iteration 9 sleeping for 533 thread 2 iteration 10 sleeping for 1795 thread 2 exiting
How far is this from a real thread support in R? Well, there are three major challenges before this is really useful:
- Context switch happens only when a function from this package is called explicitly
- Memory allocation needs to be synchronized
- Error handling runs into
R_run_onexits
which in turn throws a very nasty error message – this suggests I haven’t covered all features of the interpreter related to switching stacks
Issues #1 and #2 are related: one cannot leave R (release R interpreter lock) and enter an arbitrary C function because it is legal to call allocVector()
from any C/C++ code. This in turn needs to happen synchronously – only one thread can execute allocVector()
(or more specifically, allocVector3()
) at any given time. I think that the best way to address it would be to patch R (main/memory.c
) and introduce a pointer to allocVector3
similar to ptr_R_WriteConsole
). Then the thread package would inject a decorator for allocVector3
with additional synchronization logic.
Issue #3 is not clear to me yet. But it also suggests more attention is needed to the specifics of R code execution.
I’ll be grateful for comments and suggestions. I think R could benefit from native thread support, if only to simplify program logic – but maybe also to run parts of computation-intensive code in lightweight parallel manner.
True multi-threading, in which the threads execute R code in parallel, seems way too difficult, without totally rewriting both the interpreter and numerous packages with C code. To cite just one difficulty that occurs to me, how will the garbage collector know what objects are in use? C code currently needs to use PROTECT only when there is the possibility of an allocation. In bits of code that don’t do allocation, pointers to objects can be kept in local variables, which may live in processor registers, where it seems difficult for the garbage collector to find them. And which thread is running the garbage collector, anyway?
If you want to use threads just to implement coroutines, with only one executing at a time, that might be more feasible, though I would expect problems, possibly intractable, with that as well.
In pqR’s version of the R interpreter (see pqR-project.org), multiple threads execute in parallel, but all but the “master” thread do only numerical computations, using storage that has already been allocated. This avoids the intractable issues.
LikeLike
Hi Radford,
I know of pqR and actually it’s on my list to learn more about your patch – so I could even say I feel honored to see a comment from you. The only thing that stopped me was its size which I expect to be considerable, while I wanted to get my hands dirty first to get a feeling of the problem.
You confirm my intuitions, too. I thought enabling true parallelism in R would be too much of a task, that’s why I mention Python’s global interpreter lock as an inspiration. Initially I thought I would be possible to leave R (and enter C) and provide secure threading there. Now it seems it’s a variation of the same problem as interpreter needs to be exclusively present in one thread only because of memory allocation and garbage collection. One way I am still going to investigate is identifying the set of memory-related APIs that need to be synchronized and making them secured with an exclusive mutex. That way only one thread could change the shared resource – interpreter’s manager heap.
It still means that old C code cannot use this feature because allocation and protection from garbage collection are not atomic (which is a big problem). There could be a new call, e.g. allocVector4 with a flag to return a protected vector but only new code would know about it.
Another way could be to provide synchronized and safe APIs through a package and allow only for R-level threads and C-level protected allocation in new code that accesses R via this new library. I would mean that all standard calls (even entering C) would take over synchronized resource (interpreter and managed heap). Only threads that explicitly promise not to call core R API directly would be allowed to run in parallel. Still, allocation via allocVector4 would take over the shared resource so core R needs to be patched to respect that. (Although one could simply copy garbage collector from R and maintain a separate, synchronized heap just for threads. Objects could be accessed from the main thread and vice versa but would never be deallocated by the main garbage collector.)
A completely insane idea if R cannot be patched would be to replace entries in process’ symbol table. That’s the last resort, however, as it’s not portable and complex.
Finally there is the ability to allocate all the memory ahead of time and then go to C with no need to access the managed heap. Current core R does not do this (I checked cumsum) – even if low-level function runs on pre-allocated memory, it’s hidden behind a public API that will allocate. So again, the answer is either to patch core R or to provide a C library with all required APIs (plus minimal patch to core R, which seems unavoidable).
I know R Core team doesn’t want threads and maybe they’re right. R is a tool for statisticians and it needs sounds statistical toolset rather than general programming facilities. But who knows where R could go with approachable threading API. And it’s so much fun to even try it 😉
LikeLike
I’d be curious how the state of threads has evolved in the last 20 years. Check out Duncan Temple Lang’s 1997 PhD thesis: http://dl.acm.org/citation.cfm?id=926841
LikeLike