Groups 30 of 99+ julia-users › Clarification about @parallel (op) for loop 2 posts by 2 authors Abhinav Deshpande Jul 19 My problem involves a function applied to a huge number of integers and a reduction on them, either `max()` or `+` . I am finding this syntax pretty convenient: ... # Other definitions and setting up func() @parallel (max) for i in 1:binomial(m + n - 1, n) func(i) end I am doing this on a cluster where `julia` is running on the submit node (users are discouraged from running intensive processes on this node) and add processes with Slurm using the ClusterManagers.jl package. Since the workers are generally on different nodes, I want to minimise communication between them and the main process. My specific question is this- in the above implementation, does every worker compute func(i) and return it to the calling process, which then reduces it on the fly? Or does every worker apply the reduction operator for its chunk and then return it to the calling process? The documentation says that "In case of @parallel for, the final reduction is done on the calling process." Does this mean that func(i) for every i is returned to the calling process? On the other hand, from what I understand from this implementation of preduce, line 1723 seems to indicate that the reduction happens on the worker process itself, and only the final reduction is carried out on the calling process. I suspect it is the latter, but want to confirm that it is indeed the case. Also, the documentation mentions: "In contrast, @parallel for can handle situations where each iteration is tiny, perhaps merely summing two numbers." However, my function func() is nowhere near this simple. Should I still be using @parallel? Or try to go for pmap() with my own written version of a pmap_reduce()? Greg Plowman Jul 20 does every worker compute func(i) and return it to the calling process, which then reduces it on the fly? No. Or does every worker apply the reduction operator for its chunk and then return it to the calling process? Yes. The documentation says that "In case of @parallel for, the final reduction is done on the calling process." Does this mean that func(i) for every i is returned to the calling process? No. On the other hand, from what I understand from this implementation of preduce, line 1723 seems to indicate that the reduction happens on the worker process itself, and only the final reduction is carried out on the calling process. Yes. That is what is meant by "the final reduction is done on the calling process". It's the reduction of the single, already-reduced results from each worker. Also, the documentation mentions: "In contrast, @parallel for can handle situations where each iteration is tiny, perhaps merely summing two numbers." However, my function func() is nowhere near this simple. Should I still be using @parallel? Or try to go for pmap() with my own written version of a pmap_reduce()? @parallel statically splits the work among workers in batches. Statically means each worker runs the same number of loops. In batches means communication is minimised. pmap dynamically splits the work among workers (dynamic load balancing) Each worker is given the next task only when it has completed the previous task. pmap is useful for uneven workloads and/or uneven processors. It is less ideal for small workloads because of the communication per iteration. Of course you could custom code a sort of hybrid, where pmap dishes out the work to workers in batches.