Groups


5 of 99+  


julia-users ›
Parallel computing: SharedArrays not updating on cluster
5 posts by 5 authors  


PMab 	

Jun 21


Hi everyone,

I am using shared arrays and an @sync @parallel for loop to run computations on my university's cluster. The body of the loop takes a given line in a shared array, calls a function on it that uses a solver to return an array, and finally updates another shared array using the returned array. Here is a snippet of that code, which is part of a bigger function that is called in the main file. Functions and shared arrays are defined @everywhere.

function do_for_i(i::Int64)
            tmp_grid = gridSt[i,:];
            tmp_resmat = resmat[i,:];
            tmp_resmat_prev = resmat_prev[i,:];

            tmp_resmat_new, tmp_VF, tmp_TF, tmp_failed = solvePointList(mobj, tmp_grid, tmp_resmat, tmp_resmat_prev, printmode);

            return tmp_resmat_new, tmp_VF, tmp_TF, tmp_failed;
end

        @sync @parallel for i in 1:NPT
        temp = do_for_i(i); #temp = solve_for_i(i,gridSt,resmat,resmat_prev,mobj,printmode) 
        resmat[i,:] = temp[1];
        VFnext[i,:] = temp[2];
        TFnext[i,:] = temp[3];
        end

The very puzzling thing is that when I run this code on a single mac or PC with multiple workers (respectively 2 and 8), everything works fine and the shared arrays resmat, VFnext and TFnext are updated. However, when I run it on a cluster (using the --machinefile option -- and whatever the number of workers used), they are not updated. They seem to be updated only within the @sync @parallel for loop, but not in the body of the bigger function. 

Does someone know what is going on? Is it possible that Julia's SharedArrays don't work with clusters?
 

Greg Plowman 	

Jun 21


Yes.
AFAIK,
Shared arrays are shared across multiple processes on the same machine.
Distributed arrays can be distributed across different machines.
 

Stefan Karpinski 	

Jun 22


That's right – shared memory arrays cannot, by definition, be used on a non-shared memory distributed system like a cluster. You may want DistributedArrays.
- show quoted text -
 

Matthew Pearce 	

Jun 24


As the others have said, it won't work like that. I found a few options:
DistributedArrays. Message passing handled in the background. Some limitations, but I've not used much.
SharedArrays on each machine. You can share memory between all the pids on a single machine, and then pass messages between one process from each machine to updated.
Regular Arrays on each machine. Swap messages between all processes.
Which one works for you will depend on how big your arrays are and the access patterns of the code you're trying to run on them.


Kevin Keys 	

Jun 25


To clarify: shared memory arrays cannot be used across multiple nodes of a compute cluster. If you schedule your code to run on only one node of a cluster, then your code should work fine. This is what I do on my university cluster; see here.

If you need more parallel computing power than what is available on one cluster node, then, as others have said, you will need to appeal to a different array paradigm.

KLK

El dimecres, 22 juny de 2016 12:09:49 UTC-7, Stefan Karpinski va escriure:
- show quoted text -