Groups 92 of 99+ julia-users › Using pmap in julia 7 posts by 4 authors Martha White Jun 1 I am having difficulty understanding how to use pmap in Julia. I am a reasonably experienced matlab and c programmer. However, I am new to Julia and to using parallel functions. I am running an experiment with nested for loops, benchmarking different algorithms. In the inner loop, I am running the algorithms across multiple trials. I would like to parallelize this inner loop (as the outer iteration I can easily run as multiple jobs on a cluster). The code looks like: effNumCores = 3 procids = addprocs(effNumCores) # This has to be added so that each run has access to these function definitions @everywhere include("experimentUtils.jl") # Initialize array of RMSE fill!(runErrors, 0.0); # Split up runs across number of cores outerloop = floor(Int, numRuns / effNumCores)+1 r = 1 rend = effNumCores for i = 1:outerloop rend = min(r+effNumCores-1, numRuns) # Empty RMSE passed, since it is create and returned in pmap_errors Array{Float64}(0,0) pmap_errors = pmap(r -> learningExperimentRun(mdp,hordeOfD, stepData, alpha,lambda,beta, numAgents, numSteps, Array{Float64}(0,0), r), r:rend) for j=1:(rend-r+1) runErrors[:,:,MEAN_IND] += pmap_errors[j] runErrors[:,:,VAR_IND] += pmap_errors[j].^2 end r += effNumCores end rmprocs(procids) The function called above is defined in separate file called experimentUtils.jl, as function learningExperimentRun(mdp::MDP, hordeOfD::horde, stepData::transData, alpha::Float64,lambda::Float64, beta::Float64, numAgents::Int64, numSteps::Int64, RMSE::Array{Float64, 2}, runNum::Int64) # if runErrors is empty, then initialize; this is empty for parallel version if (isempty(RMSE)) RMSE = zeros(Float64,numAgents, numSteps) else fill!(RMSE, 0.0) end srand(runNum) agentInit(hordeOfD, mdp, alpha, beta,lambda,BETA_ETD) getLearnerErrors(hordeOfD,mdp, RMSE,1) mdpStart(mdp,stepData) for i=2:numSteps mdpStep(mdp,stepData) updateLearners(stepData, mdp, hordeOfD) getLearnerErrors(hordeOfD,mdp, RMSE,i) end return RMSE end When I try to run this, I get a large number of workers and get errors that state that I have too many files open. I believe I must be doing something seriously wrong. If anyone could help to parallelize this code in julia, that would be fantastic. I am not tied to pmap, but after reading a bit, it seemed to be the right function to use. I should further add that I have an additional loop splitting runs over cores, even though pmap could do that for me. I did this because pmap_errors then becomes an array of numRuns (which could be 100s). By splitting it up into loops, the returned pmap_errors has size that is at most the number of cores. I am hoping that this memory then gets re-used when starting the next loop over cores. I tried at first avoiding this by using a distributed array for runErrors. But, this was not clearly documented and so I abandoned that approach. Stefan Karpinski Jun 1 Are you opening files via open or mmap in any of the functions that learningExperimentRun calls? - show quoted text - Martha White Jun 1 Thank you for the prompt reply! No, I am not using either open or mmap. - show quoted text - Greg Plowman Jun 1 You say you get a large number of workers. Without delving too deep, this seems pretty weird, regardless of other code. Have you checked the number of workers (using nworkers()) after call to addprocs()? If you are getting errors and re-run the script, is addprocs() just accumulating more workers? If so, perhaps try rmprocs(workers()) before addprocs() Martha White Jun 2 I was printing information from each worker, and seeing the worker number increase. But, when I actually check nworkers, the number stays at 3. So, I was incorrect about the number of workers increasing. Rather, because I am adding and removing workers in the outer loop, the worker id is increasing. However, I do still have issues with speed, where it is slower to use pmap and run in parallel. I am not currently seeing the open files issues, but am running again to see if I can recreate that problem. In any case, for speed, it might be that too much memory is being copied to pass to each worker. Is there a way to restrict what is copied? For example, some values are const; can I somehow give this information to pmap? - show quoted text - Martha White Jun 8 I think I realize now why pmap is so slow. It is because on each call, it is copying over a lot of information to send to each worker. Because I have many nested outer loops, this ends up calling pmap thousands of times. I would like to create and pass variables to pmap, that are not re-copied by pmap. As an abstract example, imagine that I have 4 cores, but want to make 4000 calls to a function called my_fun. total = zeros(Float64, 100) for i = 1:1000 # pmap_answers is dimension 4 x 100, where my_fun returns a vector of information of length 100 pmap_answers = pmap(index -> my_fun(i, index), 1:4) # Sum across 4 parallel runs total += sum(pmap_answers,2) end This allocates memory for pmap_answers 1000 times. But, really, I could allocate this memory once outside of the loop and allow pmap to re-use that memory. I could pass in an array of size numCores x 100 to pmap. However, I know that pmap currently re-copies all variables that are passed to it. Is there a way to stop pmap from re-allocating memory, and instead just use pre-allocated memory? Or, any parallel functionality that allows this? - show quoted text - Tim Holy Jun 10 Re: [julia-users] Re: Using pmap in julia I haven't read this thread in detail, but are the answers from your calculation expressable as "bitstypes" (e.g., Float64, etc) and/or arrays of bitstypes? If so, perhaps you could make your function deposit its results in a SharedArray, and then return `nothing`? Best, --Tim - show quoted text -