Groups 209 of 99+ julia-users › C/MPI/SLURM => Julia/?/SLURM? 6 posts by 3 authors Gabriel Mihalache 11/23/14 Hello! I'm looking to use Julia for an upcoming project and right now I'm trying to do a simple example which includes everything I need.I managed to port/code the application and the results are correct. Now I'm stuck trying to parallelize it. My experience is with MPI on C. I use broadcast + allgather to have separate nodes compute and fill parts of a multi-dimensional array and otherwise all nodes doing the same thing. I'm not sure how to translate that into Julia's way of doing things. I'm trying distribute arrays. Another complication is that I need to use SLURM to schedule jobs. So, my questions are: 1) was anyone successful running DArray code with SLURM? 2) any idea why I get this output: Iteration 1 (11,100), DArray{Float64,2,Array{Float64,2}} (1:0,1:0) elapsed time: 2.388602661 seconds Err = 0.0 when I run this code using MAT; using Grid; using NLopt; const alpha = 0.25; const uBeta = 0.90; const delta = 1.00; const kGridMin = 0.01; const kGridMax = 0.50; const kGridSize = 100; const vErrTol = 1e-8; const maxIter = 200; file = matopen("shock.mat") yData = read(file, "y"); yGridSize = size(yData, 1); y = linrange(yData[1], yData[end], yGridSize); yPi = read(file, "yPi") close(file) function sgmVfi() addprocs(24); kGrid = linrange(kGridMin, kGridMax, kGridSize); V0 = zeros(yGridSize, kGridSize); V1 = zeros(size(V0)); kPr = zeros(size(V0)); gdp(yIx, kIx) = exp(y[yIx]) * kGrid[kIx].^alpha; iter = 0; err = 1.0; while iter < maxIter && err > vErrTol tic(); iter += 1; println("Iteration ", iter); V0 = copy(V1); V0f = CoordInterpGrid( (y, kGrid), V0, BCnearest, InterpQuadratic); DV1 = distribute(V1); DkPr = distribute(kPr); println(size(DV1), ", ", typeof(DV1)); myIx = localindexes(DV1); println(myIx); for yIx = myIx[1] for kIx = myIx[2] println(yIx, " ", kIx, " ", myid()); opt = Opt(:LN_BOBYQA, 1) lower_bounds!(opt, kGridMin) upper_bounds!(opt, min(kGridMax, gdp(yIx, kIx) + (1-delta) * kGrid[kIx] - 1e-5)) myObj(kk) = log(gdp(yIx, kIx) - kk + (1-delta) * kGrid[kIx]) + uBeta * sum( [ yPi[yIx,yPrIx] * V0f[y[yPrIx], kk] for yPrIx = 1:yGridSize ]); max_objective!(opt, (x::Vector, grad::Vector) -> myObj(x[1]) ); (maxVal, maxK, ret) = optimize!(opt, [kGridMin] ); localpart(DkPr)[yIx, kIx] = maxK[1]; localpart(DV1)[yIx, kIx] = maxVal; end end V1 = convert(typeof(V1), DV1); kPr = convert(typeof(kPr), DkPr); err = maximum(abs(V1[:] - V0[:])); toc() println("Err = ", err) end return kGrid, V1, kPr; end kGrid, V, kPr = sgmVfi() using this submission script: #!/bin/bash #SBATCH -J sgmVfi #SBATCH -e elog_sgm #SBATCH -o olog_sgm #SBATCH -t 1:00:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=24 #SBATCH -p standard #SBATCH --mail-type=fail module load julia srun julia -q sgm.j but the code runs fine (iterates 157 times and converges)when I just call the function from Julia and don't ask for procs (without paralleling but with DArray)? 3) Finally, if I call DV1 = distribute(V1) and DkPr = distribute(kPr) is there a way for me to make sure that localindexes(DV1) and localindexes(DkPr) return the same indices? Any hint/lead is very much appreciated! Thank you for your time! Gabriel Gabriel Mihalache 11/23/14 I made some progress on getting the code to run correctly in parallel while using all the cores of one machine, by using pmap and running some prep code on all the workers: The next steps are to get this running on multiple machines (using addprocs_ssh?) and then benchmarking. Here's the state of my solution so far, for anyone interested/stuck at this too: The SLURM batch file: #!/bin/bash #SBATCH -J sgmVfi #SBATCH -e elog_sgm #SBATCH -o olog_sgm #SBATCH -t 0:10:00 #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=24 #SBATCH -p standard #SBATCH --mail-type=fail srun julia -q -p 24 -L loader.jl sgm.jl the "loader" file which defines constants for all workers: using MAT; using Grid; using NLopt; const alpha = 0.25; const uBeta = 0.90; const delta = 1.00; const kGridMin = 0.01; const kGridMax = 0.50; const kGridSize = 100; const kGrid = linrange(kGridMin, kGridMax, kGridSize); const vErrTol = 1e-8; const maxIter = 200; file = matopen("shock.mat") yData = read(file, "y"); yGridSize = size(yData, 1); const y = linrange(yData[1], yData[end], yGridSize); const yPi = read(file, "yPi") close(file) gdp(yIx, kIx) = exp(y[yIx]) * kGrid[kIx].^alpha; and finally, the code running on the master worker/driver: function sgmVfi() V0 = zeros(yGridSize, kGridSize); V1 = zeros(size(V0)); kPr = zeros(size(V0)); iter = 0; err = 1.0; while iter < maxIter && err > vErrTol tic(); iter += 1; println("Iteration ", iter); V0 = copy(V1); V0f = CoordInterpGrid( (y, kGrid), V0, BCnearest, InterpQuadratic); function workAt(iix) yIx = iix[1]; kIx = iix[2]; opt = Opt(:LN_BOBYQA, 1) lower_bounds!(opt, kGridMin) upper_bounds!(opt, min(kGridMax, gdp(yIx, kIx) + (1-delta) * kGrid[kIx] - 1e-5)) myObj(kk) = log(gdp(yIx, kIx) - kk + (1-delta) * kGrid[kIx]) + uBeta * sum( [ yPi[yIx,yPrIx] * V0f[y[yPrIx], kk] for yPrIx = 1:yGridSize ]); max_objective!(opt, (x::Vector, grad::Vector) -> myObj(x[1]) ); (maxVal, maxK, ret) = optimize!(opt, [kGridMin] ); return maxVal, maxK[1]; end tmp = pmap(workAt, [[yIx, kIx] for yIx = 1:yGridSize, kIx = 1:kGridSize]); tmp = reshape(tmp, size(V1)); for yIx = 1:yGridSize, kIx = 1:kGridSize V1[yIx, kIx] = tmp[yIx, kIx][1]; kPr[yIx, kIx] = tmp[yIx, kIx][2]; end err = maximum(abs(V1[:] - V0[:])); toc() println("Err = ", err) end return kGrid, V1, kPr; end kGrid, V, kPr = sgmVfi() file = matopen("results.mat", "w") write(file, "kPr", kPr) write(file, "V", V) write(file, "y", [y]) write(file, "yPi", yPi) write(file, "kGrid", [kGrid]) close(file) Miles Lubin 11/23/14 Two comments: 1) It's pretty easy to use MPI from Julia. For some use cases it may make more sense than Julia's built-in approach to parallelism, especially if you're already comfortable with MPI. Though if you're well served by pmap, that's much simpler. 2) It looks like the subproblem you're solving is a 1d optimization. I wouldn't be surprised if you could get a significant speedup by tweaking the optimization algorithm (e.g., using a derivative-based method) and making the function evaluations more efficient (e.g., avoiding temporary allocations). - show quoted text - Gabriel Mihalache 11/23/14 1) It's pretty easy to use MPI from Julia. For some use cases it may make more sense than Julia's built-in approach to parallelism, especially if you're already comfortable with MPI. Though if you're well served by pmap, that's much simpler. Do you mean I should be able to broadcast and allgather memory managed by Julia? That sounds scary. Luckily for me these models should be amenable to pmap. Is there an example of MPI-from-Julia, just in case? 2) It looks like the subproblem you're solving is a 1d optimization. I wouldn't be surprised if you could get a significant speedup by tweaking the optimization algorithm (e.g., using a derivative-based method) and making the function evaluations more efficient (e.g., avoiding temporary allocations). Yes, thank you! This is a toy example for which we have an analytic solution to check against. The project I'm working on involves a few more discrete and continuous controls and states. I only hope that NLOpt.jl will run in reasonable time compared to the C version. Thank you for your reply! Gabriel Erik Schnetter 11/24/14 Re: [julia-users] Re: C/MPI/SLURM => Julia/?/SLURM? To use MPI from Julia, use the MPI.jl package (`Pkg.add("MPI")`). There are also examples in this package. Similar to Python, this MPI wrapper has two API levels: a lower level that feels much like C, and a higher level where arbitrary Julia objects can be sent and received. The MPI wrapper is not complete; if you need an MPI function that isn't wrapped, please speak up. -erik - show quoted text - -- Erik Schnetter http://www.perimeterinstitute.ca/personal/eschnetter/ Gabriel Mihalache 11/24/14 Re: [julia-users] Re: C/MPI/SLURM => Julia/?/SLURM? On Mon Nov 24 2014 at 8:37:36 AM Erik Schnetter wrote: To use MPI from Julia, use the MPI.jl package (`Pkg.add("MPI")`). There are also examples in this package. That's great! Somehow I missed the examples when I first saw that package. If I can't get pmap to work nice across machines, this should be at least as good. Thank you for your work on this!