Groups 43 of 99+ julia-users › Help with parallel computing for optimization. 2 posts by 2 authors Nsep Apr 11 Hello World, I am relatively new with Julia. I wrote an optimization model MIP that I need to run many, many time for sensitivity analysis. I am working on a cluster that uses SLURM. I wrote my model as a Julia Module. Basically, the process that I want to do is to have a file with all the different cases; and using a for-loop I want each case to be solved by a different node all cores in the node solving the same MIP case . I have tried two different ways: 1 using the machinefile option of Julia see .sh file code below . ! bin bash SBATCH --uid nsep SBATCH --job-name juliaTest SBATCH --partition newnodes SBATCH --output juliaTest. j. N.out SBATCH --error juliaTest. j. N.err SBATCH --time 1:0:0 SBATCH -N 3 SBATCH -n 20 SBATCH --export ALL export SLURM_NODEFILE `generate_pbs_nodefile` . etc profile.d modules.sh module add engaging julia 0.4.3 module add engaging gurobi 6.5.1 julia --machinefile $SLURM_NODEFILE Cases-jl Using this method I get an error when loading MyModule the model everywhere push! LOAD_PATH, home nsep Test everywhere using MyModule everywhere using DataFrames everywhere inpath home nsep Test Input everywhere outpath home nsep Test Results mysetup Dict config. options for MyModule everywhere mysetup casepath home nsep Test everywhere cases_in_data readtable $casepath Cases_Control.csv , header true parallel for c in 1:size cases_in_data,1 loading general inputs myinputs Load_inputs mysetup,inpath creating output directory mkdir $outpath Case$c case_outpath $outpath Case$c case-specific inputs myinputs pMaxCO2 1 cases_in_data :Emissions c myresults solve_model mysetup,myinputs write_outputs mysetup,case_outpath,myresults,myinputs end The error that I get is: WARNING: replacing module MyModule WARNING: replacing module MyModule WARNING: replacing module MyModule signal 11 : Segmentation fault-jl_module_using at cm shared engaging julia julia-a2f713dea5 bin .. lib julia libjulia.so unknown line unknown function ip: 0x2aaaaae0def9 unknown function ip: 0x2aaaaae0e1e5 unknown function ip: 0x2aaaaae0de3d unknown function ip: 0x2aaaaae0e77c -jl_load_file_string at cm shared engaging julia julia-a2f713dea5 bin .. lib julia libjulia.so unknown line include_string at loading-jl:266-jl_apply_generic at cm shared engaging julia julia-a2f713dea5 bin .. lib julia libjulia.so unknown line include_from_node1 at . loading-jl:307-jl_apply_generic at cm shared engaging julia julia-a2f713dea5 bin .. lib julia libjulia.so unknown line unknown function ip: 0x2aaaaadf92a3 unknown function ip: 0x2aaaaadf8639 unknown function ip: 0x2aaaaae0daac -jl_toplevel_eval_in at cm shared engaging julia julia-a2f713dea5 bin .. lib julia libjulia.so unknown line eval at . sysimg-jl:14-jl_apply_generic at cm shared engaging julia julia-a2f713dea5 bin .. lib julia libjulia.so unknown line anonymous at multi-jl:1364-jl_f_apply at cm shared engaging julia julia-a2f713dea5 bin .. lib julia libjulia.so unknown line anonymous at multi-jl:910 run_work_thunk at multi-jl:651 run_work_thunk at multi-jl:660-jlcall_run_work_thunk_21367 at unknown line -jl_apply_generic at cm shared engaging julia julia-a2f713dea5 bin .. lib julia libjulia.so unknown line anonymous at task-jl:58 unknown function ip: 0x2aaaaadff514 unknown function ip: nil sh: line 1: 18358 Segmentation fault cm shared engaging julia julia-a2f713dea5 bin julia --worker Worker 2 terminated. ERROR unhandled task failure : EOFError: read end of file in read at stream-jl:911 in message_handler_loop at multi-jl:868 in process_tcp_streams at multi-jl:857 in anonymous at task-jl:63 ERROR: LoadError: ProcessExitedException in yieldto at . task-jl:71 in wait at . task-jl:371 in wait at . task-jl:286 in wait at . channels-jl:63 in take! at . channels-jl:53 in take! at . multi-jl:809 in remotecall_fetch at multi-jl:735 in remotecall_fetch at multi-jl:740 in anonymous at multi-jl:1386 ...and 1 other exceptions. in sync_end at . task-jl:413 in anonymous at multi-jl:1395 in include at . boot-jl:261 in include_from_node1 at . loading-jl:304 in process_options at . client-jl:280 in _start at . client-jl:378 while loading home nsep Cases-jl, in expression starting on line 3 2 The other method I tried was using ClusterManagers-jl, .sh file below. ! bin bash SBATCH --uid nsep SBATCH --job-name juliaTest SBATCH --partition newnodes SBATCH --output juliaTest. j. N.out SBATCH --error juliaTest. j. N.err SBATCH --time 0:2:0 SBATCH -N 4 SBATCH --export ALL . etc profile.d modules.sh module add engaging julia 0.4.3 module add engaging gurobi 6.5.1 julia julia_cluster-jl and then in the Julia code I tried to run the SLURM:example in the ClusterManagers page. using ClusterManagers Arguments to the Slurm srun 1 command can be given as keyword arguments to addprocs. The argument name and value is translated to a srun 1 command line argument as follows: 1 If the length of the argument is 1 -arg value , e.g. t 0:1:0 -t 0:1:0 2 If the length of the argument is 1 --arg value e.g. time 0:1:0 --time 0:1:0 3 If the value is the empty string, it becomes a flag value, e.g. exclusive --exclusive 4 If the argument contains _ , they are replaced with - , e.g. mem_per_cpu 100 --mem-per-cpu 100 addprocs SlurmManager 4 , partition newnodes , t 00:2:00 hosts pids for i in workers host, pid fetch spawnat i gethostname , getpid push! hosts, host push! pids, pid end The Slurm resource allocation is released when all the workers have exited for i in workers rmprocs i end But I get this error: Error launching Slurm job: MethodError length, :all_to_all, If anyone could help figure out 1 what is wrong in my code when passing MyModule, 2 waht I am doing wrong when trying ClusterManagers, that would be AWESOME! Jiahao Chen Apr 12 Looks like the bug report ClusterManagers-jl 31. Can you try Pkg.checkout ClusterManagers and see if that works for you?