Groups 179 of 99+ julia-users › Parallelizing Error on 0.5 on Ubuntu 1 post by 1 author ABB Sep 30 On this Julia version: _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_) | Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type "?help" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.5.0 (2016-09-19 18:14 UTC) _/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release |__/ | x86_64-pc-linux-gnu running on: Ubuntu 14.04.3 LTS I am trying to do a Monte Carlo simulation in parallel across 36 workers. I have two problems (at least). 1. Some of the workers terminate at the beginning of the simulation, but I don't understand the error message: Worker 5 terminated.ERROR (unhandled task failure): ProcessExitedException() in yieldto(::Task, ::ANY) at ./event.jl:136 in wait() at ./event.jl:169 in wait(::Condition) at ./event.jl:27 in wait(::Channel{Any}) at ./channels.jl:92 in take!(::Channel{Any}) at ./channels.jl:73 in #remotecall_fetch#606(::Array{Any,1}, ::Function, ::Function, ::Base.Worker, ::Function, ::Vararg{Any,N}) at ./multi.jl:1066 in remotecall_fetch(::Function, ::Base.Worker, ::Function, ::Vararg{Any,N}) at ./multi.jl:1062 in #remotecall_fetch#609(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Function, ::Vararg{Any,N}) at ./multi.jl:1080 in remotecall_fetch(::Function, ::Int64, ::Function, ::Vararg{Any,N}) at ./multi.jl:1080 in (::Base.##667#668{Base.#+,ProjectModule.##45#47{Int64,Array{Any,1},Array{Any,2}},UnitRange{Int64},Array{UnitRange{Int64},1}})() at ./multi.jl:1998 This is not a huge problem as the rest of the workers keep going and can finish the simulation, but I would like to understand what is going on, if possible. (And maybe how to fix it so as to use those workers.) 2. The more important problem is that at the end of the simulation, I run into other errors and nothing is returned. My (uninformed and probably wrong) guess is that there is something the program doesn't like about the fact that the different workers are finishing at different times? The errors I get are: ERROR (unhandled task failure): EOFError: read end of file Worker 16 terminated.ERROR (unhandled task failure): ProcessExitedException() in yieldto(::Task, ::ANY) at ./event.jl:136 in wait() at ./event.jl:169 in wait(::Condition) at ./event.jl:27 in wait(::Channel{Any}) at ./channels.jl:92 in take!(::Channel{Any}) at ./channels.jl:73 in #remotecall_fetch#606(::Array{Any,1}, ::Function, ::Function, ::Base.Worker, ::Function, ::Vararg{Any,N}) at ./multi.jl:1066 in remotecall_fetch(::Function, ::Base.Worker, ::Function, ::Vararg{Any,N}) at ./multi.jl:1062 in #remotecall_fetch#609(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Function, ::Vararg{Any,N}) at ./multi.jl:1080 in remotecall_fetch(::Function, ::Int64, ::Function, ::Vararg{Any,N}) at ./multi.jl:1080 in (::Base.##667#668{Base.#+,ProjectModule.##45#47{Int64,Array{Any,1},Array{Any,2}},UnitRange{Int64},Array{UnitRange{Int64},1}})() at ./multi.jl:1998 And - ERROR: LoadError: ProcessExitedException() in wait(::Task) at ./task.jl:135 in collect_to!(::Array{Array{Float64,2},1}, ::Base.Generator{Array{Task,1},Base.#wait}, ::Int64, ::Int64) at ./array.jl:340 in collect(::Base.Generator{Array{Task,1},Base.#wait}) at ./array.jl:308 in preduce(::Function, ::Function, ::UnitRange{Int64}) at ./multi.jl:2002 in (::ProjectModule.##44#46{Int64,Array{Any,1},Array{Any,2},Int64})() at ./multi.jl:2011 in macro expansion at ./task.jl:326 [inlined] in #OuterSim#43(::Int64, ::Int64, ::Int64, ::Array{Any,1}, ::Array{Any,2}, ::Function, ::Int64) at /home/ubuntu/dynhosp/DataStructs.jl:1321 in (::ProjectModule.#kw##OuterSim)(::Array{Any,1}, ::ProjectModule.#OuterSim, ::Int64) at ./:0 in include_from_node1(::String) at ./loading.jl:488 in process_options(::Base.JLOptions) at ./client.jl:262 in _start() at ./client.jl:318 while loading /home/ubuntu/dynhosp/Run.jl, in expression starting on line 9 And finally: ERROR (unhandled task failure): On worker 9: ArgumentError: Dict(kv): kv needs to be an iterator of tuples or pairs in Type at ./dict.jl:388 in CalcWTP at /home/ubuntu/dynhosp/DataStructs.jl:728 in WTPMap at /home/ubuntu/dynhosp/DataStructs.jl:747 in #PSim#32 at /home/ubuntu/dynhosp/DataStructs.jl:1024 in #45 at ./multi.jl:2016 in #625 at ./multi.jl:1421 in run_work_thunk at ./multi.jl:1001 in macro expansion at ./multi.jl:1421 [inlined] in #624 at ./event.jl:68 in #remotecall_fetch#606(::Array{Any,1}, ::Function, ::Function, ::Base.Worker, ::Function, ::Vararg{Any,N}) at ./multi.jl:1070 in remotecall_fetch(::Function, ::Base.Worker, ::Function, ::Vararg{Any,N}) at ./multi.jl:1062 in #remotecall_fetch#609(::Array{Any,1}, ::Function, ::Function, ::Int64, ::Function, ::Vararg{Any,N}) at ./multi.jl:1080 in remotecall_fetch(::Function, ::Int64, ::Function, ::Vararg{Any,N}) at ./multi.jl:1080 in (::Base.##667#668{Base.#+,ProjectModule.##45#47{Int64,Array{Any,1},Array{Any,2}},UnitRange{Int64},Array{UnitRange{Int64},1}})() at ./multi.jl:1998 The actual function I am calling is: function OuterSim(MCcount::Int; T1::Int64 = 3, dim1::Int64 = 290, dim2::Int64 = 67, fi = fips, da = data05) outp = @sync @parallel (+) for j = 1:MCcount Texas = MakeNew(fi, da); eq_patients = NewPatients() neq_patients = NewPatients() ResultsOut(NewSim(T1, Texas, eq_patients), PSim(T1, neq_patients); T = T1) end outp[:,1] = outp[:,1]/MCcount return outp end I added the "@sync" following the suggestion of a colleague here - I am not sure it's necessary. (FWIW - I get the errors above on Ubuntu whether I include it or not.) This code *does* run and terminate without error on my own home machine (running OS-X, also v0.5), which has only four cores. I would love your feedback! Thanks - AB