Errors when loading parallel packages with ssh tunnel #16778 Closed EthanAnderes opened this Issue on Jun 5 · 4 comments Projects None yet Labels parallel Milestone No milestone Assignees No one assigned 4 participants @EthanAnderes @RossBoylan @vtjnash @tkelman Notifications You’re not receiving notifications from this thread. @EthanAnderes EthanAnderes commented on Jun 5 Ref: google groups I seem to be encountering a problem using packages in parallel when using ssh tunneling to set up parallel works on a remote server. Here is how I am setting up the workers. _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_) | Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type "?help" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.4.6-pre+37 (2016-05-27 22:56 UTC) _/ |\__'_|_|_|\__'_| | Commit 430601c (9 days old release-0.4) |__/ | x86_64-apple-darwin15.5.0 julia> machines = ["anderes@xxx.xxx.edu", "anderes@xxx.xxx.edu"] 2-element Array{ASCIIString,1}: "anderes@xxx.xxx.edu" "anderes@xxx.xxx.edu" julia> addprocs( machines, tunnel=true, dir="/home/anderes/", exename="/usr/local/bin/julia", topology=:master_slave, ) 2-element Array{Int64,1}: 2 3 After this, all of the following four code blocks fail (I'll only show the errors on just the last one for readability). Note, these commands work fine when I launch the workers on the same machine as the master node (without ssh tunneling). import Dierckx @everywhere using Dierckx @everywhere spl = Dierckx.Spline1D([1., 2., 3.], [1., 2., 3.], k=2) import Dierckx using Dierckx @everywhere spl = Dierckx.Spline1D([1., 2., 3.], [1., 2., 3.], k=2) using Dierckx @everywhere spl = Dierckx.Spline1D([1., 2., 3.], [1., 2., 3.], k=2) @everywhere using Dierckx @everywhere spl = Dierckx.Spline1D([1., 2., 3.], [1., 2., 3.], k=2) The last give the following errors julia> @everywhere using Dierckx WARNING: node state is inconsistent: node 2 failed to load cache from /Users/ethananderes/.julia/lib/v0.4/Dierckx.ji WARNING: node state is inconsistent: node 3 failed to load cache from /Users/ethananderes/.julia/lib/v0.4/Dierckx.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Dierckx.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Dierckx.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Compat.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Compat.ji ERROR: On worker 2: LoadError: InitError: Dierckx not properly installed. Run Pkg.build("Dierckx") in __init__ at /Users/ethananderes/.julia/v0.4/Dierckx/src/Dierckx.jl:27 in include_string at loading.jl:282 in include_from_node1 at ./loading.jl:323 in require at ./loading.jl:259 in eval at ./sysimg.jl:14 in anonymous at multi.jl:1394 in anonymous at multi.jl:923 in run_work_thunk at multi.jl:661 [inlined code] from multi.jl:923 in anonymous at task.jl:63 during initialization of module Dierckx while loading /Users/ethananderes/.julia/v0.4/Dierckx/src/Dierckx.jl, in expression starting on line 714 in remotecall_fetch at multi.jl:747 in remotecall_fetch at multi.jl:750 in anonymous at multi.jl:1396 ...and 1 other exceptions. in sync_end at ./task.jl:413 in anonymous at multi.jl:1405 julia> @everywhere spl = Dierckx.Spline1D([1., 2., 3.], [1., 2., 3.], k=2) ERROR: On worker 2: error compiling __Spline1D#6__: could not load library "/Users/ethananderes/.julia/v0.4/Dierckx/src/../deps/src/ddierckx/libddierckx" /Users/ethananderes/.julia/v0.4/Dierckx/src/../deps/src/ddierckx/libddierckx: cannot open shared object file: No such file or directory in eval at ./sysimg.jl:14 in anonymous at multi.jl:1394 in anonymous at multi.jl:923 in run_work_thunk at multi.jl:661 [inlined code] from multi.jl:923 in anonymous at task.jl:63 in remotecall_fetch at multi.jl:747 in remotecall_fetch at multi.jl:750 in anonymous at multi.jl:1396 ...and 1 other exceptions. in sync_end at ./task.jl:413 in anonymous at multi.jl:1405 julia> The errors seem to be package dependent. Here are similar code blocks for the Distributions package. I'm only getting ERROR on the last one when using ssh tunneling, yet each one works fine when the workers are launched with addprocs(2)) julia> import Distributions WARNING: node state is inconsistent: node 2 failed to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji WARNING: node state is inconsistent: node 3 failed to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji julia> @everywhere using Distributions WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/PDMats.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/PDMats.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Compat.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Compat.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/StatsFuns.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/StatsFuns.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/StatsBase.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/StatsBase.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/ArrayViews.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/ArrayViews.ji julia> @everywhere spl = Distributions.Normal(0,1) julia> julia> @everywhere using Distributions WARNING: node state is inconsistent: node 2 failed to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/PDMats.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Compat.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji WARNING: node state is inconsistent: node 3 failed to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/StatsFuns.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/PDMats.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/Compat.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/StatsBase.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/StatsFuns.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/ArrayViews.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/StatsBase.ji WARNING: deserialization checks failed while attempting to load cache from /Users/ethananderes/.julia/lib/v0.4/ArrayViews.ji julia> @everywhere spl = Distributions.Normal(0,1) julia> julia> using Distributions WARNING: node state is inconsistent: node 2 failed to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji WARNING: node state is inconsistent: node 3 failed to load cache from /Users/ethananderes/.julia/lib/v0.4/Distributions.ji julia> @everywhere spl = Distributions.Normal(0,1) ERROR: On worker 2: UndefVarError: Distributions not defined in eval at ./sysimg.jl:14 in anonymous at multi.jl:1394 in anonymous at multi.jl:923 in run_work_thunk at multi.jl:661 [inlined code] from multi.jl:923 in anonymous at task.jl:63 in remotecall_fetch at multi.jl:747 in remotecall_fetch at multi.jl:750 in anonymous at multi.jl:1396 ...and 1 other exceptions. in sync_end at ./task.jl:413 in anonymous at multi.jl:1405 @tkelman tkelman added the parallel label on Jun 6 @RossBoylan RossBoylan referenced this issue on Jun 6 Closed Errors when loading parallel packages on same machine #16788 @RossBoylan RossBoylan commented on Jun 6 This may be related to #16788, which shows similar errors using parallelism on one machine. @vtjnash The Julia Language member vtjnash commented on Jun 12 The binary build on each client needs to be the same, not just the source checkout. @vtjnash vtjnash closed this on Jun 12 @EthanAnderes EthanAnderes commented on Jun 13 @vtjnash Maybe I don't understand what you mean by binary build, but after the source checkout I complied julia...in particular, when launching julia on both machines it displays the exact same version number. @vtjnash The Julia Language member vtjnash commented on Jun 13 The exact (bitwise) same usr/lib/julia/* binaries, which requires copying them to each machine