"addprocs(1); using ModuleName" fails on OSX on current master #6556 Closed rened opened this Issue on Apr 17, 2014 · 18 comments Projects None yet Labels bug parallel regression Milestone No milestone Assignees No one assigned 4 participants @rened @amitmurthy @tanmaykm @JeffBezanson Notifications You’re not receiving notifications from this thread. @rened The Julia Language member rened commented on Apr 17, 2014 The following hangs for about a minute, then throws an error: ./julia -e "addprocs(1); using HDF5" Worker 2 terminated.ERROR: ProcessExitedException() in remotecall_fetch at multi.jl:681 in require at loading.jl:52 It is independent of the actual module, and everything works when no workers are being added. A git bisect points to 588d6ff. The error occurs only on OSX, not on Linux. @JeffBezanson JeffBezanson added bug regression parallel labels on Apr 17, 2014 @amitmurthy The Julia Language member amitmurthy commented on Apr 17, 2014 cc @tanmaykm could you take a look? @tanmaykm The Julia Language member tanmaykm commented on Apr 17, 2014 I'm not able to replicate the issue. I get: >> julia -e "addprocs(1); using HDF5; println(remotecall_fetch(2, myid))" 2 Tried this both with and without an iface address, and with variations in the command. And I'm on: Julia Version 0.3.0-prerelease+2652 Commit 8a5a3fc* (2014-04-17 17:20 UTC) Platform Info: System: Darwin (x86_64-apple-darwin13.1.0) CPU: Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY) LAPACK: libopenblas LIBM: libopenlibm @rened could you give some more info on your environment? @rened The Julia Language member rened commented on Apr 17, 2014 I just wiped and rebuilt julia again, I still see the error. I also tried as a different, freshly set up user on the same machine (running stock bash and everything), same result: rene@cirdesk3 ~/l/devjulia (master) [130]> ./julia -e "versioninfo(); addprocs(1); using HDF5" Julia Version 0.3.0-prerelease+2652 Commit 8a5a3fc (2014-04-17 17:20 UTC) Platform Info: System: Darwin (x86_64-apple-darwin13.1.0) CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY) LAPACK: libopenblas LIBM: libopenlibm Worker 2 terminated.ERROR: ProcessExitedException() in remotecall_fetch at multi.jl:681 in require at loading.jl:52 Strangely, it works on my laptop: rene@lemonlab ~/l/devjulia (master)> ./julia -e "versioninfo(); addprocs(1); using HDF5" Julia Version 0.3.0-prerelease+2652 Commit 8a5a3fc (2014-04-17 17:20 UTC) Platform Info: System: Darwin (x86_64-apple-darwin13.1.0) CPU: Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY) LAPACK: libopenblas LIBM: libopenlibm rene@lemonlab ~/l/devjulia (master)> Is there any other information I could provide? Thanks for looking into this! @rened The Julia Language member rened commented on Apr 17, 2014 One more thing: it works on the machine with a self-assigned hostname (lemonlab.local), the error occurs only on the one with a full name (cirdesk3.meduniwien.ac.at) @amitmurthy The Julia Language member amitmurthy commented on Apr 17, 2014 Could you start a plain REPL and print the following? println(Base.LPROC.bind_addr) println(getipaddr()) @amitmurthy The Julia Language member amitmurthy commented on Apr 17, 2014 Also println(getaddrinfo(string(getipaddr()))) @rened The Julia Language member rened commented on Apr 18, 2014 I get the following: | | |_| | | | (_| | | Version 0.3.0-prerelease+2652 (2014-04-17 17:20 UTC) _/ |\__'_|_|_|\__'_| | Commit 8a5a3fc (0 days old master) |__/ | x86_64-apple-darwin13.1.0 julia> println(Base.LPROC.bind_addr) 149.148.108.179 julia> println(getipaddr()) 149.148.108.179 julia> println(getaddrinfo(string(getipaddr()))) 149.148.108.179 I realized only now that the following even works: | | |_| | | | (_| | | Version 0.3.0-prerelease+2652 (2014-04-17 17:20 UTC) _/ |\__'_|_|_|\__'_| | Commit 8a5a3fc (0 days old master) |__/ | x86_64-apple-darwin13.1.0 julia> addprocs(1) 1-element Array{Any,1}: 2 julia> @fetchfrom 2 myid() 2 So it is only related to using? Weird. I'll try to step through the code on both machines and see what the difference is. @amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 I suspect using will work on the REPL after an addprocs. Seems to be an issue only with the command line option -e with a full domain name. Can you see if julia -p 2 works? @amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 Also on the REPL, try eval(Main,parse_input_line("addprocs(1); using HDF5; println(remotecall_fetch(2, myid))")) @rened The Julia Language member rened commented on Apr 18, 2014 I get the following: rene@cirdesk3 ~/l/devjulia (master)> ./julia -p 2 ... | | |_| | | | (_| | | Version 0.3.0-prerelease+2652 (2014-04-17 17:20 UTC) _/ |\__'_|_|_|\__'_| | Commit 8a5a3fc (0 days old master) |__/ | x86_64-apple-darwin13.1.0 julia> workers() 2-element Array{Int64,1}: 2 3 julia> using JSON # this works! julia> using HDF5 Worker 3 terminated.Worker 2 terminated. ERROR: ProcessExitedException() in remotecall_fetch at multi.jl:681 in require at loading.jl:52 julia> eval(Main,parse_input_line("addprocs(1); using HDF5; println(remotecall_fetch(2, myid))")) ERROR: parse_input_line not defined julia> eval(Main,parse("addprocs(1); using HDF5; println(remotecall_fetch(2, myid))")) ERROR: ProcessExitedException() in remotecall_fetch at multi.jl:686 julia> It seems to work for some modules, not for others - I am just going through them to try to see whether there is a pattern. @amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 parse_input_line should be Base.parse_input_line, but I don't think it will make any difference...One thing that HDF5 differs between OSX and Linux is the use of the Homebrew package, though I don't know why that should cause a problem with the changed addprocs and a full domain name! @rened The Julia Language member rened commented on Apr 18, 2014 I removed all packages I had and installed them again after a Pkg.update(). Then I ran rene@cirdesk3 ~/l/julia (master) [1]> for x in (ls ~/.julia/v0.3/) time ./julia -e "print(\"$x\"); addprocs(1); using $x; println(\" ok\")" end >> parallel.txt ^^ parallel.txt I also did this without the addprocs, yielding nonparallel.txt. The gists are here: nonparallel.txt: https://gist.github.com/f9e0ffbfa62867cde050 parallel.txt: https://gist.github.com/cd3213ccc671754d9848 Some packages fail, most work. Notably, in this exeriment HDF5 did work. Trying again immediately after it failed. I mostly see this error for these packages: Compose, Gadfly, HDF5, HttpServer, Images, MAT, ProfileView. But I can't find a similarity between them. Compose for example is very small, has only 2 dependencies, no dependency on BinDeps or Homebrew. It strickes me as odd that the timings go up so dramatically for the case when a worker was added. On my other machine the times double, but here they skyrocket. I then profiled the parallel using HDF5, which returned the ProcessExitedException: Profile.init(10^7, 0.01) addprocs(1) @profile eval(parse("using HDF5")) Resulting gist: https://gist.github.com/4fd22407b3cf74657e0b Most time is spent in inference.jl and reflection.jl - but why would compilation suddenly take so much longer? I'm without a clue - anything else I should try and report? @rened rened changed the title from "addprocs(1); using ModuleName" fails on OSX on current master to "addprocs(1); using ModuleName" fails on OSX on prerelease+2652 (8a5a3fc*) on Apr 18, 2014 @rened rened changed the title from "addprocs(1); using ModuleName" fails on OSX on prerelease+2652 (8a5a3fc*) to "addprocs(1); using ModuleName" fails on OSX on current master on Apr 18, 2014 @amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 Yeah, the timings are definitely way off. I suspect a getaddrinfo that I introduced to be the cause of that. Will submit a patch to fix that. No idea for the failures of using yet. @rened The Julia Language member rened commented on Apr 18, 2014 Just to be completely paranoid, I created a new user on the system, checked out and built the latest master, added HDF5 as the only package - to no avail. addprocs(1) followed by using HDF5 results in the ProcessExitedException. @rened The Julia Language member rened commented on Apr 18, 2014 ok, thanks, I'll try again then with that patch. @amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 The PR is here #6572 - it should work, but could you test once before a merge? @rened The Julia Language member rened commented on Apr 18, 2014 Yes, this fixed it! Thank you so much for the quick response! @rened The Julia Language member rened commented on Apr 18, 2014 Fixed by #6572 @rened rened closed this on Apr 18, 2014