addprocs 1 ; using ModuleName fails on OSX on current master 6556 Closed rened opened this Issue on Apr 17, 2014 ยท 18 comments Projects None yet Labels bug parallel regression Milestone No milestone Assignees No one assigned 4 participants rened amitmurthy tanmaykm JeffBezanson Notifications rened The Julia Language member rened commented on Apr 17, 2014 The following hangs for about a minute, then throws an error: . julia -e addprocs 1 ; using HDFfive Worker 2 terminated.ERROR: ProcessExitedException in remotecall_fetch at multi-jl:681 in require at loading-jl:52 It is independent of the actual module, and everything works when no workers are being added. A git bisect points to 588d6ff. The error occurs only on OSX, not on Linux. JeffBezanson JeffBezanson added bug regression parallel labels on Apr 17, 2014 amitmurthy The Julia Language member amitmurthy commented on Apr 17, 2014 cc tanmaykm could you take a look? tanmaykm The Julia Language member tanmaykm commented on Apr 17, 2014 I'm not able to replicate the issue. I get: julia -e addprocs 1 ; using HDFfive; println remotecall_fetch 2, myid 2 Tried this both with and without an iface address, and with variations in the command. And I'm on: Julia Version 0.3.0-prerelease+2652 Commit 8a5a3fc 2014-04-17 17:20 UTC Platform Info: System: Darwin x86_64-apple-darwin13.1.0 CPU: Intel R Core TM 2 Duo CPU P7550 2.26GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY LAPACK: libopenblas LIBM: libopenlibm rened could you give some more info on your environment? rened The Julia Language member rened commented on Apr 17, 2014 I just wiped and rebuilt julia again, I still see the error. I also tried as a different, freshly set up user on the same machine running stock bash and everything , same result: rene cirdesk3 l devjulia master 130 . julia -e versioninfo ; addprocs 1 ; using HDFfive Julia Version 0.3.0-prerelease+2652 Commit 8a5a3fc 2014-04-17 17:20 UTC Platform Info: System: Darwin x86_64-apple-darwin13.1.0 CPU: Intel R Xeon R CPU E5520 2.27GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY LAPACK: libopenblas LIBM: libopenlibm Worker 2 terminated.ERROR: ProcessExitedException in remotecall_fetch at multi-jl:681 in require at loading-jl:52 Strangely, it works on my laptop: rene lemonlab l devjulia master . julia -e versioninfo ; addprocs 1 ; using HDFfive Julia Version 0.3.0-prerelease+2652 Commit 8a5a3fc 2014-04-17 17:20 UTC Platform Info: System: Darwin x86_64-apple-darwin13.1.0 CPU: Intel R Core TM i7-2677M CPU 1.80GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY LAPACK: libopenblas LIBM: libopenlibm rene lemonlab l devjulia master Is there any other information I could provide? Thanks for looking into this! rened The Julia Language member rened commented on Apr 17, 2014 One more thing: it works on the machine with a self-assigned hostname lemonlab.local , the error occurs only on the one with a full name cirdesk3.meduniwien.ac.at amitmurthy The Julia Language member amitmurthy commented on Apr 17, 2014 Could you start a plain REPL and print the following? println Base.LPROC.bind_addr println getipaddr amitmurthy The Julia Language member amitmurthy commented on Apr 17, 2014 Also println getaddrinfo string getipaddr rened The Julia Language member rened commented on Apr 18, 2014 I get the following: | | |_| | | | _| | | Version 0.3.0-prerelease+2652 2014-04-17 17:20 UTC _ |\__'_|_|_|\__'_| | Commit 8a5a3fc 0 days old master |__ | x86_64-apple-darwin13.1.0 julia println Base.LPROC.bind_addr 149.148.108.179 julia println getipaddr 149.148.108.179 julia println getaddrinfo string getipaddr 149.148.108.179 I realized only now that the following even works: | | |_| | | | _| | | Version 0.3.0-prerelease+2652 2014-04-17 17:20 UTC _ |\__'_|_|_|\__'_| | Commit 8a5a3fc 0 days old master |__ | x86_64-apple-darwin13.1.0 julia addprocs 1 1-element Array Any,1 : 2 julia fetchfrom 2 myid 2 So it is only related to using? Weird. I'll try to step through the code on both machines and see what the difference is. amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 I suspect using will work on the REPL after an addprocs. Seems to be an issue only with the command line option -e with a full domain name. Can you see if julia -p 2 works? amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 Also on the REPL, try eval Main,parse_input_line addprocs 1 ; using HDFfive; println remotecall_fetch 2, myid rened The Julia Language member rened commented on Apr 18, 2014 I get the following: rene cirdesk3 l devjulia master . julia -p 2 ... | | |_| | | | _| | | Version 0.3.0-prerelease+2652 2014-04-17 17:20 UTC _ |\__'_|_|_|\__'_| | Commit 8a5a3fc 0 days old master |__ | x86_64-apple-darwin13.1.0 julia workers 2-element Array Int64,1 : 2 3 julia using JSON this works! julia using HDFfive Worker 3 terminated.Worker 2 terminated. ERROR: ProcessExitedException in remotecall_fetch at multi-jl:681 in require at loading-jl:52 julia eval Main,parse_input_line addprocs 1 ; using HDFfive; println remotecall_fetch 2, myid ERROR: parse_input_line not defined julia eval Main,parse addprocs 1 ; using HDFfive; println remotecall_fetch 2, myid ERROR: ProcessExitedException in remotecall_fetch at multi-jl:686 julia It seems to work for some modules, not for others - I am just going through them to try to see whether there is a pattern. amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 parse_input_line should be Base.parse_input_line, but I don't think it will make any difference...One thing that HDFfive differs between OSX and Linux is the use of the Homebrew package, though I don't know why that should cause a problem with the changed addprocs and a full domain name! rened The Julia Language member rened commented on Apr 18, 2014 I removed all packages I had and installed them again after a Pkg.update . Then I ran rene cirdesk3 l julia master 1 for x in ls .julia v0.3 time . julia -e print \ $x\ ; addprocs 1 ; using $x; println \ ok\ end parallel.txt parallel.txt I also did this without the addprocs, yielding nonparallel.txt. The gists are here: nonparallel.txt: https: gist.github.com f9e0ffbfa62867cde050 parallel.txt: https: gist.github.com cd3213ccc671754d9848 Some packages fail, most work. Notably, in this exeriment HDFfive did work. Trying again immediately after it failed. I mostly see this error for these packages: Compose, Gadfly, HDFfive, HttpServer, Images, MAT, ProfileView. But I can't find a similarity between them. Compose for example is very small, has only 2 dependencies, no dependency on BinDeps or Homebrew. It strickes me as odd that the timings go up so dramatically for the case when a worker was added. On my other machine the times double, but here they skyrocket. I then profiled the parallel using HDFfive, which returned the ProcessExitedException: Profile.init 10 7, 0.01 addprocs 1 profile eval parse using HDFfive Resulting gist: https: gist.github.com 4fd22407b3cf74657e0b Most time is spent in inference-jl and reflection-jl - but why would compilation suddenly take so much longer? I'm without a clue - anything else I should try and report? rened rened changed the title from addprocs 1 ; using ModuleName fails on OSX on current master to addprocs 1 ; using ModuleName fails on OSX on prerelease+2652 8a5a3fc on Apr 18, 2014 rened rened changed the title from addprocs 1 ; using ModuleName fails on OSX on prerelease+2652 8a5a3fc to addprocs 1 ; using ModuleName fails on OSX on current master on Apr 18, 2014 amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 Yeah, the timings are definitely way off. I suspect a getaddrinfo that I introduced to be the cause of that. Will submit a patch to fix that. No idea for the failures of using yet. rened The Julia Language member rened commented on Apr 18, 2014 Just to be completely paranoid, I created a new user on the system, checked out and built the latest master, added HDFfive as the only package - to no avail. addprocs 1 followed by using HDFfive results in the ProcessExitedException. rened The Julia Language member rened commented on Apr 18, 2014 ok, thanks, I'll try again then with that patch. amitmurthy The Julia Language member amitmurthy commented on Apr 18, 2014 The PR is here 6572 - it should work, but could you test once before a merge? rened The Julia Language member rened commented on Apr 18, 2014 Yes, this fixed it! Thank you so much for the quick response! rened The Julia Language member rened commented on Apr 18, 2014 Fixed by 6572 rened rened closed this on Apr 18, 2014