Segfault for everywhere importall using 12558 Closed rened opened this Issue on Aug 11, 2015 ยท 18 comments Projects None yet Labels parallel Milestone No milestone Assignees No one assigned 5 participants rened jakebolewski parpwhick amitmurthy alyst Notifications rened The Julia Language member rened commented on Aug 11, 2015 When using the following code in a file funcd-jl : module funcd using Compat export len, range len a length a len T,N a::AbstractArray T,N size a,N import Base.range range a 1:len a end by running it with julia run-jl with a run-jl file containing: addprocs 3 everywhere importall funcd results in a segfault: WARNING: replacing module funcd WARNING: replacing module funcd WARNING: replacing module funcd exception on exception on WARNING: Method definition range Any in module funcd at Users rene BTSync code git funcd funcd-jl:10 overwritten in module funcd at Users rene BTSync code git funcd funcd-jl:10. 4: 3: signal 11 : Segmentation fault: 11 mtcache_hash_lookup at Users rene local devjulia src gf.c:152-jl_apply_generic at Users rene local devjulia src gf.c:1630 showerror at replutil-jl:72 showerror at replutil-jl:83-jlcall___showerror 160___21217 at unknown line -jl_apply at Users rene local devjulia src gf.c:1658 julia_showerror_21216 at unknown line -jlcall_showerror_21216 at unknown line showerror at replutil-jl:91 julia_showerror_21214 at unknown line -jlcall_showerror_21214 at unknown line -jl_apply at Users rene local devjulia src gf.c:1658 anonymous at client-jl:88 with_output_color at util-jl:330-jl_apply at Users rene local devjulia src gf.c:1658 display_error at client-jl:86-jl_apply at Users rene local devjulia src gf.c:1658 run_work_thunk at multi-jl:651 run_work_thunk at multi-jl:657 signal 11 : Segmentation fault: 11-jlcall_run_work_thunk_21149 at unknown line -jl_apply at Users rene local devjulia src gf.c:1658 anonymous at task-jl:11-jl_apply at Users rene local devjulia src task.c:233 mtcache_hash_lookup at Users rene local devjulia src gf.c:152-jl_apply_generic at Users rene local devjulia src gf.c:1630 showerror at replutil-jl:72 showerror at replutil-jl:83-jlcall___showerror 160___21221 at unknown line -jl_apply at Users rene local devjulia src gf.c:1658 julia_showerror_21220 at unknown line -jlcall_showerror_21220 at unknown line showerror at replutil-jl:91 julia_showerror_21218 at unknown line -jlcall_showerror_21218 at unknown line -jl_apply at Users rene local devjulia src gf.c:1658 anonymous at client-jl:88 with_output_color at util-jl:330 A git bisect points to 7207a8a commit 7207a8a43e076576d6d6a6161ac75d2ae3391a6e Author: Amit Murthy amit.murthy gmail.com Date: Wed Jun 10 21:20:12 2015 +0530 added support for different topologies When executing the following code directly i.e. in the REPL : module funcd using Compat export len, range len a length a len T,N a::AbstractArray T,N size a,N import Base.range range a 1:len a end addprocs 3 everywhere importall funcd the code passes. cc amitmurthy jakebolewski The Julia Language member jakebolewski commented on Aug 11, 2015 Isn't this a is a dup of 12381? jakebolewski jakebolewski added the parallel label on Aug 11, 2015 rened The Julia Language member rened commented on Aug 11, 2015 i dont' think so - the error occurs at a different place in the Clanguage code and the workaround in 12381 comment adding sleep 0.5 does not help. This time the git bisect at least seems to point to a reasonable commit for causing this. rened rened changed the title from Segfault for everywhere importall to Segfault for everywhere importall using on Aug 11, 2015 rened The Julia Language member rened commented on Aug 11, 2015 ps: also happens for using instead of importall. rened The Julia Language member rened commented on Aug 11, 2015 This error occurs on OSX - I can't reproduce it on Linux. rened The Julia Language member rened commented on Aug 11, 2015 One last comment: it seems that everywhere is no longer necessary for imports anyway? Everything works nicely when I omit everywhere. jakebolewski The Julia Language member jakebolewski commented on Aug 11, 2015 sure the code will get loaded on the master process, but none of the workers should load the code this also assumes a shared file system . parpwhick parpwhick commented on Aug 11, 2015 I get a similar error, but if I precompile the module with compilecache, then everywhere using works correctly. rened The Julia Language member rened commented on Aug 11, 2015 jakebolewski I thought so too, therefore the everywhere. But this works which I think did not work in the 0.3 early 0.4 days : julia addprocs 3 3-element Array Int64,1 : 2 3 4 julia using JSON julia fetchfrom 2 json 1 1 So while defining a new function needs to look like everywhere func hi , otherwise it is not visible on the workers, loading modules seems to be across all processes now. So basically, everything is usable, but it would still be good not to crash on a everywhere import statement which is a no-op anyway? amitmurthy The Julia Language member amitmurthy commented on Aug 11, 2015 I can see it on Linux in the current master. WARNING: replacing module funcd WARNING: replacing module funcd WARNING: Method definition range Any in module funcd at tmp funcd-jl:10 overwritten in module funcd at tmp funcd-jl:10. WARNING: replacing module funcd WARNING: Method definition range Any in module funcd at tmp funcd-jl:10 overwritten in module funcd at tmp funcd-jl:10. signal 11 : Segmentation fault -jl_object_id at home amitm Work julia julia usr bin .. lib libjulia.so unknown line unknown function ip: 0x7f514d94cc78 unknown function ip: 0x7f514d952bb5 unknown function ip: 0x7f514d95b58c -jl_apply_generic at home amitm Work julia julia usr bin .. lib libjulia.so unknown line serialize at serialize-jl:414 -jl_apply_generic at home amitm Work julia julia usr bin .. lib libjulia.so unknown line serialize at serialize-jl:414-jl_apply_generic at home amitm Work julia julia usr bin .. lib libjulia.so unknown line serialize at serialize-jl:414-jl_apply_generic at home amitm Work julia julia usr bin .. lib libjulia.so unknown line serialize at serialize-jl:414-jl_apply_generic at home amitm Work julia julia usr bin .. lib libjulia.so unknown line serialize at serialize-jl:414-jl_apply_generic at home amitm Work julia julia usr bin .. lib libjulia.so unknown line send_msg_ at multi-jl:222 send_msg_now at multi-jl:173-jl_apply_generic at home amitm Work julia julia usr bin .. lib libjulia.so unknown line deliver_result at multi-jl:805-jlcall_deliver_result_21311 at unknown line -jl_apply_generic at home amitm Work julia julia usr bin .. lib libjulia.so unknown line anonymous at task-jl:890 unknown function ip: 0x7f514d9bf560 unknown function ip: nil I suspect it is the same as 12381, specifically 12381 comment rened The Julia Language member rened commented on Aug 11, 2015 Ok, true. So the only perhaps valueable info from this issue is that it does not occur before 7207a8a. But then again, perhaps this bisect is red herring, as well. Please feel free to close this issue when you think 12381 is enough for tracking this. amitmurthy The Julia Language member amitmurthy commented on Aug 11, 2015 Replacing addprocs 3 with addprocs 2 results in the following error printed no segfault in this case : WARNING: replacing module funcd WARNING: replacing module funcd WARNING: Method definition range Any in module funcd at tmp funcd-jl:10 overwritten in module funcd at tmp funcd-jl:10. ERROR: LoadError: On worker 3: LoadError tmp funcd-jl ,7,TypeError :getfield, ,DataType,Any : serialize-jl, line 400: ,NewvarNode :t ,NewvarNode :nf ,NewvarNode symbol s332 ,: tag Base.Serializer.sertag x::TypeError ::Int32 ,: line 401: , : unless Base.slt_int 0, Base.box Int64, Base.sext_int Int64,tag::Int32 ::Int64 ::Bool goto 0 ,: line 402: ,: GenSym 2 top getfield s::SerializationState TCPSocket ,:io ::TCPSocket , : unless Base.slt_int tag::Int32,Base.Serializer.VALUE_TAGS ::Bool goto 15 ,: Base.write GenSym 2 , top vect Base.box UInt8, Base.checked_trunc_uint UInt8,0 ::UInt8 ::Array UInt8,1 ::Int64 ,: goto 15 ,: 15: ,: return Base.write GenSym 2 , top vect Base.box UInt8, Base.checked_trunc_uint UInt8,tag::Int32 ::UInt8 ::Array UInt8,1 ::Int64 ,: 0: ,: line 404: ,: t Base.Serializer.typeof x::TypeError ::Type TypeError ,: line 405: ,: nf Base.Serializer.nfields t::Type TypeError ::Int64 ,: line 406: , : unless nf::Int64 0::Bool goto 1 ,: s332 Base.slt_int 0, Base.box Int64, Base.sext_int Int64, top getfield t::Type TypeError ,:size ::Int32 ::Int64 ::Bool ,: goto 2 ,: 1: ,: s332 false ,: 2: , : unless s332::Bool goto 3 ,: line 407: ,: Base.Serializer.serialize_type s::SerializationState TCPSocket ,t::Type TypeError ::Union Int64,Void ,: line 408: ,: GenSym 3 top getfield s::SerializationState TCPSocket ,:io ::TCPSocket ,: return Base.throw $ Expr :new, : top getfield Base,:MethodError ::Type MethodError , : Base.write , : top tuple GenSym 3 ,x::TypeError ::Tuple TCPSocket,TypeError ::Union ,: goto 12 ,: 3: ,: line 410: , : unless top getfield t::Type TypeError ,:mutable ::Bool goto 5 , : unless Base.Serializer.serialize_cycle s::SerializationState TCPSocket ,x::TypeError ::Bool goto 4 ,: return ,: 4: ,: goto 5 ,: 5: ,: line 411: ,: Base.Serializer.serialize_type s::SerializationState TCPSocket ,t::Type TypeError ::Union Int64,Void ,: line 412: ,: GenSym 0 $ Expr :new, UnitRange Int64 , 1, : top getfield Base.Intrinsics,:select_value ::I Base.sle_int 1,nf::Int64 ::Bool,nf::Int64, Base.box Int64, Base.sub_int 1,1 ::Int64 ::Int64 ,: s333 top getfield GenSym 0 ,:start ::Int64 , : unless Base.box Base.Bool, Base.not_int s333::Int64 Base.box Base.Int, Base.add_int top getfield GenSym 0 ,:stop ::Int64,1 ::Int64::Bool ::Bool goto 7 ,: 8: ,: GenSym 5 s333::Int64 ,: GenSym 6 Base.box Base.Int, Base.add_int s333::Int64,1 ::Int64 ,: i GenSym 5 ,: s333 GenSym 6 ,: line 413: , : unless Base.Serializer.isdefined x::TypeError,i::Int64 ::Bool goto 10 ,: line 414: ,: Base.Serializer.serialize s::SerializationState TCPSocket , Base.Serializer.getfield x::TypeError,i::Int64 ,: goto 11 ,: 10: ,: line 416: ,: GenSym 4 top getfield s::SerializationState TCPSocket ,:io ::TCPSocket ,: Base.write GenSym 4 , top vect Base.box UInt8, Base.checked_trunc_uint UInt8,Base.Serializer.UNDEFREF_TAG ::UInt8 ::Array UInt8,1 ::Int64 ,: 11: ,: 9: , : unless Base.box Base.Bool, Base.not_int Base.box Base.Bool, Base.not_int s333::Int64 Base.box Base.Int, Base.add_int top getfield GenSym 0 ,:stop ::Int64,1 ::Int64::Bool ::Bool ::Bool goto 8 ,: 7: ,: 6: ,: return ,: 12: in include_string at loading-jl:225 in include_from_node1 at . loading-jl:266 in require at . loading-jl:202 in eval at sysimg-jl:14 in anonymous at multi-jl:1349 in anonymous at multi-jl:889 in run_work_thunk at multi-jl:642 in anonymous at task-jl:889 in remotecall_fetch at multi-jl:728 in anonymous at task-jl:447 in sync_end at . task-jl:413 in anonymous at multi-jl:422 in include at . boot-jl:254 in include_from_node1 at . loading-jl:263 in process_options at . client-jl:308 in _start at . client-jl:411 while loading tmp run-jl, in expression starting on line 3 I don't know how to interpret it. Does it help in identifying the cause of the segfault? amitmurthy The Julia Language member amitmurthy commented on Aug 11, 2015 FWIW, this is Linux on a macbookpro, so maybe the segfault has some relation to the hardware too? System: Linux x86_64-linux-gnu CPU: Intel R Core TM i7-4770HQ CPU 2.20GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 rened The Julia Language member rened commented on Aug 11, 2015 mine is System: Darwin x86_64-apple-darwin13.4.0 CPU: Intel R Core TM i7-4870HQ CPU 2.50GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 amitmurthy amitmurthy referenced this issue on Aug 12, 2015 Closed track already loaded modules 12581 amitmurthy The Julia Language member amitmurthy commented on Aug 12, 2015 This warning - WARNING: replacing module funcd means that it is being loaded twice....that is probably a pointer to what is going wrong. rened The Julia Language member rened commented on Aug 12, 2015 amitmurthy I believe each import statement is executed on all workers? using X on master loads the package on all workers. The redundant everywhere triggers loading once from each worker in turn actually loading on all other workers as well . everywhere seems to be completely redundant and racy for importing? amitmurthy The Julia Language member amitmurthy commented on Aug 12, 2015 Ah! OK. phrb phrb referenced this issue in phrb StochasticSearch-jl on Aug 14, 2015 Closed Segfault with multiple processes on v0.4. 4 rened The Julia Language member rened commented on Aug 17, 2015 I can no longer reproduce this using current master 4d8ca6b , neither on OSX nor Linux. rened rened closed this on Aug 17, 2015 alyst alyst commented on Aug 17, 2015 It was never segfaulting for me, but with the very latest master f3217a8 I still get similar exceptions when trying to do everywhere on 12 workers: ERROR: On worker 5: LoadError ... ,61,LoadError ... ,4,LoadError ... ,4,UndefVarError : ... in include_string at loading-jl:226 in include_from_node1 at . loading-jl:267 in require at . loading-jl:203 in include_string at loading-jl:226 in include_from_node1 at . loading-jl:267 in anonymous at no file:28 in include_string at loading-jl:226 in include_from_node1 at . loading-jl:267 in eval at . sysimg-jl:14 in anonymous at multi-jl:1348 in anonymous at multi-jl:889 in run_work_thunk at multi-jl:642 in anonymous at task-jl:889 in remotecall_fetch at multi-jl:728 in remotecall_fetch at multi-jl:731 in anonymous at multi-jl:1350