Segfault when deserializing on worker #11397 Closed andreasnoack opened this Issue on May 21, 2015 · 6 comments Projects None yet Labels bug parallel Milestone No milestone Assignees @JeffBezanson JeffBezanson 3 participants @andreasnoack @yuyichao @JeffBezanson Notifications You’re not receiving notifications from this thread. @andreasnoack The Julia Language member andreasnoack commented on May 21, 2015 The following function causes a segfault on the workers when it is called, wrongfully, without arguments function star(f1, f2, f3, args...) out1 = [remotecall(p, () -> f1(map(localpart, args)...)) for p = workers()] out2 = f2(map(fetch, out1), args...) out3 = RemoteRef[] for i = 1:length(workers()) p = workers()[i] out2i = out2[i] push!(out3, remotecall(p, () -> f3(out2i, map(localpart, args)...))) end return DArray(out3) end An example is julia> star((in1, in2, in3) -> dot(in1,in2), (out, in...) -> fill(sum(out), nworkers()), (out, in1, in2, in3) -> out*in3) signal (11): Segmentation fault: 11 jl_cellref at /Users/andreasnoack/julia-dev/src/./julia.h:693 jl_new_lambda_info at /Users/andreasnoack/julia-dev/src/alloc.c:333 deserialize at serialize.jl:478 handle_deserialize at serialize.jl:403 deserialize at serialize.jl:462 handle_deserialize at serialize.jl:403 anonymous at task.jl:825 jl_apply at /Users/andreasnoack/julia-dev/src/task.c:234 Worker 2 terminated.ERROR: ProcessExitedException() in yieldto at ./task.jl:21 in wait at ./task.jl:309 in wait at ./task.jl:225 in wait_full at ./multi.jl:572 in remotecall_fetch at multi.jl:672 in call_on_owner at ./multi.jl:719 in fetch at multi.jl:727 in map at ./essentials.jl:138 in star at /Users/andreasnoack/Desktop/alan.jl:11 ERROR (unhandled task failure): EOFError: read end of file in yieldto at ./task.jl:21 in wait at ./task.jl:309 in wait at ./task.jl:225 in wait_full at ./multi.jl:572 in remotecall_fetch at multi.jl:672 in call_on_owner at ./multi.jl:719 in fetch at multi.jl:727 in map at ./essentials.jl:138 in star at /Users/andreasnoack/Desktop/alan.jl:11 My system is julia> versioninfo() Julia Version 0.4.0-dev+4937 Commit 9e8badc* (2015-05-21 21:16 UTC) Platform Info: System: Darwin (x86_64-apple-darwin14.3.0) CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 @yuyichao yuyichao referenced this issue on May 22, 2015 Closed SegFault in `PyCall` #11395 @yuyichao The Julia Language member yuyichao commented on May 22, 2015 What's the definition of localpart ? @andreasnoack The Julia Language member andreasnoack commented on May 23, 2015 It is defined here https://github.com/JuliaParallel/DistributedArrays.jl/blob/71c2b1921a9c9eb97432f89d4149645c3bbe4fcb/src/DistributedArrays.jl#L223 You'll have to load DistributedArrays.jl. I also added the fallback method localpart(x)=x. Sorry for not including this information in first posting. @yuyichao The Julia Language member yuyichao commented on May 23, 2015 The ccall that triggers the segfault is julia> ex = Expr(:lambda, Any[], Any[Any[], Any[], Any[Any[:f1, Function, 1], Any[:args, Tuple{1}, 0]], Expr(:body, Expr(:return, Expr(:call, TopNode(:_apply), :call, :f1, Expr(:call, :map, :localpart, :args))))], Any[]) :($(Expr(:lambda, Any[], Any[Any[],Any[],Any[Any[:f1,Function,1],Any[:args,Tuple{1},0]],:(begin return (top(_apply))(call,f1,map(localpart,args)) end)], Any[]))) julia> ccall(:jl_new_lambda_info, Any, (Any, Any), ex, false) @yuyichao The Julia Language member yuyichao commented on May 23, 2015 From the source code it seems that it doesn't do any check on the AST and directly passes it to jl_new_lambda_info. I don't know what's the convention here, is deserialize supposed to be robust (i.e. no crash) with whatever random input or should it only accept correct input? I'm also not so familiar with the format of the ast but the Tuple{1} there looks suspicious (Edit: actually changing Tuple{1} to Tuple{Any} still reproduces the segfault...) @yuyichao The Julia Language member yuyichao commented on May 23, 2015 BTW, is there a way to intercept/record the communication between processes to make it easier to debug?.... @andreasnoack The Julia Language member andreasnoack commented on May 23, 2015 Thanks for looking into this. I don't know if it is possible to intercept the communication. It would be very useful. @JeffBezanson JeffBezanson self-assigned this on May 27, 2015 @JeffBezanson JeffBezanson added bug parallel labels on May 27, 2015 @JeffBezanson JeffBezanson added a commit that closed this issue on May 27, 2015 @JeffBezanson fix #11397 94d8164 @JeffBezanson JeffBezanson closed this in 94d8164 on May 27, 2015 @mbauman mbauman added a commit to mbauman/julia that referenced this issue on Jun 6, 2015 @JeffBezanson fix #11397 59c0b11 @tkelman tkelman added a commit to tkelman/julia that referenced this issue on Jun 6, 2015 @JeffBezanson fix #11397 1874ef5