fickle segmentation fault or bus error when using pmap #13806 Closed omalled opened this Issue on Oct 28, 2015 · 2 comments Projects None yet Labels parallel regression Milestone No milestone Assignees No one assigned 5 participants @omalled @vtjnash @malmaud @JeffBezanson @tkelman Notifications You’re not receiving notifications from this thread. @omalled omalled commented on Oct 28, 2015 I ran into some problems with julia crashing during calls to pmap. I put together the smallest example that I could come up with to reproduce the bug, but it isn't all that small. This issue seems pretty fickle. First, a module is required (I call it in M.jl in the directory where the test code will be run): module M function untransform(y::Vector, transformparams::Vector) return y end function transformfunction(f::Function, transformparams::Vector) function transformedf(y::Vector) x = untransform(y, transformparams) return f(x) end return transformedf end end These functions need to be in a module, or the bug does not appear. Here's the code to reproduce the bug. It needs to be run in parallel for the bug to appear (e.g., julia -p 2): @everywhere push!(LOAD_PATH, "./") import M function makef() g(x) = x function thisf(p::Vector) println("a") result = g(1) println("b") return 1 end return thisf end function callpmap2(h) pmap(h, fill(zeros(2), 2)) end function callpmap1(h) h(zeros(2)) pmap(h, fill(zeros(2), 2))#all hell breaks loose if we call h before doing the pmap end f = makef() f_trans = M.transformfunction(f, zeros(2)) callpmap2(f_trans)#works callpmap1(f_trans)#bus error (Mac), segmentation fault (Ubuntu) If I run it on Mac OS X 10.10.5 with julia 0.4.0, I get From worker 2: a From worker 2: b From worker 3: a From worker 3: b a b signal (10): Bus error: 10 signal (10): Bus error: 10 _ZL17jl_add_linfo_rootP17_jl_lambda_info_tP11_jl_value_t at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:1704 _ZL9emit_exprP11_jl_value_tP12jl_codectx_tbb at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:3232 _ZL17jl_add_linfo_rootP17_jl_lambda_info_tP11_jl_value_t at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:1704 _ZL11emit_jlcallPN4llvm5ValueES1_PP11_jl_value_tmP12jl_codectx_t at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:2519 _ZL9emit_exprP11_jl_value_tP12jl_codectx_tbb at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:3232 _ZL9emit_callPP11_jl_value_tmP12jl_codectx_tS0_ at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:2679 _ZL11emit_jlcallPN4llvm5ValueES1_PP11_jl_value_tmP12jl_codectx_t at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:2519 _ZL13emit_functionP17_jl_lambda_info_t at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:4802 _ZL9emit_callPP11_jl_value_tmP12jl_codectx_tS0_ at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:2679 _Z19jl_eh_restore_stateP13_jl_handler_t at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1410 _ZL13emit_functionP17_jl_lambda_info_t at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:4802 jl_compile at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:808 jl_trampoline_compile_function at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/builtins.c:1025 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 anonymous at /localpath/M.jl:8 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 anonymous at multi.jl:892 run_work_thunk at multi.jl:645 jlcall_run_work_thunk_21375 at (unknown line) jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 anonymous at multi.jl:892 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/task.c:241 _Z19jl_eh_restore_stateP13_jl_handler_t at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1410 jl_compile at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/codegen.cpp:808 jl_trampoline_compile_function at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/builtins.c:1025 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 anonymous at /localpath/M.jl:8 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 anonymous at multi.jl:892 run_work_thunk at multi.jl:645 jlcall_run_work_thunk_21342 at (unknown line) jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 anonymous at multi.jl:892 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/task.c:241 Worker 3 terminated. ERROR (unhandled task failure): EOFError: read end of file Worker 2 terminated. If I run it on Ubuntu 14.04 with julia 0.4.0, I get From worker 3: a From worker 3: b From worker 2: a From worker 2: b a b signal (11): Segmentation fault signal (11): Segmentation fault unknown function (ip: 0x7f6302a61b78) unknown function (ip: 0x7f6302a81b33) unknown function (ip: 0x7f6302a891b7) unknown function (ip: 0x7f6302a897c2) unknown function (ip: 0x7f6302a830f8) unknown function (ip: 0x7f6302a7524b) unknown function (ip: 0x7f6302a77861) unknown function (ip: 0x7f6302a77a3c) unknown function (ip: 0x7f0e4c9b1b78) jl_trampoline at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) unknown function (ip: 0x7f0e4c9d1b33) jl_apply_generic at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) unknown function (ip: 0x7f0e4c9d91b7) anonymous at /localpath/M.jl:8 unknown function (ip: 0x7f0e4c9d97c2) jl_apply_generic at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) unknown function (ip: 0x7f0e4c9d30f8) jl_f_apply at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) anonymous at multi.jl:892 unknown function (ip: 0x7f0e4c9c524b) run_work_thunk at multi.jl:645 unknown function (ip: 0x7f0e4c9c7861) jlcall_run_work_thunk_21214 at (unknown line) unknown function (ip: 0x7f0e4c9c7a3c) jl_apply_generic at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) jl_trampoline at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) anonymous at multi.jl:892 jl_apply_generic at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) unknown function (ip: 0x7f6302aaa6a1) unknown function (ip: (nil)) anonymous at /localpath/M.jl:8 jl_apply_generic at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) jl_f_apply at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) anonymous at multi.jl:892 run_work_thunk at multi.jl:645 jlcall_run_work_thunk_21214 at (unknown line) jl_apply_generic at /pathtojulia/julia-0ff703b40a/bin/../lib/julia/libjulia.so (unknown line) anonymous at multi.jl:892 unknown function (ip: 0x7f0e4c9fa6a1) unknown function (ip: (nil)) Worker 3 terminated.ArgumentError: stream is closed or unusable @omalled omalled commented on Oct 28, 2015 One more note: This code works on the Mac with julia 0.3.11. I don't have a 0.3.11 binary for Ubuntu left around, so I couldn't try it on Ubuntu. I know the code that this is derived from worked on Ubuntu with 0.3.11 though. @malmaud malmaud added the parallel label on Oct 28, 2015 @vtjnash The Julia Language member vtjnash commented on Oct 28, 2015 i added some typeassertion code to catch it early: https://github.com/JuliaLang/julia/compare/jn/worker_stderr?expand=1 now the backtrace points to the roots array is getting deseralized as an Expr instead of the Vector{Any} that was sent. I suspect an error in the deserialize_cycles code: julia> callpmap1(f_trans)#bus error (Mac), segmentation fault (Ubuntu) a b fatal error on 2: ERROR: TypeError: deserialize: in typeassert, expected Array{Any,1}, got Expr [inlined code] from essentials.jl:58 in deserialize at serialize.jl:557 in handle_deserialize at serialize.jl:477 [inlined code] from essentials.jl:58 ... @JeffBezanson JeffBezanson added regression backport pending 0.4 labels on Oct 29, 2015 @vtjnash vtjnash added a commit that closed this issue on Nov 1, 2015 @vtjnash fix serialization typo 843ab66 @vtjnash vtjnash closed this in 843ab66 on Nov 1, 2015 @vtjnash vtjnash added a commit that referenced this issue on Nov 1, 2015 @vtjnash fix serialization typo ded1380 @tkelman tkelman removed the backport pending 0.4 label on Nov 9, 2015