fickle segmentation fault or bus error when using pmap 13806 Closed omalled opened this Issue on Oct 28, 2015 ยท 2 comments Projects None yet Labels parallel regression Milestone No milestone Assignees No one assigned 5 participants omalled vtjnash malmaud JeffBezanson tkelman Notifications omalled omalled commented on Oct 28, 2015 I ran into some problems with julia crashing during calls to pmap. I put together the smallest example that I could come up with to reproduce the bug, but it isn't all that small. This issue seems pretty fickle. First, a module is required I call it in M-jl in the directory where the test code will be run : module M function untransform y::Vector, transformparams::Vector return y end function transformfunction f::Function, transformparams::Vector function transformedf y::Vector x untransform y, transformparams return f x end return transformedf end end These functions need to be in a module, or the bug does not appear. Here's the code to reproduce the bug. It needs to be run in parallel for the bug to appear e.g., julia -p 2 : everywhere push! LOAD_PATH, . import M function makef g x x function thisf p::Vector println a result g 1 println b return 1 end return thisf end function callpmap2 h pmap h, fill zeros 2 , 2 end function callpmap1 h h zeros 2 pmap h, fill zeros 2 , 2 all hell breaks loose if we call h before doing the pmap end f makef f_trans M.transformfunction f, zeros 2 callpmap2 f_trans works callpmap1 f_trans bus error Mac , segmentation fault Ubuntu If I run it on Mac OS X 10.10.5 with julia 0.4.0, I get From worker 2: a From worker 2: b From worker 3: a From worker 3: b a b signal 10 : Bus error: 10 signal 10 : Bus error: 10 _ZL1-jl_add_linfo_rootP17-jl_lambda_info_tP11-jl_value_t at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:1704 _ZL9emit_exprP11-jl_value_tP1-jl_codectx_tbb at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:3232 _ZL1-jl_add_linfo_rootP17-jl_lambda_info_tP11-jl_value_t at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:1704 _ZL11emit-jlcallPN4llvm5ValueES1_PP11-jl_value_tmP1-jl_codectx_t at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:2519 _ZL9emit_exprP11-jl_value_tP1-jl_codectx_tbb at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:3232 _ZL9emit_callPP11-jl_value_tmP1-jl_codectx_tS0_ at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:2679 _ZL11emit-jlcallPN4llvm5ValueES1_PP11-jl_value_tmP1-jl_codectx_t at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:2519 _ZL13emit_functionP17-jl_lambda_info_t at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:4802 _ZL9emit_callPP11-jl_value_tmP1-jl_codectx_tS0_ at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:2679 _Z1-jl_eh_restore_stateP13-jl_handler_t at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1410 _ZL13emit_functionP17-jl_lambda_info_t at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:4802-jl_compile at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:808-jl_trampoline_compile_function at Users osx buildbot slave package_osx10_9-x64 build src builtins.c:1025-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 anonymous at localpath M-jl:8-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 anonymous at multi-jl:892 run_work_thunk at multi-jl:645-jlcall_run_work_thunk_21375 at unknown line -jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 anonymous at multi-jl:892-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src task.c:241 _Z1-jl_eh_restore_stateP13-jl_handler_t at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1410-jl_compile at Users osx buildbot slave package_osx10_9-x64 build src codegen.cpp:808-jl_trampoline_compile_function at Users osx buildbot slave package_osx10_9-x64 build src builtins.c:1025-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 anonymous at localpath M-jl:8-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 anonymous at multi-jl:892 run_work_thunk at multi-jl:645-jlcall_run_work_thunk_21342 at unknown line -jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 anonymous at multi-jl:892-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src task.c:241 Worker 3 terminated. ERROR unhandled task failure : EOFError: read end of file Worker 2 terminated. If I run it on Ubuntu 14.04 with julia 0.4.0, I get From worker 3: a From worker 3: b From worker 2: a From worker 2: b a b signal 11 : Segmentation fault signal 11 : Segmentation fault unknown function ip: 0x7f6302a61b78 unknown function ip: 0x7f6302a81b33 unknown function ip: 0x7f6302a891b7 unknown function ip: 0x7f6302a897c2 unknown function ip: 0x7f6302a830f8 unknown function ip: 0x7f6302a7524b unknown function ip: 0x7f6302a77861 unknown function ip: 0x7f6302a77a3c unknown function ip: 0x7f0e4c9b1b78 -jl_trampoline at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line unknown function ip: 0x7f0e4c9d1b33 -jl_apply_generic at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line unknown function ip: 0x7f0e4c9d91b7 anonymous at localpath M-jl:8 unknown function ip: 0x7f0e4c9d97c2 -jl_apply_generic at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line unknown function ip: 0x7f0e4c9d30f8 -jl_f_apply at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line anonymous at multi-jl:892 unknown function ip: 0x7f0e4c9c524b run_work_thunk at multi-jl:645 unknown function ip: 0x7f0e4c9c7861 -jlcall_run_work_thunk_21214 at unknown line unknown function ip: 0x7f0e4c9c7a3c -jl_apply_generic at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line -jl_trampoline at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line anonymous at multi-jl:892-jl_apply_generic at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line unknown function ip: 0x7f6302aaa6a1 unknown function ip: nil anonymous at localpath M-jl:8-jl_apply_generic at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line -jl_f_apply at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line anonymous at multi-jl:892 run_work_thunk at multi-jl:645-jlcall_run_work_thunk_21214 at unknown line -jl_apply_generic at pathtojulia julia-0ff703b40a bin .. lib julia libjulia.so unknown line anonymous at multi-jl:892 unknown function ip: 0x7f0e4c9fa6a1 unknown function ip: nil Worker 3 terminated.ArgumentError: stream is closed or unusable omalled omalled commented on Oct 28, 2015 One more note: This code works on the Mac with julia 0.3.11. I don't have a 0.3.11 binary for Ubuntu left around, so I couldn't try it on Ubuntu. I know the code that this is derived from worked on Ubuntu with 0.3.11 though. malmaud malmaud added the parallel label on Oct 28, 2015 vtjnash The Julia Language member vtjnash commented on Oct 28, 2015 i added some typeassertion code to catch it early: https: github.com JuliaLang julia compare jn worker_stderr?expand 1 now the backtrace points to the roots array is getting deseralized as an Expr instead of the Vector Any that was sent. I suspect an error in the deserialize_cycles code: julia callpmap1 f_trans bus error Mac , segmentation fault Ubuntu a b fatal error on 2: ERROR: TypeError: deserialize: in typeassert, expected Array Any,1 , got Expr inlined code from essentials-jl:58 in deserialize at serialize-jl:557 in handle_deserialize at serialize-jl:477 inlined code from essentials-jl:58 ... JeffBezanson JeffBezanson added regression backport pending 0.4 labels on Oct 29, 2015 vtjnash vtjnash added a commit that closed this issue on Nov 1, 2015 vtjnash fix serialization typo 843ab66 vtjnash vtjnash closed this in 843ab66 on Nov 1, 2015 vtjnash vtjnash added a commit that referenced this issue on Nov 1, 2015 vtjnash fix serialization typo ded1380 tkelman tkelman removed the backport pending 0.4 label on Nov 9, 2015