Segfault when using SharedArray on OS X #14295 Closed jdrugo opened this Issue on Dec 6, 2015 · 7 comments Projects None yet Labels mac parallel Milestone No milestone Assignees No one assigned 5 participants @jdrugo @sbromberger @amitmurthy @tkelman @kshyatt Notifications You’re not receiving notifications from this thread. @jdrugo jdrugo commented on Dec 6, 2015 When working on an parallel implementation of a particle filter, Julia started segfault'ing under heavy workload. A minimal example to reproduce this behavior is @everywhere begin type A x::SharedArray{Float64,1} A(N) = new(SharedArray(Float64, N)) end localf(x::SharedArray) = nothing function f(a::A) map(fetch, Any[(@spawnat i localf(a.x)) for i in workers()]) end end a = A(1000) for n = 1:10^8 f(a) end This results on my MacBook Pro under OS X El Capitan in Jans-MacBook-Pro:~ jdrugo$ /Applications/Julia-0.4.1.app/Contents/Resources/julia/bin/julia -p 7 ./crash_example.jl signal (11): Segmentation fault: 11 __pool_alloc at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/gc.c:1053 _new_array_ at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/array.c:84 _new_array at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/array.c:333 call at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line) def_rv_channel at multi.jl:619 jlcall_def_rv_channel_21329 at (unknown line) jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 lookup_ref at multi.jl:513 remotecall_fetch at multi.jl:727 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 call_on_owner at multi.jl:778 fetch at multi.jl:796 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 map at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line) f at /Users/jdrugo/crash_example.jl:9 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 anonymous at /Users/jdrugo/crash_example.jl:15 jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 jl_parse_eval_all at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/toplevel.c:577 jl_load at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/toplevel.c:620 include at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line) jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 include_from_node1 at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line) jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 process_options at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line) _start at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line) jlcall__start_18614 at /Applications/Julia-0.4.1.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line) jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1325 true_main at /Applications/Julia-0.4.1.app/Contents/Resources/julia/bin/julia (unknown line) main at /Applications/Julia-0.4.1.app/Contents/Resources/julia/bin/julia (unknown line) Segmentation fault: 11 Julia version: julia> versioninfo() Julia Version 0.4.1 Commit cbe1bee* (2015-11-08 10:33 UTC) Platform Info: System: Darwin (x86_64-apple-darwin13.4.0) CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.3 @sbromberger sbromberger commented on Dec 6, 2015 I can reproduce this on my system as well: julia> versioninfo() Julia Version 0.4.2-pre+18 Commit eb31eef (2015-11-26 08:03 UTC) Platform Info: System: Darwin (x86_64-apple-darwin15.0.0) CPU: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-svn @kshyatt kshyatt added parallel mac labels on Dec 7, 2015 @amitmurthy The Julia Language member amitmurthy commented on Dec 17, 2015 On 0.4.2 without any workers : ERROR: UndefRefError: access to undefined reference in ht_keyindex2 at dict.jl:602 in setindex! at dict.jl:643 in schedule_call at multi.jl:660 in remotecall at multi.jl:703 in f at none:8 [inlined code] from none:2 in anonymous at no file:0 On 0.4.2 with 2 workers : fatal error on 2: ERROR: BoundsError: attempt to access 0-element Array{Any,1} at index [2] in notify at /Volumes/Julia/Julia-0.4.2.app/Contents/Resources/julia/lib/julia/sys.dylib in __notify#32__ at /Volumes/Julia/Julia-0.4.2.app/Contents/Resources/julia/lib/julia/sys.dylib in send_add_client at multi.jl:592 in serialize at serialize.jl:185 in serialize at sharedarray.jl:269 in serialize_any at serialize.jl:422 in serialize at serialize.jl:405 in serialize at serialize.jl:127 in serialize at serialize.jl:310 in serialize_any at serialize.jl:422 in send_msg_ at multi.jl:222 in remotecall at multi.jl:710 in f at none:8 [inlined code] from none:2 in anonymous at no file:0 Works fine on 0.5 with 2 workers. On 0.5 with no workers I see a series of error in running finalizer: UndefRefError() error in running finalizer: UndefRefError() error in running finalizer: UndefRefError() I suspect a memory corruption in the shared memory code. And could possibly be the same issue as #14186 (comment) @amitmurthy The Julia Language member amitmurthy commented on Dec 17, 2015 Reduced case with no workers on 0.5: for n = 1:10^8 map(fetch, Any[(@spawnat i i) for i in workers()]) end No error when run with workers added. Errors out on 0.4 . This was referenced on Dec 19, 2015 Closed UndefRefError with Dicts and finalizers #14445 Merged workaround for dict access issue in a finalizer #14456 Merged workaround for a dict access from a finalizer bug for 0.4 #14457 @tkelman The Julia Language member tkelman commented on Dec 22, 2015 Is this closed on master by #14456? @amitmurthy The Julia Language member amitmurthy commented on Dec 22, 2015 Yes. @tkelman tkelman closed this on Dec 22, 2015 @amitmurthy The Julia Language member amitmurthy commented on Dec 22, 2015 Why did you close it? The bug reported was for 0.4 and that will be open till 0.4.3 is released. @tkelman The Julia Language member tkelman commented on Dec 22, 2015 Generally issues should be closed by fixes getting merged to master, unless it's an issue that was never a problem for master. @vtjnash vtjnash added a commit that referenced this issue on May 4 @vtjnash don't corrupt the identity of AbstractRemoteRef in their finalizers 8d1970e @vtjnash vtjnash added a commit that referenced this issue on Jul 25 @vtjnash make WeakKeyDict finalizer usage gc-safe 6478db7 @vtjnash vtjnash added a commit that referenced this issue on Jul 26 @vtjnash make WeakKeyDict finalizer usage gc-safe 7f1cad4 @vtjnash vtjnash added a commit that referenced this issue on Aug 4 @vtjnash make WeakKeyDict finalizer usage gc-safe 30a6f8b @vtjnash vtjnash added a commit that referenced this issue on Aug 5 @vtjnash make WeakKeyDict finalizer usage gc-safe cd8be65 @tkelman tkelman added a commit that referenced this issue on Aug 11 @vtjnash make WeakKeyDict finalizer usage gc-safe b673748 @mfasi mfasi added a commit to mfasi/julia that referenced this issue on Sep 5 @vtjnash make WeakKeyDict finalizer usage gc-safe