Segfault when using SharedArray on OS X 14295 Closed jdrugo opened this Issue on Dec 6, 2015 ยท 7 comments Projects None yet Labels mac parallel Milestone No milestone Assignees No one assigned 5 participants jdrugo sbromberger amitmurthy tkelman kshyatt Notifications jdrugo jdrugo commented on Dec 6, 2015 When working on an parallel implementation of a particle filter, Julia started segfault'ing under heavy workload. A minimal example to reproduce this behavior is everywhere begin type A x::SharedArray Float64,1 A N new SharedArray Float64, N end localf x::SharedArray nothing function f a::A map fetch, Any spawnat i localf a.x for i in workers end end a A 1000 for n 1:10 8 f a end This results on my MacBook Pro under OS X El Capitan in Jans-MacBook-Pro: jdrugo$ Applications Julia-0.4.1.app Contents Resources julia bin julia -p 7 . crash_example-jl signal 11 : Segmentation fault: 11 __pool_alloc at Users osx buildbot slave package_osx10_9-x64 build src gc.c:1053 _new_array_ at Users osx buildbot slave package_osx10_9-x64 build src array.c:84 _new_array at Users osx buildbot slave package_osx10_9-x64 build src array.c:333 call at Applications Julia-0.4.1.app Contents Resources julia lib julia sys.dylib unknown line def_rv_channel at multi-jl:619-jlcall_def_rv_channel_21329 at unknown line -jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 lookup_ref at multi-jl:513 remotecall_fetch at multi-jl:727-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 call_on_owner at multi-jl:778 fetch at multi-jl:796-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 map at Applications Julia-0.4.1.app Contents Resources julia lib julia sys.dylib unknown line f at Users jdrugo crash_example-jl:9-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 anonymous at Users jdrugo crash_example-jl:15-jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325-jl_parse_eval_all at Users osx buildbot slave package_osx10_9-x64 build src toplevel.c:577-jl_load at Users osx buildbot slave package_osx10_9-x64 build src toplevel.c:620 include at Applications Julia-0.4.1.app Contents Resources julia lib julia sys.dylib unknown line -jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 include_from_node1 at Applications Julia-0.4.1.app Contents Resources julia lib julia sys.dylib unknown line -jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 process_options at Applications Julia-0.4.1.app Contents Resources julia lib julia sys.dylib unknown line _start at Applications Julia-0.4.1.app Contents Resources julia lib julia sys.dylib unknown line -jlcall__start_18614 at Applications Julia-0.4.1.app Contents Resources julia lib julia sys.dylib unknown line -jl_apply at Users osx buildbot slave package_osx10_9-x64 build src . julia.h:1325 true_main at Applications Julia-0.4.1.app Contents Resources julia bin julia unknown line main at Applications Julia-0.4.1.app Contents Resources julia bin julia unknown line Segmentation fault: 11 Julia version: julia versioninfo Julia Version 0.4.1 Commit cbe1bee 2015-11-08 10:33 UTC Platform Info: System: Darwin x86_64-apple-darwin13.4.0 CPU: Intel R Core TM i7-4980HQ CPU 2.80GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.3 sbromberger sbromberger commented on Dec 6, 2015 I can reproduce this on my system as well: julia versioninfo Julia Version 0.4.2-pre+18 Commit eb31eef 2015-11-26 08:03 UTC Platform Info: System: Darwin x86_64-apple-darwin15.0.0 CPU: Intel R Core TM i5-5287U CPU 2.90GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-svn kshyatt kshyatt added parallel mac labels on Dec 7, 2015 amitmurthy The Julia Language member amitmurthy commented on Dec 17, 2015 On 0.4.2 without any workers : ERROR: UndefRefError: access to undefined reference in ht_keyindex2 at dict-jl:602 in setindex! at dict-jl:643 in schedule_call at multi-jl:660 in remotecall at multi-jl:703 in f at none:8 inlined code from none:2 in anonymous at no file:0 On 0.4.2 with 2 workers : fatal error on 2: ERROR: BoundsError: attempt to access 0-element Array Any,1 at index 2 in notify at Volumes Julia Julia-0.4.2.app Contents Resources julia lib julia sys.dylib in __notify 32__ at Volumes Julia Julia-0.4.2.app Contents Resources julia lib julia sys.dylib in send_add_client at multi-jl:592 in serialize at serialize-jl:185 in serialize at sharedarray-jl:269 in serialize_any at serialize-jl:422 in serialize at serialize-jl:405 in serialize at serialize-jl:127 in serialize at serialize-jl:310 in serialize_any at serialize-jl:422 in send_msg_ at multi-jl:222 in remotecall at multi-jl:710 in f at none:8 inlined code from none:2 in anonymous at no file:0 Works fine on 0.5 with 2 workers. On 0.5 with no workers I see a series of error in running finalizer: UndefRefError error in running finalizer: UndefRefError error in running finalizer: UndefRefError I suspect a memory corruption in the shared memory code. And could possibly be the same issue as 14186 comment amitmurthy The Julia Language member amitmurthy commented on Dec 17, 2015 Reduced case with no workers on 0.5: for n 1:10 8 map fetch, Any spawnat i i for i in workers end No error when run with workers added. Errors out on 0.4 . This was referenced on Dec 19, 2015 Closed UndefRefError with Dicts and finalizers 14445 Merged workaround for dict access issue in a finalizer 14456 Merged workaround for a dict access from a finalizer bug for 0.4 14457 tkelman The Julia Language member tkelman commented on Dec 22, 2015 Is this closed on master by 14456? amitmurthy The Julia Language member amitmurthy commented on Dec 22, 2015 Yes. tkelman tkelman closed this on Dec 22, 2015 amitmurthy The Julia Language member amitmurthy commented on Dec 22, 2015 Why did you close it? The bug reported was for 0.4 and that will be open till 0.4.3 is released. tkelman The Julia Language member tkelman commented on Dec 22, 2015 Generally issues should be closed by fixes getting merged to master, unless it's an issue that was never a problem for master. vtjnash vtjnash added a commit that referenced this issue on May 4 vtjnash don't corrupt the identity of AbstractRemoteRef in their finalizers 8d1970e vtjnash vtjnash added a commit that referenced this issue on Jul 25 vtjnash make WeakKeyDict finalizer usage gc-safe 6478db7 vtjnash vtjnash added a commit that referenced this issue on Jul 26 vtjnash make WeakKeyDict finalizer usage gc-safe 7f1cad4 vtjnash vtjnash added a commit that referenced this issue on Aug 4 vtjnash make WeakKeyDict finalizer usage gc-safe 30a6f8b vtjnash vtjnash added a commit that referenced this issue on Aug 5 vtjnash make WeakKeyDict finalizer usage gc-safe cd8be65 tkelman tkelman added a commit that referenced this issue on Aug 11 vtjnash make WeakKeyDict finalizer usage gc-safe b673748 mfasi mfasi added a commit to mfasi julia that referenced this issue on Sep 5 vtjnash make WeakKeyDict finalizer usage gc-safe