Groups 61 of 99+ julia-users › SharedArray fails to gc when called within a sequence of functions? 0.5.0-rc3 3 posts by 3 authors Rafael Menegassi Aug 31 Dear all Quite new in julia so sorry if made something wrong; Reduced the case to simplest possible; Using SharedArray within a sequence of functions: addprocs 4 function chisq n::Integer A SharedArray Float64, n sync parallel for i in 1:n A i rand -rand 2 end sumsq sum A end function calculate n::Integer b 0.0 for j in 1:n b+ chisq n end return b end chisq 500 2 ok no failure calculate 500 fails Calculating the same number of evaluations 500 x 500 it does not fail while it crashes before the same function is called 500 times And the failure is: ERROR: SystemError: shm_open failed for -jl005889eze42OrPYHS9RKjHZihQ: Too many open files in uv_error at . libuv-jl:68 inlined in _link_pipe ::Ptr Void , ::Ptr Void at . stream-jl:596 in link_pipe ::Base.PipeEndpoint, ::Bool, ::Base.PipeEndpoint, ::Bool at . stream-jl:652 in setup_stdio ::Pipe, ::Bool at . process-jl:419 in setup_stdio ::Base. 412 413 Cmd,Ptr Void ,Base.Process , ::Tuple Base.DevNullStream,Pipe,Base.TTY at . process-jl:464 in spawn 411 ::Nullable Base.ProcessChain , ::Function, ::Cmd, ::Tuple Base.DevNullStream,Pipe,Base.TTY , ::Bool, ::Bool at . process-jl:477 in ::Base. kw spawn ::Array Any,1 , ::Base. spawn, ::Cmd, ::Tuple Base.DevNullStream,Pipe,Base.TTY , ::Bool, ::Bool at . missing :0 in open ::Cmd, ::String, ::Base.DevNullStream at . process-jl:539 in read ::Cmd, ::Base.DevNullStream at . process-jl:574 in readstring at . process-jl:581 inlined repeats 2 times in print_shmem_limits ::Int64 at . sharedarray-jl:488 in shm_mmap_array ::Type T , ::Tuple Int64 , ::String, ::UInt16 at . sharedarray-jl:515 in SharedArray 786 ::Bool, ::Array Int64,1 , ::Type T , ::Type Float64 , ::Tuple Int64 at . sharedarray-jl:70 in SharedArray T,N ::Type Float64 , ::Tuple Int64 at . sharedarray-jl:57 in SharedArray 793 ::Array Any,1 , ::Type T , ::Type T , ::Int64, ::Vararg Int64,N at . sharedarray-jl:113 in chisq ::Int64 at . REPL 2 :2 in calculate ::Int64 at . REPL 3 :4 It also happens at 0.4.6, albeit a little different error: ERROR: On worker 3: SystemError: shm_open failed for -jl006428a6fpOftDBFr087xQnY6F: Too many open files in remotecall_fetch at multi-jl:747 in remotecall_fetch at multi-jl:750 in call_on_owner at multi-jl:793 in wait at multi-jl:808 in __SharedArray 138__ at sharedarray-jl:74 in SharedArray at sharedarray-jl:117 in chisq at none:2 in calculate at none:4 In fact, even without the sync parallel in the for o function chisq it still crashes; it crashes even without addprocs if everywhere gc called in the second function at each function calling , it doesn't crash but long gc time . Is garbage collection not recognizing function creating SharedArrays being called many times and hitting system's limit of open files? This might be a common case, for example, when adjusting parameters by optimization of a chisquare function - and each simulation being done in parallel, whereas optimization method calling chisquare many times... Or I made something wrong? Best regards Rafael p.s.: could reproduce also in juliabox 0.5.0-dev below and 0.4.6, but not in a julia 0.4.5 32 bits system: In 4 : calculate 500 LoadError: On worker 2: SystemError: shm_open failed for -jl000034opVp2HcAjt3ix2bbeW5A: Too many open files in -jl_spawn at . process-jl:321 in 293 at . process-jl:474 inlined in setup_stdio at . process-jl:462 in spawn 292 at . process-jl:473 in spawn at . missing :0 in ip:0x7f5f467573de at opt julia-0.5.0-dev lib julia sys.so:? repeats 2 times in readstring at . process-jl:577 inlined repeats 2 times in print_shmem_limits at . sharedarray-jl:488 in shm_mmap_array at . sharedarray-jl:515 in 657 at . sharedarray-jl:80 in 494 at . multi-jl:1189 in run_work_thunk at . multi-jl:844 in run_work_thunk at . multi-jl:853 inlined in 474 at . task-jl:54 while loading In 4 , in expression starting on line 1 in remotecall_fetch 482 ::Array Any,1 , ::Function, ::Function, ::Base.Worker, ::Base.RRID, ::Vararg Any,N at . multi-jl:904 in remotecall_fetch ::Function, ::Base.Worker, ::Base.RRID, ::Vararg Any,N at . multi-jl:898 in remotecall_fetch 483 ::Array Any,1 , ::Function, ::Function, ::Int64, ::Base.RRID, ::Vararg Any,N at . multi-jl:907 in remotecall_fetch ::Function, ::Int64, ::Base.RRID, ::Vararg Any,N at . multi-jl:907 in call_on_owner ::Function, ::Future, ::Int64, ::Vararg Int64,N at . multi-jl:950 in wait ::Future at . multi-jl:965 in SharedArray 654 ::Bool, ::Array Int64,1 , ::Type T , ::Type Float64 , ::Tuple Int64 at . sharedarray-jl:89 in SharedArray T,N ::Type Float64 , ::Tuple Int64 at . sharedarray-jl:57 in SharedArray 661 ::Array Any,1 , ::Type T , ::Type T , ::Int64, ::Vararg Int64,N at . sharedarray-jl:113 in chisq ::Int64 at . In 2 :4 in calculate ::Int64 at . In 2 :14 in execute_request ::ZMQ.Socket, ::IJulia.Msg at opt julia_packages .julia v0.5 IJulia src execute_request-jl:164 in eventloop ::ZMQ.Socket at opt julia_packages .julia v0.5 IJulia src IJulia-jl:138 in ::IJulia. 25 31 at . task-jl:309 ERROR unhandled task failure : EOFError: read end of file Eduardo Lenz Aug 31 It may be related to 15467 jean-pierre both Sep 2 It seems to me that your code is correct BUT: allocating a SharedArray is a bit expensive, and should be done once. The follwowing modifications runs OK function chisq A::SharedArray Float64 n length A sync parallel for i in 1:n A i rand -rand 2 end sumsq sum A end function calculate n::Integer A SharedArray Float64, n b 0.0 for j in 1:n b+ chisq A end return b end chisq 500 2 ok no failure calculate 500 fails Overview Discussion Chronological view Tree view Paged view Collapse all Link to this topic Email updates to me Report abuse