Segmentation fault while fetching for product of sparse qrfact and dense matrix #14134 Closed aueelis opened this Issue on Nov 25, 2015 · 2 comments Projects None yet Labels linear algebra parallel Milestone No milestone Assignees No one assigned 4 participants @aueelis @andreasnoack @ViralBShah @kshyatt Notifications You’re not receiving notifications from this thread. @aueelis aueelis commented on Nov 25, 2015 I was comparing LU- and QR-Factorization and wanted to implement parallelism. I noticed that fetching the QR-factorized matrix fa at the same time as the identity matrix b results in a segmentation fault, while the same works fine with lufact. Additionally, serial code works, too. julia> addprocs(2) 2-element Array{Int64,1}: 2 3 julia> @everywhere n = 5 julia> a = @spawn sprand(n,n,0.99) RemoteRef{Channel{Any}}(2,1,5) julia> b = @spawn eye(n) RemoteRef{Channel{Any}}(3,1,6) julia> fa = @spawn qrfact(fetch(a)) RemoteRef{Channel{Any}}(2,1,7) julia> c = @spawn fetch(fa) \ fetch(b) RemoteRef{Channel{Any}}(3,1,8) Error: signal (11): Segmentation fault size at abstractarray.jl:53 Worker 3 terminated. ERROR (unhandled task failure): EOFError: read end of file in read at stream.jl:911 in message_handler_loop at multi.jl:863 in process_tcp_streams at multi.jl:852 in anonymous at task.jl:63 julia> versioninfo() Julia Version 0.4.1 Commit cbe1bee (2015-11-08 10:33 UTC) Platform Info: System: Linux (x86_64-unknown-linux-gnu) CPU: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz WORD_SIZE: 64 BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Nehalem) LAPACK: libopenblas LIBM: libm LLVM: libLLVM-3.3 @ViralBShah ViralBShah added the parallel label on Nov 25, 2015 @andreasnoack The Julia Language member andreasnoack commented on Nov 25, 2015 I think there are two issues here. The issue that triggers the segfault is that fa is moved from one process to another and this won't work because a sparse QR object is just a pointer to C struct handled by SPQR. When using @spawn, there is an ambiguity about which process to use when executing fetch(fa) \ fetch(b) when fa and b are on two different processes. From the RemoteRefs, you can see that fa is on process 2 and b and c are on process 3. If you create c with julia> c = @spawnat fa.where fetch(fa) \ fetch(b) Future(2,1,10,Nullable{Any}()) julia> fetch(c) 5x5 Array{Float64,2}: 5.74502 -8.21385 -2.35193 1.60267 -1.30705 4.95325 -4.9708 -1.94713 -1.55292 1.59012 -5.9495 7.69385 3.70573 -1.63633 1.22123 -1.39242 1.21979 -0.656823 1.70116 -0.257719 -2.34509 4.46599 0.60584 0.78298 -1.03139 it works. This is tricky to fix because the @spawn macro doesn't know if an object can be moved or not. However, you should have received a normal error instead of a segfault. The question is then why it segfaults instead of giving an error. This also happens on 0.5. @amitmurthy @yuyichao any ideas? @kshyatt kshyatt added the linear algebra label on Nov 25, 2015 @andreasnoack The Julia Language member andreasnoack commented on Nov 25, 2015 I've figured out what is happening here. It's \ on the the SPQR object when the pointer has been zeroed because of the serialization. We'll probably have to check the pointer on entry for all exported functions in SuiteSparse. @andreasnoack andreasnoack added a commit that referenced this issue on Nov 25, 2015 @andreasnoack Check that pointers to SuiteSparse object haven't been zeroed before … ff2a705 This was referenced on Nov 25, 2015 Merged Check that pointers to SuiteSparse objects haven't been zeroed before calling SuiteSparse routines. #14149 Closed Reuse of CHOLMOD.Factor #14155 @andreasnoack andreasnoack added a commit that referenced this issue on Nov 30, 2015 @andreasnoack Check that pointers to SuiteSparse object haven't been zeroed before … dc00b89 @andreasnoack andreasnoack added a commit that referenced this issue on Nov 30, 2015 @andreasnoack Check that pointers to SuiteSparse object haven't been zeroed before … 899552f @andreasnoack andreasnoack closed this in #14149 on Dec 1, 2015 @andreasnoack andreasnoack added a commit that referenced this issue on Jan 9 @andreasnoack Check that pointers to SuiteSparse object haven't been zeroed before …