Unexpected behaviour of `SharedArray` in single core usage #10773 Closed nilshg opened this Issue on Apr 8, 2015 · 21 comments Projects None yet Labels parallel windows Milestone No milestone Assignees No one assigned 9 participants @nilshg @simonster @timholy @tkelman @twadleigh @ViralBShah @mbauman @ihnorton @amitmurthy Notifications You’re not receiving notifications from this thread. @nilshg nilshg commented on Apr 8, 2015 See this discussion in the Julia users group: When running a @sync @parallel loop which writes its results into different SharedArrays on just one core, some of the returned arrays will contain information of other arrays being assigned to. This does not happen when the code is run on mutiple cores. I'm copying my original example from the users group below; in this example the return array r2 will contain the results of r3, while the three arrays calculated in parallel contain the expected results: x1 = linspace(1, 3, 3) x2 = linspace(1, 3, 3) x3 = linspace(1, 3, 3) function getresults(x1::Array, x2::Array, x3::Array) result1 = SharedArray(Float64, (3,3,3)) result2 = similar(result1) result3 = similar(result1) @sync @parallel for a=1:3 for b=1:3 for c=1:3 result1[a,b,c] = x1[a]*x2[b]*x3[c] result2[a,b,c] = sqrt(x1[a]*x2[b]*x3[c]) result3[a,b,c] = (x1[a]*x2[b]*x3[c])^2 end end end return sdata(result1), sdata(result2), sdata(result3) end # Compute function using 1 core (r1,r2,r3) = getresults(x1, x2, x3) # Add remaining cores as workers, compute again nprocs()==CPU_CORES || addprocs(CPU_CORES-1) (r1_par,r2_par,r3_par) = getresults(x1, x2, x3) @nilshg nilshg commented on Apr 8, 2015 Just to add, one could "fix" the example above by initializing the result arrays as nprocs() > 1 ? result1 = SharedArray(Float64, (3,3,3)) : result1 = Array(Float64, (3,3,3)) in case this isn't actually a bug but expected behaviour for SharedArray, in that case I would at least vote for mentioning this in the docs, as I for one spent half a day trying to figure out why my results changed so dramatically before realizing I had just forgotten to add workers... @simonster The Julia Language member simonster commented on Apr 8, 2015 @nilshg I tried with two systems and wasn't able to reproduce this. Can you give the output of versioninfo()? @nilshg nilshg commented on Apr 9, 2015 Versioninfo: Julia Version 0.3.7 Commit cb9bcae* (2015-03-23 21:36 UTC) Platform Info: System: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge) LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 On this system, I'm getting the following: sum(abs(r1-r1_par)) # 0.0 sum(abs(r2-r2_par)) # 2672.719 sum(abs(r3-r3_par)) # 0.0 sum(abs(r2-r3_par)) # 0.0 The problem does not occur on the same machine using Julia Version 0.4.0-dev+4157 though. @ihnorton ihnorton added the parallel label on Apr 10, 2015 @timholy The Julia Language member timholy commented on Apr 11, 2015 Works for me (sum(abs(r2-r2_par)) == 0) on julia> versioninfo() Julia Version 0.3.7-pre+1 Commit d15f183* (2015-02-17 22:12 UTC) Platform Info: System: Linux (x86_64-linux-gnu) CPU: Intel(R) Core(TM) i7 CPU L 640 @ 2.13GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem) LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 @tkelman The Julia Language member tkelman commented on Apr 11, 2015 I can reproduce the problem with Julia Version 0.3.6-pre+76 Commit 79846f8 (2015-02-17 00:52 UTC) Platform Info: System: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge) LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 so it's likely a Windows-specific quirk in the SharedArray implementation. I think @twadleigh wrote that code? @tkelman tkelman added the windows label on Apr 11, 2015 @twadleigh twadleigh commented on Apr 14, 2015 I did write the code for the windows implementation. I didn't, however, do any testing beyond what was already in the testbed for the POSIX implementation. @tkelman The Julia Language member tkelman commented on Apr 14, 2015 Thanks Tracy. Would be helpful if someone who has a Windows machine and a bit of time can try tracking down the OS API calls that underlie the SharedArray operations and figure out more precisely what causes this. @twadleigh twadleigh commented on Apr 14, 2015 I just noticed that @nilshg says it is working on 0.4, which makes me scratch my head a bit. @tkelman The Julia Language member tkelman commented on Apr 14, 2015 We seem to be getting more and more "fixed on master but don't know by what" bugs. Unless we can find some obviously related bugfix that would be simple to backport, trying to bisect this on Windows could be a lot of work and might point to some major restructuring of internals that can't be backported. @nilshg nilshg commented on Apr 14, 2015 Apologies, I might have been a little quick in saying that it works on 0.4; just went back to double check and now I'm getting the same (wrong) results as on 0.3.7. Maybe others who are running both versions could quickly verify this? @twadleigh twadleigh commented on Apr 14, 2015 I think I just found the bug, and it is probably only windows-specific by accident. Check out: https://github.com/JuliaLang/julia/blob/d534b0029fc06cfc230e4ad0d1a7818295c441ad/base/sharedarray.jl#L52 The shared segment name is generated, in part, using system time. If you create shared arrays in succession too quickly (as in this example), you will get non-unique segment names. Is the time returned from time() lower res on windows? If so, that could be why the problem is only noticeable there. Anyway, the fix should be simple. @twadleigh twadleigh commented on Apr 14, 2015 Another reason why this may work on POSIX vs. Windows: there is no analog of shm_unlink for windows. It is a no-op there. Still, the fix is to uniquify the segment name. @tkelman The Julia Language member tkelman commented on Apr 14, 2015 Good catch! I would not be at all surprised if time() were lower-resolution on Windows. @timholy The Julia Language member timholy commented on Apr 14, 2015 That's indeed really good debugging, @twadleigh. What about using tempname? @tkelman The Julia Language member tkelman commented on Apr 14, 2015 There are some still-unresolved platform discrepancies regarding tempname - #9053 @ViralBShah The Julia Language member ViralBShah commented on Apr 14, 2015 Cc @amitmurthy @twadleigh twadleigh commented on Apr 14, 2015 Would pid plus a sufficiently long randstring be sufficiently safe? Or maybe pid plus a munged stringification of a gensym? @mbauman The Julia Language member mbauman commented on Apr 14, 2015 Maybe try time_ns() instead of time()? That uses a different C call that should have higher precision. @ihnorton The Julia Language member ihnorton commented on Apr 16, 2015 Rather than time, this could be done with Base.random.uuid4. Or on Windows there is also CoCreateGuid (I don't know how the strength compares). @twadleigh twadleigh commented on Apr 18, 2015 I'm going to put together a PR with a name made from some digits of the pid, some digits of time, and padded with randstring characters. @twadleigh twadleigh added a commit to twadleigh/julia that referenced this issue on Apr 18, 2015 @twadleigh Randomize segment name generated for `SharedArray`. e225a1c @twadleigh twadleigh referenced this issue on Apr 18, 2015 Merged Randomize segment name generated for `SharedArray`. #10877 @twadleigh twadleigh commented on Apr 18, 2015 Went with 6 digits of pid with a long randstring. @twadleigh twadleigh added a commit to twadleigh/julia that referenced this issue on Apr 19, 2015 @twadleigh Randomize segment name generated for `SharedArray`. 3dbc6cc @amitmurthy amitmurthy closed this on Apr 19, 2015 @mbauman mbauman added a commit to mbauman/julia that referenced this issue on Jun 6, 2015 @twadleigh Randomize segment name generated for `SharedArray`.