Unexpected behaviour of `SharedArray` in single core usage 10773 Closed nilshg opened this Issue on Apr 8, 2015 ยท 21 comments Projects None yet Labels parallel windows Milestone No milestone Assignees No one assigned 9 participants nilshg simonster timholy tkelman twadleigh ViralBShah mbauman ihnorton amitmurthy Notifications nilshg nilshg commented on Apr 8, 2015 See this discussion in the Julia users group: When running a sync parallel loop which writes its results into different SharedArrays on just one core, some of the returned arrays will contain information of other arrays being assigned to. This does not happen when the code is run on mutiple cores. I'm copying my original example from the users group below; in this example the return array r2 will contain the results of r3, while the three arrays calculated in parallel contain the expected results: x1 linspace 1, 3, 3 x2 linspace 1, 3, 3 x3 linspace 1, 3, 3 function getresults x1::Array, x2::Array, x3::Array result1 SharedArray Float64, 3,3,3 result2 similar result1 result3 similar result1 sync parallel for a 1:3 for b 1:3 for c 1:3 result1 a,b,c x1 a x2 b x3 c result2 a,b,c sqrt x1 a x2 b x3 c result3 a,b,c x1 a x2 b x3 c 2 end end end return sdata result1 , sdata result2 , sdata result3 end Compute function using 1 core r1,r2,r3 getresults x1, x2, x3 Add remaining cores as workers, compute again nprocs CPU_CORES || addprocs CPU_CORES-1 r1_par,r2_par,r3_par getresults x1, x2, x3 nilshg nilshg commented on Apr 8, 2015 Just to add, one could fix the example above by initializing the result arrays as nprocs 1 ? result1 SharedArray Float64, 3,3,3 : result1 Array Float64, 3,3,3 in case this isn't actually a bug but expected behaviour for SharedArray, in that case I would at least vote for mentioning this in the docs, as I for one spent half a day trying to figure out why my results changed so dramatically before realizing I had just forgotten to add workers... simonster The Julia Language member simonster commented on Apr 8, 2015 nilshg I tried with two systems and wasn't able to reproduce this. Can you give the output of versioninfo ? nilshg nilshg commented on Apr 9, 2015 Versioninfo: Julia Version 0.3.7 Commit cb9bcae 2015-03-23 21:36 UTC Platform Info: System: Windows x86_64-w64-mingw32 CPU: Intel R Core TM i7-3770 CPU 3.40GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 On this system, I'm getting the following: sum abs r1-r1_par 0.0 sum abs r2-r2_par 2672.719 sum abs r3-r3_par 0.0 sum abs r2-r3_par 0.0 The problem does not occur on the same machine using Julia Version 0.4.0-dev+4157 though. ihnorton ihnorton added the parallel label on Apr 10, 2015 timholy The Julia Language member timholy commented on Apr 11, 2015 Works for me sum abs r2-r2_par 0 on julia versioninfo Julia Version 0.3.7-pre+1 Commit d15f183 2015-02-17 22:12 UTC Platform Info: System: Linux x86_64-linux-gnu CPU: Intel R Core TM i7 CPU L 640 2.13GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 tkelman The Julia Language member tkelman commented on Apr 11, 2015 I can reproduce the problem with Julia Version 0.3.6-pre+76 Commit 79846f8 2015-02-17 00:52 UTC Platform Info: System: Windows x86_64-w64-mingw32 CPU: Intel R Core TM i7-2630QM CPU 2.00GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 so it's likely a Windows-specific quirk in the SharedArray implementation. I think twadleigh wrote that code? tkelman tkelman added the windows label on Apr 11, 2015 twadleigh twadleigh commented on Apr 14, 2015 I did write the code for the windows implementation. I didn't, however, do any testing beyond what was already in the testbed for the POSIX implementation. tkelman The Julia Language member tkelman commented on Apr 14, 2015 Thanks Tracy. Would be helpful if someone who has a Windows machine and a bit of time can try tracking down the OS API calls that underlie the SharedArray operations and figure out more precisely what causes this. twadleigh twadleigh commented on Apr 14, 2015 I just noticed that nilshg says it is working on 0.4, which makes me scratch my head a bit. tkelman The Julia Language member tkelman commented on Apr 14, 2015 We seem to be getting more and more fixed on master but don't know by what bugs. Unless we can find some obviously related bugfix that would be simple to backport, trying to bisect this on Windows could be a lot of work and might point to some major restructuring of internals that can't be backported. nilshg nilshg commented on Apr 14, 2015 Apologies, I might have been a little quick in saying that it works on 0.4; just went back to double check and now I'm getting the same wrong results as on 0.3.7. Maybe others who are running both versions could quickly verify this? twadleigh twadleigh commented on Apr 14, 2015 I think I just found the bug, and it is probably only windows-specific by accident. Check out: https: github.com JuliaLang julia blob d534b0029fc06cfc230e4ad0d1a7818295c441ad base sharedarray-jl L52 The shared segment name is generated, in part, using system time. If you create shared arrays in succession too quickly as in this example , you will get non-unique segment names. Is the time returned from time lower res on windows? If so, that could be why the problem is only noticeable there. Anyway, the fix should be simple. twadleigh twadleigh commented on Apr 14, 2015 Another reason why this may work on POSIX vs. Windows: there is no analog of shm_unlink for windows. It is a no-op there. Still, the fix is to uniquify the segment name. tkelman The Julia Language member tkelman commented on Apr 14, 2015 Good catch! I would not be at all surprised if time were lower-resolution on Windows. timholy The Julia Language member timholy commented on Apr 14, 2015 That's indeed really good debugging, twadleigh. What about using tempname? tkelman The Julia Language member tkelman commented on Apr 14, 2015 There are some still-unresolved platform discrepancies regarding tempname - 9053 ViralBShah The Julia Language member ViralBShah commented on Apr 14, 2015 Cc amitmurthy twadleigh twadleigh commented on Apr 14, 2015 Would pid plus a sufficiently long randstring be sufficiently safe? Or maybe pid plus a munged stringification of a gensym? mbauman The Julia Language member mbauman commented on Apr 14, 2015 Maybe try time_ns instead of time ? That uses a different Clanguage call that should have higher precision. ihnorton The Julia Language member ihnorton commented on Apr 16, 2015 Rather than time, this could be done with Base.random.uuid4. Or on Windows there is also CoCreateGuid I don't know how the strength compares . twadleigh twadleigh commented on Apr 18, 2015 I'm going to put together a PR with a name made from some digits of the pid, some digits of time, and padded with randstring characters. twadleigh twadleigh added a commit to twadleigh julia that referenced this issue on Apr 18, 2015 twadleigh Randomize segment name generated for `SharedArray`. e225a1c twadleigh twadleigh referenced this issue on Apr 18, 2015 Merged Randomize segment name generated for `SharedArray`. 10877 twadleigh twadleigh commented on Apr 18, 2015 Went with 6 digits of pid with a long randstring. twadleigh twadleigh added a commit to twadleigh julia that referenced this issue on Apr 19, 2015 twadleigh Randomize segment name generated for `SharedArray`. 3dbc6cc amitmurthy amitmurthy closed this on Apr 19, 2015 mbauman mbauman added a commit to mbauman julia that referenced this issue on Jun 6, 2015 twadleigh Randomize segment name generated for `SharedArray`.