Slowdown of a function when executed on worker 6686 Closed carlobaldassi opened this Issue on Apr 29, 2014 ยท 2 comments Projects None yet Labels parallel performance Milestone No milestone Assignees No one assigned 2 participants carlobaldassi malmaud Notifications carlobaldassi The Julia Language member carlobaldassi commented on Apr 29, 2014 I have a case in which I observe a slowdown of a function when it's executed on a worker rather than in the main process. What I mean is: start with julia -p 1, then write a function like: function inner args... here, `args` do not contain distributed arrays, remote refs or else, just plain Vectors and composite types time begin code end end then call it either from process 1 or 2: time remotecall_fetch 1, inner, args... time remotecall_fetch 2, inner, args... Expected result: the outer timings are different because of data movement, but the inner timings are about the same. Actual result: the inner timings are different, and can get 2x-4x slower on the worker. I have a reduced test case with timing results here, this is an excerpt showing the issue: inner on local process ---------------------- elapsed time: 0.589470459 seconds 0 bytes allocated elapsed time: 0.589635775 seconds 424 bytes allocated inner on worker --------------- From worker 2: elapsed time: 1.615534087 seconds 0 bytes allocated elapsed time: 2.12067934 seconds 38298756 bytes allocated The code in the gist also shows the only solution I found so far to recover the lost performance, which was to manually open up the BitArrays and fetch their chunks before the tight-loop computations inside the inner function. The effect may be related to the depth of the nesting in the arguments I'm passing one of which is a Vector of composite types, each holding a Vector BitVector . Is it possible that serialization only goes down to a certain depth? When profiling the guts of inner, I also saw some weird calls to functions in multi-jl occurring when the function was executed by the worker. carlobaldassi carlobaldassi added performance parallel labels on Apr 29, 2014 carlobaldassi The Julia Language member carlobaldassi commented on Apr 29, 2014 One additional detail I forgot to mention about the different versions of the inner functions in the gist I linked above: one may suspect that the performance on the remote worker can be recovered just by manually inlining the code for the dot function, but that's not the case: I really need to fetch all BitVector chunks beforehand outside of the outermost loop . Also, versioninfo: Julia Version 0.3.0-prerelease+2816 Commit d3650a2 2014-04-29 09:10 UTC Platform Info: System: Linux x86_64-linux-gnu CPU: Intel R Core TM i5 CPU M 430 2.27GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY LAPACK: libopenblas LIBM: libopenlibm malmaud malmaud commented on Oct 14, 2015 I can't reproduce this on master b6c0d95 - the worker is reporting identical timing to the master. I did decorate inner and type T with everywhere declarations, which your gist doesn't have. At least with the current way remotecall_fetch works, those declarations are necessary - not sure if that's what's making difference. Anyways, please reopen if this is still an issue. Here is the revised benchmark I tried: addprocs 1 everywhere type W J::Vector BitVector W N::Int, K::Int new bitrand N for k 1:K end function inner ws::Vector W , X::Vector BitVector r 0 time begin for x in X, w in ws J w.J s 0 for j in J s + dot j,x 120 end r+ s 50 end end return r end everywhere function inner_evenslower ws::Vector W , X::Vector BitVector r 0 time begin Js w.J for w in ws for x in X, J in Js s 0 for j in J s + dot j, x 120 end r + s 50 end end return r end function outer ws::Vector W , X::Vector BitVector remotecall_fetch inner_evenslower,1,ws,X remotecall_fetch inner_evenslower,2,ws,X remotecall_fetch inner_evenslower,1,ws,X remotecall_fetch inner_evenslower,2,ws,X end function test srand 1 N 500 K1 100 K2 200 M 1_000 ws W N,K1 for i 1:K2 X bitrand N for i 1:M outer ws,X end test malmaud malmaud closed this on Oct 14, 2015