Groups 138 of 99+ julia-users › why's my julia code running slower than matlab, despite performance tips 26 posts by 9 authors feza May 8 I have read the performance section and believe I have followed all the suggested guidelines The same matlab script takes less than 3 seconds, julia 0.45 9.7 seconds (julia 0.5 is even worse...) feza May 8 https://gist.github.com/musmo/27436a340b41c01d51d557a655276783 - show quoted text - michae...@gmail.com May 8 I see that c is a constant array of Ints, and its elements multiply ux, uy and uz in a loop, where ux, uy and uz are arrays of floats, so there's a type stability problem. - show quoted text - feza May 8 Good catch altough this still doesn't explain away the difference @code_warntype shows me feq, f, \rho, ux, uy, uz are red for some reason eventhough I have explictly stated their types... - show quoted text - STAR0SS May 8 You are using a lot of vectorized operations and Julia isn't as good as matlab is with those. The usual solution is to devectorized your code and to use loops (except for matrix multiplication if you have large matrices). Patrick Kofod Mogensen May 8 As for the v0.5 performance (which is horrible), I think it's the boxing issue with closure https://github.com/JuliaLang/julia/issues/15276 . Right? On Sunday, May 8, 2016 at 10:29:59 AM UTC+2, STAR0SS wrote: You are using a lot of vectorized operations and Julia isn't as good as matlab is with those. The usual solution is to devectorized your code and to use loops (except for matrix multiplication if you have large matrices). Patrick Kofod Mogensen May 8 For what it's worth it run in about 3-4 seconds on my computer on latest v0.4. CPU : Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz - show quoted text - feza May 8 That's no surprise your CPU is better :) Regarding devectorization for l in 1:q for k in 1:nz for j in 1:ny for i in 1:nx u = ux[i,j,k] v = uy[i,j,k] w = uz[i,j,k] cu = c[k,1]*u + c[k,2]*v + c[k,3]*w u2 = u*u + v*v + w*w feq[i,j,k,l] = weights[k]*ρ[i,j,k]*(1 + 3*cu + 9/2*(cu*cu) - 3/2*u2) f[i,j,k,l] = f[i,j,k,l]*(1-ω) + ω*feq[i,j,k,l] end end end end Actually makes the code a lot slower.... - show quoted text - Tim Holy May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips One of the really cool features of julia is that functions are allowed to have more than 0 arguments. It's even considered good style, and I highly recommend making use of this awesome feature in your code! :-) In other words: try passing all variables as arguments to the functions. Even though you're wrapping everything in a function, performance-wise you're running up against an inference problem (https://github.com/JuliaLang/julia/issues/15276). In terms of coding style, you're still essentially using global variables. Honestly, these make your life harder in the end (http://c2.com/cgi/wiki?GlobalVariablesAreBad)---it's not a bad thing that julia provides gentle encouragement to avoid using them, and you're losing out on opportunities by trying to sidestep that encouragement. Best, --Tim - show quoted text - feza May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips Thanks for the tip (initially I just transllated the matlab verbatim) Now I have made all the changes. In place operations, and direct function calls. Despite these changes. Matlab is 3.6 seconds, new Julia 7.6 seconds TBH the results of this experiment are frustrating, I was hoping Julia was going to provide a huge speedup (on the level of c) Am I still missing anything in the Julia code that is crucial to speed? @code_warntype looks ok sans a few red unions which i don't think are in my control - show quoted text - feza May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips Milan Script is here: https://gist.github.com/musmo/27436a340b41c01d51d557a655276783 - show quoted text - This message has been deleted. STAR0SS May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips Try changing the order of your loops: for i in 1:nx, j in 1:ny, k in 1:nz -> @inbounds for k in 1:nz, j in 1:ny, i in 1:nx (@inbounds disable bounds checking for arrays, it usually makes a small improvement). This message has been deleted. feza May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips Wow thank you guys I totally thought for i in 1:nx, j in 1:ny, k in 1:nz ran the i index first and then j and then k !!!!! This has been a great learning experience. Much appreciated, now the julia code is about twice as fast! On Sunday, May 8, 2016 at 1:12:30 PM UTC-4, Tk wrote: Also try: julia -O --check-bounds=no yourcode.jl - show quoted text - David Gold May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips So, the issue here was the indexing clashing up against the column-major storage of multi-dimensional arrays? On Sunday, May 8, 2016 at 10:10:54 AM UTC-7, Tk wrote: Could you try replacing for i in 1:nx, j in 1:ny, k in 1:nz to for k in 1:nz, j in 1:ny, i in 1:nx because your arrays are defined like a[i,j,k]? Another question is, how many cores is your Matlab code using? - show quoted text - feza May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips Well first problem was that the vectorized version of my code was very slow. Then I devectorized still slow, because of the index clashing with the column-major storage I assumed for i =1:10,j=1:10,k=1:10 does the index i first then j then k wrongly... - show quoted text - feza May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips With all that done, the julia code runs about the same if not better than matlab (using 4 threads) - show quoted text - Patrick Kofod Mogensen May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips out of curiosity, what about v0.5? feza May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips roughly the same speed. On Sunday, May 8, 2016 at 2:44:19 PM UTC-4, Patrick Kofod Mogensen wrote: out of curiosity, what about v0.5? Patrick Kofod Mogensen May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips Same as v0.4, or same as before you changed the code? - show quoted text - feza May 8 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips I mean the revised script runs just as fast if not a tad faster with the latest master as it does on 0.4.5 : ) - show quoted text - Christian Peel May 9 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips > The usual solution is to devectorized your code and to use loops (except for matrix multiplication if you have large matrices). I am hopeful that ParallelAccelerator.jl [1][2] or similar projects can enable fast vectorized Julia code [1] https://github.com/IntelLabs/ParallelAccelerator.jl [2] http://julialang.org/blog/2016/03/parallelaccelerator - show quoted text - -- chris...@ieee.org Ford O. May 9 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips Other recipients: chris...@ieee.org I have checked the link and read the article. Am I right that the parallel accelerator basically uses C code instead of julia to do the computation? That would be kinda shame dont you think? Dne pondělí 9. května 2016 7:00:38 UTC+2 Christian Peel napsal(a): - show quoted text - Yichao Yu May 9 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips Other recipients: chris...@ieee.org On Mon, May 9, 2016 at 1:15 AM, Ford Ox wrote: > I have checked the link and read the article. Am I right that the parallel > accelerator basically uses C code instead of julia to do the computation? > That would be kinda shame dont you think? No I don't think so. IIUC it uses C for the threading API, it even has a backend using the julia threading API. (And the julia threading API is very incomplete and experimental). And in general this is not so different from julia generating LLVM IR (especially since LLVM has a C backend). Generating C is just usually not the as efficient as generating LLVM IR since you'll have parser overhead, much less flexible and expressive, unless, as in this case, the function/API is in C. - show quoted text - Yichao Yu May 9 Re: [julia-users] Re: why's my julia code running slower than matlab, despite performance tips Other recipients: chris...@ieee.org On Mon, May 9, 2016 at 2:04 AM, Yichao Yu wrote: > On Mon, May 9, 2016 at 1:15 AM, Ford Ox wrote: >> I have checked the link and read the article. Am I right that the parallel >> accelerator basically uses C code instead of julia to do the computation? >> That would be kinda shame dont you think? > > No I don't think so. > > IIUC it uses C for the threading API, it even has a backend using the > julia threading API. (And the julia threading API is very incomplete > and experimental). > And in general this is not so different from julia generating LLVM IR > (especially since LLVM has a C backend). Generating C is just usually > not the as efficient as generating LLVM IR since you'll have parser > overhead, much less flexible and expressive, unless, as in this case, > the function/API is in C. Or in another word, it is at most a shame for LLVM IR for not having a threading construct (which, admittedly, is a very hard problem but people are working on it). > >> >> Dne pondělí 9. května 2016 7:00:38 UTC+2 Christian Peel napsal(a): >>> >>> > The usual solution is to devectorized your code and to use loops (except >>> > for matrix multiplication if you have large matrices). >>> >>> I am hopeful that ParallelAccelerator.jl [1][2] or similar projects can >>> enable fast vectorized Julia code >>> >>> [1] https://github.com/IntelLabs/ParallelAccelerator.jl >>> [2] http://julialang.org/blog/2016/03/parallelaccelerator >>> >>> On Sun, May 8, 2016 at 3:37 PM, feza wrote: >>>> >>>> I mean the revised script runs just as fast if not a tad faster with the >>>> latest master as it does on 0.4.5 : ) >>>> >>>> >>>> On Sunday, May 8, 2016 at 5:20:08 PM UTC-4, Patrick Kofod Mogensen wrote: >>>>> >>>>> Same as v0.4, or same as before you changed the code? >>>>> >>>>> On Sunday, May 8, 2016 at 8:55:00 PM UTC+2, feza wrote: >>>>>> >>>>>> roughly the same speed. >>>>>> >>>>>> On Sunday, May 8, 2016 at 2:44:19 PM UTC-4, Patrick Kofod Mogensen >>>>>> wrote: >>>>>>> >>>>>>> out of curiosity, what about v0.5? >>> >>> >>> >>> >>> -- >>> chris...@ieee.org