Groups 134 of 99+ julia-users › how long until vectorized code runs fast? 19 posts by 10 authors esproff May 12 This remains one of the main drawbacks of Julia, and the devectorize package is basically useless as it doesn't support some really crucial vectorized operations. I'd really prefer not to rewrite all my vectorized code into nested loops if at all possible, but I really need more speed, can anyone tell me the timeline and future plans for making vectorized code run at C speed? Kristoffer Carlsson May 12 It is always easier to discuss if there is a piece of code to look at. Could you perhaps post a few code examples that does not run as fast as you want? Also, make sure to look at : https://github.com/IntelLabs/ParallelAccelerator.jl. They have a quite sophisticated compiler that does loop fusions and parallelization and other cool stuff. On Thursday, May 12, 2016 at 7:22:24 AM UTC+2, Anonymous wrote: This remains one of the main drawbacks of Julia, and the devectorize package is basically useless as it doesn't support some really crucial vectorized operations. I'd really prefer not to rewrite all my vectorized code into nested loops if at all possible, but I really need more speed, can anyone tell me the timeline and future plans for making vectorized code run at C speed? Keno Fischer May 12 Re: [julia-users] Re: how long until vectorized code runs fast? There seems to be a myth going around that vectorized code in Julia is slow. That's not really the case. Often times it's just that devectorized code is faster because one can manually perform operations such as loop fusion, which the compiler cannot currently reason about (and most C compilers can't either). In some other languages those benefits get drowned out by language overhead, but in julia those kinds of constructs are generally fast. The cases where julia can be slower is when there is excessive memory allocation in a tight inner loop, but those cases can usually be rewritten fairly easily without losing the vectorized look of the code. - show quoted text - esproff May 12 Re: [julia-users] Re: how long until vectorized code runs fast? In response to both Kristoffer and Keno's timely responses, Originally I just did a simple @time test of the form Matrix .* horizontal vector and then tested the same thing with for loops, and the for loops were way faster (and used way less memory) However I just devectorized one of my algorithms and ran an @time comparison and the vectorized version was actually twice as fast as the devectorized version, however the vectorized version used way more memory. Clearly I don't really understand the specifics of what makes code slow, and in particular how vectorized code compares to devectorized code. Vectorized code does seem to use a lot more memory, but clearly for my algorithm it nevertheless runs faster than the devectorized version. Is there a reference I could look at that explains this to someone with a background in math but not much knowledge of computer architecture? - show quoted text - Milan Bouchet-Valat May 12 - show quoted text - Some major improvements are coming in 0.5, and more are currently being worked on/discussed. See https://github.com/JuliaLang/julia/issues/16285 Regards Milan Bouchet-Valat May 12 Re: [julia-users] Re: how long until vectorized code runs fast? Translate message to English Le mercredi 11 mai 2016 à 23:03 -0700, Anonymous a écrit : > In response to both Kristoffer and Keno's timely responses, > > Originally I just did a simple @time test of the form > Matrix .* horizontal vector > > and then tested the same thing with for loops, and the for loops were > way faster (and used way less memory) > > However I just devectorized one of my algorithms and ran an @time > comparison and the vectorized version was actually twice as fast as > the devectorized version, however the vectorized version used way > more memory. Clearly I don't really understand the specifics of what > makes code slow, and in particular how vectorized code compares to > devectorized code. Vectorized code does seem to use a lot more > memory, but clearly for my algorithm it nevertheless runs faster than > the devectorized version. Is there a reference I could look at that > explains this to someone with a background in math but not much > knowledge of computer architecture? I don't know about a reference, but I suspect this is due to BLAS. Vectorized versions of linear algebra operations like matrix multiplication are highly optimized, and run several threads in parallel. OTC, your devectorized code isn't carefully tuned for a specific processor model, and uses a single CPU core (soon Julia will support using several threads, and see [1]). So depending on the particular operations you're running, the vectorized form can be faster even though it allocates more memory. In general, it will likely be faster to use BLAS for expensive operations on large matrices. OTOH, it's better to devectorize code if you successively perform several simple operations on an array, because each operation currently allocates a copy of the array (this may well change with [2]). Regards 1: http://julialang.org/blog/2016/03/parallelaccelerator 2: https://github.com/JuliaLang/julia/issues/16285 - show quoted text - esproff May 12 Re: [julia-users] Re: how long until vectorized code runs fast? are operators such as [1 2; 3 4] .* [1 2] or [1,2] .^ [1,2] part of BLAS? The latter is covered by devectorize.jl, however my understanding is that the former falls between the cracks, neither covered by devectorize.jl nor by BLAS. - show quoted text - Stefan Karpinski May 12 Re: [julia-users] Re: how long until vectorized code runs fast? On Thu, May 12, 2016 at 7:41 AM, Keno Fischer wrote: There seems to be a myth going around that vectorized code in Julia is slow. That's not really the case. Often times it's just that devectorized code is faster because one can manually perform operations such as loop fusion, which the compiler cannot currently reason about (and most C compilers can't either). In some other languages those benefits get drowned out by language overhead, but in julia those kinds of constructs are generally fast. The cases where julia can be slower is when there is excessive memory allocation in a tight inner loop, but those cases can usually be rewritten fairly easily without losing the vectorized look of the code. This. JMW's blog post on the subject is as relevant as when he wrote it: http://www.johnmyleswhite.com/notebook/2013/12/22/the-relationship-between-vectorized-and-devectorized-code/ Conclusion: Julia’s vectorized code is 2x faster than R’s vectorized code Julia’s devectorized code is 140x faster than R’s vectorized code Julia’s devectorized code is 1350x faster than R’s devectorized code Julia's vectorized code is not slow – it's faster than other languages. It's just that Julia allows you to write even faster code when it matters. 2 messages have been deleted. Miguel Bazdresch May 12 Re: [julia-users] Re: how long until vectorized code runs fast? The easiest way to write slow for loops is to make them row-major instead of column-major. -- mb On Thu, May 12, 2016 at 8:46 AM, Anonymous wrote: So I guess the consensus is not that Julia's devectorized code is so much faster than its vectorized code (in fact I keep getting slow downs when I test out different devectorizations of my algorithms), but that R's devectorized code just sucks, either that or I really suck at writing for loops. honestly I've been testing out different devectorizations of my algorithms and I keep getting slower results, not faster, so either I really suck at writing for loops or Julia is doing a good job with my vectorized code. - show quoted text - Tim Holy May 12 Re: [julia-users] Re: how long until vectorized code runs fast? Did you run it twice? Remember that memory is allocated during JIT compilation, so the amount of memory on the first call is completely meaningless. --Tim - show quoted text - - show quoted text - esproff May 12 Re: [julia-users] Re: how long until vectorized code runs fast? I did run it multiple times yes. I've tried a couple different devectorizations on my algorithms and none result in speed ups, and most result in slightly slower run-times. I guess I find it a bit strange because the memory allocations and garbage collection is far less when I devectorize, but that doesn't translate into performance improvements. Also like I said before, I'm most curious about the current status of operations of the form: [1 2; 3 4] .* [1 2] is such an operation covered by BLAS? - show quoted text - Steven G. Johnson May 12 Re: [julia-users] Re: how long until vectorized code runs fast? On Thursday, May 12, 2016 at 8:51:44 AM UTC-4, Miguel Bazdresch wrote: honestly I've been testing out different devectorizations of my algorithms and I keep getting slower results, not faster, so either I really suck at writing for loops or Julia is doing a good job with my vectorized code. Make sure your loops are in a function — don't benchmark in global scope (see the performance tips sections of the manual). Try running your function through @code_warntype myfunction(args...) and see if it warns marks any variables as type "ANY" (which indicates a type instability in your code, see the performance tips), Also, if you do "@time myfunc(args...)" and it indicates that you did a huge number of allocations, you could either have a type instability or be allocating new arrays in your inner loops (it is always better to allocate arrays once outside your inner loops and then update them in-place as needed). Tom Breloff May 12 Re: [julia-users] Re: how long until vectorized code runs fast? Also it's possible that your vectorized versions are being passed to multithreaded routines? The setup might require more memory but the execution would run in parallel. - show quoted text - esproff May 12 Re: [julia-users] Re: how long until vectorized code runs fast? Yes the algorithm I'm testing this on is fairly polished at this point, all variables are within a type and they all have strict type declarations. The memory allocations are very low compared to the vectorized code, so memory-wise the loops are doing their job, but this doesn't translate into speed-ups. - show quoted text - Ford O. May 12 Re: [julia-users] Re: how long until vectorized code runs fast? Why dont you just post your code here? Dne čtvrtek 12. května 2016 15:53:35 UTC+2 Anonymous napsal(a): - show quoted text - Stefan Karpinski May 12 Re: [julia-users] Re: how long until vectorized code runs fast? I also have to ask... you're not working with global variables, right? - show quoted text - Tim Holy May 12 Re: [julia-users] Re: how long until vectorized code runs fast? On Thursday, May 12, 2016 06:44:16 AM Anonymous wrote: > Also like I said before, I'm most curious about the current status of > operations of the form: > > [1 2; 3 4] .* [1 2] > > is such an operation covered by BLAS? No, among other reasons because BLAS only handles floating-point numbers. That specific operation is handled by broadcasting. Best, --Tim - show quoted text -