Groups 94 of 99+ julia-users › Rlang code to Julia Performance issues with pmap 6 posts by 3 authors jasc... gmail.com Jul 5 I am a complete newcomer to Julia and trying to port some of my Rlang code to it; Basically I have rewritten the following Rlang code in Julia: library parallel eps_1 -rnorm 1000000 eps_2 -rnorm 1000000 large_matrix -ifelse cbind eps_1,eps_2 0,1,0 matrix_to_compare expand.grid c 0,1 ,c 0,1 indices -seq 1,1000000,4 large_matrix -lapply indices,function i large_matrix i: i+3 , function_compare -function x which rowSums x matrix_to_compare 2 in TRUE system.time lapply large_matrix,function_compare user system elapsed 38.812 0.024 38.828 system.time mclapply large_matrix,function_compare,mc.cores 11 user system elapsed 63.128 1.648 6.108 As one can notice I am getting significant speed-up when going from one core to 11. Now I am trying to do the same in Julia: using Distributions; everywhere using Iterators; d Normal ; eps_1 rand d,1000000 ; eps_2 rand d,1000000 ; Define cluster: addprocs 11 ; Create a large matrix: large_matrix hcat eps_1,eps_2 . 0; indices collect 1:4:1000000 Split large matrix: large_matrix large_matrix i: i+3 ,: for i in indices ; Define the function to apply: everywhere function function_split x matrix_to_compare transpose reinterpret Int,collect product 0,1 , 0,1 , 2,4 ; matrix_to_compare matrix_to_compare. 0; find sum x. matrix_to_compare,2 . 2 end time map function_split,large_matrix time pmap function_split,large_matrix 5.167820 seconds 22.00 M allocations: 2.899 GB, 12.83 gc time 18.569198 seconds 40.34 M allocations: 2.082 GB, 5.71 gc time I somehow do not understand why parallel map function does not work for me. Maybe somebody can point me to a correct solution. Stefan Karpinski Jul 5 Similar question and answer: http: stackoverflow.com questions 38075163 julia-uses-only-20-30-of-my-cpu-what-should-i-do 38075939. jasc... gmail.com Jul 6 Yes, but I am not using BLAS or FFT transforms so it si a bit surprising that I am not getting any speed improvements Michael Borregaard Jul 6 I am not seeing your speed-up in R? elapsed is less time, but user significantly more, and it is the sum that counts. When executing in parallel the language needs to copy the data to the workers. If the matrices are large, that takes longer than the speedup of the parallel execution. See what happens with a smaller matrix and then repeating the operation on the workers. jasc... gmail.com Jul 6 This is not entirely true in R: Details: ‘proc.time’ returns five elements for backwards compatibility, but its ‘print’ method prints a named vector of length 3. The first two entries are the total user and system CPU times of the current Rlang process and any child processes on which it has waited, and the third entry is the ‘real’ elapsed time since the process was started. On Wednesday, July 6, 2016 at 5:11:04 PM UTC+2, Michael Borregaard wrote: I am not seeing your speed-up in R? elapsed is less time, but user significantly more, and it is the sum that counts. When executing in parallel the language needs to copy the data to the workers. If the matrices are large, that takes longer than the speedup of the parallel execution. See what happens with a smaller matrix and then repeating the operation on the workers. Michael Borregaard Jul 6 Ah, forgot that the system.time is a bit tricky when running parallel cores