Groups 132 of 99+ julia-users › Getting the best performance in Julia - Bayesian Inference with Julia 2 posts by 2 authors Adham Beyki 9 9 15 Well Julia newbie here! I intend to implement a number of Bayesian hierarchical clustering models more specifically topic models in Julia and here is my implementation for Latent Dirichlet Allocation as a gist:https: odinay 3e49d50ba580a9bff8e3 I shall say my Julia implementation is almost 100 times faster than my Python NumPy implementation. For instance for a simulated dataset from 5 clusters with 1000 observations each containing 100 points: true_kk 5 n_groups 1000 n_group_j 100 ones Int64, n_groups Julia spends nearly 0.1 sec for each LDA Gibbs sampling iteration while it takes almost 9.5 sec in Python on my machine. But the code is still slow for real datasets. I know that Gibbs Inference for these models is expensive in nature. But how can I make sure I have optimised the performance of my code to the best. For example for a slightly bigger dataset such as true_kk 20 n_groups 1000 n_group_j 1000 ones Int64, n_groups the output is: iteration: 98, number of components: 20, elapsed time: 3.209459973 iteration: 99, number of components: 20, elapsed time: 3.265090272 iteration: 100, number of components: 20, elapsed time: 3.204902689 elapsed time: 332.600401208 seconds 20800255280 bytes allocated, 12.87 gc time As I move to more complex models, optimizing the code to the best becomes a bigger concern. How can I make sure without changing the algorithm I don't want to use other Bayesian approaches like variational methods or so , this is the best performance I can get? Also parallelization is not the answer. Although efficient parallel Gibbs sampling for LDA has been proposed e.g. here , it is not the case for more complex statistical models. Thus I want to know if I am tuning the loops and passing vars and types correctly or it can be done more efficiently. What made me unsure of my work is the huge amount of data that is allocated, almost 20 GB. I am aware that since numbers are immutable types, Julia has to copy them for manipulation and calculations. But considering the complexity of my problem 3 nested loops and size of my data, maybe based on your experience you can tell if moving around 20 GB is normal or I am doing something wrong? Best, Adham julia versioninfo Julia Version 0.3.11 Commit 483dbf5 2015-07-27 06:18 UTC Platform Info: System: Windows x86_64-w64-mingw32 CPU: Intel Rlang Core TM i5-3470 CPU 3.20GHz WORD_SIZE: 64 BLAS: libopenblas USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge LAPACK: libopenblas LIBM: libopenlibm LLVM: libLLVM-3.3 Cedric St-Jean 9 9 15 Cool to see more Bayesian inference in Julia! Those are the generic tips in case you haven't gone through them: http: en latest manual performance-tips I particularly recommend profiling your code with Profile.clear profile ...some_function_call... ProfileView.view You'll have to Pkg.add it The red boxes will show memory allocation. Also, if you time your code, it'll tell you what fraction of the time is spent in GC most likely a lot if it's 20 GB . That's quite a bit of code, if you can tell us which part is the bottleneck, it'll be easier to help out. Best, Cédric