Groups


174 of 99+  


julia-users ›
Achitecture for solving largish LASSO/elasticnet problem
4 posts by 3 authors  


Matthew Pearce 	

Jun 24


Hello

I'm trying to solve a largish elasticnet type problem (convex optimisation). 
The LARS.jl package produces Out of Memory errors for a test (1000, 262144) problem. /proc/meminfo suggests I have 17x this array size free so not sure what's going on there.
I have access to multiple GPUs and nodes.
I would potentially need to solve problems of the above sort of size or bigger (10k, 200k) many, many times.
Looking for thoughts on the appropriate way to go about tackling this:
Rewrap an existing glmnet library for Julia (e.g. this CUDA enabled one https://github.com/jeffwong/cudaglmnet or http://www-hsc.usc.edu/~garykche/gpulasso.pdf)
Go back to basics and use and optimisation package on the objective function (https://github.com/JuliaOpt), but which one? Would this be inefficient compared to specific glmnet solvers which do some kind of coordinate descent?
Rewrite some CUDA library from scratch (OK - probably a bad idea).
Thoughts on the back of a postcard would be gratefully received.


Cheers


Matthew

 
Tom Breloff 	

Jun 24


You could consider streaming data to multiple instances of OnlineStats.jl in parallel. There should be no problem with memory usage as long as you don't explicitly load your whole data set at once. 
- show quoted text -
 

Josh Day 	

Jun 24


I'm working on https://github.com/joshday/SparseRegression.jl for penalized regression problems.  I'm still optimizing the code, but a test set of that size is not a problem.  

julia> n, p = 1000, 262144; x = randn(n, p); y = x*randn(p) + randn(n);

julia> @time o = SparseReg(x, y, ElasticNetPenalty(.1), Fista(tol = 1e-4, step = .1), lambda = [.5])
 22.356062 seconds (1.69 k allocations: 408.851 MB, 0.16% gc time)
■ SparseReg
  >      Model:  SparseRegression.LinearRegression()
  >    Penalty:  ElasticNetPenalty (α = 0.1)
  >  Intercept:  true
  >         nλ:  1
  >  Algorithm:  Fista
 

Matthew Pearce 	

Jun 24


Thanks for the suggestions so far, I'll be investigating these options :)