Groups 28 of 99+ julia-users › Loading files and scoping of variables in parallel code 11 posts by 3 authors Christopher Fisher 10/9/15 Hi all- I am trying to load a file of functions on a cluster of computers. In the past, I used require() (now depreciated) and the sendto() function described here to make a data variable available on all workers. ( Note that I cannot simply load the data upon initializing the program because the data will change outside of the module, eventually receiving a stream of data from another program. So speed and flexibility is imperative). As recommended here, I defined a module containing the functions and used "using MyModule" to send it to the available workers. It seems that the major limitation of this approach is that data is not available to the functions within the module when using sendto(). I suspect this is because modules are encapsulated from other variables and functions. Bearing that in mind: 1. Is there a way around this problem using the module method? 2. Alternatively, is there a way I can make the functions and packages available to the workers without using modules? Perhaps something akin to the old require method? 3. Or is there a way to send the data via map() along with my function and distributed array? Essentially, my code loads stored inputs for numerous kernel density functions and converts them to a distributed array of arrays. For example: map(EvalKDFs,MyDistArray) Each time the above function is called, "MyData" needs to be available to the function EvalKDFs. However, map(EvalKDFs,MyDistArray,MyData) does not work because there is one array of data and many arrays within MyDistArray. I might be able to post a stripped down version of my code if my description does not suffice. Any help would be greatly appreciated. Sara Freeman 10/10/15 I've encountered a similar problem, but do not have a solution to report. I'm not sure why require was depreciated. It worked quite well. - show quoted text - Tim Holy 10/11/15 I'm not certain I understand the interaction between your functions and your data. Let me make a guess: you're basically wanting to supply some parameters as defaults? Then the strategy I'd recommend is the following: module MyModule export foo foo(x, p) = println("Got ", x, " and parameter ", p) end julia> addprocs(1) 1-element Array{Int64,1}: 2 julia> push!(LOAD_PATH, pwd()); julia> using MyModule julia> @everywhere using MyModule julia> @everywhere foo1 = x -> foo(x, "hello") julia> remotecall_fetch(2, foo1, 3) From worker 2: Got 3 and parameter hello Let me comment that I'm not sure it makes sense that one should have to say @everywhere using MyModule. An alternative is to define the anonymous function as @everywhere foo1 = x -> MyModule.foo(x, "hello") but, with this route, one wonders if this module-scoping could be added automatically. If you do neither one of these things, you get the following error: julia> remotecall_fetch(2, foo1, 3) ERROR: On worker 2: UndefVarError: foo not defined in anonymous at none:1 in anonymous at multi.jl:892 in run_work_thunk at multi.jl:645 [inlined code] from multi.jl:892 in anonymous at task.jl:63 in remotecall_fetch at multi.jl:731 in remotecall_fetch at multi.jl:734 which is quite confusing because this works: julia> remotecall_fetch(2, foo, 3, "world") From worker 2: Got 3 and parameter world These seem like unnecessary bumps (it certainly cost me several minutes to figure out), and anyone who ironed those out would be making a great contribution! I just opened https://github.com/JuliaLang/julia/issues/13548 and marked it up-for-grabs. Best, --Tim On Friday, October 09, 2015 07:35:20 AM Christopher Fisher wrote: > Hi all- > > I am trying to load a file of functions on a cluster of computers. In the > past, I used require() (now depreciated) and the sendto() function > described here > er-processor-in-julia>to make a data variable available on all workers. ( > Note that I cannot simply load the data upon initializing the program > because the data will change outside of the module, eventually receiving a > stream of data from another program. So speed and flexibility is > imperative). As recommended here > ers/6zBKw4nd20I/5JLt7Ded0zkJ>, I defined a module containing the functions - show quoted text - Tim Holy 10/11/15 Re: [julia-users] Re: Loading files and scoping of variables in parallel code IIUC, there were two reasons for deprecating require: - many people complained about the slew of related concepts (include, require, reload, using, import). require seems like the easiest of these to eliminate. - Package precompilation. It was quite ambiguous whether require(filename) should defer to the precompiled file or the source code. However, now that we have timestamp checks to ensure these are in sync with each other, I'm not sure that argument is relevant anymore. However, modules really should be "static" (containers of code, not of data), so perhaps use of modules indicates that the deprecation is a good idea on its own merits. Does my reply to Christopher about the anonymous functions help with your use case? Best, --Tim On Saturday, October 10, 2015 04:08:36 AM Sara Freeman wrote: > I've encountered a similar problem, but do not have a solution to report. > > I'm not sure why require was depreciated. It worked quite well. > > On Friday, October 9, 2015 at 10:35:20 AM UTC-4, Christopher Fisher wrote: > > Hi all- > > > > I am trying to load a file of functions on a cluster of computers. In the > > past, I used require() (now depreciated) and the sendto() function > > described here > > > ther-processor-in-julia>to make a data variable available on all workers. > > ( Note that I cannot simply load the data upon initializing the program > > because the data will change outside of the module, eventually receiving > > a stream of data from another program. So speed and flexibility is > > imperative). As recommended here > > > users/6zBKw4nd20I/5JLt7Ded0zkJ>, I defined a module containing the - show quoted text - Christopher Fisher 10/11/15 Thanks for your comments Tim. Unfortunately, the issues I have still persist. I made a stripped down version of my code (I cannot post my model code right now) that has all of the relevant operations in tact. So hopefully having the code will make the issue easier to resolve. In short, the code evaluates the likelihood of "MyData" using precomputed kernel density functions (this is done because the model is very slow to simulate and it a likelihood function is not mathematically tractable). The inputs to the kernel density function are put into a distributed array and then evaluated with map().This worked fine in .3. The challenge in .4 has been getting MyData to the KDFeval function without the require() function. I have two versions of the code. One that uses a module and "using" and one that uses include() with the macro @everywhere on each function. Strangely, after removing my model from the code, the sendto() function does not load properly in either method. Regarding the slew of similar functions, I agree that it is somewhat confusing. However, I do agree with Sara that require() was very convenient for distributing the code to the workers: simply create a .jl file with the functions and a call to the packages you need. I tried to recreate what require used to do but I could not make any sense out of it based on the source code. Thanks again for any help you can offer. - show quoted text - Attachments (1) Example.zip 3 MB View Download Christopher Fisher 10/11/15 Re: [julia-users] Re: Loading files and scoping of variables in parallel code Sorry for the confusion. I realized the problem with sendto() was that the argument workers() was missing and there was a minor indexing problem in MainScript.jl. Please see the updated code attached. I also included code that works in Julia .3 with the require method, just to show that the code is functioning. The include method is beginning to work but it results in a conflict of pdf between Distributions and KernelDensity: WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an existing identifier. WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an existing identifier. WARNING: using Distributions.pdf in module Main conflicts with an existing identifier. WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an existing identifier. WARNING: using KernelDensity.pdf in module Main conflicts with an existing identifier. LoadError: On worker 4: UndefVarError: pdf not defined in KDFeval at /Users/chrisfisher/Desktop/Example Updated/Include Method/KDFs.jl:10 in map at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib in anonymous at /Users/chrisfisher/.julia/v0.4/DistributedArrays/src/DistributedArrays.jl:494 in anonymous at multi.jl:889 in run_work_thunk at multi.jl:645 in run_work_thunk at multi.jl:654 in anonymous at task.jl:58 in remotecall_fetch at multi.jl:731 in call_on_owner at multi.jl:776 in fetch at multi.jl:784 in chunk at /Users/chrisfisher/.julia/v0.4/DistributedArrays/src/DistributedArrays.jl:257 in anonymous at task.jl:447 while loading In[4], in expression starting on line 28 in sync_end at /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib [inlined code] from task.jl:422 in convert at /Users/chrisfisher/.julia/v0.4/DistributedArrays/src/DistributedArrays.jl:344 in convert at abstractarray.jl:421 [inlined code] from In[4]:35 in anonymous at no file:34 - show quoted text - Attachments (1) Example Updated.zip 5 MB View Download Sara Freeman 10/11/15 Re: [julia-users] Re: Loading files and scoping of variables in parallel code I get the same error that Christopher is getting when I run include method in .4. However, it does work in .3 without conflicts. The include method seems to work with my code but I do not call any packages. So I am wondering if there is an error in one or several of the packages or if the behavior in .4 changed. - show quoted text - Tim Holy 10/11/15 Re: [julia-users] Re: Loading files and scoping of variables in parallel code Two main problems: 1. module KDFs doesn't know anything about variables stored in Main. (This has nothing to do with parallel code, this is just a basic scoping issue.) It should be function KDFeval(KDFinputs, data) ... L = pdf(f, data) ... end and then call it from MainScript as output = map(x->KDFeval(x, MyData), KDFargs) 2. Add @everywhere using KDFs after you say `using KDFs`. See https://github.com/JuliaLang/julia/issues/9245. I think those are the only changes I had to make. --Tim On Sunday, October 11, 2015 07:22:27 AM Christopher Fisher wrote: > Sorry for the confusion. I realized the problem with sendto() was that the > argument workers() was missing and there was a minor indexing problem in > MainScript.jl. Please see the updated code attached. I also included code > that works in Julia .3 with the require method, just to show that the code > is functioning. > > > > The include method is beginning to work but it results in a conflict of pdf > between Distributions and KernelDensity: > > WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an > existing identifier. WARNING: using KernelDensity.UnivariateKDE in module > Main conflicts with an existing identifier. WARNING: using > Distributions.pdf in module Main conflicts with an existing identifier. > WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an > existing identifier. WARNING: using KernelDensity.pdf in module Main > conflicts with an existing identifier. > > LoadError: On worker 4: > UndefVarError: pdf not defined > in KDFeval at /Users/chrisfisher/Desktop/Example Updated/Include > Method/KDFs.jl:10 in map at > /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib > in anonymous at > /Users/chrisfisher/.julia/v0.4/DistributedArrays/src/DistributedArrays.jl:4 > 94 in anonymous at multi.jl:889 > in run_work_thunk at multi.jl:645 > in run_work_thunk at multi.jl:654 > in anonymous at task.jl:58 > in remotecall_fetch at multi.jl:731 > in call_on_owner at multi.jl:776 > in fetch at multi.jl:784 > in chunk at > /Users/chrisfisher/.julia/v0.4/DistributedArrays/src/DistributedArrays.jl:2 > 57 in anonymous at task.jl:447 > while loading In[4], in expression starting on line 28 > > in sync_end at > /Applications/Julia-0.4.0.app/Contents/Resources/julia/lib/julia/sys.dylib > [inlined code] from task.jl:422 > in convert at > /Users/chrisfisher/.julia/v0.4/DistributedArrays/src/DistributedArrays.jl:3 > 44 in convert at abstractarray.jl:421 - show quoted text - Christopher Fisher 10/11/15 Re: [julia-users] Re: Loading files and scoping of variables in parallel code Thank you, Tim. That was very helpful. The solution was somewhat counterintuitive to me. The code now works in the for loop and produces the correct output. Strangely, when I run map() outside of the for loop I get this problem: In [2]: #This is a simple example of approximating maximum likelihood estimation from precomputed #kernel desnity functions. This particular example uses a guassian model. In practice, this #code would use a much slower simulation model for which an anlytical likelihood function is #not tractable. ​ ​ ​ #Load KDF inputs and Parameter List ParmList = readcsv("ParmList.csv") ranges = readcsv("RangeVar.csv") densities = readcsv("DensityVar.csv") ​ #Initialize local workers Nprocs = 4 addprocs(Nprocs-1) ​ using Distributions ​ #add path push!(LOAD_PATH,"/Users/chrisfisher/Desktop/Example/Module Method") #load functions and pakages to workers using KDFs @everywhere using KDFs #Distributed Array containing KDF inputs KDFargs = KDFDistArray(densities,ranges) ​ #Number of datasets to which the model will be fit Nsub = 10 MLEs = zeros(Nsub,2) ​ # for sub = 1:Nsub # #Generate simulated data # MyData = rand(Normal(0,1),50) # #Evaluate KDFs # output = map(x->KDFeval(x,MyData),KDFargs) # output = convert(Array,output) # output = vcat(output...) # index = findmax(output)[2] # #Record approximate MLE # MLEs[sub,:] = ParmList[index,:] # end ​ MyData = rand(Normal(0,1),50) #Evaluate KDFs output = map(x->KDFeval(x,MyData),KDFargs) Out[2]: 100-element DistributedArrays.DArray{Any,1,RemoteException}: #undef #undef #undef #undef #undef #undef #undef #undef #undef #undef #undef #undef #undef ⋮ #undef #undef #undef #undef #undef #undef #undef #undef #undef #undef #undef #undef Sometimes it will say MyData is not defined on a particular worker instead. Do you have any idea why it works in one context but not another? I'm perplexed. (code attached) Thanks again, Chris - show quoted text - Attachments (1) Example.zip 1 MB View Download Christopher Fisher 10/11/15 Re: [julia-users] Re: Loading files and scoping of variables in parallel code My appologies if the formatting was not readable. Essentially, I replaced the for loop in MainScript.jl with: MyData = rand(Normal(0,1),50) output = map(x->KDFeval(x,MyData),KDFargs) and the output was: 100-element DistributedArrays.DArray{Any,1,RemoteException}: #undef #undef #undef... On Sunday, October 11, 2015 at 2:52:36 PM UTC-4, Tim Holy wrote: - show quoted text - Sara Freeman 10/12/15 Re: [julia-users] Re: Loading files and scoping of variables in parallel code Thanks for your help so far Tim. The behavior of output = map(x->KDFeval(x, MyData), KDFargs) is indeed strange and it happens with my code too. Do you have any idea why it works within a for loop but not outside of it? ~Sara On Sunday, October 11, 2015 at 2:52:36 PM UTC-4, Tim Holy wrote: - show quoted text -