Groups 28 of 99+ julia-users › Loading files and scoping of variables in parallel code 11 posts by 3 authors Christopher Fisher 10 9 15 Hi all- I am trying to load a file of functions on a cluster of computers. In the past, I used require now depreciated and the sendto function described here to make a data variable available on all workers. Note that I cannot simply load the data upon initializing the program because the data will change outside of the module, eventually receiving a stream of data from another program. So speed and flexibility is imperative . As recommended here, I defined a module containing the functions and used using MyModule to send it to the available workers. It seems that the major limitation of this approach is that data is not available to the functions within the module when using sendto . I suspect this is because modules are encapsulated from other variables and functions. Bearing that in mind: 1. Is there a way around this problem using the module method? 2. Alternatively, is there a way I can make the functions and packages available to the workers without using modules? Perhaps something akin to the old require method? 3. Or is there a way to send the data via map along with my function and distributed array? Essentially, my code loads stored inputs for numerous kernel density functions and converts them to a distributed array of arrays. For example: map EvalKDFs,MyDistArray Each time the above function is called, MyData needs to be available to the function EvalKDFs. However, map EvalKDFs,MyDistArray,MyData does not work because there is one array of data and many arrays within MyDistArray. I might be able to post a stripped down version of my code if my description does not suffice. Any help would be greatly appreciated. Sara Freeman 10 10 15 I've encountered a similar problem, but do not have a solution to report. I'm not sure why require was depreciated. It worked quite well. Tim Holy 10 11 15 I'm not certain I understand the interaction between your functions and your data. Let me make a guess: you're basically wanting to supply some parameters as defaults? Then the strategy I'd recommend is the following: module MyModule export foo foo x, p println Got , x, and parameter , p end julia addprocs 1 1-element Array Int64,1 : 2 julia push! LOAD_PATH, pwd ; julia using MyModule julia everywhere using MyModule julia everywhere foo1 x - foo x, hello julia remotecall_fetch 2, foo1, 3 From worker 2: Got 3 and parameter hello Let me comment that I'm not sure it makes sense that one should have to say everywhere using MyModule. An alternative is to define the anonymous function as everywhere foo1 x - MyModule.foo x, hello but, with this route, one wonders if this module-scoping could be added automatically. If you do neither one of these things, you get the following error: julia remotecall_fetch 2, foo1, 3 ERROR: On worker 2: UndefVarError: foo not defined in anonymous at none:1 in anonymous at multi-jl:892 in run_work_thunk at multi-jl:645 inlined code from multi-jl:892 in anonymous at task-jl:63 in remotecall_fetch at multi-jl:731 in remotecall_fetch at multi-jl:734 which is quite confusing because this works: julia remotecall_fetch 2, foo, 3, world From worker 2: Got 3 and parameter world These seem like unnecessary bumps it certainly cost me several minutes to figure out , and anyone who ironed those out would be making a great contribution! I just opened https: github.com JuliaLang julia issues 13548 and marked it up-for-grabs. Best, --Tim On Friday, October 09, 2015 07:35:20 AM Christopher Fisher wrote: Hi all- I am trying to load a file of functions on a cluster of computers. In the past, I used require now depreciated and the sendto function described here http: stackoverflow.com questions 27677399 julia-how-to-copy-data-to-anoth er-processor-in-julia to make a data variable available on all workers. Note that I cannot simply load the data upon initializing the program because the data will change outside of the module, eventually receiving a stream of data from another program. So speed and flexibility is imperative . As recommended here https: groups.google.com forum !searchin julia-users $20require julia-us ers 6zBKw4nd20I 5JLt7Ded0zkJ , I defined a module containing the functions Tim Holy 10 11 15 Re: julia-users Re: Loading files and scoping of variables in parallel code IIUC, there were two reasons for deprecating require: - many people complained about the slew of related concepts include, require, reload, using, import . require seems like the easiest of these to eliminate. - Package precompilation. It was quite ambiguous whether require filename should defer to the precompiled file or the source code. However, now that we have timestamp checks to ensure these are in sync with each other, I'm not sure that argument is relevant anymore. However, modules really should be static containers of code, not of data , so perhaps use of modules indicates that the deprecation is a good idea on its own merits. Does my reply to Christopher about the anonymous functions help with your use case? Best, --Tim On Saturday, October 10, 2015 04:08:36 AM Sara Freeman wrote: I've encountered a similar problem, but do not have a solution to report. I'm not sure why require was depreciated. It worked quite well. On Friday, October 9, 2015 at 10:35:20 AM UTC-4, Christopher Fisher wrote: Hi all- I am trying to load a file of functions on a cluster of computers. In the past, I used require now depreciated and the sendto function described here http: stackoverflow.com questions 27677399 julia-how-to-copy-data-to-ano ther-processor-in-julia to make a data variable available on all workers. Note that I cannot simply load the data upon initializing the program because the data will change outside of the module, eventually receiving a stream of data from another program. So speed and flexibility is imperative . As recommended here https: groups.google.com forum !searchin julia-users $20require julia- users 6zBKw4nd20I 5JLt7Ded0zkJ , I defined a module containing the Christopher Fisher 10 11 15 Thanks for your comments Tim. Unfortunately, the issues I have still persist. I made a stripped down version of my code I cannot post my model code right now that has all of the relevant operations in tact. So hopefully having the code will make the issue easier to resolve. In short, the code evaluates the likelihood of MyData using precomputed kernel density functions this is done because the model is very slow to simulate and it a likelihood function is not mathematically tractable . The inputs to the kernel density function are put into a distributed array and then evaluated with map .This worked fine in .3. The challenge in .4 has been getting MyData to the KDFeval function without the require function. I have two versions of the code. One that uses a module and using and one that uses include with the macro everywhere on each function. Strangely, after removing my model from the code, the sendto function does not load properly in either method. Regarding the slew of similar functions, I agree that it is somewhat confusing. However, I do agree with Sara that require was very convenient for distributing the code to the workers: simply create a -jl file with the functions and a call to the packages you need. I tried to recreate what require used to do but I could not make any sense out of it based on the source code. Thanks again for any help you can offer. Attachments 1 Example.zip 3 MB View Download Christopher Fisher 10 11 15 Re: julia-users Re: Loading files and scoping of variables in parallel code Sorry for the confusion. I realized the problem with sendto was that the argument workers was missing and there was a minor indexing problem in MainScript-jl. Please see the updated code attached. I also included code that works in Julia .3 with the require method, just to show that the code is functioning. The include method is beginning to work but it results in a conflict of pdf between Distributions and KernelDensity: WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an existing identifier. WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an existing identifier. WARNING: using Distributions.pdf in module Main conflicts with an existing identifier. WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an existing identifier. WARNING: using KernelDensity.pdf in module Main conflicts with an existing identifier. LoadError: On worker 4: UndefVarError: pdf not defined in KDFeval at Users chrisfisher Desktop Example Updated Include Method KDFs-jl:10 in map at Applications Julia-0.4.0.app Contents Resources julia lib julia sys.dylib in anonymous at Users chrisfisher .julia v0.4 DistributedArrays src DistributedArrays-jl:494 in anonymous at multi-jl:889 in run_work_thunk at multi-jl:645 in run_work_thunk at multi-jl:654 in anonymous at task-jl:58 in remotecall_fetch at multi-jl:731 in call_on_owner at multi-jl:776 in fetch at multi-jl:784 in chunk at Users chrisfisher .julia v0.4 DistributedArrays src DistributedArrays-jl:257 in anonymous at task-jl:447 while loading In 4 , in expression starting on line 28 in sync_end at Applications Julia-0.4.0.app Contents Resources julia lib julia sys.dylib inlined code from task-jl:422 in convert at Users chrisfisher .julia v0.4 DistributedArrays src DistributedArrays-jl:344 in convert at abstractarray-jl:421 inlined code from In 4 :35 in anonymous at no file:34 Attachments 1 Example Updated.zip 5 MB View Download Sara Freeman 10 11 15 Re: julia-users Re: Loading files and scoping of variables in parallel code I get the same error that Christopher is getting when I run include method in .4. However, it does work in .3 without conflicts. The include method seems to work with my code but I do not call any packages. So I am wondering if there is an error in one or several of the packages or if the behavior in .4 changed. Tim Holy 10 11 15 Re: julia-users Re: Loading files and scoping of variables in parallel code Two main problems: 1. module KDFs doesn't know anything about variables stored in Main. This has nothing to do with parallel code, this is just a basic scoping issue. It should be function KDFeval KDFinputs, data ... L pdf f, data ... end and then call it from MainScript as output map x- KDFeval x, MyData , KDFargs 2. Add everywhere using KDFs after you say `using KDFs`. See https: github.com JuliaLang julia issues 9245. I think those are the only changes I had to make. --Tim On Sunday, October 11, 2015 07:22:27 AM Christopher Fisher wrote: Sorry for the confusion. I realized the problem with sendto was that the argument workers was missing and there was a minor indexing problem in MainScript-jl. Please see the updated code attached. I also included code that works in Julia .3 with the require method, just to show that the code is functioning. The include method is beginning to work but it results in a conflict of pdf between Distributions and KernelDensity: WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an existing identifier. WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an existing identifier. WARNING: using Distributions.pdf in module Main conflicts with an existing identifier. WARNING: using KernelDensity.UnivariateKDE in module Main conflicts with an existing identifier. WARNING: using KernelDensity.pdf in module Main conflicts with an existing identifier. LoadError: On worker 4: UndefVarError: pdf not defined in KDFeval at Users chrisfisher Desktop Example Updated Include Method KDFs-jl:10 in map at Applications Julia-0.4.0.app Contents Resources julia lib julia sys.dylib in anonymous at Users chrisfisher .julia v0.4 DistributedArrays src DistributedArrays-jl:4 94 in anonymous at multi-jl:889 in run_work_thunk at multi-jl:645 in run_work_thunk at multi-jl:654 in anonymous at task-jl:58 in remotecall_fetch at multi-jl:731 in call_on_owner at multi-jl:776 in fetch at multi-jl:784 in chunk at Users chrisfisher .julia v0.4 DistributedArrays src DistributedArrays-jl:2 57 in anonymous at task-jl:447 while loading In 4 , in expression starting on line 28 in sync_end at Applications Julia-0.4.0.app Contents Resources julia lib julia sys.dylib inlined code from task-jl:422 in convert at Users chrisfisher .julia v0.4 DistributedArrays src DistributedArrays-jl:3 44 in convert at abstractarray-jl:421 Christopher Fisher 10 11 15 Re: julia-users Re: Loading files and scoping of variables in parallel code Thank you, Tim. That was very helpful. The solution was somewhat counterintuitive to me. The code now works in the for loop and produces the correct output. Strangely, when I run map outside of the for loop I get this problem: In 2 : This is a simple example of approximating maximum likelihood estimation from precomputed kernel desnity functions. This particular example uses a guassian model. In practice, this code would use a much slower simulation model for which an anlytical likelihood function is not tractable. ​ ​ ​ Load KDF inputs and Parameter List ParmList readcsv ParmList.csv ranges readcsv RangeVar.csv densities readcsv DensityVar.csv ​ Initialize local workers Nprocs 4 addprocs Nprocs-1 ​ using Distributions ​ add path push! LOAD_PATH, Users chrisfisher Desktop Example Module Method load functions and pakages to workers using KDFs everywhere using KDFs Distributed Array containing KDF inputs KDFargs KDFDistArray densities,ranges ​ Number of datasets to which the model will be fit Nsub 10 MLEs zeros Nsub,2 ​ for sub 1:Nsub Generate simulated data MyData rand Normal 0,1 ,50 Evaluate KDFs output map x- KDFeval x,MyData ,KDFargs output convert Array,output output vcat output... index findmax output 2 Record approximate MLE MLEs sub,: ParmList index,: end ​ MyData rand Normal 0,1 ,50 Evaluate KDFs output map x- KDFeval x,MyData ,KDFargs Out 2 : 100-element DistributedArrays.DArray Any,1,RemoteException : undef undef undef undef undef undef undef undef undef undef undef undef undef ⋮ undef undef undef undef undef undef undef undef undef undef undef undef Sometimes it will say MyData is not defined on a particular worker instead. Do you have any idea why it works in one context but not another? I'm perplexed. code attached Thanks again, Chris Attachments 1 Example.zip 1 MB View Download Christopher Fisher 10 11 15 Re: julia-users Re: Loading files and scoping of variables in parallel code My appologies if the formatting was not readable. Essentially, I replaced the for loop in MainScript-jl with: MyData rand Normal 0,1 ,50 output map x- KDFeval x,MyData ,KDFargs and the output was: 100-element DistributedArrays.DArray Any,1,RemoteException : undef undef undef... On Sunday, October 11, 2015 at 2:52:36 PM UTC-4, Tim Holy wrote: Sara Freeman 10 12 15 Re: julia-users Re: Loading files and scoping of variables in parallel code Thanks for your help so far Tim. The behavior of output map x- KDFeval x, MyData , KDFargs is indeed strange and it happens with my code too. Do you have any idea why it works within a for loop but not outside of it? Sara On Sunday, October 11, 2015 at 2:52:36 PM UTC-4, Tim Holy wrote: