segfault when calling rmprocs after parallel for loop 7376 Closed felixjung opened this Issue on Jun 23, 2014 ยท 5 comments Projects None yet Labels parallel Milestone No milestone Assignees No one assigned 4 participants felixjung JeffBezanson jiahao ViralBShah Notifications felixjung felixjung commented on Jun 23, 2014 Hi, I've noted that Julia 0.3.0-prerelease+3789 segfaults when I try to remove workers after running the following parallelised code: Add worker cores addprocs 10 Create distributed array d dzeros 100, 100 , workers , 1, nworkers Fill the DArray in a parallel loop parallel for w 1:length d.chunks Obtain local part of array on worker w dl localpart d Fill localpart for i 1:size dl, 1 for j 1:size dl, 2 dl i, j randn end end end Remove the worker cores rmprocs workers This will cause julia to segfault The code creates a DArray, where the second dimension is split between workers. During the parallel for loop, each worker fills its local part of the DArray with random numbers by iterating through all array indices. Finally, I try to remove worker nodes using rmprocs . The exact error I get is julia rmprocs workers :ok julia From worker 2: 1:37,1:251,1:100 From worker 3: 1:37,1:251,101:200 From worker 13: 1:37,1:251,1101:1200 From worker 10: 1:37,1:251,801:900 From worker 7: 1:37,1:251,501:600 From worker 11: 1:37,1:251,901:1000 From worker 19: 1:37,1:251,1701:1800 From worker 20: 1:37,1:251,1801:1900 From worker 6: 1:37,1:251,401:500 From worker 16: 1:37,1:251,1401:1500 From worker 5: 1:37,1:251,301:400 From worker 15: 1:37,1:251,1301:1400 1 51187 segmentation fault julia I've tried whether this problem also occurs when just summing up random numbers in a parallel for loop, i.e.: addprocs 10 foo parallel + for 1:100 randn end rmprocs workers However, I don't get a segfault in this case. Seems like the segmentation fault has something to do with the way I write to the DArray, or the DArray itself. The example code runs fine in Julia 0.21. I've installed Julia through Homebrew using brew install --HEAD julia --64bit. This is my Hardware setup. screen shot 2014-06-23 at 09 41 53 Any ideas? JeffBezanson The Julia Language member JeffBezanson commented on Jun 23, 2014 Why is there text output from the workers? The code doesn't seem to do any printing. jiahao The Julia Language member jiahao commented on Jun 23, 2014 I can reproduce the segfault, but not with julia-debug, lldb julia or gdb julia. felixjung felixjung commented on Jun 24, 2014 JeffBezanson Not sure. However, I just realised that the text output are the local indices myindexes of each worker in the computation I initially noticed the problem in. I don't know, if I accidentally pasted the wrong output, or if the bug is actually related to workers already having local parts of another DArray. I also just noticed, that my example code does no longer produce a segfault on my machine. Not sure why though. JeffBezanson The Julia Language member JeffBezanson commented on Jul 27, 2014 Reopen if this turns out to be reproducible. JeffBezanson JeffBezanson closed this on Jul 27, 2014 ViralBShah ViralBShah added the parallel label on Jul 27, 2014 ViralBShah The Julia Language member ViralBShah commented on Jul 27, 2014 Cc: amitmurthy