Groups 64 of 99+ julia-users › pmap - intermingled output from workers on v0.4 5 posts by 2 authors Greg Plowman 11/23/15 Has output from parallel workers changed in Julia v0.4 from v0.3? I guess that running parallel processes might lead to intermingled output. However, I have (more or less) the same parallel simulation code using pmap running on v0.3 and v0.4. On v0.3 the output from workers is always orderly. On v0.4 it's often intermingled between workers. But moreover, the output sometimes seems delayed, as if it's being buffered and not being flushed straight away. Is there a way I can get the output fro workers written immediately? Greg Plowman 11/23/15 I should add this problem is only when using remote workers. (In my case ssh on Windows). The following code produces intermingled output with multiple workers on multiple machines (Julia v0.4) Output is orderly when using Julia v0.3, or with v0.4 when workers are on local machine only. function Launch() @everywhere function sim(trial, numIterations) println("Starting trial $trial") s = 0.0 for i = 1:numIterations s += sum(sqrt(rand(10^6))) end println("Finished trial $trial") s end numTrials = 100 numIterations = 100 println("Running random simulation: $numTrials trials of $numIterations iterations ... ") results = pmap(sim, 1:numTrials, fill(numIterations, numTrials)) end bernhard 11/24/15 In my view it is natural, that the order of the "output" (print statements) is intermingled, as the code runs in parallel. To my knowledge this was the same in 0.3 . Is it possible that you had no workers at all? (I.e. nprocs() evaluates to 1). Also, I cannot see any noticable delay... - show quoted text - Greg Plowman 11/25/15 Thanks for your reply. In my view it is natural, that the order of the "output" (print statements) is intermingled, as the code runs in parallel. Yes, I agree. But I'd like to make sure we're talking about the same level of intermingledness (is this a new word?) Firstly I don't really understand parallel processing, output streams, switching etc. But when I first starting using Julia for parallel sims (Julia v0.3) I was initially surprised that output from each worker was NOT intermingled, in the sense that each print statement from a worker was delivered to the master process console "atomically". i.e. there were discreet lines on the console each wholly from a single worker. Sure, the order of the lines depended on the speed of the processor, the amount of work to do etc. After a while, I just assumed this was either magic, or there was some kind of queuing system with locking or similar. In any case, I didn't really think about it until I started using Julia v0.4 where output lines are sometimes not discrete and sometimes delayed. Here's an example of output: ... From worker 3: Completed random trial 69 From worker 3: Starting random trial 86 with 1000000 games From worker 5: Starting random trial 87 with 1000000 games From worker 2: Completed random trial 70 From worker 2: Starting random trial 88 with 1000000 games From worker 27: Starting random trial 89 with 1000000 games From worker 21: Completed random trial From worker 22: Starting random trial 90 with 1000000 games From worker 23: Starting random trial 93 with 1000000 games From worker 21: 81 From worker 19: Starting random trial 91 with 1000000 games From worker 14: Starting random trial 96 with 1000000 games From worker 4: Completed random trial 82 From worker 4: Starting random trial 98 with 1000000 games From worker 24: Completed random trial From worker 26: Completed random trial 76 From worker 25: Completed random trial 80 From worker 24: 85 From worker 22: Completed random trial 90 From worker 3: Completed random trial 86 From worker 8: Completed random trial From worker 9: Starting random trial 94 with 1000000 games From worker 8: 78 From worker 3: Starting random trial 99 with 1000000 games From worker 27: Completed random trial From worker 29: Starting random trial 92 with 1000000 games From worker 28: Starting random trial 95 with 1000000 games From worker 27: 89 From worker 2: Completed random trial 88 From worker 2: Starting random trial 100 with 1000000 games From worker 23: Completed random trial 93 From worker 29: Completed random trial 92 From worker 28: Completed random trial 95 From worker 14: Completed random trial From worker 16: Completed random trial 72 From worker 15: Completed random trial 75 From worker 20: Completed random trial 79 From worker 17: Completed random trial 83 From worker 18: Completed random trial 84 From worker 19: Completed random trial 91 From worker 14: 96 From worker 4: Completed random trial 98 From worker 9: Completed random trial 94 From worker 3: Completed random trial 99 From worker 10: Completed random trial From worker 11: Completed random trial 65 From worker 12: Completed random trial 66 From worker 13: Completed random trial 71 From worker 10: 77 From worker 11: Starting random trial 97 with 1000000 games From worker 10: From worker 2: Completed random trial 100 From worker 5: Completed random trial From worker 6: Completed random trial 73 From worker 7: Completed random trial 74 From worker 5: 87 From worker 11: Completed random trial 97 Again I have no idea how these thing work, but here's code from Julia v0.3 (multi.jl) if isa(stream, AsyncStream) let wrker = w # redirect console output from workers to the client's stdout: @async begin while !eof(stream) line = readline(stream) print("\tFrom worker $(wrker.id):\t$line") end end end end And equivalent code from Julia v0.4: function redirect_worker_output(ident, stream) @schedule while !eof(stream) line = readline(stream) if startswith(line, "\tFrom worker ") # STDOUT's of "additional" workers started from an initial worker on a host are not available # on the master directly - they are routed via the initial worker's STDOUT. print(line) else print("\tFrom worker $(ident):\t$line") end end end It seems we've gone from @async to @schedule. Would this make a difference? Greg Plowman 11/26/15 OK, I've done a little more digging. It seems that in v0.4, remote workers are started differently. This is my understanding: Only one worker for each host is started directly from the master process. Additional workers on each host are started from the first worker on that host. Thus output from these additional workers is routed via the first worker on the host (rather than directly to master process). Somehow this causes the intermingled output. To overcome this, I can start all workers directly from the master process, and output is orderly again (as for v0.3). Presumably, the new v0.4 indirect method was to speed up adding remote workers. Clearly, I don't really understand much of this. And I'm not sure how connecting all workers directly to master process affects performance or scalability. Intuitively, it doesn't sound good, but for my purpose it does give more readable output. To help speed up the startup of workers, I can start workers on different hosts in parallel (but each worker on host is started serially and directly from master process) @sync begin for each (host, nworkers) in machines @async begin for i = 1:nworkers addprocs([(host,1)]) end end end end