Xml has: id route tag dir tag lat lon secsSinceReport predictable heading scrapetimetable.py: Get route data, including lat and lon for stops Get timetable eg schedule for stops findspacings.py: Actually calculate the spacing between arrivals at a stop Also calculates arrival time at each stop Needs the h5 files to already be generated Saved in data.mat analyze_all_named_stops.py: uses findspacings.py generates spacings.mat has: data_map = { 'stop_idxs': [x[0] for x in all_data], 'schedule_codes': [x[1] for x in all_data], 'chunk_idxs': [x[2] for x in all_data], 'times' : [x[3] for x in all_data], 'spacings': [x[4] for x in all_data], 'spacings_expected': [x[5] for x in all_data], } plot.py: Uses spacing.mat to generate plots Just uses stop ids and spacings stuff = scipy.io.loadmat('spacings.mat') stop_idxs = stuff['stop_idxs'].transpose() all_gaps = stuff['spacings'].transpose() dependencies: gcc hdf5 scipy tables bs4 I already have times, but I need expected times, not just expected spacings. I can get expected time for a stop if it is on the schedule, but what about stops which do not have a schedule? What to do then? plots.jl work on finding dwell time finite accel/veloc/etc refine gps locations instantaneous speed as proxy for traffic julia con intro to julia david sanders https://www.cs.ubc.ca/~murphyk/MLbook/ principal components analysis springerlink 9.520, 6.860 9.6: generalized linear mixed models