6.888 Parallel and Heterogeneous Computer Architecture

Each meeting consists of both a short lecture (20-40 minutes) and a class discussion of the assigned readings. We will meet once a week during the last weeks on the course to leave more time for the project. This syllabus is still subject to minor changes.

Part 1: Parallel Architectures and Programming Models

Date	Topic	Readings	Notes
Wed Feb 6	Introduction and course overview	None
Mon Feb 11	Instruction, data and thread-level parallelism in modern multicores	The task of the referee, Smith, TC90 Roofline: An insightful visual performance model for multicore architectures, Williams et al., CACM09 Niagara: A 32-way multithreaded SPARC processor, Kongetira et al., Micro05 Additional: The Landscape of Parallel Computing Research, Limits of Instruction-Level Parallelism, The MIPS R10000 Superscalar Processor
Wed Feb 13	Challenges	Is Dark Silicon Useful?, Taylor, DAC12 Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?, Chung et al., MICRO10 Dark Silicon and the End of Multicore Scaling, Esmaeilzadeh et al., ISCA11 Additional: Amdahl's Law in the multicore era, Advancing systems without technology progress
Mon Feb 18	President's Day - No meeting
Wed Feb 20	Evaluating parallel systems: Principles, tools, and experiment design	Memory system characterization of commercial workloads, Barroso et al., ISCA98 Emulation track: RAMP Blue FPGA-accelerated simulation track: HASim Software simulation track: gem5 Additional: A Characterization of Processor Performance in the VAX-11/780, IPC Considered Harmful for Multiprocessor Workloads, CACTI; Emulation: ProtoFlex, Leon; FPGA-accelerated: FAST, RAMP Gold; Software sim: ASim, Graphite, WWT	HW1 posted
Mon Feb 25	Communication models: Shared memory and message passing	The SGI Origin: A ccNUMA Highly Scalable Server, Laudon et al., ISCA97 Websearch for a planet: The google cluster architecture, Barroso et al., Micro03 Additional: Hydra, SeaMicro SM10000-64
Wed Feb 27, Mon Mar 4	High-level parallel programming models	Task-parallel track: Cilk, X10 Data-parallel track: CUDA, MapReduce Pipeline-parallel track: StreamIt, CnC Implicit/domain-specific track: Delite, MATCH

Part 2: Communication, Synchronization, and the Memory Hierarchy

Date	Topic	Readings	Notes
Wed Mar 6	Cache coherence	Cohesion, Tagless Additional: Token Coherence, Virtual Tree Coherence, Atomic Coherence, SCD, Relaxed Scoreboard
Mon Mar 11	Consistency models	Shared Memory Consistency Models: A Tutorial, BulkSC Background: Is SC + ILP = RC? Additional: InvisiFence, Denovo, Radish	HW1 due
Wed Mar 13	Advanced multicore caching	R-NUCA, SHiP Additional: NUCA, Managing Wire Delay, ASR, Cooperative Caching, DSR; UCP, Vantage; DIP, TA-DIP, RRIP; Feedback-directed prefetching, Friendly fire
Mon Mar 18	Main memory	TCM, PCM Additional: Virtual Write Queue, Fundamental Latency Trade-offs in Architecting DRAM Caches, Fairness via Source Throttling	Project proposal due
Wed Mar 20	Fine-grain communication and synchronization	Synchronization and Communication in the T3E Multiprocessor, Scaling to the End of Silicon with EDGE Architectures Additional: UDM, ECMon, RAW, TRIPS Evaluation, Mapping Dataflow
Mon Mar 25	Spring break - No meeting
Wed Mar 27	Spring break - No meeting
Mon Apr 1	Thread-Level Speculation and Transactional Memory	LogTM: Log-based Transactional Memory, Tradeoffs in Transactional Memory Virtualization Additional: Speculative Lock Elision, Speculative Synchronization, TCC, Bulk, ScalableBulk, TSX

Part 3: Specialized and Heterogeneous Computing

Date	Topic	Readings	Notes
Wed Apr 3	Introduction to heterogeneous computing	Understanding sources of inefficiency in general-purpose chips
Mon Apr 8	Vector processors and GPUs	GPUs: Warp scheduling/RF, Dynamic Warp Formation Background: How GPUs work Additional: ViRAM, Vector-Thread architecture, Maven
Wed Apr 10	Specialized compute units	QsCores Additional: GreenDroid, Conservation Cores, Single-ISA heterogeneous architectures
Wed Apr 17	Fine-grain reconfigurable computing: FPGAs	BORPH, Latency-insensitive multi-FPGA design Background: Virtex-5 Additional: Co-RAM, Tabula	Project progress report due
Mon Apr 22	No meeting
Wed Apr 24	No meeting
Mon Apr 29	Coarse-grain reconfigurable computing	DySER, Triggered Instructions Additional: Garp, Chimæra
Wed May 1	Domain-specific and single-purpose architectures	Anton Additional: Sonic Millip3De

Part 4: Cross-cutting Issues and Project Presentations

Date	Topic	Readings	Notes
Mon May 6	Reliability	Razor, Redundant multithreading alternatives
Wed May 8	VLSI trends: 3D integration, post-CMOS, nanophotonics	Corona Benchmarking Beyond-CMOS Devices Additional: Emerging Memories
Mon May 13	Conference-style Project Presentations - Part 1
Wed May 15	Conference-style Project Presentations - Part 2		Project final report due