Course information


Lecture notes


Course staff

MIT Logo

6.888 Parallel and Heterogeneous Computer Architecture

Course syllabus

Each meeting consists of both a short lecture (20-40 minutes) and a class discussion of the assigned readings. We will meet once a week during the last weeks on the course to leave more time for the project. This syllabus is still subject to minor changes.

Part 1: Parallel Architectures and Programming Models
Date Topic Readings Notes
Wed Feb 6 Introduction and course overview None
Mon Feb 11 Instruction, data and thread-level parallelism in modern multicores The task of the referee, Smith, TC90
Roofline: An insightful visual performance model for multicore architectures, Williams et al., CACM09
Niagara: A 32-way multithreaded SPARC processor, Kongetira et al., Micro05
Additional: The Landscape of Parallel Computing Research, Limits of Instruction-Level Parallelism, The MIPS R10000 Superscalar Processor
Wed Feb 13 Challenges Is Dark Silicon Useful?, Taylor, DAC12
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?, Chung et al., MICRO10
Dark Silicon and the End of Multicore Scaling, Esmaeilzadeh et al., ISCA11
Additional: Amdahl's Law in the multicore era, Advancing systems without technology progress
Mon Feb 18 President's Day - No meeting
Wed Feb 20 Evaluating parallel systems: Principles, tools, and experiment design Memory system characterization of commercial workloads, Barroso et al., ISCA98
Emulation track: RAMP Blue
FPGA-accelerated simulation track: HASim
Software simulation track: gem5
Additional: A Characterization of Processor Performance in the VAX-11/780, IPC Considered Harmful for Multiprocessor Workloads, CACTI; Emulation: ProtoFlex, Leon; FPGA-accelerated: FAST, RAMP Gold; Software sim: ASim, Graphite, WWT
HW1 posted
Mon Feb 25 Communication models: Shared memory and message passing The SGI Origin: A ccNUMA Highly Scalable Server, Laudon et al., ISCA97
Websearch for a planet: The google cluster architecture, Barroso et al., Micro03
Additional: Hydra, SeaMicro SM10000-64
Wed Feb 27, Mon Mar 4 High-level parallel programming models Task-parallel track: Cilk, X10
Data-parallel track: CUDA, MapReduce
Pipeline-parallel track: StreamIt, CnC
Implicit/domain-specific track: Delite, MATCH

Part 2: Communication, Synchronization, and the Memory Hierarchy
Date Topic Readings Notes
Wed Mar 6 Cache coherence Cohesion, Tagless
Additional: Token Coherence, Virtual Tree Coherence, Atomic Coherence, SCD, Relaxed Scoreboard
Mon Mar 11 Consistency models Shared Memory Consistency Models: A Tutorial, BulkSC
Background: Is SC + ILP = RC?
Additional: InvisiFence, Denovo, Radish
HW1 due
Wed Mar 13 Advanced multicore caching R-NUCA, SHiP
Additional: NUCA, Managing Wire Delay, ASR, Cooperative Caching, DSR; UCP, Vantage; DIP, TA-DIP, RRIP; Feedback-directed prefetching, Friendly fire
Mon Mar 18 Main memory TCM, PCM
Additional: Virtual Write Queue, Fundamental Latency Trade-offs in Architecting DRAM Caches, Fairness via Source Throttling
Project proposal due
Wed Mar 20 Fine-grain communication and synchronization Synchronization and Communication in the T3E Multiprocessor, Scaling to the End of Silicon with EDGE Architectures
Additional: UDM, ECMon, RAW, TRIPS Evaluation, Mapping Dataflow
Mon Mar 25 Spring break - No meeting
Wed Mar 27 Spring break - No meeting
Mon Apr 1 Thread-Level Speculation and Transactional Memory LogTM: Log-based Transactional Memory, Tradeoffs in Transactional Memory Virtualization
Additional: Speculative Lock Elision, Speculative Synchronization, TCC, Bulk, ScalableBulk, TSX

Part 3: Specialized and Heterogeneous Computing
Date Topic Readings Notes
Wed Apr 3 Introduction to heterogeneous computing Understanding sources of inefficiency in general-purpose chips
Mon Apr 8 Vector processors and GPUs GPUs: Warp scheduling/RF, Dynamic Warp Formation
Background: How GPUs work
Additional: ViRAM, Vector-Thread architecture, Maven
Wed Apr 10 Specialized compute units QsCores
Additional: GreenDroid, Conservation Cores, Single-ISA heterogeneous architectures
Wed Apr 17 Fine-grain reconfigurable computing: FPGAs BORPH, Latency-insensitive multi-FPGA design
Background: Virtex-5
Additional: Co-RAM, Tabula
Project progress report due
Mon Apr 22 No meeting
Wed Apr 24 No meeting
Mon Apr 29 Coarse-grain reconfigurable computing DySER, Triggered Instructions
Additional: Garp, Chimæra
Wed May 1 Domain-specific and single-purpose architectures Anton
Additional: Sonic Millip3De

Part 4: Cross-cutting Issues and Project Presentations
Date Topic Readings Notes
Mon May 6 Reliability Razor, Redundant multithreading alternatives
Wed May 8 VLSI trends: 3D integration, post-CMOS, nanophotonics Corona
Benchmarking Beyond-CMOS Devices
Additional: Emerging Memories
Mon May 13 Conference-style Project Presentations - Part 1
Wed May 15 Conference-style Project Presentations - Part 2 Project final report due