# 6.111 Lecture 13

#### Today: <u>Arithmetic: Multiplication</u>

Simple multiplication
 Twos complement mult.
 Speed: CSA & Pipelining
 Booth recoding



#### **5.Behavioral transformations:** Fixed-coef. mult., Canonical Signed Digits, Retiming

Acknowledgements:

- R. Katz, "Contemporary Logic Design", Addison Wesley Publishing Company, Reading, MA, 1993. (Chapter 5)
- J. Rabaey, A. Chandrakasan, B. Nikolic, "Digital Integrated Circuits: A Design Perspective" Prentice Hall, 2003.
- Kevin Atkinson, Alice Wang, Rex Min



Easy part: forming partial products (just an AND gate since  $B_I$  is either 0 or 1) Hard part: adding M N-bit partial products

#### Sequential Multiplier

Assume the multiplicand (A) has N bits and the multiplier (B) has M bits. If we only want to invest in a single N-bit adder, we can build a sequential circuit that processes a single partial product at a time and then cycle the circuit M times:



#### **Combinational Multiplier**



# 2's Complement Multiplication (Baugh-Wooley)

Step 1: two's complement operands so high order bit is  $-2^{N-1}$ . Must sign extend partial products and subtract the last one

|   |        |              |              | •            | X3<br>* Y3   |                      | X1<br>Y1 | X0<br>Y0 |
|---|--------|--------------|--------------|--------------|--------------|----------------------|----------|----------|
| + |        | X3Y1<br>X3Y2 | X3Y1<br>X3Y2 | X3Y1<br>X2Y2 | X2Y1<br>X1Y2 | X2Y0<br>X1Y1<br>X0Y2 |          | X0Y0     |
|   | <br>Z7 | Z6           | <br>Z5       | z4           | z3           | <br>Z2               | z1       | z0       |

Step 2: don't want all those extra additions, so add a carefully chosen constant, remembering to subtract it at the end. Convert subtraction into add of (complement + 1).



Step 3: add the ones to the partial products and propagate the carries. All the sign extension bits go away!

|   |      |              |              | <b>X3Y</b> 0 | X2Y0 | X1Y0 | X0Y0 |
|---|------|--------------|--------------|--------------|------|------|------|
| + |      |              | <b>X3Y1</b>  | X2Y1         | X1Y1 | X0Y1 |      |
| + |      | <b>X2Y2</b>  | X1Y2         | X0Y2         |      |      |      |
| + | X3Y3 | <u>x2</u> y3 | <u>x1</u> y3 | <u>x0</u> y3 |      |      |      |
| + |      |              |              | 1            |      |      |      |
| - | 1    | 1            | 1            | 1            |      |      |      |

Step 4: finish computing the constants...

|   |      |              |              | <u>x3</u> 20 | X2Y0 | X1Y0 | X0Y0 |
|---|------|--------------|--------------|--------------|------|------|------|
| + |      |              | <u>x3</u> 1  | X2Y1         | X1Y1 | X0Y1 |      |
| + |      | <u>x2</u> 22 | X1Y2         | X0Y2         |      |      |      |
| + | X3Y3 | <u>x2</u> y3 | <u>x1</u> y3 | <u>x0</u> y3 |      |      |      |
| + | 1    |              | 1            |              |      |      |      |

**Result: multiplying 2's complement operands** takes just about same amount of hardware as multiplying unsigned operands!

6.111 Fall 2006

#### 2's Complement Multiplication



## **Multiplication in Verilog**

You can use the "\*" operator to multiply two numbers:

```
wire [9:0] a,b;
wire [19:0] result = a*b; // unsigned multiplication!
```

If you want Verilog to treat your operands as signed two's complement numbers, add the keyword signed to your wire or reg declaration:

```
wire signed [9:0] a,b;
wire signed [19:0] result = a*b; // signed multiplication!
```

Remember: unlike addition and subtraction, you need different circuitry if your multiplication operands are signed vs. unsigned. Same is true of the >>> (arithmetic right shift) operator. To get signed operations all operands must be signed.

To make a signed constant: 10'sh37C

#### Multipliers in the Virtex II

The Virtex FGPA has hardware multiplier circuits:



Combinatorial and Registered Multiplier Primitives

Note that the operands are signed 18-bit numbers.

The ISE tools will often use these hardware multipliers when you use the "\*" operator in Verilog. Or can you instantiate them directly yourself:

```
wire signed [17:0] a,b;
wire signed [35:0] result;
```

```
MULT18X18 mymult(.A(a),.B(b),.P(result));
```

#### 3. Faster Multipliers: Carry-Save Adder



6.111 Fall 2006

## Increasing Throughput: Pipelining



Throughput = 1 result per clock cycle (period is now  $4*t_{PD,FA}$  instead of  $8*t_{PD,FA}$ )

## Wallace Tree Multiplier

This is called a 3:2 counter by multiplier hackers: counts number of 1's on the 3 inputs, outputs 2bit result.

#### Wallace Tree: Combine groups of three bits at a time

Higher fan-in adders can be used to further reduce delays for large M.

> 4:2 compressors and 5:3 counters are popular building blocks.

CSA



 $O(\log_{1.5}M)$ 

#### 4. Booth Recoding: Higher-radix mult.

Idea: If we could use, say, 2 bits of the multiplier in generating each partial product we would halve the number of columns and halve the latency of the multiplier!



## Booth recoding



## 5. Behavioral Transformations

- There are a large number of implementations of the same functionality
- These implementations present a different point in the area-time-power design space
- Behavioral transformations allow exploring the design space a high-level

#### **Optimization metrics:**

- 1. Area of the design
- 2. Throughput or sample time  $T_s$
- 3. Latency: clock cycles between the input and associated output change
- 4. Power consumption
- 5. Energy of executing a task



6. ...

#### **Fixed-Coefficient Multiplication**





# Transform: Canonical Signed Digits (CSD)

Canonical signed digit representation is used to increase the number of zeros. It uses digits {-1, 0, 1} instead of only {0, 1}.



Worst case CSD has 50% non zero bits



#### Algebraic Transformations



**Transforms for Efficient Resource Utilization** 



## Retiming: A very useful transform

Retiming is the action of moving delay around in the systems
Delays have to be moved from ALL inputs to ALL outputs or vice versa



Cutset retiming: A cutset intersects the edges, such that this would result in two disjoint partitions of these edges being cut. To retime, delays are moved from the ingoing to the outgoing edges or vice versa.





Retiming Synchronous Circuitry Charles E. Leiserson and James B. Saxe

#### 6.111 Fall 2006

#### Pipelining, Just Another Transformation (Pipelining = Adding Delays + Retiming)



## The Power of Transforms: Lookahead



6.111 Fall 2006

#### Booth recoding:

6.111 Fall 2006

- O(N) delay

Simple multiplication:

Faster multipliers:

– Add using 2 bits at a time

- Behavioral Transformations:
  - Faster circuits using pipelining and algebraic properties

- Wallace Tree O(log N)

- Twos complement easily handled (Baugh-Wooley)





