

## Arithmetic Circuits

- Number representations
- Addition, subtraction
- Performance issues
-- ripple carry
-- carry bypass
-- carry skip
-- carry lookahead
Lab \#3 due Thursday, report next Tuesday, no LPSets this week


## Encoding numbers

It is straightforward to encode positive integers as a sequence of bits.
Each bit is assigned a weight. Ordered from right to left, these weights are increasing powers of 2 . The value of an $n$-bit number encoded in this fashion is given by the following formula:

Oftentimes we will
find it convenient to cluster groups of bits together for a more compact notation. Two popular groupings are clusters of 3 bits and 4 bits.


## Binary Representation of Numbers

## How to represent negative numbers?

- Three common schemes:
- sign-magnitude, ones complement, twos complement
- Sign-magnitude: $M S B=0$ for positive, 1 for negative
- Range: - $\left(2^{\mathrm{N}-1}-1\right)$ to $+\left(2^{\mathrm{N}-1}-1\right)$
- Two representations for zero: 0000... \& 1000...
- Simple multiplication but complicated addition/subtraction
- Ones complement: if $N$ is positive then its negative is $\bar{N}$
- Example: $0111=7,1000=-7$
- Range: -( $\left.2^{\mathrm{N}-1}-1\right)$ to +(2 $\left.2^{\mathrm{N}-1}-1\right)$
- Two representations for zero: 0000... \& 1111...
- Subtraction is addition followed by ones complement


## Representing negative integers

To keep our arithmetic circuits simple, we'd like to find a representation for negative numbers so that we can use a single operation (binary addition) when we wish to find the sum of two integers, independent of whether they are positive are negative.

We certainly want $A+(-A)=0$. Consider the following 8-bit binary addition where we only keep 8 bits of the result:

$$
\begin{array}{r}
11111111 \\
+\quad 00000001 \\
\hline 00000000
\end{array}
$$

which implies that the 8 -bit representation of -1 is 11111111 . More generally


## Signed integers: 2's complement



8-bit 2 's complement example:

$$
11010110=-2^{7}+2^{6}+2^{4}+2^{2}+2^{1}=-128+64+16+4+2=-42
$$

If we use a two's complement representation for signed integers, the same binary addition mod $2^{n}$ procedure will work for adding positive and negative numbers (don't need separate subtraction rules). The same procedure will also handle unsigned numbers!

By moving the implicit location of "decimal" point, we can represent fractions too:

$$
1101.0110=-2^{3}+2^{2}+2^{0}+2^{-2}+2^{-3}=-8+4+1+0.25+0.125=-2.625
$$

## Sign extension

Consider the 8-bit 2's complement representation of:

$$
\begin{aligned}
42=00101010-5 & =\sim 00000101+1 \\
& =11111010+1 \\
& =11111011
\end{aligned}
$$

What is their 16-bit 2's complement representation?

$$
\begin{aligned}
& 42=0000000000101010
\end{aligned}
$$

## Adder: a circuit that does addition

Here's an example of binary addition as one might do it by "hand":

Adding two N -bit

we can quickly build a circuit two add two 4-bit numbers...


## "Full Adder" building block

The "half adder"

circuit has only the $A$ and $B$ inputs


| A | B | C | S | CO |
| :---: | :---: | :---: | :---: | :---: |
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 |

$S=A \oplus B \oplus C$
$C O=\bar{A} B C+A \bar{B} C+A B \bar{C}+A B C$
$=(\bar{A}+A) B C+\overline{(B}+B) A C+A B \bar{C}+C)$
$=B C+A C+A B$

## Subtraction: $A-B=A+(-B)$

Using 2's complement representation: $-B=\sim B+1$

$$
\sim=\text { bit-wise complement }
$$



So let's build an arithmetic unit that does both addition and subtraction. Operation selected by control input:


## Condition Codes

Besides the sum, one often wants four other bits of information from an arithmetic unit:

Z (zero): result is $=0$
$N$ (negative): result is < 0
$C$ (carry): indicates an add in the most significant position produced a carry, e.g., $1111+0001 \quad$ from last FA
$\checkmark$ (overflow): indicates that the answer has too many bits to be represented correctly by the result width, e.g., $0111+0111$

$$
\begin{aligned}
& V=A_{N^{-1}} B_{N^{-1}} \overline{S_{N^{-1}}}+\overline{A_{N^{-}}} \overline{B_{N^{-}}} S_{N^{-1}} \\
& V=\operatorname{COUT}_{N-1} \oplus \text { CIN }_{N-1}
\end{aligned}
$$

To compare $A$ and $B$, perform $A-B$ and use condition codes:

Signed comparison:

| LT | $\mathrm{N} \oplus \mathrm{V}$ |
| :--- | :--- |
| LE | $\mathrm{Z}+(\mathrm{N} \oplus \mathrm{V})$ |
| EQ | Z |
| NE | $\sim \mathrm{Z}$ |
| GE | $\sim(\mathrm{N} \oplus \mathrm{V})$ |
| GT | $\sim(\mathrm{Z}+(\mathrm{N} \oplus \mathrm{V}))$ |

Unsigned comparison:
LTU ~C

LEU ~C+Z
GEU C
GTU $\sim(\sim \mathrm{C}+\mathrm{Z})$

## Condition Codes in Verilog

$Z$ (zero): result is $=0$
$N$ (negative): result is < 0
$C$ (carry): indicates an add in the most significant position produced a carry, e.g., $1111+0001$

V (overflow): indicates that the answer has too many bits to be represented correctly by the result $\dagger$ width, e.g., 0111 + 0111

```
wire [31:0] a,b,s;
wire z,n,v,c;
assign {c,s} = a + b;
assign z = ~\s;
assign n = s[31];
assign v = a[31]^b[31]^s[31]^c;
```

Might be better to use sum-ofproducts formula for $V$ from previous slide if using LUT implementation (only 3 variables instead of 4).

## Modular Arithmetic

The Verilog arithmetic operators (,,+- ) all produce full-precision results, e.g., adding two 8-bit numbers produces a 9 -bit result.

In many designs one chooses a "word size" (many computers use 32 or 64 bits) and all arithmetic results are truncated to that number of bits, i.e., arithmetic is performed modulo $2^{\text {word size }}$.

Using a fixed word size can lead to overflow, e.g., when the operation produces a result that's too large to fit in the word size. One can

- Avoid overflow: choose a sufficiently large word size
- Detect overflow: have the hardware remember if an operation produced an overflow - trap or check status at end
- Embrace overflow: sometimes this is exactly what you want, e.g., when doing index arithmetic for circular buffers of size $2^{\mathrm{N}}$.
- "Correct" overflow: replace result with most positive or most negative number as appropriate, aka saturating arithmetic. Good for digital signal processing.


## Speed: $\dagger_{\text {PD }}$ of Ripple-carry Adder



Worse-case path: carry propagation from LSB to MSB, e.g., when adding 11... 111 to 00... 001 .

$$
t_{P D}=(N-1)^{\star} \underbrace{t_{P D, O R}+t_{P D, A N D}}_{C I \text { to } C O})+\underbrace{t_{P D, X O R}}_{C I_{N-1} \text { to } S_{N-1}} \approx \Theta(N)
$$

$$
\mathbf{t}_{\text {adder }}=(\mathbf{N}-1) \mathbf{t}_{\text {carry }}+\mathbf{t}_{\text {sum }}
$$

$\Theta(N)$ is read "order N": means that the latency of our adder grows at worst in proportion to the number of bits in the operands.

## How about the $t_{P D}$ of this circuit?



Is the $t_{P D}$ of this circuit $=2 * t_{P D, N-B I T ~ R I P P L E}$ ?

Nope! $\dagger_{\text {PD }}$ of this circuit $=\dagger_{\text {PD,N-BIT RIPPLE }}+\dagger_{\text {PD,FA }}$ !!!

## Faster carry logic

Let's see if we can improve the speed by rewriting the equations for $C_{\text {OUT }}$ :


$$
\begin{aligned}
& C_{\text {OUT }}=A B+A C_{\text {IN }}+B C_{\text {IN }} \\
&=A B+(A+B) C_{\text {IN }} \\
&=G+P C_{\text {IN }} \\
& \text { generate propagate }
\end{aligned}
$$

```
module fa(input \(a, b, c i n\), output \(s, c o u t)\);
    wire \(g=a \& b ;\)
    wire \(p=a \wedge b ;\)
    assign s = p ^ cin;
    assign cout \(=g \mid(p \& c i n) ;\)
endmodule
```


where $G=A B$ $P=A+B$

Actually, $P$ is usually defined as $P=A^{\wedge} B$ which won't change $C_{\text {OUT }}$ but will allow us to express $S$ as a simple function:
$S=P^{\wedge} C_{\text {IN }}$

## Virtex II Adder Implementation



## Virtex II Carry Chain



1 CLB $=4$ Slices $=2,4$-bit adders 64-bit Adder: 16 CLBs



CLBs must be in same column

## Carry Bypass Adder



Key Idea: if $\left(P_{0} P_{1} P_{2} P_{3}\right)$ then $C_{0,3}=C_{i, 0}$

## 16-bit Carry Bypass Adder



> What is the worst case propagation delay for the 16-bit adder?

Assume the following for delay each gate:
$P, G$ from $A, B: 1$ delay unit
$P, G, C_{i}$ to $C_{0}$ or Sum for a $C / S: 1$ delay unit
2:1 mux delay: 1 delay unit

## Critical Path Analysis



For the second stage, is the critical path:

$$
B P 2=0 \text { or } B P 2=1 ?
$$

Message: Timing analysis is very tricky Must carefully consider data dependencies for false paths

## Carry Bypass vs Ripple Carry

Ripple Carry: $\quad \dagger_{\text {adder }}=(N-1) \dagger_{\text {carry }}+\dagger_{\text {sum }}$
Carry Bypass: $\dagger_{\text {adder }}=2(M-1) \dagger_{\text {carry }}+\dagger_{\text {sum }}+(N / M-1) \dagger_{\text {bypass }}$


## Carry Lookahead Adder (CLA)

- Recall that $\quad C_{\text {OUT }}=G+P C_{\text {IN }} \quad$ where $G=A \& B$ and $P=A^{\wedge} B$
- For adding two N -bit numbers:

$$
\begin{aligned}
& C_{N}=G_{N-1}+P_{N-1} C_{N-1} \\
&=G_{N-1}+P_{N-1} G_{N-2}+P_{N-1} P_{N-2} C_{N-2} \\
&=\underbrace{}_{\begin{array}{r}
C_{N-1}+\text { in only } 3 \text { gate delays* : } \\
\\
\end{array} \begin{array}{r}
\text { for } P / G \text { generation, } 1 \text { for ANDs, } 1 \text { for final OR } \\
\text { *assuming gates with } N \text { inputs }
\end{array}} \begin{aligned}
G_{N-2}+P_{N-1} P_{N-2} G_{N-3}+\ldots+P_{N-1} \ldots P_{0} C_{I N}
\end{aligned} \\
&
\end{aligned}
$$

- Idea: pre-compute all carry bits as $f\left(G s, P s, C_{\text {IN }}\right)$


## Carry Lookahead Circuits



## The 74182 Carry Lookahead Unit

74182 carry lookahead unit


Active low example:

$$
\begin{aligned}
\mathrm{C}_{\mathrm{n}+\mathrm{x}}= & \overline{\overline{\mathrm{G} 0} \cdot \overline{\mathrm{P} 0}+\overline{\mathrm{G} 0} \cdot \overline{\mathrm{C}_{\mathrm{n}}}} \\
= & \overline{\overline{\mathrm{G} 0} \cdot \overline{\mathrm{P0}}} \cdot \overline{\overline{\mathrm{G} 0} \cdot} \cdot \overline{\mathrm{C}_{\mathrm{n}}} \\
= & (\mathrm{G} 0+\mathrm{P} 0) \cdot\left(\mathrm{G} 0+\mathrm{C}_{\mathrm{n}}\right)=\mathrm{G} 0+\mathrm{P} 0 \mathrm{C}_{\mathrm{n}} \\
>\mathrm{C}_{4}= & \mathrm{G}_{3: 0}+\mathrm{P}_{3: 0} \mathrm{C}_{\mathrm{n}} \\
\mathrm{C}_{\mathrm{n}+\mathrm{y}}= & \mathrm{C}_{8}=\mathrm{G}_{7: 4}+\mathrm{P}_{7: 4} \mathrm{G}_{3: 0}+\mathrm{P}_{7: 4} \mathrm{P}_{3: 0} \mathrm{C}_{\mathrm{i}, 0}=\mathrm{G}_{7: 0}+\mathrm{P}_{7: 0} \mathrm{C}_{\mathrm{n}} \\
\mathrm{C}_{\mathrm{n}+\mathrm{z}}= & \mathrm{C}_{12}=\mathrm{G}_{11: 8}+\mathrm{P}_{11: 8} \mathrm{G}_{7: 4}+\mathrm{P}_{11: 8} \mathrm{P}_{7: 4} \mathrm{G}_{3: 0}+\mathrm{P}_{11: 8} \mathrm{P}_{7: 4} \mathrm{P}_{3: 0} \mathrm{C}_{\mathrm{n}} \\
& =\mathrm{G}_{11: 0}+\mathrm{P}_{1: 0} \mathrm{C}_{\mathrm{n}}
\end{aligned}
$$

## Block Generate and Propagate

$G$ and $P$ can be computed for groups of bits (instead of just for individual bits). This allows us to choose the maximum fan-in we want for our logic gates and then build a hierarchical carry chain using these equations:

$$
\begin{aligned}
& C_{\mathrm{J}+1}=G_{\mathrm{IJ}}+P_{\mathrm{IJ}} C_{\mathrm{I}} \\
& G_{\mathrm{IK}}=G_{\mathrm{J}+1, \mathrm{~K}}+\mathrm{P}_{\mathrm{J}+1, \mathrm{~K}} G_{\mathrm{IJ}} \\
& P_{\mathrm{IK}}=P_{\mathrm{IJ}} P_{\mathrm{J}+1, \mathrm{~K}}
\end{aligned}
$$


where $I<J$ and $J+1<K$


Hierarchical building block

## 8-bit CLA (P/G generation)



## 8-bit CLA (carry generation)



## 8-bit CLA (complete)



