__ __ __ Y = S0*S1*D0 + S0*S1*D1 + S1*D2Use a dataflow style for your code (ie, use "assign" to set the values of your signals).
module mux31_1(d0,d1,d2,s0,s1,y); input d0,d1,d2,s0,s1; output y; assign y = (~s0 & ~s1 & d0) | (s0 & ~s1 & d1) | (s1 & d2); endmodule
module mux31_2(d0,d1,d2,s0,s1,y); input d0,d1,d2,s0,s1; output y; assign y = s1 ? d2 : (s0 ? d1 : d0); endmodule
module mux31_3(d0,d1,d2,s0,s1,y);
  input d0,d1,d2,s0,s1;
  output y;
  // add reg declarations for variables assigned inside of
  // the behavioral block below
  reg y;
  always @ (d0 or d1 or d2 or s0 or s1)
  begin
    case ({s1,s0})
     2'b00: y <= d0;
     2'b01: y <= d1;
     2'b10, 2'b11: y <= d2;
    endcase
  end
endmodule
module fulladder(a,b,cin,sum,cout);
  input a,b,cin;
  output sum,cout;
  reg sum,cout;
  always @ (a or b or cin)
  begin
    sum <= a ^ b ^ cin;
    cout <= (a & b) | (a & cin) | (b & cin);
  end
endmodule
When writing behavioral implementations of combinational logic
all the signals appearing on the right-hand side of assignment
statements (ie, signals whose values are used in computations)
should appear in the sensitivity list.
    The omission of a variable from the sensitivity list doesn't usually affect what logic is synthesized for the ALWAYS block. Of course, the logic gates will propagate changes in cin to sum and cout.
What's troublesome is that the simulation results are different from the results produced by the actual logic. This isn't so bad when the simulation results are discovered to be wrong and the source code gets fixed. It's much worse when the simulation gives the desired answer, but the synthesized logic (which may have already been delivered to a customer) does the wrong thing, ie, the incorrect simulation masked a bug in logic equations. Nowadays, many synthesizers give warnings when synthesizing combinational logic using values not declared in the sensitivity list of an ALWYS block.
module parity(in,p); parameter N = 8; // width of vector IN, default value is arbitrary input [N-1:0] in; output p; assign p = ^in; // using XOR as a reduction operator endmodule
parity #(16) xxx(DATA[15:0],p); // xxx is the name of this instance
Some Verilog synthesizers (eg, MAX+plus II, the one we use) don't allow the syntax used above for setting module parameters. One can also write
parity xxx(DATA[15:0],p); defparam xxx.N=16; // in MAX+plus II, this has to follow the instantiation of xxx

module adder8_1(a,b,cin,sum,cout);
  input [7:0] a,b;
  input cin;
  output [7:0] sum;
  output cout;
  // Verilog knows that adding two N-bit numbers produces a (N+1)-bit
  // result.  By concatenating cout and sum to form a 9-bit destination
  // for the add, we'll capture that high-order result bit.
  assign {cout,sum} = a + b + cin;
endmodule
P = A ^ B (^ is the xor operator) G = A*B SUM = P ^ CIN COUT = G + P*CINThe advantage of this formulation is that P and G can be computed in parallel for all the bits of the adder since they don't depend on CIN. This will shorten the delay between CIN and COUT. Write a Verilog module for the 8-bit adder that uses the P and G to compute SUM and COUT. Again use a dataflow style for your code.
module adder8_2(a,b,cin,sum,cout);
  input [7:0] a,b;
  input cin;
  output [7:0] sum;
  output cout;
  wire [7:0] p,g,c;   // P, G and COUT for each bit
  // all the lines below involve 8-bit wide operations
  assign p = a ^ b;
  assign g = a & b;
  assign c = g | (p & {c[6:0],cin});  // ripple carry!
  assign sum = p ^ {c[6:0],cin};
  assign cout = c[7];
endmodule
module adder8_3(a,b,cin,sum,cout);
  input [7:0] a,b;
  input cin;
  output [7:0] sum;
  output cout;
  // provide reg declarations for signals we'll be assigning
  // inside of the ALWAYS block below
  reg [7:0] sum;
  reg cout;
  reg c,p,g;
  integer i;    // index for for loop
  always @ (a or b or cin)     // we're sensitive to input changes
  begin
    c = cin;                   // carry-in to first stage
    // even though we appear to be reusing the c, p and g signals
    // each time through the loop, the synthesizer will realize it
    // needs to make local copies for each loop iteration.
    for (i = 0; i < 8; i = i + 1)
      begin
        p = a[i] ^ b[i];
        g = a[i] & b[i];
        sum[i] = p ^ c;
        c = g | (p & c);       // carry-in for next stage
      end
    cout = c;                  // final carry out
  end
endmodule
We can improve tPD by reworking the carry logic to use a "lookahead" strategy -- we'll use the P and G signals for individual nodes to compute P and G signals for pairs of nodes. We can apply this strategy again, computing P and G for groups of 4 nodes, and so on. We're building a tree of these lookahead modules; each node in the tree has the same logic. Once CIN for the adder is available it is fed to the top node of the lookahead tree, where it's combined with P and G of the children to produce carry-ins for the children nodes. This process continues until carry-in arrives at the leaf nodes where it can be used to compute SUM. For an N-bit adder, the depth of the lookahead tree is log2(N). It takes two traversals of the lookahead tree to (Ps and Gs up the tree, CINs down the tree) in order to finish the computation of SUM -- a tPD that's proportional to log2(N).
The following figure (from Hennessy and Patterson) shows what we have in mind.
 
    P = A ^ B G = A * B SUM = P ^ CINWrite a Verilog implementation of the A module.
module A(a,b,cin,p,g,sum); input a,b,cin; output p,g,sum; assign p = a ^ b; assign g = a & b; assign sum = p ^ cin; endmodule
POUT = P1*P2 GOUT = G2 + P2*G1 C1 = CIN C2 = G1 + P1*CINWrite a Verilog implementation of the B module.
module B(p1,g1,p2,g2,cin,pout,gout,c1,c2);
  input p1,g1,p2,g2,cin;
  output pout,gout,c1,c2;
  // just for variety try a behavioral description
  reg pout,gout,c1,c2;
  always @ (p1 or g1 or p2 or g2 or cin)
  begin
    pout <= p1 & p2;
    gout <= g2 | (p2 & g1);
    c1 <= cin;
    c2 <= g1 | (p1 & cin);
  end
endmodule
module adder8_4(a,b,cin,sum,cout); input [7:0] a,b; input cin; output [7:0] sum; output cout; // 8 A modules wire [7:0] p,g,c; A a0(a[0],b[0],c[0],p[0],g[0],sum[0]); A a1(a[1],b[1],c[1],p[1],g[1],sum[1]); A a2(a[2],b[2],c[2],p[2],g[2],sum[2]); A a3(a[3],b[3],c[3],p[3],g[3],sum[3]); A a4(a[4],b[4],c[4],p[4],g[4],sum[4]); A a5(a[5],b[5],c[5],p[5],g[5],sum[5]); A a6(a[6],b[6],c[6],p[6],g[6],sum[6]); A a7(a[7],b[7],c[7],p[7],g[7],sum[7]); // first layer of B modules wire [3:0] pp,gg,cc; B b_0(p[0],g[0],p[1],g[1],cc[0],pp[0],gg[0],c[0],c[1]); B b_1(p[2],g[2],p[3],g[3],cc[1],pp[1],gg[1],c[2],c[3]); B b_2(p[4],g[4],p[5],g[5],cc[2],pp[2],gg[2],c[4],c[5]); B b_3(p[6],g[6],p[7],g[7],cc[3],pp[3],gg[3],c[6],c[7]); // second layer of B modules wire [1:0] ppp,ggg,ccc; B bb_0(pp[0],gg[0],pp[1],gg[1],ccc[0],ppp[0],ggg[0],cc[0],cc[1]); B bb_1(pp[2],gg[2],pp[3],gg[3],ccc[1],ppp[1],ggg[1],cc[2],cc[3]); // final layer of B modules wire pppp,gggg; B bbb(ppp[0],ggg[0],ppp[1],ggg[1],cin,pppp,gggg,ccc[0],ccc[1]); // compute final carry out assign cout = gggg + (pppp & cin); endmodule