\documentclass[12pt,twoside]{article}
\usepackage{amsmath}
\usepackage{latexsym}
\usepackage{times,url}

\newcommand{\profs}{Professors Ron Rivest and Srini Devadas}
\newcommand{\subj}{6.006}

\newlength{\toppush}
\setlength{\toppush}{2\headheight}
\addtolength{\toppush}{\headsep}

\newcommand{\htitle}[3]{\noindent\vspace*{-\toppush}\newline\parbox{6.5in}
{\textit{Introduction to Algorithms: 6.006}\hfill\name\newline
Massachusetts Institute of Technology \hfill #3\newline
\profs\hfill Handout #1\vspace*{-.5ex}\newline
\mbox{}\hrulefill\mbox{}}\vspace*{1ex}\mbox{}\newline
\begin{center}{\Large\bf #2}\end{center}}

\newcommand{\handout}[3]{\thispagestyle{empty}
 \markboth{Handout #1: #2}{Handout #1: #2}
 \pagestyle{myheadings}\htitle{#1}{#2}{#3}}

\setlength{\oddsidemargin}{0pt}
\setlength{\evensidemargin}{0pt}
\setlength{\textwidth}{6.5in}
\setlength{\topmargin}{0in}
\setlength{\textheight}{8.5in}

\newcommand{\name}{INSERT NAME HERE}


\begin{document}


\handout{4}{Problem Set 2}{September 25, 2007}
\setlength{\parindent}{0pt}

\newcommand{\solution}{
  \medskip
  {\bf Solution:}
}

\newcommand{\collaborators}[1]{
  \qquad {\bf Collaborators:} #1
}

This problem set is due {\bf October 11} at {\bf 11:59PM}. 

Solutions should be turned in through the course website in PS or PDF
form using LaTeX. The course website has links to a number of editors
that are useful for writing in LaTeX. 

It is recommended that you download the LaTeX source for this problem
set which includes placeholders for solutions.

\begin{enumerate}

\item Rotating Binary Search Trees
  \collaborators{LIST COLLABORATORS HERE} %Write `none' if solved alone.

  In this problem we'll explore the rotation operation on binary search
  trees. As discussed in class, this operation changes the structure of
  a binary search tree without affecting the inorder of the underlying
  nodes.  See CLR section 14.2 (page 265-7) for a description of the
  operation.
  
  \begin{enumerate}
  \item Let $l(v)$ denote the number of nodes in $v$'s left subtree.
    Let $L(T)$ be the sum of $l(v)$ over all nodes of the tree $T$. Show
    that a
    right rotation decreases $L(T)$. Deduce that it is impossible to do
    more than $O(n^2)$ consecutive right rotations in an $n$ node tree, 
    i.e., with NO left rotations mixed in.
    
    \solution INSERT ANSWER HERE
    
  \item Show that on an $n$ node left path (a tree where all children
    are left children) it is possible to do $\Omega(n^2)$ consecutive right
    rotations. (Again, no left rotations allowed.)
    
    \solution INSERT ANSWER HERE
    
  \end{enumerate}
  
\item How Good is this Hash?
  \collaborators{LIST COLLABORATORS HERE} %Write `none' if solved alone.

  The following hash functions are used to hash strings of
  characters. Describe at least one strength and one weakness of each
  of the hash functions. (Don't try to do a rigorous mathematical
  analysis of the properties of the functions.)
  
  \begin{enumerate}

  \item Direct addressing, using the first two bytes of the hash key as
    an address into a 65,536-entry hashtable.  (Assume all keys are at
    least two bytes long.)

    \solution INSERT ANSWER HERE
    
  \item Add the values of all bytes in the hash key into a single-byte
    counter, ignoring overflow.  Use the result as an index into a
    256-element hash table.

    \solution INSERT ANSWER HERE
    
  \item Start with a fixed prime number constant $p_0$ and a second
    small number $s$.  Set $p := p_0$.  For each byte $b_i$ in the hash
    key, set $p := p + (p << s) + b_i$\,, where $p << s$ means ``shift $p$
    left by $s$ bit positions''.  Mod the final value of $p$ by a prime
    number to produce the hash code.  (Just describe what could be good or
    bad about this scheme depending on the values of $p_0$ and $s$.)

    \solution INSERT ANSWER HERE
    
  \end{enumerate}

\item Longest Common Substring
  \collaborators{LIST COLLABORATORS HERE} %Write `none' if solved alone.
  
  Humans have 23 pairs of chromosomes, while other primates like
  chimpanzees have 24 pairs. Biologists claim that human chromosome
  \#2 is a fusion of two primate chromosomes that they call 2a and
  2b. We wish to verify this claim by locating long nucleotide chains
  shared between the human and primate chromosomes.

  We define the \emph{longest common substring} of two strings to be
  the longest contiguous string that is a substring of both strings
  e.g. the longest common substring of DEADBEEF and EA7BEEF is
  BEEF. \footnote{Do not confuse this with the \emph{longest common
  subsequence}, in which the characters do not need to be
  contiguous. The longest common subsequence of DEADBEEF and EA7BEEF
  is EABEEF.} If there is a tie for longest common substring, we just
  want to find one of them.
  
  \begin{enumerate}
  \item Ben Bitdiddle wrote a python program to find the longest
    common substring of two strings. What is the asymptotic running time
    of his code? Assume $|s| = |t| = n$.

    \begin{verbatim}
def longest_substring(s, t):
  "Finds the longest substring that occurs in both s and t"
  best = ''
  for s_start in range(0, len(s)+1):
    for s_end in range(s_start, len(s)+1):
      for t_start in range(0, len(t)+1):
        for t_end in range(t_start, len(t)+1):
          if(s[s_start:s_end] == t[t_start:t_end]):
            current = s[s_start:s_end]
            if(len(current) > len(best)):
              best = current
  return best
    \end{verbatim}

    \solution INSERT ANSWER HERE
    
  \item Describe a simple algorithm that finds the longest common
    substring in $O(n^3)$ time.
    
    \solution INSERT ANSWER HERE
    
  \item Describe a simple algorithm that finds the longest common
    substring in $O(n^2 \log n)$ time.
    
    \solution INSERT ANSWER HERE
    
  \item Describe an algorithm that finds the longest common
    substring in $O(n \log n)$ time.
    
    \solution INSERT ANSWER HERE
    
  \item Implement your algorithm from part (d). Be sure to thoroughly
    comment your code so the course staff can read it. Some simple test
    cases are available for download on the class website.
    
    The human chromosome 2 and the chimp chromosomes 2a and 2b are quite
    large (over 100,000,000 nucleotides each) so we took the first and
    last million nucleotides of each chromosome and put them in separate
    files. These files are available on the class website, and also in
    /mit/6.006/dna/
    
    \texttt{chr2\_first\_1000000} contains the first million nucleotides
    of human chromosome 2, and \texttt{chr2a\_first\_1000000} contains the
    first million nucleotides of chimpanzee chromosome 2a. Note: these
    files contain both uppercase and lowercase letters that are used by
    biologists to distinguish between parts of the chromosomes called
    introns and extrons.

    \begin{enumerate}
    \item How long is the longest common substring of \\
      \texttt{chr2\_first\_1000000} and \texttt{chr2a\_first\_1000000}?
      
      \solution INSERT ANSWER HERE
    
    \item How long is the longest common substring of \\
      \texttt{chr2\_first\_1000000} and \texttt{chr2b\_first\_1000000}?
      
      \solution INSERT ANSWER HERE
    
    \item How long is the longest common substring of \\
      \texttt{chr2\_last\_1000000} and \texttt{chr2a\_last\_1000000}?
      
      \solution INSERT ANSWER HERE
    
    \item How long is the longest common substring of \\
      \texttt{chr2\_last\_1000000} and \texttt{chr2b\_last\_1000000}?

      \solution INSERT ANSWER HERE
    
    \end{enumerate}
  
    When you are finished, submit your code through the class website.
    
  \item {\bf Optional:} Make your algorithm run in $O(n \log k)$ time, where $k$ is the
    length of the longest common substring. 

  \item {\bf Optional:} Run your algorithm on the entire chromosomes. They
    are available in /mit/6.006/dna in compressed form.

  \end{enumerate}

\end{enumerate}

\end{document}
