\documentclass[12pt,twoside]{article}
\usepackage{amsmath}
\usepackage{latexsym}
\usepackage{times,url}
\newcommand{\profs}{Professors Ron Rivest and Srini Devadas}
\newcommand{\subj}{6.006}
\newlength{\toppush}
\setlength{\toppush}{2\headheight}
\addtolength{\toppush}{\headsep}
\newcommand{\htitle}[3]{\noindent\vspace*{-\toppush}\newline\parbox{6.5in}
{\textit{Introduction to Algorithms: 6.006}\hfill\name\newline
Massachusetts Institute of Technology \hfill #3\newline
\profs\hfill Handout #1\vspace*{-.5ex}\newline
\mbox{}\hrulefill\mbox{}}\vspace*{1ex}\mbox{}\newline
\begin{center}{\Large\bf #2}\end{center}}
\newcommand{\handout}[3]{\thispagestyle{empty}
\markboth{Handout #1: #2}{Handout #1: #2}
\pagestyle{myheadings}\htitle{#1}{#2}{#3}}
\setlength{\oddsidemargin}{0pt}
\setlength{\evensidemargin}{0pt}
\setlength{\textwidth}{6.5in}
\setlength{\topmargin}{0in}
\setlength{\textheight}{8.5in}
\newcommand{\name}{INSERT NAME HERE}
\begin{document}
\handout{4}{Problem Set 2}{September 25, 2007}
\setlength{\parindent}{0pt}
\newcommand{\solution}{
\medskip
{\bf Solution:}
}
\newcommand{\collaborators}[1]{
\qquad {\bf Collaborators:} #1
}
This problem set is due {\bf October 11} at {\bf 11:59PM}.
Solutions should be turned in through the course website in PS or PDF
form using LaTeX. The course website has links to a number of editors
that are useful for writing in LaTeX.
It is recommended that you download the LaTeX source for this problem
set which includes placeholders for solutions.
\begin{enumerate}
\item Rotating Binary Search Trees
\collaborators{LIST COLLABORATORS HERE} %Write `none' if solved alone.
In this problem we'll explore the rotation operation on binary search
trees. As discussed in class, this operation changes the structure of
a binary search tree without affecting the inorder of the underlying
nodes. See CLR section 14.2 (page 265-7) for a description of the
operation.
\begin{enumerate}
\item Let $l(v)$ denote the number of nodes in $v$'s left subtree.
Let $L(T)$ be the sum of $l(v)$ over all nodes of the tree $T$. Show
that a
right rotation decreases $L(T)$. Deduce that it is impossible to do
more than $O(n^2)$ consecutive right rotations in an $n$ node tree,
i.e., with NO left rotations mixed in.
\solution INSERT ANSWER HERE
\item Show that on an $n$ node left path (a tree where all children
are left children) it is possible to do $\Omega(n^2)$ consecutive right
rotations. (Again, no left rotations allowed.)
\solution INSERT ANSWER HERE
\end{enumerate}
\item How Good is this Hash?
\collaborators{LIST COLLABORATORS HERE} %Write `none' if solved alone.
The following hash functions are used to hash strings of
characters. Describe at least one strength and one weakness of each
of the hash functions. (Don't try to do a rigorous mathematical
analysis of the properties of the functions.)
\begin{enumerate}
\item Direct addressing, using the first two bytes of the hash key as
an address into a 65,536-entry hashtable. (Assume all keys are at
least two bytes long.)
\solution INSERT ANSWER HERE
\item Add the values of all bytes in the hash key into a single-byte
counter, ignoring overflow. Use the result as an index into a
256-element hash table.
\solution INSERT ANSWER HERE
\item Start with a fixed prime number constant $p_0$ and a second
small number $s$. Set $p := p_0$. For each byte $b_i$ in the hash
key, set $p := p + (p << s) + b_i$\,, where $p << s$ means ``shift $p$
left by $s$ bit positions''. Mod the final value of $p$ by a prime
number to produce the hash code. (Just describe what could be good or
bad about this scheme depending on the values of $p_0$ and $s$.)
\solution INSERT ANSWER HERE
\end{enumerate}
\item Longest Common Substring
\collaborators{LIST COLLABORATORS HERE} %Write `none' if solved alone.
Humans have 23 pairs of chromosomes, while other primates like
chimpanzees have 24 pairs. Biologists claim that human chromosome
\#2 is a fusion of two primate chromosomes that they call 2a and
2b. We wish to verify this claim by locating long nucleotide chains
shared between the human and primate chromosomes.
We define the \emph{longest common substring} of two strings to be
the longest contiguous string that is a substring of both strings
e.g. the longest common substring of DEADBEEF and EA7BEEF is
BEEF. \footnote{Do not confuse this with the \emph{longest common
subsequence}, in which the characters do not need to be
contiguous. The longest common subsequence of DEADBEEF and EA7BEEF
is EABEEF.} If there is a tie for longest common substring, we just
want to find one of them.
\begin{enumerate}
\item Ben Bitdiddle wrote a python program to find the longest
common substring of two strings. What is the asymptotic running time
of his code? Assume $|s| = |t| = n$.
\begin{verbatim}
def longest_substring(s, t):
"Finds the longest substring that occurs in both s and t"
best = ''
for s_start in range(0, len(s)+1):
for s_end in range(s_start, len(s)+1):
for t_start in range(0, len(t)+1):
for t_end in range(t_start, len(t)+1):
if(s[s_start:s_end] == t[t_start:t_end]):
current = s[s_start:s_end]
if(len(current) > len(best)):
best = current
return best
\end{verbatim}
\solution INSERT ANSWER HERE
\item Describe a simple algorithm that finds the longest common
substring in $O(n^3)$ time.
\solution INSERT ANSWER HERE
\item Describe a simple algorithm that finds the longest common
substring in $O(n^2 \log n)$ time.
\solution INSERT ANSWER HERE
\item Describe an algorithm that finds the longest common
substring in $O(n \log n)$ time.
\solution INSERT ANSWER HERE
\item Implement your algorithm from part (d). Be sure to thoroughly
comment your code so the course staff can read it. Some simple test
cases are available for download on the class website.
The human chromosome 2 and the chimp chromosomes 2a and 2b are quite
large (over 100,000,000 nucleotides each) so we took the first and
last million nucleotides of each chromosome and put them in separate
files. These files are available on the class website, and also in
/mit/6.006/dna/
\texttt{chr2\_first\_1000000} contains the first million nucleotides
of human chromosome 2, and \texttt{chr2a\_first\_1000000} contains the
first million nucleotides of chimpanzee chromosome 2a. Note: these
files contain both uppercase and lowercase letters that are used by
biologists to distinguish between parts of the chromosomes called
introns and extrons.
\begin{enumerate}
\item How long is the longest common substring of \\
\texttt{chr2\_first\_1000000} and \texttt{chr2a\_first\_1000000}?
\solution INSERT ANSWER HERE
\item How long is the longest common substring of \\
\texttt{chr2\_first\_1000000} and \texttt{chr2b\_first\_1000000}?
\solution INSERT ANSWER HERE
\item How long is the longest common substring of \\
\texttt{chr2\_last\_1000000} and \texttt{chr2a\_last\_1000000}?
\solution INSERT ANSWER HERE
\item How long is the longest common substring of \\
\texttt{chr2\_last\_1000000} and \texttt{chr2b\_last\_1000000}?
\solution INSERT ANSWER HERE
\end{enumerate}
When you are finished, submit your code through the class website.
\item {\bf Optional:} Make your algorithm run in $O(n \log k)$ time, where $k$ is the
length of the longest common substring.
\item {\bf Optional:} Run your algorithm on the entire chromosomes. They
are available in /mit/6.006/dna in compressed form.
\end{enumerate}
\end{enumerate}
\end{document}