6.046 Lecture #20 and #21
-------------------
Today we switch to the new topic of TRACTABLE vs INTRACTABLE problems.
The complexity of all problems studied thus far were polynomial time.
Namely, all problems had algorithms which ran in worst case time T(n)
= O(n^k) for some fixed k>0 where n is as usual the size of the
input. In fact, none of our running time were worse than O(n^3). We
consider problems with such running time O(n^k), TRACTABLE, or
efficiently solvable problems.
It may seem to you that when k=100 say, it hardly constitutes an
efficient solution. That may be so, but certainly a problem whose
running time cannot even be bounded by O(n^k) for ANY k is
INTRACTABLE.
A natural question is whether all problems are TRACTABLE? The answer
is No. A famous intractable problem is the HALTING problem. Given a
program P and an input to it x, does P(x) ever halt? It turns out we
cannot design an algorithm which on input P and x will answer this
question. Not in polynomial time, not in exponential time, not in
principal. Why? Goto 6.045.
In this course, we will be interested in problems which are solvable
in principal but may take exponential time O(2^{n^k}) for some k>0
rather than polynomial time O(n^k).
We will restrict our attention to DECISION PROBLEMS. These are
problems for which the output is always either YES or NO.
Formally, a decision problem D is a function from the set of all
strings (all possible inputs) to the set {1,0}. We shall call an
input x to D a yes-input if and only if D(x) = 1, and otherwise we
shall call x a no-input.
Examples of Decision Problems.
PATH PROBLEM
------------
INPUT: Graph G=(V,E), and vertices s,t in V.
QUESTION: Is there a path from s to t in G such that the length of the
path is <= |V| ( the number of vertices).
OUTPUT: YES if such a path exists, and NO otherwise.
Notation: PATH(G,s,t) = 1 if and only if there is a path from s to t
in G where the length of the path < number of vertices in the graph.
Clearly, algroithms for shortest paths we have learned recently can be
used to quickly in O(m+n) answer this question where m is the number
of edges. Thus, this problem is tractable.
More generaly, we let
P = {Decision problems D for which there exist an polynomial time
algorithm A such that A(x) = YES if and only if D(x) =1}
CLIQUE PROBLEM
---------------
INPUT: Graph G=(V,E) and bound B.
QUESTION: Is there a subset C of the vertices V such that |C| >= B and
for all u,v in C, (u,v) is in E. Such a set is called a clique of size
B in G.
OUTPUT: Yes if such a large clique exists, and NO otherwise.
Notation: CLIQUE(G,B) =1 if and only if there is a clique of size B in
G.
TRAVELING SALESMAN PROBLEM(TSP)
--------------------------
INPUT: Weighted graph G=(V,E), and bound B.
QUESTION: Does there exist a tour through the graph that visits all
vertices exactly once, starts and ends at the same vertex, and is of
cost <= B.
OUTPUT: YES if such a cheap tour exists, and NO otherwise.
Notation: TSP(G,B) = 1 if and only if there is a tour of cost <= B.
3SAT
----
INPUT: Formula of n Boolean variables f(x1,...xn) which is in the form
of a conjuction of clauses each being a disjunction of 3 of the
variables each appearing either in a negate or non-negated form.
e.g. f(x1,x2,x3) = (x1 OR not x2 OR x3) (x1 OR x2 OR not x3)
QUESTION: Is there a Boolean assignment to x1,...,xn that would make
f(x1,...,xn) = TRUE (this is called a satisfying assignment).
OUTPUT: If a satisifying assignment exists YES, else NO.
Notation: 3SAT(f) = 1 if and only if there is a satisfying assignment
to f.
Whereas PATH was easy to solve, we know of no efficient
polynomial-time solution for CLIQUE, TSP, and 3SAT. In fact (looking
ahead) CLIQUE, TSP, and 3SAT are all NP-complete problems.
NP-complete problems are a collection of problems from as varied areas
as graph theory, logic, number theory, algebra, and combinatorices
which all share the following properties:
-The best known algorithms to solve any known NP-complete problem
take O(2^{n^k}) time for some fixed k>0.
-If a polynomial time algorithm will be found for any NP-complete
problem, then all NP-complete problems could be solved in
polynomial time.
The prevailing belief after 30 years of work, is that NP-complete
problems do not have polynomial time algorithms, and thus are not in
P.
So, why do we study them in this course which is dedicated to the
design of algorithms?
Good question...A few answers...
1. It is good to be able to recognize an NP-complete problem when you
encounter one. This way you wont waste time trying to design an
polynomial time algorithm for it.
2. We shall learn about desiging approximation algorithm for
NP-complete problems which will run in polynomial time. An
approximation algorithm will give an answer which is an approximation
of the true answer. This does not make much sense for decision
problems but will make sense when we consider the search problem
versions of the decision problems (see below).
3. Change your problem formulation to put it in P rather NP-complete.
For example, restrict the set of inputs for which the problem should
be solved. For example, for TSP, a changed fomulation restricts the
input graphs to planar graphs. Unfortunately, planar-TSP is still
NP-complete but we shall see nice approximation algorithms for it
which run in polynomial time.
4. Fame and ``Greed''. This question of whether NP-complete problems
are in P or not is the most famous problem in computer science. There
is nice prize to whomever resolves it. See
www.claymath.org/prizeproblem (also contains exposition of this
problem).
5. This is the only theory course you are REQUIRED to take, although I
encourage you to take more. You must know about the issue of
classifying computational problems by their complexity as an educated
computer scientist.
Note that so far we have not formally defined NP-complete problems.
To do this, we shall first define NP and the notion of `reductions'.
Intuitively, we say that a decision problem D is in NP if (regardless
whether it is easy or hard to solve) it has the following property:
For every yes-input x to D, there is a polynomial size piece of
evidence_x which can be checked in polynomial time that indeed x
is a yes-input. This `evidence' (sometimes called a `certificate'
or `witness' of `proof') may be very hard to come up with, but is
easy to check.
For example,
CLIQUE is in NP. Why? Let G=(V,E). When CLIQUE(G,B) = 1, the subset S
of V which is a clique of size greater or equal to B is itself the
`evidence' that CLIQUE(G,B)= 1. The `evidence' S is of size <= |V| and
can be checked in O(|V|^2) time.
TSP is in NP. Why? Let G=(V,E). When TSP(G,B) =1, the ordertour of
vertices by which the tour of cost B proceeds, is the `evidence' that
TSP(G,B)=1. The tour is of size |V| and the cost of the tour can be
computed in O(|V|) time.
3SAT is in NP. Why? When 3SAT(f) = 1, the assignment to the variables
x1,...,xn which makes f true is the `evidence' that 3SAT(f) =1. The
size of the assignment is n and evaluating the formula is linear in
its size.
Formally,
NP = { Decision problems D such that there exists A and polynomials
p,q such that for |x|=n, D(x) = 1 if and only if there exists y such
that |y| {0,1}^* such that A(x) = 1 if and
only if B(R(x))=1.
CLAIM: if A