6.046 Lecture #20 and #21 ------------------- Today we switch to the new topic of TRACTABLE vs INTRACTABLE problems. The complexity of all problems studied thus far were polynomial time. Namely, all problems had algorithms which ran in worst case time T(n) = O(n^k) for some fixed k>0 where n is as usual the size of the input. In fact, none of our running time were worse than O(n^3). We consider problems with such running time O(n^k), TRACTABLE, or efficiently solvable problems. It may seem to you that when k=100 say, it hardly constitutes an efficient solution. That may be so, but certainly a problem whose running time cannot even be bounded by O(n^k) for ANY k is INTRACTABLE. A natural question is whether all problems are TRACTABLE? The answer is No. A famous intractable problem is the HALTING problem. Given a program P and an input to it x, does P(x) ever halt? It turns out we cannot design an algorithm which on input P and x will answer this question. Not in polynomial time, not in exponential time, not in principal. Why? Goto 6.045. In this course, we will be interested in problems which are solvable in principal but may take exponential time O(2^{n^k}) for some k>0 rather than polynomial time O(n^k). We will restrict our attention to DECISION PROBLEMS. These are problems for which the output is always either YES or NO. Formally, a decision problem D is a function from the set of all strings (all possible inputs) to the set {1,0}. We shall call an input x to D a yes-input if and only if D(x) = 1, and otherwise we shall call x a no-input. Examples of Decision Problems. PATH PROBLEM ------------ INPUT: Graph G=(V,E), and vertices s,t in V. QUESTION: Is there a path from s to t in G such that the length of the path is <= |V| ( the number of vertices). OUTPUT: YES if such a path exists, and NO otherwise. Notation: PATH(G,s,t) = 1 if and only if there is a path from s to t in G where the length of the path < number of vertices in the graph. Clearly, algroithms for shortest paths we have learned recently can be used to quickly in O(m+n) answer this question where m is the number of edges. Thus, this problem is tractable. More generaly, we let P = {Decision problems D for which there exist an polynomial time algorithm A such that A(x) = YES if and only if D(x) =1} CLIQUE PROBLEM --------------- INPUT: Graph G=(V,E) and bound B. QUESTION: Is there a subset C of the vertices V such that |C| >= B and for all u,v in C, (u,v) is in E. Such a set is called a clique of size B in G. OUTPUT: Yes if such a large clique exists, and NO otherwise. Notation: CLIQUE(G,B) =1 if and only if there is a clique of size B in G. TRAVELING SALESMAN PROBLEM(TSP) -------------------------- INPUT: Weighted graph G=(V,E), and bound B. QUESTION: Does there exist a tour through the graph that visits all vertices exactly once, starts and ends at the same vertex, and is of cost <= B. OUTPUT: YES if such a cheap tour exists, and NO otherwise. Notation: TSP(G,B) = 1 if and only if there is a tour of cost <= B. 3SAT ---- INPUT: Formula of n Boolean variables f(x1,...xn) which is in the form of a conjuction of clauses each being a disjunction of 3 of the variables each appearing either in a negate or non-negated form. e.g. f(x1,x2,x3) = (x1 OR not x2 OR x3) (x1 OR x2 OR not x3) QUESTION: Is there a Boolean assignment to x1,...,xn that would make f(x1,...,xn) = TRUE (this is called a satisfying assignment). OUTPUT: If a satisifying assignment exists YES, else NO. Notation: 3SAT(f) = 1 if and only if there is a satisfying assignment to f. Whereas PATH was easy to solve, we know of no efficient polynomial-time solution for CLIQUE, TSP, and 3SAT. In fact (looking ahead) CLIQUE, TSP, and 3SAT are all NP-complete problems. NP-complete problems are a collection of problems from as varied areas as graph theory, logic, number theory, algebra, and combinatorices which all share the following properties: -The best known algorithms to solve any known NP-complete problem take O(2^{n^k}) time for some fixed k>0. -If a polynomial time algorithm will be found for any NP-complete problem, then all NP-complete problems could be solved in polynomial time. The prevailing belief after 30 years of work, is that NP-complete problems do not have polynomial time algorithms, and thus are not in P. So, why do we study them in this course which is dedicated to the design of algorithms? Good question...A few answers... 1. It is good to be able to recognize an NP-complete problem when you encounter one. This way you wont waste time trying to design an polynomial time algorithm for it. 2. We shall learn about desiging approximation algorithm for NP-complete problems which will run in polynomial time. An approximation algorithm will give an answer which is an approximation of the true answer. This does not make much sense for decision problems but will make sense when we consider the search problem versions of the decision problems (see below). 3. Change your problem formulation to put it in P rather NP-complete. For example, restrict the set of inputs for which the problem should be solved. For example, for TSP, a changed fomulation restricts the input graphs to planar graphs. Unfortunately, planar-TSP is still NP-complete but we shall see nice approximation algorithms for it which run in polynomial time. 4. Fame and ``Greed''. This question of whether NP-complete problems are in P or not is the most famous problem in computer science. There is nice prize to whomever resolves it. See www.claymath.org/prizeproblem (also contains exposition of this problem). 5. This is the only theory course you are REQUIRED to take, although I encourage you to take more. You must know about the issue of classifying computational problems by their complexity as an educated computer scientist. Note that so far we have not formally defined NP-complete problems. To do this, we shall first define NP and the notion of `reductions'. Intuitively, we say that a decision problem D is in NP if (regardless whether it is easy or hard to solve) it has the following property: For every yes-input x to D, there is a polynomial size piece of evidence_x which can be checked in polynomial time that indeed x is a yes-input. This `evidence' (sometimes called a `certificate' or `witness' of `proof') may be very hard to come up with, but is easy to check. For example, CLIQUE is in NP. Why? Let G=(V,E). When CLIQUE(G,B) = 1, the subset S of V which is a clique of size greater or equal to B is itself the `evidence' that CLIQUE(G,B)= 1. The `evidence' S is of size <= |V| and can be checked in O(|V|^2) time. TSP is in NP. Why? Let G=(V,E). When TSP(G,B) =1, the ordertour of vertices by which the tour of cost B proceeds, is the `evidence' that TSP(G,B)=1. The tour is of size |V| and the cost of the tour can be computed in O(|V|) time. 3SAT is in NP. Why? When 3SAT(f) = 1, the assignment to the variables x1,...,xn which makes f true is the `evidence' that 3SAT(f) =1. The size of the assignment is n and evaluating the formula is linear in its size. Formally, NP = { Decision problems D such that there exists A and polynomials p,q such that for |x|=n, D(x) = 1 if and only if there exists y such that |y| {0,1}^* such that A(x) = 1 if and only if B(R(x))=1. CLAIM: if A