Approximation Algorithms
6.854 Notes #20

David Karger

Approximation Algorithms

What do you do when a problem is NP-complete?

or, when the “polynomial time solution” is impractically slow?
consider special cases
assume input is random, do “expected performance.” Eg, Hamiltonian path in all graphs. Problem: agreeing on good distribution.
run a nonpolynomial (hopefully only slightly) algorithms such as branch and bound. Usually no proven bound on runtime, but sometime can.
settle for heuristic, but prove it does well enough (our focus)

Definitions:

optimization problem, instances \(I\), solutions \(S(I)\) with values \(f:S(I) \rightarrow R\)
maximization/minimization: find solution in \(S(I)\) maximizing/minimizing \(f\)
called OPT\((I)\)
eg bin-packing. instance is set of \(s_i \in {0,1}\), partition so no subset exceeds 1

Techincal assumptions we’ll often make:

assumption: all inputs and range of \(f\) are integers/rationals (can’t represent reals, and allows, eg, LP, binary search).
assumption \(f(\sigma)\) is a polynomial size (num bits) number (else output takes too long)
look for polytime in bit complexity

NP-hardness

optimization NP-hard if can reduce an NP-hard decision problem to it
(eg, problem of “is opt solution to this instance \(\le k\)?”)
but use more general notion of turing-reducibility (GJ).

Approximation algorithm:

any algorithm that gives a feasible answer
eg, each item in own bin.
of course, want good algorithm. How measure?

Absolute Approximations

Definition: \(k\)-abs approx if on any \(I\), have \(|A(I)-OPT(I)| \le k\)

Example: planar graph coloring.

NP-complete to decide if 3 colorable
know 4-colorable
easy to 5-color
Ditto for edge coloring: Vizing’s theorem says opt is \(\Delta\) or (constructive) \(\Delta+1\)

Known only for trivial cases, where opt is bounded by a constant.

Often, can show impossible by “scaling” the problem.

EG knapsack.
- define profits \(p_i\), sizes \(s_i\), sack \(B\)
- wlog, integers.
- suppose have \(k\)-absolute
- multiply all \(p_i\) by \(k+1\), solve, scale down.
EG independent set (clique)
- \(k+1\) copies of \(G\)

Relative Approximation

Definitions:

An \(\alpha\)-optimum solution has value at most \(\alpha\) times optimum for minimization, at least \(1/\alpha\) times optimum for maximization.
an algorithm has approximation ratio \(\alpha\) if on any input, it outputs an \(\alpha\)-approximate feasible solution.
called an \(\alpha\)-approximation algorithm

How do we prove algorithms have relative approximations?

Can’t describe opt, so can’t compare to it
instead, comparison to computable lower bounds.

Greedy Algorithms

Do obvious thing at each step.

Hard part is proving it works.
Usually, by attention to right upper/lower bound.

Max cut

Upper bound trivial
Max-cut greedy.

Set cover

\(n\) items
\({\mbox{OPT}}=k\)
At each step, can still cover remainder with \(k\) sets
So can cover \(1/k\) of remainder

Vertex cover:

define problem
suppose repeatedly pick any uncovered edge and cover: \(O(log n)\) from set cover
suppose pick uncovered edge and cover both sides: 2-approx
sometime, need to be extra greedy
Explicit attention to where lower bound is coming from—lower bound informs algorithm.

Graham’s rule for \(P||C_{\max}\) is a \(2-\frac1m\) approximation algorithm

explain problem: \(m\) machines, \(n\) jobs with proc times \(p_j\), min proc time.
can also think of minimizing max load of continuously running jobs
use a greedy algorithm to solve
proof by comparison to lower bounds
first lower bound: average load: \({\mbox{OPT}}\ge \frac1m\sum p_j\)
second lower bound: \({\mbox{OPT}}\ge \max p_j\)
consider max-load machine
load before adding was less than average load, so less than OPT
then add one job, length less than OPT
so final weight is at most 2OPT
Tighter: suppose added last-finishing/tight \(p_j\) to machine with load \(L\)
Then average load is at least \(L+p_j/m \le {\mbox{OPT}}\)
We achieve \[\begin{aligned} L+p_j &= (L+p_j/m) + p_j(1-1/m)\\ &\le {\mbox{OPT}}+(1-1/m){\mbox{OPT}}\\ &=(2-1/m){\mbox{OPT}}\end{aligned}\]

Notice:

this algorithm is online, competitive ratio \(2-\frac1m\)
we have no idea of optimum schedule; just used lower bounds.
we used a greedy strategy
tight bound: consider \(m(m-1)\) size-1 jobs, one size-\(m\) job
where was problem? Last job might be big
LPT achieves 4/3, but not online

0.1 Scheduling Theory

General notation: machines \(\mid\) constraints/features \(\mid\) objective

\(m\) machines \(i\) (and \(n\) jobs \(j\))
- \(1\): \(1\) machine
- \(P\): \(m\) identical machines
- \(Q\): \(m\) machines of different speeds, but all related
- \(R\): \(m\) unrelated machines, may have completely different runtimes on same jobs
- \(F\): Flow shop (job is specific sequence of tasks on many machines)
- \(O\): Open shop (job is specific set of tasks on many machines)
Objectives
- \(\sum C_j\) average (total) completion time
- \(\max C_j\) time to complete whole set
- \(\sum U_j\) number of “late” jobs (post deadline)
Constraints
- \(p_j=1\) unit time tasks
- \(r_j\) release dates before which jobs cannot run
- \(d_j\) deadlines to complete
- pmtn jobs can be preempted/interrupted
- prec precedence constraints on jobs

Approximation Schemes

So far, we’ve seen various constant-factor approximations.

What is best constant achievable?
defer APX-hardness discussion until later

An approximation scheme is a family of algorithms \(A_\epsilon\) such that

each algorithm polytime
\(A_\epsilon\) achieve \(1+\epsilon\) approx

But note: runtime might be awful as function of \(\epsilon\)

FPAS, Pseudopolynomial algorithms

Knapsack example.

Dynamic program for bounded integer profits

\(B(j,p)=\) smallest (total size) subset of jobs \(1,\ldots,j\) of total profit \(\ge p\).
because if can achieve it, can achieve it at minimum size
and now we have sub-problem optimality for dynamic program
\(B(j+1,p) = \min \left( B(j,p),\quad s_{j+1}+B(j,p-p_{j+1}) \right)\)
For each \(p\), solve all \(n\) values of \(j\)
Gives runtime \(nP_{{\mbox{OPT}}}\)

did this prove \(P=NP\)?

No
recall pseudopoly algorithms \(O(mU)\)
contrast to weakly polynomial \(O(m\log U)\)

rounding

Let opt be \(P\).
Scale prices to \(\lfloor (n/\epsilon P) p_i \rfloor\)
New opt is it least \(n/\epsilon-n = (1-\epsilon)n/\epsilon\)
so seek solution of this value
Table size/runtime \(n^2/\epsilon\) (\(n\) items, range \(n/\epsilon\))
Gives solution within \(1-\epsilon\) of original opt

Wait, how do we know \(P\)?

simple option: use a lower bound on \(P\), namely \(\max p_i\)
- then opt scales to \((n/\epsilon \max p_i)(\sum p_i) < n^2/\epsilon\)
- just slows algorithm to \(n^3/\epsilon\)
Faster: suppose we guess \(P>{\mbox{OPT}}/(1-\epsilon)\)
- OPT scales to at most \((n(1-\epsilon)/\epsilon P){\mbox{OPT}}< n/\epsilon-n\)
- scaled solution of desired value \(n/\epsilon-n\) doesn’t exist
- so won’t find
Suppose we guess \(P<{\mbox{OPT}}\)?
- OPT will scale to at least \(n/\epsilon-n\)
- so will find solution of this value
- possibly much less than \({\mbox{OPT}}\), but that’s ok
Of two cases, outcome tells us whether we guessed \(P\) too high or low
- so, can use binary search to find \(P\)
- note once we are close—\(P<OPT<P/(1-\epsilon)\)—unclear which outcome happens
- but that’s OK, because we are close
- quickly get within \((1-\epsilon)\) of \(P\)
- adds another \((1-\epsilon)\) factor to approximation ratio
- but that gives \((1-\epsilon)^2 \approx 1-2\epsilon\)
- so just changes the constant.
wait, need initial low and high values?
- lower bound? \(\max p_i\)
- upper bound? \(\sum p_i\)
- ratio of \(n\)
- so, look at powers of \((1+\epsilon)\) in that range
- \(\log_{1+\epsilon}n = (\log n)/\epsilon\) of them
- gives a solution within \((1+\epsilon)\)

pseudopoly gives FPAS; converse almost true

Knapsack is only weakly NP-hard
strong NP-hardness (define) means no pseudo-poly

From FPAS to pseudo poly:

Suppose input instance has integers bounded by \(t\)
Solution value is \(O(nt)\)
Find \(\epsilon\)-approx with \(\epsilon=1/(nt+1)\)
Solution will be integer, so equal optimum.

Enumeration

More powerful idea: \(k\)-enumeration

Return to \(P || C_{\max}\)
Schedule \(k\) largest jobs optimally
scheduling remainder greedily
analysis: claim \(A(I) \le OPT(I)+p_{k+1}\)
- Consider job with max \(c_j\)
- If one of \(k\) largest, done and at opt
- Else, was assigned to min load machine, so \(c_j \le OPT+p_j \le OPT+p_{k+1}\)
- so done if \(p_{k+1}\) small
- but note \(OPT(I) \ge (k/m)p_{k+1}\)
- deduce \(A(I) \le (1+m/k)OPT(I)\).
- So, for fixed \(m\), can get any desired approximation ratio
- i.e., PAS

Scheduling any number of machines

As with Knapsack, focus on decision problem
Can I complete all in time \(T\)?
If can answer this, binary search will optimize \(T\)
Again, need \(T_{\min}\) and \(T_{\max}\) but these are easy

Combine enumeration and rounding

Suppose only \(k\) job sizes
- Vector of “number of each type” on a given machine—gives “machine type”
- Only \(n^k\) distinct vectors/machine types
- Similarly, only \(n^k\) problem instances
- find out which instances are feasible with \(m\) machines
- By deciding how many of each machine type.
- Use dynamic program:
  - enumerate all input instances that can be completed by \(j\) machines in time \(T\)
  - Feasible instance is sum of feasible \(j-1\) machine instance and 1-machine instance (machine type)
- Works because only poly many instances and types.
Use rounding to make few important job types
- Guess OPT \(T\) to within \(\epsilon\) (by binary search)
- All jobs of size exceeding \(\epsilon T\) are “large”
- Round each up to next power of \((1+\epsilon)\) in range \(\epsilon T \ldots T\)
- Only \(\log_{1+\epsilon}1/\epsilon = O(1/\epsilon \ln 1/\epsilon)\) large types
- Solve optimally
- Greedy schedule remainder
  - If last job to finish is large, are optimal for rounded problem so within \(\epsilon\) of opt
  - If last job small, greedy analysis showes we are within \(\epsilon\) of opt.

Hardness

Is there always a PAS?
MAX-SNP hard: some unbeatable constant
Numerous problems in class (vertex cover, independent set, etc)
Amplifications can prove no constant.

Relaxations

So far we’ve seen two techniques:

Greedy: do the obvious
Dynamic Programming: try everything

Can we be more clever?

TSP

Requiring tour: no approximation (finding hamiltonian path NP-hard)
Undirected Metric: MST relaxation 2-approx, christofides
recent progress has gotten below 3/2
Directed: Cycle cover relaxation (HW) gives \(O(\log n)\)
recent progress: \(O(\log n / \log\log n)\)
Also, Euclidean TSP and planar graph TSP has PAS

intuition: SPT for \(1||\sum C_j\)

suppose process longer before shorter
then swap improves
note haven’t shown local opt implies global opt
rather, have relied on global opt being local opt

\(1 | r_j | \sum C_j\)

NP-hard
relaxation: allow preemption
optimum: SRPT
- assume no \(r_j\): claim SPT optimal
- proof: local optimality argument
- consider swapping two pieces of two jobs
- suppose currently processing \(k\) instead of SRPT \(j\)
- remaining times \(p_j\) and \(p_k\)
- total \(p_j+p_k\) time
- use first \(p_j\) to process \(j\), then do \(k\)
- some job must finish at \(p_j+p_k\)
- and no job can finish before \(p_j\)
- so this is locally optimal (for these 2 jobs)
rounding: schedule jobs in preemptive completion order
- take preemptive schedule and insert \(p_j\) time at \(C_j\)
- now room to schedule nonpreemptively
- how much does this slow down \(j\)?
- \(p_k\) space inserted before \(j\) for every job completing before \(j\) in preemptive schedule
- in other words, only inserted \(C_j\) time
- so \(j\) completes in \(2C_j\) time
- so 2-approx
More recently: rounding, enumeration gives PAS

LP relaxations

Three steps

write integer linear program
relax
round

0.2 Vertex cover

\[\begin{aligned} &\min \sum x_i\\ x_i+x_j &\ge 1& (i,j)\in E\\ x_i&\in{0,1}\end{aligned}\]

Rounding

At least one end of edge \(\ge 1/2\)
round up to 1
others to 0
each vertex at most doubles
so 2-approx
this is tight (Nemhauser-Trotter)

Even weighted (unlike greedy).

Approximation hardness:

7/6 Hastad 2001
\(10\sqrt{5} -21 \approx 1.3606\ldots\) Dinur Safra 2002
\(\sqrt{2}\) Khot Minzer Safra 2017
\(2\) if unique games conjecture is true
\(2-O(1/\sqrt{\log n})\) achieved (and useful).

Facility Location

Metric version, with triangle inequality.

general version as hard as set cover.

ILP \[\begin{aligned} &\min \sum f_iy_i + \sum c_{ij}x_{ij}\\ x_{ij} &\le y_i\\ \sum_i x_{ij} &\ge 1\end{aligned}\]

Step 1: filtering

Want to assign \(j\) to one of its “partial” assignments \(x_{ij}>0\)
\(C_j = \sum_i x_{ij}c_{ij}\) is “average” assignment cost
and is amount accounted for in fractional optimum
but some \(x_{ij}>0\) may have huge \(c_{ij}\)
which wouldn’t be accounted for
rely on “can’t have everything above average”
claim at most \(1/\rho\) fraction of assignment can have \(c_{ij}>\rho C_j\)
if more, average exceeds \((1/\rho)(\rho C_j)=C_j\), impossible
so, zero out an \(x_{ij}\) with \(c_{ij} \ge \rho C_j\)
and compensate by scaling up other \(x_{ij}\) by \(1/(1-1/\rho)\) factor
Also have to increase \(y_i\) by \(1/(1-1/\rho)\) factor to “make room”
New feasible solution to LP
no longer necessarily optimal
Now, assignment of client \(j\) to any nonzero \(x_{ij}\) costs at most \(\rho C_j\)
So, total assignment cost at most \(\rho\sum C_j\)

Step 2: Facility opening: intuition

To assign, need to open facilities
If \(y_i\) small, opening facility isn’t paid for
So, find a cluster of facilities of total \(y_i > 1\)
Open minimum cost facility
Cost \(f_{\min}\le f_{\min}\sum y_i = \sum f_{\min}y_i \le \sum f_i y_i\) so LP upper bounds cost
Everything in cluster nearby, so using opened facility as “proxy” for all others without adding much cost

Step 2: Facility opening: details

Choose client with minimum \(C_j\)
Take all his “available” facilities (\(c_{ij} < \rho C_j\))
Open the cheapest and zero the others
So cost at most \(\sum f_i y_i\) over \(i\) in cluster
assign every client that has nonzero \(x_{ij}\) to any node in cluster
- cost of assigning \(j'\)
- \(\le \rho C_{j'}\) to reach its nonzero \(x_{i'j'}\) in cluster
- then distance from \(i'\) to \(i\) is at most \(2\rho C_j \le 2\rho C_{j'}\) by choice of \(j'\)
- so total \(3\rho C_j\) by metric property
Now iterate

Combine:

multiplied \(y_i\) by \(1/(1-1/\rho)=\rho/(\rho-1)\)
multiplied assignment costs by \(3\rho\)
- note \(x_{ij}\) are used only to identify “feasible” facilities to connec to
- so assignment cost not affected by scaling up \(x_{ij}\)
for balance, set \(\rho=4/3\) and get 4-approx
other settings of \(\rho\) yield bicriteria approximation trading facility and connection approximation costs

Further research progress has yielded

1.5-approximation
.463-hardness result.
Other algorithms based on greedy and local search.

MAX SAT

Define.

literals
clauses
NP-complete

random setting

achieve \(1-2^{-k}\) (k is the number of literals in each clause)
very nice for large \(k\), but only \(1/2\) for \(k=1\)

LP \[\begin{aligned} \max &&\sum z_j\\ \sum_{i \in C_j^+} y_i + \sum_{i \in C_j^-} (1-y_1) &\ge &z_j\end{aligned}\]

Analysis

\(\beta_k = 1-(1-1/k)^k\). values \(1,3/4,.704,\ldots\)
Random round \(y_i\)
Lemma: \(k\)-literal clause sat w/pr at least \(\beta_k {\hat z}_j\).
proof:
- assume all positive literals.
- prob \(1-\prod(1-y_i)\)
- maximize when all \(y_i={\hat z}_j/k\).
- Show \(1-(1-{\hat z}/k)^k \ge \beta_k {\hat z}\).
- at \(z=0,1\) these two sides are equal
- in between, right hand side is linear
- first deriv of LHS is \((1-z/k)^k=1\) at \(z=0\); second deriv is \(-(1-1/k)(1-z/k)^{k-2}<0\)
- so LHS cannot cross below and then return, must always be above RHS
Result: \((1-1/e)\) approximation (convergence of \((1-1/k)^k\))
much better for small \(k\): i.e. \(1\)-approx for \(k=1\)

LP good for small clauses, random for large.

Better: try both methods.
\(n_1,n_2\) number in both methods
Show \((n_1+n_2)/2 \ge (3/4)\sum {\hat z}_j\)
\(n_1 \ge \sum_{C_j \in S^k} (1-2^{-k}){\hat z}_j\)
\(n_2 \ge \sum \beta_k {\hat z}_j\)
\(n_1+n_2 \ge \sum (1-2^{-k}+\beta_k) {\hat z}_j \ge \sum \frac32{\hat z}_j\)