\[\text{ 10/20/2004 }\] \[\text{6.854 - Advanced Algorithms}\] \[\text{Professor David Karger}\] \[\text{Sergiy Sidenko}\]

Since absolute approximation algorithms are known to exist for so few optimization problems, a better class of approximation algorithms to consider are relative approximation algorithms. Because they are so commonplace, we will refer to them simply as approximation algorithms.

**An \(\alpha\)-approximation algorithm** finds a solution of value at most \(\alpha \cdot
OPT(I)\) for a minimization problem and at least \(OPT(I) / \alpha\) for a maximization problem (\(\alpha \ge 1\)).

Note that although \(\alpha\) can vary with the size of the input, we will only consider those cases in which it is a constant. To illustrate the design and analysis of \(\alpha\)-approximation algorithm, let us consider the Parallel Machine Scheduling problem, a generic form of load balancing.

- Parallel Machine Scheduling:
Given \(m\) machines \(m_i\) and \(n\) jobs with processing times \(p_j\), assign the jobs to the machines to minimize the load

\[\max\limits_{i} \sum\limits_{j \in i} p_j,\]

the time required for all machines to complete their assigned jobs. In scheduling notation, this problem is described as \(\mathrm {P \parallel C_{\max}}\).

A natural way to solve this problem is to use *greedy algorithm* called **list scheduling**.

A **list scheduling** algorithm assigns jobs to machines by assigning each job to the least loaded machine.

Note that the order in which the jobs are processed is not specified.

To analyze the performance of list scheduling, we must somehow compare its solution for each instance \(I\) (call this solution \(A(I)\)) to the optimum \(OPT(I)\). But we do not know how to obtain an analytical expression for \(OPT(I)\). Nonetheless, if we can find a meaningful lower bound \(LB(I)\) for \(OPT(I)\) and can prove that \(A(I) \le \alpha \cdot LB(I)\) for some \(\alpha\), we then have

\[\begin{array}{lcl} A(I) & \le & \alpha \cdot LB(I) \\ & \le & \alpha \cdot OPT(I). \end{array}\]

Using the idea of lower-bounding \(OPT(I)\), we can now determine the performance of list scheduling.

List scheduling is a \((2-1/m)\)-approximation algorithm for Parallel Machine Scheduling.

Consider the following two lower bounds for the optimum load \(OPT(I)\):

the maximum processing time \(p = \max_j p_j,\)

the average load \(L = \sum_j p_j/m.\)

The maximum processing time \(p\) is clearly lower bound, as the machine to which the corresponding job is assigned requires at least time \(p\) to complete its tasks. To see that the average load is a lower bound, note that if all of the machines could complete their assigned tasks in less than time \(L\), the maximum load would be less than the average, which is a contradiction. Now suppose machine \(m_i\) has the maximum runtime \(L = c_{\max}\), and let job \(j\) be the last job that was assigned to \(m_i\). At the time job \(j\) was assigned, \(m_i\) must have had the minimum load (call it \(L_i\)), since list scheduling assigns each job to the least loaded machine. Thus,

\[\begin{array}{lcl} \sum\limits_{\mbox{machine i}} p_i & \ge & m L_i + p_j \\ & \ge & m (L - p_j) + p_j \end{array}\]

Therefore,

\[\begin{array} {lcl} OPT(I) & \ge & \frac{1}{m} \left( m (L - p_j) + p_j \right) \\ & = & L - (1-1/m) p_j, \\ \mbox{then} \\ L & \le & OPT(I) + (1 - 1/m) p_j \\ & \le & OPT(I) + (1 - 1/m) OPT(I) \\ & \le & (2 - 1/m) OPT(I). \end{array}\]

The solution returned by list scheduling is \(c_{\max}\), and thus list scheduling is a \((2-1/m)\)-approximation algorithm for Parallel Machine Scheduling.

The example with \(m(m-1)\) jobs of size \(1\) and one job of size \(m\) for \(m\) machines shows that we cannot do better than \((2-1/m)OPT(I)\).

The obvious question to now ask is how good an \(\alpha\) we can obtain.

A **polynomial approximation scheme (PAS)** is a set of algorithms \(\{ A_\varepsilon \}\) for which each \(A_\varepsilon\) is a polynomial-time \((1+\varepsilon)\)-approximation algorithm.

Thus, given any \(\varepsilon >0\), a PAS provides an algorithm that achieves a \((1+\varepsilon)\)-approximation. In order to devise a PAS we can use the method called \(k\)-enumeration.

An approximation algorithm using **\(k\)-enumeration** finds an optimal solution for the \(k\) most important elements in the problem and then uses an approximate polynomial-time method to solve the reminder of the problem.

We can do the following:

Enumerate all possible assignments of the \(k\) largest jobs.

For each of these partial assignments, list schedule the remaining jobs.

Return as the solution the assignment with the minimum load.

Note that in enumerating all possible assignments of the \(k\) largest jobs, the algorithm will always find the optimal assignment for these jobs. The following claim demonstrates that this algorithm provides us with a PAS.

For any fixed \(m\), \(k\)-enumeration yields a polynomial approximation scheme for Parallel Machine Scheduling.

Let us consider the machine \(m_i\) with maximum runtime \(c_{\max}\) and the last job that \(m_i\) was assigned.

If this job is among the \(k\) largest, then it is scheduled optimally, and \(c_{\max}\) equals \(OPT(I)\).

If this job is not among the \(k\) largest, without loss of generality we may assume that it is the \((k+1)\)th largest job with processing time \(p_{k+1}\). Therefore, \[A(I) \le OPT(I) + p_{k+1}.\]

However, \(OPT(I)\) can be bound from below in the following way: \[OPT(I) \ge \frac{k p_k}{m},\]

because \(\frac{k p_k}{m}\) is the minimum average load when the largest \(k\) jobs have been scheduled.

Now we have: \[\begin{array} {lcl} A(I) & \le & OPT(I) + p_{k+1} \\ & \le & OPT(I) + OPT(I) \frac{m}{k} \\ & = & OPT(I) \left( 1+\frac{m}{k} \right). \end{array}\]

Given \(\varepsilon > 0\), if we let \(k\) equal \(m/\varepsilon\), we will get \[c_{\max} \le (1+\varepsilon) OPT(I).\]

Finally, to determine the running time of the algorithm, note that because each of the \(k\) largest jobs can be assigned to any of the \(m\) machines, there are \(m^k = m^{m/\varepsilon}\) possible assignments of these jobs. Since the list scheduling performed for each of these assignments takes \(O(n)\) time, the total running time is \(O(nm^{m/\epsilon})\), which is polynomial because \(m\) is fixed. Thus, given an \(\varepsilon >0\), the algorithm is a \((1+\varepsilon)\)-approximation, and so we have a polynomial approximation scheme.

Consider the PAS in the previous section for \({\rm P \parallel C_{\max}}\). The running time for the algorithm is prohibitive even for moderate values of \(\varepsilon\). The next level of improvement, therefore, would be approximation algorithms that run in time polynomial in \(1/ \varepsilon\), leading to the definition below.

**Fully Polynomial Approximation Scheme (FPAS)** is a set of approximation algorithms such that each algorithm \(A(\varepsilon)\) in this set runs in time that is polynomial in the size of the input, as well as in \(1/ \varepsilon\).

There are few \(NP\)-complete problems that allow for FPAS. Below we discuss FPAS for the Knapsack problem.

The Knapsack problem receives as input an instance \(I\) of \(n\) items with profits \(p_i\), sizes \(s_i\) and knapsack size (or capacity) \(B\). The output of the Knapsack problem is the subset \(S\) of items of total size at most \(B\), and that has profit:

\[\max \sum\limits_{i \in S} p_i.\]

Suppose now that the profits are integers; then we can write a \(DP\) algorithm based on the minimum size subset with profit \(p\) for items \(1, \, 2, \, \ldots, \, r\) as follows:

\[M(r,p) = \min \left\{ M(r-1,p), M(r-1, p-p_r) + s_r \right\}.\]

The corresponding table of values can be filled in \(O\left(n \sum_i p_i \right)\) (note that this is not FPAS in itself).

Now, we consider the general case where the profits are not assumed to be integers. Once again, we use a rounding technique but one that can be considered a generic approach for developing FPAS for other \(NP\)-complete problems that allows for FPAS. Suppose we multiplied all profits \(p_i\) by \(n/\varepsilon \cdot OPT\); then the new optimal objective value is apparently \(n/\varepsilon\). Now, we can round the profits down to the nearest integer, and hence the optimal objective value decreases at most by \(n\); expressed differently, the decrease in objective value is at most \(\varepsilon \cdot OPT\). Using the \(DP\) algorithm above, we can therefore find the optimal solution to the rounded problem in \(O(n^2/ \varepsilon)\), thus providing us with FPAS in \(1/\varepsilon\).