Lecture 14: Algorithms for Solving LPs
LP Solving Algorithms
So far we learned a lot about the structure of LPs. But how do we actually solve them? Can we do it faster than by just enumerating all basic feasible solutions (BFS)?
 Simplex  1940s.
 Ellipsoid method  1970s.
 Interior point method  1980s.
Simplex
 Recall that we know that for an LP (in standard form) opt always corresponds to a vertex of a polytope (a BFS).
 Imagine we are given some BFS, how to check if it is optimal?
 Naive approach: check all other BFS' and compare their costs.
$\Rightarrow$ Problem: there can be $m \choose n$ of them!
 But do we really need to look at all the other BFS'?
 No! Considering only “adjacent' BFS, i.e., BFS that differ only on one tight constraint, suffices. (Suggestive picture.)
$\Rightarrow$ Need to consider only at most $m$ possibilities.
 Greedy approach: Check if the current BFS is optimal (by checking the neighbors), if not, move to any of these BFS' that has better objective value.
 Potential problems:
 Will it terminate?
 How to start the initial BFS? Isn't it as hard as optimization in the first place?
 Time?
 Would hope that you don't need to visit too many of the BFS. Not true! KleeMinty cube (a slightly twisted hypercube)  best pivot rule (improving the max possible) visits! Similar examples for other pivot choice rules (but some of them are “only” subexponential).
 We can't prove though that there is no pivot choice rule that leads to always fast simplex.
 Hirch conjecture (1957): diameter/number of pivots needed is at most $m+n$ (very optimistic!) Disproved by Santos in 2010 but only barely so: diameter $\geq (1+\eps)(m+n)$.
 Still possible that, e.g., diameter $\leq poly(m,n)$.
 Best upper bounds $m^{O(\log n)}$, $2^nm$.
 There is a linear time simplex for fixed $n$!
 Note small diameter does not guarantee fast simplex pivot rule  as some of the pivots might require increasing the value of the objective from time to time!
 In practice: works really really well (unless it doesn't). (Needs some linear algebra tricks to make pivoting fast.)
Ellipsoid Method
Consider the linear program $$ \begin{eqnarray*} &&\min \sum_j x_j \\ \sum_j a_{ij} x_j &\geq &1 \;\;\; \forall i\\ x_j &\geq &0 \;\;\; \forall j \end{eqnarray*} $$ and its dual $$ \begin{eqnarray*} &&\max \sum_i y_i \\ \sum_i a_{ij} y_i &\leq &1\\ x_{i}&\geq &0. \end{eqnarray*} $$ Assume that $A=[a_{ij}]$ is $m\times n$ and has only {\it nonnegative} entries.
In this problem, you'll have to show that a continuous algorithm solves (almost miraculously) the above pair of dual linear programs. We shall define a series of functions whose argument is the “time” and you'll show that some of these functions tend to the optimal solution as time goes to infinity. (For simplicity of notation, we drop the dependence on the time.)
 Initially, we let $s_j=0$ for $j=1,\ldots, n$ and $LB=0$. The vector $s$ will (sort of) play the role of primal solution, and $LB$ the role of a lower bound on the objective function.
 At any time, let $$t_i=e^{\sum_j a_{ij} s_j}$$ for $i=1,\ldots, m$. Also, let $d_j=\sum_i a_{ij} t_i$ for $j=1,\ldots,n$, $D=\max_j d_j$ and $k$ be an index $j$ attaining the maximum in the definition of $D$. The algorithm continuously increases $s_k$.

Let $\alpha=\min_i(\sum_j a_{ij} s_j)$. Let $x_j=s_j/\alpha$ for $j=1,\ldots,n$, $y_i=t_i/D$ for $i=1,\ldots,m$ and $LB=\max(LB,\frac{\sum t_i}{D})$. Show that $x$ is primal feasible, $y$ is dual feasible and $LB$ is a lower bound on the optimal value of both primal and dual.

Prove that $$ \sum_{i=1}^m t_i \leq m e^{\sum_{j=1}^n s_j/LB}.$$
Hint: Show that initially the inequality holds and that it is also maintained whenever we have equality.
 Deduce from (b) that $\sum_i t_i$ tends to 0 as time goes to infinity.
 Using (b), give an upper bound on the value of the primal solution $x$, and using (c), show that this upper bound tends to $LB$ as time goes to infinity. This shows that as time goes to infinity, both $x$ and $y$ tend to primal and dual optimal solutions!
Reduce optimization to feasibility checking. (Remind how it is with polytope $P$.)
Lion hunting in the desert.
 bolzano wierstrauss theoremproves certain sequence has a subsequence with a limit by repeated subdividing of intervals to get a point in the subinterval.
 The BolzanoWeierstrass method: Divide the desert by a line running from north to south. The lion is then either in the eastern or in the western part. Let's assume it is in the eastern part. Divide this part by a line running from east to west. The lion is either in the northern or in the southern part. Let's assume it is in the northern part. We can continue this process arbitrarily and thereby constructing with each step an increasingly narrow fence around the selected area. The diameter of the chosen partitions converges to zero so that the lion is caged into a fence of arbitrarily small diameter.
 Analogue of a fence: separation oracle. Given point $x$ that is not in $P$ come up with a separating hyperplane, i.e., $u$ such that $u^T x \leq 0$ but $y^Tx' > 0$ for all $x'\in P$.
 We want to use ellipsoid instead of boxes.
 Ellipsoid $E(D,z):=\{x \mid (xz)^TD^{1}(xz) \leq 1 \}$, where $D=BB^T$ for some invertible matrix $B$.
 Note: $E(r\cdot I,z)$ is just an radius $r$ sphere centered at $z$.
 In general: $E(D,z)$ is a mapping of a unit sphere via an affine change of coordinates $x\rightarrow Bx +z$.
 Goal: start with a big ellipsoid that encompasses everything (for sure).
 In each step, query the center of the ellipsoid $z$. If $z\in P$ done. Otherwise, there is a separating hyperplane $u$. Shift it so it passes through the center and draw a smaller ellipsoid that encompasses the half in which the feasible point might lie.
 How to measure progress? Keep track of the volume of the ellipsoid. Want it to shrink significantly in each step.
 Pictorial proof (displacement $\epsilon$).
 Constant factor reduction $(1\eps)$ vs. $1/\sqrt{1\epsilon^2}^n\approx 1/(1+\frac{1}{2}\epsilon^2)^{n1}$ when $\epsilon=\frac{1}{n}$.
 Is it enough? Small and big ellipsoid.
 Running time polynomial (but high)
 strongly polynomial?
 Separation oracle!
vertices in standard form/bases:

Without loss of generality make $A$ have full row rank (define):
 find basis in rows of $A$, say $a_1,\ldots,a_k$
 any other $a_\ell$ is linear combo of those.
 so $a_\ell x = \sum \lambda_i a_i x $
 so better have $b_l = \sum \lambda_i a_i$ if any solution.
 if so, anything feasible for $a_1,\ldots,a_\ell$ feasible for all.
 $m$ constraints $Ax=b$ all tight/active
 given this, need $nm$ of the $x_i \ge 0$ constraints
 also, need them to form a basis with the $a_i$.
 write matrix of tight constraints, first $m$ rows then identity matrix adjacent to zero matrix \[ \begin{array}{l} \\ \\ m\text{ constraint rows} \\ \\ \\ \\ \\ \\ n\text{ row identity} \\ \\ \\ \end{array} \left( \begin{array}{lccccccc} &\multicolumn{4}{c}{nm\text{ columns}}&\multicolumn{3}{c}{m\text{ columns}}\\ &\multicolumn{4}{c}{\overbrace{\qquad\qquad\qquad}}&\multicolumn{3}{c}{\overbrace{\qquad\qquad}}\\ \\ &\multicolumn{3}{c}{\cdots}&A&\multicolumn{3}{c}{\cdots}\\ \\ &1&\\ &&1&&&&0\\ &&&1\\ &&&&1\\ &&&&&1\\ &&0&&&&1\\ &&&&&&&1\\ \end{array} \right)x \begin{array}{c} \\ \\ = \\ \\ \\ \\ \ge \\ \\ \\ = \\ \end{array} \left( \begin{array}{c} \\ \\ \\ b \\ \\ \\ \\ 0 \\ \\ \\ 0 \\ \\ \end{array} \right) \]
 zero matrix corresponds to slack (nonzero) $x_i$
 need linearly independent rows
 equiv, need linearly independent columns
 but columns are linearly independent iff $m$ columns of $A$
including all that correspond to nonzero $x_i$ are linearly
independent
 because columns of identity matrix are clearly independent
 but columns of zero matrix depend entirely on $A$ for independence
 gives other way to define a vertex: $x$ is vertex if
 $Ax=b$
 $m$ linearly independent columns of $A$ include all $x_j \ne 0$
 $x_j$ of columns called basic set $B$, others nonbasic set $N$
 given bases, can compute $x$:
 $A_B$ is basis columns, $m \times m$ and full rank.
 solve $A_B x_B = b$, set other $x_N=0$.
 note can have many bases (column sets) for same vertex (choice of 0 $x_j$)
 note algebra is $m$dimensional, so really depends only on number of constraints not variables
Summary: $x$ is vertex of $P$ if for some basis $B$,
 $x_N=0$
 $A_B$ nonsingular
 $A_B^{1} b \ge 0$
Simplex method:
 start with a basic feasible soluion
 try to improve it
 picture: move on improving edge.
 math: work relative to current $x$
 rewrite LP: $\min c_Bx_B + c_N x_N$, $A_Bx_B+A_N x_N=b$, $x \ge 0$
 true for all $x$, not just current
 $B$ is basis for bfs
 since $A_Bx_B = bA_Nx_N$, so $x_B = A_B^{1}(bA_Nx_N)$, know that $$ \begin{eqnarray*} cx &= &c_Bx_B+c_Nx_N\\ &= & c_B A_B^{1}(bA_Nx_N) + c_N x_N\\ &= & c_B A_B^{1} b + (c_Nc_BA_B^{1}A_N)x_N\\ \end{eqnarray*} $$
 reduced cost $\tilde c_N = c_Nc_BA_B^{1}A_N$
 if no $\tilde c_j < 0$, then increasing any $x_j \in N$ increases (nondecreases) cost (may violate feasiblity for $x_B$, but who cares?), so are at optimum!
 if some $\tilde c_j < 0$, can increase $x_j$ to decrease cost
 but since $x_B$ is func of $x_N$, will have to stop when $x_B$
hits a constraint.
 this happens when some $x_i$, $i \in B$ hits 0.
 we bring $j$ into basis, take $i$ out of basis.
 we've moved to an adjacent basis.
 called a pivot
 show picture
Notes:
 Need initial vertex. How find? HW.
 maybe some $x_i \in B$ already 0, so can't increase $x_j$, just pivot to same obj value.
 could lead to cycle in pivoting, infinite loop.
 can prove exist noncycling pivots (eg, lexicographically first $j$ and $i$)
 no known pivot better than exponential time
Paths length in polytopes
 note traverse path of edges over polytope. Unknown what shortest such path is
 Hirsh conjecture (1957): path of $mn$ pivots exists.
 true for $n< 4$
 Disproved by Santos 2010: $(1+\epsilon)(mn)$
 KalaiKleitmen 1992: $m^{\log n}$ bound on path length, also $2^n m$.
 Also can do simplex in linear time for fixed $n$
 even if true, simplex might be bad because path might not be monotone in objective function.
 Upper bounds hold for highly abstracted version:
 Eisenbrand, Hahnle, Razborov, Rothvoss
 vertices are $n$subsets of $m$
 edges must ensure path from $u$ to $v$ whose vertices all contain $u \cap v$
 Even in this abstracted version, only quadratic lower bound
Simplex and Duality
 defined reduced costs of nonbasic vars $N$ by \[ {\tilde c}_N = c_Nc_BA^{1}_BA_N \] and argued that when all ${\tilde c}_N \ge 0$, had optimum.
 Define $y=c_BA^{1}_B$ (so of course $c_B=yA_B$)
 nonegative reduced costs means $c_N \ge yA_N$
 put together with $c_B=yA_B$, see $yA \le c$ so $y$ is dual feasible
 but, $yb = c_B A^{1}_B b = c_Bx_B=cx$ (since $x_N=0$)
 so $y$ is dual optimum.
 simplex finds primal and dual optima simultaneously
 more generally, $ybcx$ measures duality gap for current solution!
 another way to prove duality theorem: prove there is a terminating (non cycling) simplex algorithm.
Polynomial Time Bounds
We know a lot about structure. And we've seen how to verify optimality in polynomial time. Now turn to question: can we solve in polynomial time?
Yes, sort of (Khachiyan 1979):
 polynomial algorithms exist
 strongly polynomial unknown.
Ellipsoid
Lion hunting in the desert.
 bolzano wierstrauss theoremproves certain sequence has a subsequence with a limit by repeated subdividing of intervals to get a point in the subinterval.
 The BolzanoWeierstrass method: Divide the desert by a line running from north to south. The lion is then either in the eastern or in the western part. Let's assume it is in the eastern part. Divide this part by a line running from east to west. The lion is either in the northern or in the southern part. Let's assume it is in the northern part. We can continue this process arbitrarily and thereby constructing with each step an increasingly narrow fence around the selected area. The diameter of the chosen partitions converges to zero so that the lion is caged into a fence of arbitrarily small diameter.
Define an ellipsoid
 generalizes ellipse
 write some $D=BB^T$ “radius”
 center $z$
 point set $\set{(xz)^TD^{1}(xz) \le 1}$
 note this is just a basis change of the unit sphere $x^2 \le 1$.
 under transform $x \rightarrow Bx+z$
Outline of algorithm:
 solve feasibility, which we know is equivalent to optimizing
 goal: find a feasible point for $P=\set{Ax \le b}$
 start with ellipse containing $P$, center $z$
 check if $z \in P$
 if not, use separating hyperplane to get 1/2 of ellipse containing $P$
 find a smaller ellipse containing this 1/2 of original ellipse
 until center of ellipse is in $P$.
Consider sphere case, separating hyperplane $x_1=0$
 try center at $(\epsilon,0,0,\ldots)$
 Draw picture to see constraints
 requirements:
 $d_1^{1}(x_1\epsilon )^2+\sum_{i > 1}d_i^{1}x_i^2 \le 1$
 constraint at $(1,0,0)$: $d_1^{1}(x\epsilon )^2 = 1$ so $d_1 = (1\epsilon )^2$
 constraint at $(0,1,0)$: $\epsilon ^2/(1\epsilon )^2+d_2^{1} = 1$ so $d_2^{1} = 1\epsilon ^2/(1\epsilon )^2\approx 1\epsilon ^2$
 What is volume? about $(1\epsilon )/(1\epsilon ^2)^{n/2}$
 set $\epsilon $ about $1/n$, get $(11/n)$ volume ratio.
Formalize shrinking lemma:
 Let $E=(z,D)$ define an $n$dimensional ellipsoid
 consider separating hyperplane $ax \le az$
 Define $E'=(z',D')$ ellipsoid: $$ \begin{eqnarray*} z' &= &z\frac{1}{n+1}\frac{Da^T}{\sqrt{aDa^T}}\\ D' &= & \frac{n^2}{n^21}(D\frac{2}{n+1}\frac{Da^TaD}{aDa^T}) \end{eqnarray*} $$
 then $$ \begin{eqnarray*} E \cap \set{x \mid ax \le ez} &\subseteq &E'\\ \vol(E') &\le &e^{1/(2n+1)}\vol(E) \end{eqnarray*} $$
 for proof, first show works with $D=I$ and $z=0$. new ellipse: $$ \begin{eqnarray*} z' &=&\frac1{n+1}\\ D' &= & \frac{n^2}{n^21}(I\frac{2}{n+1} I_{11}) \end{eqnarray*} $$ and volume ratio easy to compute directly.
 for general case, transform to coordinates where $D=I$ (using new basis $B$), get new ellipse, transform back to old coordinates, get $(z',D')$ (note transformation don't affect volume ratios.
So ellipsoid shrinks. Now prove 2 things:
 needn't start infinitely large
 can't get infinitely small
Starting size:
 recall bounds on size of vertices (polynomial)
 so coords of vertices are exponential but no larger
 so can start with sphere with radius exceeding this exponential bound
 this only uses polynomial values in $D$ matrix.
 if unbounded, no vertices of $P$, will get vertex of box.
Ending size:
 suppose polytope full dimensional
 if so, it has $n+1$ affinely indpendent vertices
 all the vertices have poly size coordinates
 so they contain a box whose volume is a polysize number (computable as determinant of vertex coordinates)
Put together:
 starting volume $2^{n^{O(1)}}$
 ending volume $2^{n^{O(1)}}$
 each iteration reduces volume by $e^{1/(2n+1)}$ factor
 so $2n+1$ iters reduce by $e$
 so $n^{O(1)}$ reduce by $e^{n^{O(1)}}$
 at which point, ellipse doesn't contain $P$, contra
 must have hit a point in $P$ before.
Justifying full dimensional:
 feasible points form a “lattice” of grid points with bounded number of bits
 take $\set{Ax \le b}$, replace with $P'=\set{Ax \le b+\epsilon}$ for tiny $\epsilon$
 any point of $P$ is an interior of $P'$, so $P'$ full dimensional (only have interior for full dimensional objects)
 $P$ empty (of lattice points) iff $P'$ is (because $\epsilon$ so small)
 can “round” a point of $P'$ to $P$.
Infinite precision:
 built a new ellipsoid each time.
 maybe its bits got big?
 no.
Optimization
 Could use optimization to feasibility transform
 But an easier approach is binary search on value of optimum
 Add constraint that optimum must exceed proposed value
 Vary value
Need to find vertex?
 use rounding technique
Feasibility vs. Separation vs. Optimization
Notice in ellipsoid, were only using one constraint at a time.
 didn't matter how many there were.
 didn't need to see all of them at once.
 just needed each to be represented in polynomial size.
 so ellipsoid works, even if huge number of constraints, so long as have separation oracle: given point not in $P$, find separating hyperplane.
 of course, feasibility is same as optimize, so can optimize with sep oracle too.
 this is on a polytope by polytope basis. If can separate a particular polytope, can optimize over that polytope.
 Another good reason to optimize by binary search on objective:
 just need feasibility oracle/separator
 Might not have separator for combinared primaldual formulation
 especially since dual can have exponentially many variables!
This is very useful in many applications. e.g. network design.
Interior Point
Ellipsoid has problems in practice ($O(n^6)$ for one). So people developed a different approach that has been extremely successful.
What goes wrong with simplex?
 follows edges of polytope
 complex stucture there, run into walls, etc
 interior point algorithms stay away from the walls, where structure simpler.
 Karmarkar did the first one (1984)
 huge media hype not accurate at the time
 now interior point is competitive with Simplex and often better
 arguably known from 1960s as “barrier methods”
 but visibility from Karmarkar set off huge progress
PrimalDual Method
 Solve primal and dual together
 not by combining into one LP
 use current dual to tell you how to improve primal
 and vice versa
 (This idea has become very powerful in nonLP combinatorial optimization also)
Potential Reduction
Potential function:
 Idea: use a (nonlinear) potential function that is minimized at opt but also enforces feasibility
 use gradient descent to optimize the potential function.
 argue make “sufficient progress per step” to finish in poly steps
 Use logarithmic barrier function \[ G(x) = cx\mu(\sum\ln x_j) \] and try to minimize it
 first term aims for optimal objective
 second enforces positivity
 note barrier prevents from ever hitting optimum since $G \rightarrow \infty$ at boundary
 but as discussed above ok to just get close.
 then “round” to a better vertex, which will be opt
Early methods used gradient descent:
 immediately take $\mu$ small so objective dominates
 then use gradient descent to minimize $G_\mu(x)$
More recent, arguably cleaner approach is central path
 for each $\mu$ there is an optimum point
 varying $\mu$ traces out the central path
 Start with $\mu$ huge, objective function term negligible
 Equivalent to “start with a feasible point”
 And requires just as much thinking as simplex, ellipsoid
 but same kind of tricks work
 shrink $\mu$ towards 0
 eventually objective dominates as $\mu \rightarrow 0$
 so we reach opt
 in the continuous domain unconcerned with runtime, this “obviously” works
 we need a discrete version
Characterizing the central path
 fix $\mu$
 central path minimizes $G_\mu(x)$
 imposes condition on gradient $\nabla G(x) = \langle c_i\mu/x_i \rangle$
 but gradient may not be zero
 because we are constrained by $Ax=b$
 so real condition is that gradient is perpendicular to feasible set
 i.e. is a linear combination of rows of $A$
 (because these are the constraint hyperplanes' normal vectors)
 i.e. $yA = \langle c_i\mu/x_i \rangle$
 write $yA+s=c$ where $s_i =\mu/x_i > 0$ are slack variables
 so $y,s$ is dual feasible
 duality gap is $cxyb = (yA+s)x  yb = sx = n\mu$
 conversely, if have feasible $x_i, s_i$ with $x_is_i=\mu$, then we know we are on the central path at parameter $\mu$
 this is just saying each constraint is violated “equally” under proper scaling
How discretize?
 “predictorcorrector method”
 use linear approximation
 compute tangent to central path
 move along it as far as approximation holds (“predictor”)
 conclude $\mu$ has improved on tangent and thus on corresponding central path point
 “correct” back to central path
 each step improves $\mu$
 stop when $\mu$ “close enough” to 0
 I will show predictor step (moving towards better $\mu$)
 corrector step is similar but easier
 just moving back to an optimum of the potential for new $\mu$
Termination
 as with ellipsoid, analyze time needed to get from start to “sufficiently good” $\mu$
 once duality gap smaller than bitprecision of vertices, done
 as we saw before, can “round” to a vertex
 since are closer to opt than secondbest vertex, our starting objective is better than second best
 so we can only round to opt
 thus, sufficient to get $\mu = 2^{O(L)}$
 as with simplex, there's the coldstart problem
 using similar method (designing related LP with easy starting point and opt at real desired starting point) can achieve $\mu=2^L$
 so solving only requires $O(L)$ iterations of improving $\mu$ by a constant
What is the improvement direction?
 From current $x,s, \mu$, update to $x+dx$, $s+ds$ that are “on central path” for new, smaller $\mu$
 central path needs $(s_i+ds_i)(x_i+dx_i) = (\mu+d\mu)$ (same constant $\forall i$)
 i.e. $s_i d(x_i) + d(s_i)x_i = d\mu +$ low order terms
 but also need $A(dx)=0$ (to stay primal feasible)
 and $(dy)A+ds=0$ (to stay dual feasible)
 solve this linear system to get directions $ds$ and $dx$
 these are the tangent for central path
Can we solve this?
 yes: if rescale, can see this as a projection problem
 rescale $A$ so all $x_i=1$, meaning all $s_i = \mu$
 equations now require $$ \begin{eqnarray*} \mu dx + ds & = &\mathbf{1}d\mu\\ Adx &= &0\\ (dy)A+ds &= &0 \end{eqnarray*} $$
 here $\mathbf{1}$ is the allones vector
 in other words, $\mu dx$ in nullspace of $A$ and $ds$ in span of $A$
 but they sum to $\mathbf{1}d\mu$
 so $\mu dx$ and $ds$ are just decomposition of $\mathbf{1}d\mu$ onto $A$ and $A^{\perp}$
 note this defines directions because both $dx$ and $ds$ (and $d\mu$ and $dy$) can be multiplied by same arbitrary scalar
 move in this direction until approximation breaks
How far?
 Simplify
 recall rescaled coordinates until all $x_i=1$
 so all $s_i = \mu$
 and duality gap is $\sum s_i = n\mu$
 rescale $dx_i$ and $ds_i$ so “predicted” change in $\mu$ is $d\mu = \mu$
 i.e. linear approximation predicts each $x_is_i$ decreases by $\mu$
 predicts we close duality gap by moving $dx,ds$
 how far in that direction can we moveparameter $\theta$before approximation breaks?
 we want to be on central path, i.e. $(x_i+\theta dx_i)(s_i+\theta ds_i) = \mu+d\mu$
 actual change is $\theta (x_ids_i + s_idx_i) +\theta^2 ds_idx_i$
 our linear system says these quantities equal for all $i$ to first order (ignoring $dx_ids_i$ term)
 so until our linear approximation breaks down, all $s_ix_i$ are changing by about same amount
 i.e., we are near central path
 approximation breaks when second order term is comparable to first order
 so want $\theta^2dx_ids_i \ll \theta (x_ids_i + s_idx_i)$.
 i.e. $\theta \ll x_i/dx_i + s_i/ds_i$
 we rescaled $x_i=1$ and $s_i=\mu$, so need $\theta \ll \mu/ds_i + 1/dx_i$.
 so how big can $dx_i$ be?
 remember we solved $\mu dx + ds =$ orthogonal decomposition of $d\mu$
 and rescaled so length of $d\mu$ is $\mu$
 i.e. $\mu dx$ is projection of $\mathbf{1}\mu$, so $dx$ is projection of $\mathbf{1}$
 all one's vector has length $\sqrt{n}$
 so any $dx_i \le \sqrt{n}$
 similarly, any $ds_i \le \mu\sqrt{n}$
 so requirement that $\theta \ll \mu/ds_i+1/dx_i$ met by $\theta \ll \mu/(\mu \sqrt{n}) + 1/\sqrt{n} = O(1/\sqrt{n})$
 so we can set $\theta \approx 1/\sqrt{n}$ and have secondorder term negligible
 How much improvement?
 we said if $\theta=1$ we close duality gap to zero (in linear approximation)
 since we only move $1/\sqrt{n}$ we shave a $1/\sqrt{n}$ factor.
 but that means we reduce $\mu$ by a constant in $\sqrt{n}$ steps
 which means we finish in poly$(n)$ steps!
Interior point has recently been revolutionizing maxflow.