Fingerprints by Polynomials
Good for fingerprinting “composable” data objects.
- check if $P(x)Q(x)=R(x)$
- $P$ and $Q$ of degree $n$ (means $R$ of degree at most $2n$)
- mult in $O(n\log n)$ using FFT
- evaluation at fixed point in $O(n)$ time
- Random test:
- $S \subseteq F$
- pick random $r \in S$
- evaluate $P(r)Q(r)-R(r)$
- suppose this poly not 0
- then degree $2n$, so at most $2n$ roots
- thus, prob (discover root) $\le 2n/|S|$
- so, eg, sufficient to pick random int in $[0,4n]$
- Note: no prime needed (but needed for $Z_p$ sometimes)
- Again, major benefit if polynomial implicitly specified.
Small problem:
- degree $n$ poly can generate huge values from small inputs.
- Solution 1:
- If poly is over $Z_p$, can do all math mod $p$
- Need $p$ exceeding coefficients, degree
- doesn't change number of roots
- $p$ need not be random---pick once in advance
- Solution 2:
- Work in $Z$, deduce nonzero value from schwartz-zippel
- deduce nonzero mod random $q$ (as in string matching)
- so do all computation mod random $q$
- $q$ range must exceed bits (not value) of coeff.
- Again, major benefit if polynomial implicitly specified.
String checksum:
- treat as degree $n$ polynomial
- eval a random $O(\log n)$ bit input,
- prob. get 0 small
Multivariate:
- $n$ variables
- degree of term: sum of vars' degrees
- total degree $d$: max degree of term.
- Schwartz-Zippel: fix $S \subseteq F$ and let each $r_i$ random in $S$ \[ \Pr[Q(r_i)=0 \mid Q \ne 0] \le d/|S| \] Note: no dependence on number of vars!
- induction. Base done.
- $Q \ne 0$. So pick some (say) $x_1$ that affects $Q$
- write $Q=\sum_{i \le k} x_1^iQ_i(x_2,\ldots,x_n)$ with $Q_k() \ne 0$ by choice of $k$
- $Q_k$ has total degree at most $d-k$
- By induction, prob $Q_k$ evals to 0 is at most $(d-k)/|S|$
- suppose it didn't. Then $q(x)=\sum x_1^i Q(r_2,\ldots,r_n)$ is a nonzero univariate poly.
- by base, prob. eval to 0 is $k/|S|$
- add: get $d/|S|$
- why can we add? $$ \begin{eqnarray*} \Pr[E_1] &= &\Pr[E_1 \cap \overline{E_2}]+\Pr[E_1 \cap E_2]\\ &\le & \Pr[E_1 \mid \overline{E_2}] + \Pr[E_2] \end{eqnarray*} $$
Perfect matching
- Define
- by max-flow techniques, get $m\sqrt{n} \le n^{2.5}$
- Edmonds matrix: variable $x_{ij}$ if edge $(u_i,v_j)$
- determinant nonzero if PM
- poly nonzero symbolically.
- so apply Schwartz-Zippel
- Degree is $n$
- So number $r\in(1,\ldots,n^2)$ yields 0 with prob. $1/n$
- Determinant can be computed with $n^{2.376...}$ ops.
Wait, det may be huge!
- Shwartz-Zippel applies in any field
- So pick work mod $p$, with $p>n^2$ to keep failure prob. small
- Note all coefficients of determinant polynomial are 1
- So determinant polynomial is still (symbolically) nonzero mod $p$
Finding the Matching
We now have a way to decide if a matching exists.
- But can we find it?
Self reducibility
- General technique for solving construction/optimization via decision
- Use repeated calls to decision to “steer towards” output solution
For perfect matching
- Remove an edge
- Check if still have perfect matching.
- If not, put it back
- When done, all that's left is perfect matching.
- time is $m$ determinants
- not great time, but really simple algorithm!
Network Coding
A communication task:
- $2n$-vertex bipartite graph with source and sink vertices
- source has $n$ messages to send to $t$
- each edge can carry one message in one time step
- one solution: send along perfect matching
- requires computing perfect matching
Coding solution
- think of messages as integers mod $p$
- $s$ sends one message to each LHS
- each LHS sends its message to all RHS neighbors
- each RHS neighber sends $t$ a random linear combination of what it got, mod $p$
- can $t$ “decode” the message?
- yes (assuming it knows the random coefficients).
- What is the linear operator mapping $n$ inputs (from $s$) to $n$ outputs (at $t$)?
- it's the Edmonds matrix of the bipartite graph, with random coefficients in it
- we proved this has nonzero determinant whp
- which means it is invertible
- so $t$ can invert the linear function to reconstruct input.
Multiple receivers
- same bipartite core
- $s$ has $k< n$ messages
- multiple $t_i$ each connected to $k$ distinct RHS vertices
- can send to any one $t_i$ over a perfect matching to that $t_i$
- but coding argument above says RLC also sends to $t_i$ ($k\times k$ transfer matrix)
- but this is true for all receivers simultaneously
- since each can receive, all can receive.
This is the key idea of network coding. It generalizes to arbitrary communication graphs.