Monday, April 26, 2010

Tight bounds for Clock Synchronization [Lenzen, Locher, Wattenhofer PODC '09]

A distributed system consists of autonomous nodes (usually with an individual clock) communicating through a network. Such systems need a common notion of time to run most tasks. For instance, a multi-person video game on a network needs a way to order actions on different computers that is very close to their actual order. Similar requirements might be needed of, say, a global network of sensors or for a system time-sharing a common resource (like TDMA networks). While this may be easy if all the nodes of the system shared an identical copy of a clock that always kept time perfectly, real clocks are not perfect and tend to drift. If left alone uncorrected, the skew (difference) between clocks in these nodes can become too large and intolerable.

One way to improve this situation is to let the nodes have a logical clock on top of their hardware clocks. They could communicate their logical clock values and adjust them based on the values of other nodes in the system (according to some algorithm). The main practical challenges here are that the hardware clock drift rate and the latency of the communications between nodes is variable and unknown to these nodes. At best, the nodes may be aware of some upper bounds on the hardware clock drift rate and message latencies ($\tau$). Further, an algorithm that adjusts logical clock values can be expected to have certain reasonable properties:

  1. The largest skew between any two logical clocks in the network (global skew) is small.
  2. The largest skew between the logical clocks of any pair of adjacent nodes in the network (local skew) is small.
  3. The maximum possible skew between any pair of nodes degrades smoothly as a function of the distance between the nodes (gradient property).
  4. The rate of progress of logical clock is neither too small nor too large. This is to ensure that the spacing between logical times given to events at one node compares well with their real times.

Evidently, the skew depends on the size of network, and more specifically, on the diameter $D$ of the network. It has been shown that no algorithm can achieve a better bound on global skew than $D/2$ (Assume $\tau = 1$) [Biaz, Welch '01]. More surprisingly, [Fan, Lynch '04] showed that no algorithm could achieve a local skew better than $\Omega(\log D/ \log\log D)$. While algorithms that match the bound on global skews were known, matching the local skew bound has been elusive. The paper we will be discussing [LLW-PODC'09, J.ACM'10] solves this problem, and also presents an improved lower bound for local skew.

In the discussion, we will see a more formal statement of the model. We will discuss why certain simple algorithms like setting a node's logical clock to the average or maximum of its neighbors fail. We will then see how ideas from these failed algorithms can be put together to construct a simple algorithm (the proofs, however, are a bit long) that has optimal global and local skews and a good gradient property. We will also talk about lower bounds on global and (time permitting) local skews.

Note: In case you want to read the paper, [LLW-PODC'09] contains only the statements of the results. Also, I found the two associated tech reports diffcult to follow. Their recent journal article in J.ACM-Jan'10 is much more readable and contains proofs in good detail. Locher and Wattenhofer's earlier paper in DISC'06 is a good background reading and helps you appreciate the algorithm better.

Thursday, April 22, 2010

Forgotten hoodie

If you left a dark-coloured striped hoodie in class, with yellow trim, I picked it up. You can find it in my office.

- Ryan

Tuesday, April 20, 2010

The intersection of two halfspaces has high threshold degree

Consider a Boolean function $f: \{-1,1\}^n \mapsto \{-1,+1\}$. It is well known and also easy to show that $f$ can be represented exactly by a real polynomial of degree atmost $n$. In other words, there exists a real polynomial $p(x)$, such that $p(x) = f(x), \,\, \forall x \in \{-1,1\}^n$. One could also ask the following question: how well can a function be approximated by a polynomial? There are multiple notions of approximation and the one we will look at in this talk is called sign representation. We say that a polynomial $p(x)$ sign represents a Boolean function $f$, if $sgn(p(x)) = f(x), \,\, \forall x \in \{-1,1\}^n$. The sign degree of a function $f$ is defined as the least degree of a real polynomial $p$, such that $p$ sign represents $f$.

Such questions concerning approximation of functions by polynomials have been studied by Math and CS researchers since the $1960$'s. Some of the earliest papers on these topics are by Newman and Minsky and Papert. Although interesting in its own right, this line of work has found numerous applications in various areas of CS theory such as communication complexity, quantum computing, complexity theory etc. For more details, see this excellent tutorial by Scott Aaronson. Sign representation by polynomials also has a very nice implication for machine learning. Roughly speaking, if a function has sign degree $d$, then it can be learned in time $n^{O(d)}$ under a standard model of learning. This connection has been exploited heavily in recent years to come up with learning algorithms for many non-trivial problems. Such algorithms are sometimes also called ``perceptron based algorithms''.

However, one notorious class of functions for which no efficient learning algorithm is known till date is the class of intersection of two halfspaces. A halfspace over $\{-1,1\}^n$ is a Boolean function defined as $f(x) = sgn(a_0 + \sum_i a_ix_i)$. The intersection of two halfspaces is a function $h(x,y) = f(x) \land g(y)$, where both $f$ and $g$ are halfspaces over $\{-1,1\}^n$. Clearly, by definition the sign degree of a halfspace is $1$. Minsky and Papert in 1969, showed that the sign degree of the intersection of two halfspace is $\omega(1)$. In their paper they also introduced a very important technique, called ``symmetrization'', which can be used to lower bound the sign degree of certain functions. The next non-trivial improvement came in 2003, when O'Donnell and Servedio showed that the sign degree of the intersection of two halfspaces is $\Omega(\frac {\log n} {\log \log n})$. Although a massive improvement, this still does not rule out sub-exponential time ``perceptron based'' algorithms for learning the intersection of two halfspaces.

Last year in an award winning paper presented at FOCS, Alexander Sherstov showed that this bound is $\Omega(\sqrt{n})$. Also, in a very recent follow-up work he has managed to show that the sign degree of the intersection of two halfspaces is in fact $\Theta(n)$. Hence, perceptron based approaches cannot be used to get any non-trivial algorithm for this class of functions. This in turn means that some new ideas will be required to make progress on the problem of learning the intersection of two halfspaces. On Thursday, we will prove Sherstov's result from FOCS'09 which shows that the sign degree of the intersection of two halfspaces is $\Omega(\sqrt{n})$. The talk will be self-contained.

Tuesday, April 13, 2010

This week, I will present the paper "Regularity lemmas and combinatorial algorithms" by N. Bansal and R. Williams. This introduces a new approach for combinatorial algorithms for fundamental problems exploiting regularity concepts from graph theory and additive combinatorics.
Boolean matrix multiplication (BMM) is a key primitive used in shortest path problems, transitive closures and CFG parsing. For nxn boolean matrices A and B, the problem is to compute the matrix product D over the 'boolean semiring': D[i,j] = OR_{1<=k<=n} (A[i,k] AND B[k,j]). It is easy to see that this problem easily reduces to the "Algebraic matrix multiplication" (AMM) (the standard matrix product over the reals, for instance); in fact, this provides the best known algorithms for BMM! Beginning with Strassen's algorithm (O*(n^{2.8}) time, suppressing a polylog factor), AMM has attracted much theoretical attention, since we now have the additional power of subtraction(cancellation). The state of the art using this approach is a O*(n^{2.38}) algorithm by Coppersmith and Winograd (1987). However, there is an alternative line of attack based on "combinatorial algorithms" treating boolean matrices as graphs; many of them are based on the "Four Russians" algorithm. This approach has yielded only modest results: the best running time for dense matrices is O(n^3/log^2 n).
In this talk, I will present a new combinatorial algorithm that asympototically beats the Four Russians. For this, we will need a strong hammer from graph theory: regularity (and triangle removal) concepts. More precisely, the algorithm will be based on the (algorithmic versions of) Szemeredi's regularity theorem, the triangle removal lemma and Frieze and Kannan's pseudoregularity (weak regularity) theorem. I will first explain these theorems. Then, I will present two algorithms for BMM. The first is essentially a natural reduction to the triangle removal lemma. The second algorithm builds on the first one, but exploits weak regularity instead. We will achieve a running time of nearly O*(n^3/ (log n)^{2.25}) (suppressing a poly(log log n) factor), which is the currently best known for any combinatorial algorithm. Finally, I will present a subcubic algorithm for the related problem of detecting if a given graph has any triangles, again using weak regularity. (All these algorithms will be randomized, and will work w.h.p.) The main open questions in this approach are to derandomize the above algorithms, and to obtain a truly subcubic time algorithm.
This talk will be mostly self-contained. (You may wish to review the details of some basic models of computation (pointer machine model and the word RAM), which are relevant for the problem, but, due to time constraints, I suspect I will mostly gloss over the model-dependent aspects of the algorithm.)

Tuesday, April 6, 2010

Poly-logarithmic independence fools $AC^0$ circuits

In the coming Thursday seminar, we will go through the paper Poly-logarithmic independence fools $AC^0$ circuits.

An $AC^0$ circuit is a Boolean circuit containing NOT, AND, OR gates. We count the number of AND, OR gates as the \emph{size} of the circuit. While the \emph{depth} of a circuit is defined as the maximum number of AND, OR gates on any path from an input to the output.

We say a distribution $\mu$ fools a circuit $f$, if $|\mathbf{E}_{x \sim \mu}[f(x)] - \mathbf{E}_{x}[f(x)]| < \epsilon$. I.e., the expected output of $f$ on $\mu$ is ``roughly'' the same as that on uniform distribution.

The problem considered in this paper is on the power of $r$-independence to fool $AC^0$ circuits. Given $\epsilon$ as a constant, how large has $r$ to be so that any $r$-wise independent distribution fools all the $AC^0$ circuit of size $m$ and a constant depth $d$? The main theorem in the paper shows that $r$ is enough to be poly-logarithmic with $m$. In fact, the author proves that $r = (\log (m / \epsilon))^{O(d^2)}$ suffices. And this resolves a conjecture by [Linial-Nisan'90], which conjectured $r = O((\log m))^{d - 1}$ is enough, up to a gap between $O(d)$ and $O(d^2)$ in the exponent. The only prior progress on this problem was by [Bazzi'07], which showed that $O(\log^2 m)$-wise independent distributions fool depth-2 circuits (DNF/CNF).

The (very rough) approach to prove is to find a low degree polynomial $g$ which approximates the expectation of $f$ under both uniform distribution and a given $r$-wise independent distribution. This $g$ would set up a bridge from $\mathbf{E}_{\mu}[f]$ to $\mathbf{E}[f]$ by $\mathbf{E}[f] \approx \mathbf{E}[g] \approx \mathbf{E}_{\mu}[g] \approx \mathbf{E}_{\mu}[f]$. But since this $g$ seems hard to find (maybe it's hard because it needs to be specific for a specific distribution), the proof in the paper actually approximates $f$ by low-degree functions $g$ in the following two senses.
(1) [Beigel-Reingold-Spielman'91, Tarui'93] Find low-degree (but unbounded) $g$ so that $\mathbf{Pr}[f \neq g] < \epsilon, \mathbf{Pr} _{\mu}[f \neq g] < \epsilon$.
(2) [Linial-Mansour-Nisan'93] Find low-degree $g$ so that $\| f - g \|_2^2 < \epsilon$.

The proof succeeds by the two results above as building blocks and some other clever observations, all of which we will go through in the class.

The proof of [Linial-Mansour-Nisan'93] theorem also uses Fourier analytic tools. But no background on these will be needed -- I will introduce them in the class.