Carnegie Mellon 15-859U: Theory's Greatest Hits, 2009

Tight bounds for Clock Synchronization [Lenzen, Locher, Wattenhofer PODC '09]

2010-04-26T09:17:00.010-04:00

A distributed system consists of autonomous nodes (usually with an individual clock) communicating through a network. Such systems need a common notion of time to run most tasks. For instance, a multi-person video game on a network needs a way to order actions on different computers that is very close to their actual order. Similar requirements might be needed of, say, a global network of sensors or for a system time-sharing a common resource (like TDMA networks). While this may be easy if all the nodes of the system shared an identical copy of a clock that always kept time perfectly, real clocks are not perfect and tend to drift. If left alone uncorrected, the skew (difference) between clocks in these nodes can become too large and intolerable.

One way to improve this situation is to let the nodes have a logical clock on top of their hardware clocks. They could communicate their logical clock values and adjust them based on the values of other nodes in the system (according to some algorithm). The main practical challenges here are that the hardware clock drift rate and the latency of the communications between nodes is variable and unknown to these nodes. At best, the nodes may be aware of some upper bounds on the hardware clock drift rate and message latencies ($\tau$). Further, an algorithm that adjusts logical clock values can be expected to have certain reasonable properties:

The largest skew between any two logical clocks in the network (global skew) is small.
The largest skew between the logical clocks of any pair of adjacent nodes in the network (local skew) is small.
The maximum possible skew between any pair of nodes degrades smoothly as a function of the distance between the nodes (gradient property).
The rate of progress of logical clock is neither too small nor too large. This is to ensure that the spacing between logical times given to events at one node compares well with their real times.

Evidently, the skew depends on the size of network, and more specifically, on the diameter $D$ of the network. It has been shown that no algorithm can achieve a better bound on global skew than $D/2$ (Assume $\tau = 1$) [Biaz, Welch '01]. More surprisingly, [Fan, Lynch '04] showed that no algorithm could achieve a local skew better than $\Omega(\log D/ \log\log D)$. While algorithms that match the bound on global skews were known, matching the local skew bound has been elusive. The paper we will be discussing [LLW-PODC'09, J.ACM'10] solves this problem, and also presents an improved lower bound for local skew.

In the discussion, we will see a more formal statement of the model. We will discuss why certain simple algorithms like setting a node's logical clock to the average or maximum of its neighbors fail. We will then see how ideas from these failed algorithms can be put together to construct a simple algorithm (the proofs, however, are a bit long) that has optimal global and local skews and a good gradient property. We will also talk about lower bounds on global and (time permitting) local skews.

Note: In case you want to read the paper, [LLW-PODC'09] contains only the statements of the results. Also, I found the two associated tech reports diffcult to follow. Their recent journal article in J.ACM-Jan'10 is much more readable and contains proofs in good detail. Locher and Wattenhofer's earlier paper in DISC'06 is a good background reading and helps you appreciate the algorithm better.

Forgotten hoodie

2010-04-22T16:08:00.001-04:00

If you left a dark-coloured striped hoodie in class, with yellow trim, I picked it up. You can find it in my office.

- Ryan

The intersection of two halfspaces has high threshold degree

2010-04-20T01:43:00.006-04:00

Consider a Boolean function $f: \{-1,1\}^n \mapsto \{-1,+1\}$. It is well known and also easy to show that $f$ can be represented exactly by a real polynomial of degree atmost $n$. In other words, there exists a real polynomial $p(x)$, such that $p(x) = f(x), \,\, \forall x \in \{-1,1\}^n$. One could also ask the following question: how well can a function be approximated by a polynomial? There are multiple notions of approximation and the one we will look at in this talk is called sign representation. We say that a polynomial $p(x)$ sign represents a Boolean function $f$, if $sgn(p(x)) = f(x), \,\, \forall x \in \{-1,1\}^n$. The sign degree of a function $f$ is defined as the least degree of a real polynomial $p$, such that $p$ sign represents $f$.

Such questions concerning approximation of functions by polynomials have been studied by Math and CS researchers since the $1960$'s. Some of the earliest papers on these topics are by Newman and Minsky and Papert. Although interesting in its own right, this line of work has found numerous applications in various areas of CS theory such as communication complexity, quantum computing, complexity theory etc. For more details, see this excellent tutorial by Scott Aaronson. Sign representation by polynomials also has a very nice implication for machine learning. Roughly speaking, if a function has sign degree $d$, then it can be learned in time $n^{O(d)}$ under a standard model of learning. This connection has been exploited heavily in recent years to come up with learning algorithms for many non-trivial problems. Such algorithms are sometimes also called ``perceptron based algorithms''.

However, one notorious class of functions for which no efficient learning algorithm is known till date is the class of intersection of two halfspaces. A halfspace over $\{-1,1\}^n$ is a Boolean function defined as $f(x) = sgn(a_0 + \sum_i a_ix_i)$. The intersection of two halfspaces is a function $h(x,y) = f(x) \land g(y)$, where both $f$ and $g$ are halfspaces over $\{-1,1\}^n$. Clearly, by definition the sign degree of a halfspace is $1$. Minsky and Papert in 1969, showed that the sign degree of the intersection of two halfspace is $\omega(1)$. In their paper they also introduced a very important technique, called ``symmetrization'', which can be used to lower bound the sign degree of certain functions. The next non-trivial improvement came in 2003, when O'Donnell and Servedio showed that the sign degree of the intersection of two halfspaces is $\Omega(\frac {\log n} {\log \log n})$. Although a massive improvement, this still does not rule out sub-exponential time ``perceptron based'' algorithms for learning the intersection of two halfspaces.

Last year in an award winning paper presented at FOCS, Alexander Sherstov showed that this bound is $\Omega(\sqrt{n})$. Also, in a very recent follow-up work he has managed to show that the sign degree of the intersection of two halfspaces is in fact $\Theta(n)$. Hence, perceptron based approaches cannot be used to get any non-trivial algorithm for this class of functions. This in turn means that some new ideas will be required to make progress on the problem of learning the intersection of two halfspaces. On Thursday, we will prove Sherstov's result from FOCS'09 which shows that the sign degree of the intersection of two halfspaces is $\Omega(\sqrt{n})$. The talk will be self-contained.

2010-04-13T11:38:00.003-04:00

This week, I will present the paper "Regularity lemmas and combinatorial algorithms" by N. Bansal and R. Williams. This introduces a new approach for combinatorial algorithms for fundamental problems exploiting regularity concepts from graph theory and additive combinatorics.

Boolean matrix multiplication (BMM) is a key primitive used in shortest path problems, transitive closures and CFG parsing. For nxn boolean matrices A and B, the problem is to compute the matrix product D over the 'boolean semiring': D[i,j] = OR_{1<=k<=n} (A[i,k] AND B[k,j]). It is easy to see that this problem easily reduces to the "Algebraic matrix multiplication" (AMM) (the standard matrix product over the reals, for instance); in fact, this provides the best known algorithms for BMM! Beginning with Strassen's algorithm (O*(n^{2.8}) time, suppressing a polylog factor), AMM has attracted much theoretical attention, since we now have the additional power of subtraction(cancellation). The state of the art using this approach is a O*(n^{2.38}) algorithm by Coppersmith and Winograd (1987). However, there is an alternative line of attack based on "combinatorial algorithms" treating boolean matrices as graphs; many of them are based on the "Four Russians" algorithm. This approach has yielded only modest results: the best running time for dense matrices is O(n^3/log^2 n).

In this talk, I will present a new combinatorial algorithm that asympototically beats the Four Russians. For this, we will need a strong hammer from graph theory: regularity (and triangle removal) concepts. More precisely, the algorithm will be based on the (algorithmic versions of) Szemeredi's regularity theorem, the triangle removal lemma and Frieze and Kannan's pseudoregularity (weak regularity) theorem. I will first explain these theorems. Then, I will present two algorithms for BMM. The first is essentially a natural reduction to the triangle removal lemma. The second algorithm builds on the first one, but exploits weak regularity instead. We will achieve a running time of nearly O*(n^3/ (log n)^{2.25}) (suppressing a poly(log log n) factor), which is the currently best known for any combinatorial algorithm. Finally, I will present a subcubic algorithm for the related problem of detecting if a given graph has any triangles, again using weak regularity. (All these algorithms will be randomized, and will work w.h.p.) The main open questions in this approach are to derandomize the above algorithms, and to obtain a truly subcubic time algorithm.

This talk will be mostly self-contained. (You may wish to review the details of some basic models of computation (pointer machine model and the word RAM), which are relevant for the problem, but, due to time constraints, I suspect I will mostly gloss over the model-dependent aspects of the algorithm.)

Poly-logarithmic independence fools $AC^0$ circuits

2010-04-06T11:35:00.008-04:00

In the coming Thursday seminar, we will go through the paper Poly-logarithmic independence fools $AC^0$ circuits.

An $AC^0$ circuit is a Boolean circuit containing NOT, AND, OR gates. We count the number of AND, OR gates as the \emph{size} of the circuit. While the \emph{depth} of a circuit is defined as the maximum number of AND, OR gates on any path from an input to the output.

We say a distribution $\mu$ fools a circuit $f$, if $|\mathbf{E}_{x \sim \mu}[f(x)] - \mathbf{E}_{x}[f(x)]| < \epsilon$. I.e., the expected output of $f$ on $\mu$ is ``roughly'' the same as that on uniform distribution.

The problem considered in this paper is on the power of $r$-independence to fool $AC^0$ circuits. Given $\epsilon$ as a constant, how large has $r$ to be so that any $r$-wise independent distribution fools all the $AC^0$ circuit of size $m$ and a constant depth $d$? The main theorem in the paper shows that $r$ is enough to be poly-logarithmic with $m$. In fact, the author proves that $r = (\log (m / \epsilon))^{O(d^2)}$ suffices. And this resolves a conjecture by [Linial-Nisan'90], which conjectured $r = O((\log m))^{d - 1}$ is enough, up to a gap between $O(d)$ and $O(d^2)$ in the exponent. The only prior progress on this problem was by [Bazzi'07], which showed that $O(\log^2 m)$-wise independent distributions fool depth-2 circuits (DNF/CNF).

The (very rough) approach to prove is to find a low degree polynomial $g$ which approximates the expectation of $f$ under both uniform distribution and a given $r$-wise independent distribution. This $g$ would set up a bridge from $\mathbf{E}_{\mu}[f]$ to $\mathbf{E}[f]$ by $\mathbf{E}[f] \approx \mathbf{E}[g] \approx \mathbf{E}_{\mu}[g] \approx \mathbf{E}_{\mu}[f]$. But since this $g$ seems hard to find (maybe it's hard because it needs to be specific for a specific distribution), the proof in the paper actually approximates $f$ by low-degree functions $g$ in the following two senses.
(1) [Beigel-Reingold-Spielman'91, Tarui'93] Find low-degree (but unbounded) $g$ so that $\mathbf{Pr}[f \neq g] < \epsilon, \mathbf{Pr} _{\mu}[f \neq g] < \epsilon$.
(2) [Linial-Mansour-Nisan'93] Find low-degree $g$ so that $\| f - g \|_2^2 < \epsilon$.

The proof succeeds by the two results above as building blocks and some other clever observations, all of which we will go through in the class.

The proof of [Linial-Mansour-Nisan'93] theorem also uses Fourier analytic tools. But no background on these will be needed -- I will introduce them in the class.

Fully Homomorphic Encryption Scheme

2010-03-29T13:39:00.044-04:00

The paper this week is about how to construct "fully homomorphic" encryption scheme.

Homomorphic encryption is a form of encryption where a specific algebraic operation performed on the ciphertext has a regular algebraic effect on the plain text. Let us take a look at the RSA system. Its public key consists a pair of integers $(n,e)$. The encryption function of a message $m$ is by calculating $m^e (mod n)$. Suppose that for message $m_1$ and $m_2$, we have their encryption $c_1,c_2$. We know that $(m_1*m_2)^e = (m_1)^e * (m_2)^e = c1*c2 (mod n)$. Therefore, the product of two ciphertexts is corresponding to the encryption of the product of the plaintexts. RSA is therefore known to be homomorphic with respect to the multiplication operation. However, the homomorphism is not preserved for addition: adding $c_1$ and $c_2$ is not the encryption of $m_1+m_2$ (or other algebraic operations on $m_1$ and $m_2$).

Since basically every computation on the integer ring can be done by a circuit with addition and multiplication gate, people would like to see an encryption scheme that is homomorphic for both addition and multiplication (on the plain text). This allows us to compute the encryption of arbitrary function on the plain texts by only looking at the ciphertexts. A encryption scheme that is homomorphic with respect to both addition and multiplication is called "fully homomorphic". Prior to this work, it is only known how to construct the encryption scheme that is homomorphic for a single operation (either addition or multiplication) or for "simple" circuits of both addition and multiplication gates. One of the famous early work is due to Boneh-Goh-Nissim which supports function of unlimited addition but only a single multiplication.

In his breakthrough paper in STOC 2010, Gentry constructed a encryption scheme that is fully homomorphic. Its security assumption is based on solving certain lattice problem under worst case. There are mostly two key ideas for constructing such a homomorphic encryption scheme. The first idea is that if the encryption scheme can ensure that it is homomorphic with respect to its own decryption circuit, then we can somehow bootstrap the encryption scheme to make it fully homomorphic. Given the first idea, it remains to construct such a encryption that is bootstrapable. This will require the decryption function to be simple enough. Gentry has some smart ideas of "squashing" the circuit complexity of the decryption function.

As the encryption scheme of the original paper is based on ideal lattice which is quite complex algebraic object, we will instead present a followup work (by Gentry along with Marten van Dijk, Shai Halevi, and Vinod Vaikuntanathan) which simplify the first paper. The link of the new paper is
http://eprint.iacr.org/2009/616.
In this new paper, all the encryption and decryption is implemented by addition and multiplication over integers (instead of using the ideal of the lattice). The two key ideas (bootstrapping and squashing) are still preserved in the new paper.

There are many applications for homomorphic encryption schemes. For example, we can use such an encryption scheme for voting without releasing the privacy of each voter's preference. Basically, every voter encrypts their preference and the voting system executes the voting rules (for e.g., evaluate the majority function) only on the ciphertext. Homomorphic encryption scheme is also believed to have wide applications in the context of cloud computing.

However, the current scheme is still quite impractical. To encrypt one bit, it requires $\eta^8$ bits (I think) where $\eta$ is the security parameter. If we take $\eta$ to be 1000, then it means that to encrypt one bit, we need $10^{24}$ bits. Also, there is a huge loss of efficiency for converting a general algorithm into a circuit of the multiplication and addition gates.

As for the class, I suggest you have a look at the paper at:
http://eprint.iacr.org/2009/616
No other crypto knowledge is needed for this talk.

Intrinstic Robustness of the Price of Anarchy

2010-03-22T13:25:00.013-04:00

Everybody loves cookies. Even us, the students in 15-589U class, love cookies. Thus, for Thursday's class, Ryan, who wishes us all to attend the talk, will bring cookies. Alas, we are 12 students in the class, and Ryan only has 11 cookies. So Ryan is willing to give us the cookies if we all reach a consensus on who gets a cookie. Splitting a cookie is unheard of, and neither is sharing a cookie. (Yes, we all love cookies that much!)

So essentially, each of us has two options: "ask for a cookie" or "not to ask for a cookie". If everybody asks for a cookie, no one gets a cookie, as there as only 11 cookies and 12 students. If 11 or fewer people ask for a cookie, each gets 1 cookie. If you don't ask for a cookie, you always get no cookies. So, how will we reach a consensus?

Well, that depends on how collegial we are. If we care about getting as many cookies as possible as a class, someone steps down, and the rest get 11 cookies. But what if each of us cares only about how many cookies (s)he gets? It is clear that no matter what other students do, each of us is always better off by asking for a cookie! Therefore, we all ask for cookies, and no one gets any! So, even though as a collective, we could gain 11 cookies, we end up getting 0, and everybody's passion for cookies is left unsatisfied.

Let's contemplate a bit about this example. Here is a game, where acting selfishly is globally worse than playing for overall good. In fact, we get a ratio of 11/0, which is infinitely worse. What about other games? Can we measure how much worse will we do by acting selfishly in a given game G? In a set G of games?

This is precisely the question that Koutsoupias and Papadimitriu raised in STOC'99. Suppose we model the game by costs rather than payoffs, so everybody wishes to minimize their costs. How much more do we (altogether) pay for playing selfishly, in comparison to playing for the overall good? In the words of Koutsoupias and Papadimitriu, who considered routing games, how much faster would the internet be if it had a manager? This is the Price of Anarchy, the ratio between the worst cost of a Nash-Equilibrium in the game, to the cost of the social optimum.

Ever since Koutsoupias and Papadimitriu raised the question, there have been numerous attempts at bounding the POA of various classes of games. The question was even extended to other models of players (such as regret-minimizing, myopic best response players etc.). In STOC 2009, Tim Roughgarden introduced the notion of smoothness in games, and showed the following:

(i) Smoothness gives an immediate (and very elegant) upper bound on the POA of a game.
(ii) The same bound applies to other models of players.
(iii) This bound gives the exact POA for the class of congestion games.

In Thursday's talk, we will survey the definition of POA and the various models of players considered in the past. We will then go on to stating in exact and proving all the above 3 results.

As a side note: the first two results are pretty straight-forward, so even people who have no background in Game Theory are welcome to attend the talk.

And just to clarify: Ryan will NOT actually bring cookies to Thursday's class.

2010-03-15T18:41:00.028-04:00

This week I will present a paper from FOCS'09 by Dinur and Harsha on Probabilistically-Checkable Proofs (PCP), "Composition of low-error 2-query PCPs using decodable PCPs" (link). PCP Theorem [Arora-Safra'98] shows that any mathematical proof can be checked probabilistically by a verifier using only small number of queries. If the theorem is true, then there should exists a proof which the verifier accepts always (completeness). If the theorem is false, verifier should accept a proof with at most bounded probability (soundness).

The ground breaking result from [Arora-Safra'98, and Arora-Lund-Motwani+'98] states that any language in NP has a PCP with constant number of queries for which the soundness is bounded by a constant. To appreciate this result, notice that the proof length is essentially unbounded. In this regard, the proof can be seen as a code with very large distance -- one that can detect very small number of changes. A very good survey on the history of PCP theorem and a combinatorial proof of the original result, due to Dinur, can be found in [link].

Formally, PCP verifier V for a language L is a polynomial time algorithm that, given input string x \in {0,1}^n, and oracle access to a proof pi \in Sigma^m(n), reads x, and using r many random bits, computes q indices to query (local window) Ir = {I1, ..., Iq} and a local predicate fr:Sigma^q to {0,1}. Then it accepts iff fr(pi(Ir)) = 1.

(Completeness) If x \in L, then there exists a proof pi which is accepted with probability 1.
(Soundness) If x \notin L, then for any proof pi, Pr[fr(pi(Ir))=1] <= epsilon.

We denote the languages accepted by such PCP verifiers with PCP{1, epsilon} [r,q]_Sigma. Notice that (trivially) PCP{1,epsilon}[O(log n), q]_Sigma \subset NP. However the existence of proof systems with NP \subset PCP made it possible to get hardness of approximation results, which started with the inapproximability of clique/independent set. Even problems with unusual approximation ratios (Asymmetric k-Center problem with O(log*n)-approximation) were shown to be optimal, in other words any slight improvement in the approximation ratio will lead to NP=P.

A central problem in this area is the construction of PCPs with smaller parameters, most notably sub-constant soundness with 2-queries and small alphabet. Although interesting in its own right, the existence of such PCPs provide even stronger inapproximability results. For example, Hastad'01 showed that Max-3SAT problem is NP-hard to approximate beyond 7/8 + epsilon, for any constant epsilon > 0. However this result does not rule out, say, the possibility of having an approximation ratio of 7/8 + 1/loglog n.

It has been known for quite some time how to get sub-constant soundness [Raz-Safra'97, Arora-Sudan'03, Goldreich-Sudan'06] with exactly 2-queries. However these constructions tended to yield proofs with exponentially large alphabets in terms of soundness. Although this may not sound too bad (after all alphabet sizes are still at most polynomial in n), the hardness results take time exponential (or doubly exponential) in alphabet size. Thus these PCPs make it impossible to get hardness results based on P != NP (at best one gets DTIME(n^log n) != NP).

A recent paper by Moshkowitz and Raz in FOCS'08 demonstrated that indeed it is possible to get sub-constant soundness with small alphabet (and proof). A major open question in their paper was:

Does there exist a shorter/easier proof for this result? (As the paper happened to be 100+ pages.)

The paper we will discuss answers this question affirmatively. To give an overview: The technique involves reducing the alphabet size considerably through a technique called proof composition using list-decodable proofs [otherwise alphabet reductions with unique decoding are not possible]. The decoding part is performed by another PCP called [no surpise there] decodable PCP, D. Most important part of this paper is a way to compose V and D without increasing number of queries. After composition step, the resulting proof is made nicer by applying local transformations like degree reduction using expanders on the corresponding Label-Cover problem, which is defined as:

An instance of Label-Cover problem is specified with: a bipartite graph G=(L \cup R, E), alphabets SigmaL, SigmaR, set of functions (projections) for each edge F{uv} : SigmaL to SigmaR with u\in L, v\in R. A labelling function over nodes, G:L\cup R \to SigmaL \cup SigmaR is said to satisfy an edge (u,v)\in E if F{uv}(G(u)) = G(v).

----

Although none of the proofs I will present are complicated or long, the presentation will be a little bit dense as we will constantly switch back and forth between different definitions. Hence I strongly suggest looking at (assuming non-familiarity):

Expanders (definition and expander mixing lemma)
Basic definitions from coding theory (relative distance, rate)
Label Cover, PCPs (basically becoming familiar with the definitions)

No class today

2010-03-04T08:10:00.001-05:00

Just a reminder: no class today, as I am out of town for a workshop.

An O(log n / loglog n)-approximation algorithm for the Asymmetric Travelling Salesperson problem

2010-02-22T13:37:00.002-05:00

In this Thursday's seminar, we will go over the recent SODA 2010 paper which gives an O(log n/loglog n)-approximation algorithm for the Asymmetric Travelling Salesperson (ATSP) problem (paper link), improving over the 1982 result of Frieze et al, which showed an O(log n)-approximation for the problem. In contrast, the symmetric version has seen much better approximations, with the best known being a factor 3/2 algorithm due to Christofedes.

In the ATSP problem, we are given a directed graph with cost on the arcs, and these costs satisfy the triangle inequality, i.e c(u,w) <= c(u,v) + c(v,u). Note that the arc costs need not be symmetric, i.e c(u,v) != c(v,u). The goal is to find a minimum cost cycle which visits all the vertices of the graph.

Since 1928 (until this paper), the best known approximation for this problem has been due to Frieze et al, who give a simple combinatorial algorithm and neat analysis to prove a logarithmic approximability bound. They also bound the integrality gap of the natural LP relaxation for the problem.

In the recent paper, Asadpour et al. consider the same LP and give an improved algorithm (based on rounding the LP) which betters the approximation factor by a factor of loglog n. At a high level, their algorithm does the following:

(i) solve the LP, (ii) ignore arc directions for the time being, and get a fractional solution in the spanning tree polytope, (iii) randomly round this fractional solution to get a random spanning tree which satisfies some thin-ness properties, (iv) re-direct these edges appropriately, (v) augment some extra edges to get back euler tour along the directed arcs.

In the lecture, we will first go over the Christofedes algorithm for STSP and then the Frieze et al. algorithm for ATSP. Then we will present the LP and explain its structural properties. Before presenting the rounding process, we will take a detour into the world of matroids and their associated polytopes and present some recent work of Chekuri et al. on how to round a random point in a matroid polytope into a good vertex solution that satisfies some nice properties. Subsequently, we will use these techniques and give the complete details of their approximation algorithm. We will conclude by explaining the connections with steps (iv) and (v) with the algorithm of Christofedes for the STSP.

2010-02-15T01:05:00.004-05:00

For a graph $G = (V,E)$ with $|V|=n$, the Laplacian of $G$,

\[L = L_G = D_G - A_G,\]

is an $n \times n$ matrix where $A_G$ is the adjacency matrix of $G$ and $D_G$ is a diagonal matrix consisting of the degrees of vertices of $G$. $L$ is real, symmetric and positive semi-definite and has $n$ real eigenvalues usually enumerated as

\[ 0 = \lambda_1 \leq \lambda_2 \leq \ldots \leq \lambda_n.\]

The spectrum of $L$, in particular $\lambda_2$, also called the Fiedler value, is related to the connectedness of $G$ and spectral methods are heavily used in heuristics for graph partitioning and its variants. For this reason, $\lambda_2$ has been the focus of much theoretical work.

Most results on $\lambda_2$ were obtained by working with some embedding of a graph onto a surface and exploiting connections between the Fiedler value and geometric embeddings. As such, these techniques were specialized to graphs endowed with some structure which allow one to find an external geometric representation. See, for example, Kelner .

In 2008, Biswal, Lee, and Rao presented a method for bounding $\lambda_2$ using an intrinsic deformation of the graph. This deformation is achieved by assigning weights to vertices and equipping $V$ with a shortest path metric. A parameter associated with the metric (essentially, the average pairwise distance) is used to upper bound the Fiedler value and this parameter is optimized over all weight functions. Interestingly, this optimization problem has a dual formulation in terms of all pairs multi-commodity flows, where the optimal weights on vertices correspond directly to a measure of congestion in the optimal flow. The results are obtained by studying the structure of optimal flows in various classes of graphs.

The paper I will be discussing, ``Higher Eigenvalues of Graphs'' by Kelner, Lee, Price, and Teng, extends the above result to obtain bounds on higher eigenvalues. Additional conditions in the optimization problem lead to restrictions in the dual formulation and the notion of subset multi-commodity flows is introduced. Working under this setup, the authors show that $\lambda_k$ is $O(k/n)$ for planar graphs, $O(g k / n)$ for graphs of genus $g$, and $O(h^2 \log (h) k / n)$ for graphs with excluded $K_h$-minors.

Twice Ramanujan Sparsifiers

2010-02-08T15:47:00.011-05:00

Given a weighted graph G, a natural question to ask is whether there exist another graph H on the same vertex set such that any cut has similar weights in G and H. As the weight of a cut can be described by $\sum_{uv \in E(G)}w_{uv}(x_u-x_v)^2$ where x is a 0-1 vector, one could ask the same question where x can take any real values. A spectral sparsifier for a graph G is H such that the above functions on the two graphs are close for any assignment of x and H is sparse.

Algorithmically, graph sparsifiers were first studied as a subroutine for near-linear time Laplacian Solvers by Spielman and Teng. Their algorithm partitions the graph into expanders by finding approximately sparse cuts, then randomly sampling each expander.

Follow up work has been done on other methods for finding sparsifiers that give better guarantees. One approach is to sample the edges by effective resistance, but it relies on the near-linear time solver to achieve good performance. This work uses a similar approach, but with a different goal in mind. It aims to find the best approximation rate possible with a d-regular subgraph, and achieves twice the ratio given by Ramanujan graphs , which are regular graphs whose spectral gap are as large as possible.

I will use the first half of the talk to cover background topics such as condition number and spectral decompositions, then show the approaches used in recent work on sparsifiers. If time permit, I will try to give a sketch of why one couldn't do better than the ratios given by Ramanujan graphs. The second half of the talk will be used to go through the proof of the key theorem of this paper, which basically states the following:
Given a set of rank one matrices that sum to identity, one could pick a small subset of them such that the ratio between the largest and smallest eigenvalues of their sum is small.

The Role of Randomization in Mechanism Design

2010-02-02T18:04:00.002-05:00

On Thursday I will talk about the FOCS 2009 paper of Dobzinksi and Dughmi, "On The Power of Randomization in Algorithmic Mechanism Design" (link). This blog post serves as a teaser.

First of all, what is mechanism design? Mechanism design is just combinatorial optimization, for which the input to the problem instance is controlled by a set of self-interested parties who might lie about their data if lying proves to be advantageous. The challenge of mechanism design is therefore not only to design an algorithm that finds a good solution to its given input, but also to do so in such a way so that all of the self-interested parties are incentivized to report their true data to the mechanism.

Auctions represent the canonical mechanism design problem: The auctioneer has a set of goods to distribute among n bidders, and the bidders each have a valuation function that determines how much utility they would receive from each subset of the goods. One common objective for a mechanism designer is to distribute the items so as to maximize social welfare: the sum utilities of all of the bidders. In addition to being able to distribute the goods, the mechanism is given the power to charge bidders some price, up to their reported valuation for the set of goods they ultimately receive.

When do there exist mechanisms to optimize social welfare? The Vickrey Auction, introduced in the early 1960s, and later generalized into the VCG Mechanism by Clarke and Groves in the 1970s gives the answer: almost always. Under quite weak conditions, the VCG mechanism defines prices which incentivize agents to report truthfully when the mechanism computes the optimal allocation given the reports. This is great news, and Vickrey won the Nobel Prize for this in 1996. But it doesn't quite close the book on social welfare maximization. The VCG mechanism requires the auctioneer to actually calculate the optimal allocation to maximize social welfare, and so when the welfare maximization problem is itself computationally difficult, it is not possible to implement the VCG mechanism. Unfortunately, the incentive properties of the VCG mechanism depend on the output allocation actually being optimal: this means that approximation algorithms for combinatorial optimization problems don't easily translate into truthful mechanisms which approximately maximize social welfare.

This brings us to one of the motivating questions for algorithmic mechanism design: When exact optimization of social welfare is a computationally difficult problem, when do there exist computationally efficient truthful algorithms which well approximate optimal social welfare? This paper considers the role that randomization plays in the answer to this question.

Lets revisit what truthful means. For a deterministic mechanism to be truthful, no matter what the reported data of the other agents, no agent i should ever be able to manipulate the mechanism by lying about his own data so as to cause the mechanism to select an outcome he prefers. There are two ways that we can generalize this to randomized mechanisms:

A universally truthful randomized mechanism is simply a probability distribution over deterministic truthful mechanisms. The mechanism gets to flip some coins, but no matter what the outcome of the coins, no agent could have benefited by lying, even in hindsight.
A truthful in expectation mechanism only guarantees that no agent can benefit in expectation by lying, before the mechanism's coins are flipped. For some outcomes of the coins in hindsight, agents could possibly have benefited by lying.

Universally truthful mechanisms give the stronger guarantee: they do not require that we assume that agents are risk neutral and only wish to maximize their expected welfare, independent of risk. But when we do have risk neutral bidders, can we provide better approximations for some problems using truthful in expectation mechanisms?

This paper proves a separation which answers the question: yes. Dobzinski and Dughmi demonstrate an auction problem with m items for which they prove:

For any epsilon > 0, there exists a truthful in expectation mechanism which achieves a (1+epsilon) approximation to social welfare and runs in time poly(1/epsilon, log m).
For any constant epsilon > 0 any universally truthful mechanism that achieves a (2-epsilon) approximation to social welfare must run in time Omega(m).

The negative result is a communication complexity lower bound, and so holds independently of computational constraints.

The auction for which the separation is proved is somewhat artificial, but nevertheless shows that the class of efficient truthful in expectation mechanisms is strictly broader than the class of efficient universally truthful mechanisms. This should be viewed analogously to a complexity-class separation in how it shapes our understanding of the power of efficient truthful mechanisms.

So. Come to class on Thursday to hear more!

Recent constructive results for the Lovász Local Lemma

2010-01-24T14:53:00.010-05:00

The Lovász Local Lemma is a workhorse in probabilistic combinatorics which has found applications to packet routing, perfect hashing, graph coloring, scheduling, statistical mechanics, etc. It is typically used in conjunction with the probabilistic method to prove the existence of solutions. Efficiently finding these solutions has been the pursuit of a good deal of research, which chipped away at the problem through special cases.

At STOC 2009, Robin Moser resolved an important special case - bounded occurrence k-SAT - with a ZPP algorithm and a weak derandomization. This was an important result in of itself, earning the Best Paper award, but what really tickled the theory community was the marvelous proof strategy he came up with for his talk; "one of the best STOC talks ever", says Uncle Lance. A few months later, he and Gábor Tardos generalized the result to cover (essentially) the entire LLL. Karthekeyan Chandrasekaran, Navin Goyal, and Bernhard Haeupler strengthened the derandomization under a mild "ϵ-slack" condition, thus placing more non-constructive solutions squarely in P. A couple weeks ago, Bernhard Haeupler, Barna Saha, and Aravind Srinivasan showed that the Moser-Tardos algorithm retains a beneficial property of the naive rejection sampling algorithm, which can lead to polynomial running time even with superpolynomial events.

In my talk, we'll touch on each of these results, though much more emphasis will be placed on those by Moser and Tardos. Here's an outline.

(5-10 min) Introduction.
(10-15 min) Proof of symmetric k-SAT LLL from this survey article.
(10 min) A motivational application to hypergraph coloring from Rajmohan Rajaraman's notes.
(5 min) A brief history of results inching towards a constructive proof.
(25-35 min) The Moser-Tardos algorithm for symmetric, bounded occurrence k-SAT and the slick STOC proof as explicated by Fortnow and Tao. (The latter post is excellent.)
(25-30 min) The Moser-Tardos algorithm for symmetric LLL and its proof from Joel Spencer's notes. This proof is more pedestrian but also more general, and it sets up the rest of the talk.
(10 min) Break.
(15-20 min) Derandomization results (briefly).
(10-15 min) Conditional LLL distributions (briefly).

You may wish to review Shannon entropy and derandomization by conditional probability.

Read this blog

2010-01-12T14:46:00.007-05:00

This is the class blog for CMU's Spring 2010 graduate CS course 15-859U, "Theoretical Computer Science's Greatest Hits, 2009".

This is a seminar course. The instructor is Ryan O'Donnell. The course homepage is http://www.cs.cmu.edu/~odonnell/hits09/, but that page will be mostly static. Please read this blog regularly, or add it to your RSS feed.

We will use MathML in these posts; if it is working correctly for you, the following statements should look basically the same:

A set system L is "laminar" if for all A, B ∈ L, either A∩B = Ø, A ⊆ B, or B ⊆ A.

A set system $L$ is "laminar" if for all $A$, $B$ $\in$ $L$, either $A \cap B = \emptyset$, $A \subseteq B$, or $B \subseteq A$.

To get this to work, on IE I believe you download an ActiveX control here:
http://www.dessci.com/en/products/mathplayer/download.htm

If you use Firefox, try going here:
http://www.mozilla.org/projects/mathml/fonts

For Chrome and Safari, I'm not sure; if someone wants to comment on whether or not it works for them, please do.