## Sublinear Time Algorithms

We have long considered showing the existence of a linear time algorithm for a problem to be the gold standard of achievement. Indeed, it is hard to imagine doing much better than that, since for any nontrivial problem, it would seem that an algorithm must consider all of the input in order to make a decision.

~Ronitt Rubinfield

Any algorithm operating on a dataset and deciding whether all members of the dataset follow a particular property must operate in $\Omega(n)$ time because we must look at every datapoint in the dataset. Imagine that you want to check if any of the $30$ students in your class failed an exam. In order to check this, you have to look at each student’s exam. But you are only dealing with $30$ students. What if your class consisted of $1000$ students? How long would it take to look at each student’s exam?

Some datasets contain millions or billions of entries. We now have to ask the question whether it is more important to know whether there is one outlier in the dataset or whether the dataset is generally uniform. Going back to our previous example, is it more beneficial to know exactly how many students failed the exam or know with high confidence that $95\%$ of the students did not fail. What if I tell you that the second question results in a much smaller runtime than the first?

And this is really the appeal of sublinear time algorithms. Rather than looking at all the inputs in a dataset, we can look at a few select inputs to discern a pattern. But this also means that we have to refine some of the definitions we are operating under, and, perhaps, lower our expectations of what we want to find in a dataset.

To start, we can no longer define whether a dataset is or is not a certain quality. Instead we need to discern whether the dataset is probably pretty close to having the quality. (Um, that’s a lot of wishy-washy modifiers!) Defined formally, this concept is called property testing.

Definition 1: An input to a function, $x$, is $\epsilon$-close to satisfying a property $P$ if there is some $y$ that satisfies $P$, and $x$ and $y$ differ in at most $\epsilon n$ places (where $D$ is the number of elements in $x$ and $y$). Otherwise, $x$ is said to be $\epsilon$-far from satisfying $P$. [Rubinfeld 2006]

Definition 2: A property testing algorithm for property $P$ given input $x$ with $n$ elements and $\epsilon$ must satisfy:

1. If $x$ satisfies $P$, then the algorithm accepts $x$ (outputs “PASS”) with probability greater than or equal to $\frac{2}{3}$.
2. If $x$ is $\epsilon$-far from satisfying $P$, then the algorithm rejects $x$ (outputs “FAIL”) with probability greater than or equal to $\frac{2}{3}$.

There are two things to note about the definitions. First, property testing does not specify what happens when an input is $\epsilon$-close to a property. The algorithm can either pass or fail with arbitrary probability in this case. Second, the probability $\frac{2}{3}$ is arbitrarily set since lower errors may be found by running the algorithm repeatedly and taking the majority answer.

You can view a comprehensive list of why property testing is useful here [Ron 2001], but the key idea is that property testing eliminates all the “bad” inputs so you don’t have to waste valuable computational time and resources on these inputs. After narrowing down your inputs to a select few which pass the test, then you can run your full-blown (linear time!) algorithm on those inputs.

To see an example of property testing, let’s start with graph bipartiteness. Testing whether a graph is bipartite takes $O(n^2)$ runtime in the worst case (where $n$ is the number of vertices in the graph) because it requires looking at all the vertices and edges in the graph using either depth first search or breadth first search. In order to use the notion of property testing, we can transform the problem to the following:

Definition 3: A graph, $G=(V, E)$, is $\epsilon$-close to bipartite if a bipartite graph can be created by deleting at most $\epsilon n^2$ edges in $G$. Otherwise, it is $\epsilon$-far from bipartite.

Definition 4: A graph, $G=(V, E)$, is $\epsilon$-far from bipartite if $\forall$ partitions $(V_1, V_2)$ of $V$, the are $\geq \epsilon n^2$ edges $(u,v)$ such that $u, v \in V_1$ or $u,v \in V_2$.

TEST-BIPARTITE algorithm:

1. Choose a random sample, $S$, of $|S| = \Theta(\frac{log(1/\epsilon)}{\epsilon^2})$ vertices.
2. Find the induced subgraph of $S$ by querying if $(u,v) \in E$ for every pair of vertices, $u$ and $v$, in $S$.
3. Perform a DFS or BFS of $S$ to determine if it is bipartite. If it is bipartite, output “PASS.” Otherwise, output “FAIL.”

We can see that this algorithm actually has a one-sided error rather than the two-sided error defined above. If a graph is bipartite, then all subgraphs of the given graph are also bipartite. Thus, TEST-BIPARTITE outputs “PASS” with probability $1$ when the input graph is bipartite. All we have left to show is that if the input graph is $\epsilon$-far from bipartite, then the algorithm will output “FAIL” with probability at least $\frac{2}{3}$.

To prove this, we assume that the input graph to TEST-BIPARTITE is $\epsilon$-far from bipartite. Using Definition 4, we can introduce two different subsets of vertices that will help us prove the above property:

1. Choose a random sample of vertices, $U$, where $|U| = O(\frac{1}{\epsilon}\log{\frac{1}{\epsilon}})$.
2. Choose a random sample of pairs of vertices, $(u_i, v_i)$ for $W$, where $|W| = O(\frac{1}{\epsilon^2}\log{\frac{1}{\epsilon}})$.

We can consider $U$ and $V$ to be subsets of $S$ since the sizes of both are $O(\frac{1}{\epsilon^2}\log{\frac{1}{\epsilon}})$.

Let’s first consider the case when all nodes in $V$ are connected to all nodes in $U$ (i.e. $(w, u_i), (w, v_i) \in E$ given $w \in U$ and $(u_i, v_i) \in W$). If we look at a bipartition of $U$ into $(U_1, U_2)$, then we can induce a bipartition of $V$ if the graph is bipartite: $\forall v \in V \ (U_1 \cup U_2)$, if $v$ shares an edge with a vertex in $U_1$, place $v$ in $U_2$ and vice versa. If $v$ has an edge with a vertex in both $U_1$ and $U_2$, then arbitrarily choose to place in either $U_1$ or $U_2$. If for every partition of $U$, we cannot find a partition $(W_1, W_2)$ of $W$ s.t. $(W_1 \cup U_1), (W_2 \cup U_2)$ is a bipartition, then we know that the original graph, $G$, is not bipartite and TEST-BIPARTITE outputs “FAIL.”

Given our assumption that every node in $V$ is connected to a node in $U$, we can compute the probability that we will catch a vertex, $w \in V$, that does not satisfy a partition where $u_1$ is a vertex in $U_1$ and $u_2$ is a vertex in $U_2$:

$Pr[(w, u_1) \in E \cap (w, u_2) \in E] \geq \frac{\epsilon n^2}{n^2} = \epsilon$

Consequently, the probability that we have no violating edges is:

$Pr[(w, u_1) \not\in E \cup (w, u_2) \not\in E] = (1-\epsilon)^{|W|} \leq e^{-\epsilon |W|}$

From our definition of $U$ and $W$, we know that $|W| = \Theta(\frac{|U|}{\epsilon})$. If we let $|W| = \frac{3|U|}{\epsilon}$ and the total number of partitions of $U$ is $2^{|U|}$, then, the probability that one (or more) of the partitions is a bipartite partition is (excluding the trivial case when $|U|=0$):

$Pr[bipartite] \leq e^{-3|U|} \times 2^{|U|} \leq \frac{1}{4}$

Therefore, the probability when $G$ is $\epsilon$-far from bipartite that TEST-BIPARTITE outputs “FAIL” is $\geq 3/4$. But this is only for the case when all nodes in $V$ are connected to all nodes in $U$. If this is not the case, we need to show that $U$ is adjacent to at least $\Omega(\epsilon n^2)$ “bad” edges for every partition with high probability in order for the proof to hold. In fact, we can show that $U$ is adjacent to at least $\frac{\epsilon n^2}{2}$ “bad” edges.

Definition 5: $v \in V$ is a “high degree” vertex if $deg(v) \geq \frac{\epsilon n}{4}$ else $v$ is a low-degree vertex.

If we choose $|U| = \frac{4}{\epsilon}\log{\frac{16}{\epsilon}}$, then the probability that a “high degree” node in $V$ does not have an edge to any node in $U$ is:

$Pr[\forall u \in U, (v, u) \not\in E] \leq (1-\frac{\epsilon}{4})^{|U|} \leq \frac{\epsilon}{16}$

Let $H$ represent the set of “high degree” nodes in $V$. Then the expected number of “high degree” nodes in $V$ that does not have an edge to nodes in $U$ is:

$E[\forall v \in V \cap \in H \cap (v, u) \not\in E] \leq \frac{\epsilon n}{16}$

By the Markov inequality, the probability that number of “high degree” nodes in $V$ that do not have any edges to nodes in $U$ is greater than $\frac{\epsilon n}{4}$ is given by:

$Pr[\#\,\,nodes \in H \cap (u, v) \not\in E \geq \frac{\epsilon n}{4}] \leq \frac{1}{4}$

So, with probability $\geq \frac{3}{4}$ we have $\frac{\epsilon n}{4} \times n = \frac{\epsilon n^2}{4}$ edges adjacent to “high degree” in $V$ that have no edges to nodes in $U$ and $\frac{\epsilon n^2}{4}$ edges that are adjacent to “low degree” edges (regardless of whether or not they are in $V$ and are adjacent to nodes in $U$). Therefore, because we have a total number of $\epsilon n^2$ “bad” edges, we have $\geq \epsilon n^2 - \frac{\epsilon n^2}{2} = \frac{\epsilon n^2}{2}$ edges adjacent to nodes in $U$.

Therefore, we have shown that $U$ is adjacent to $\Omega(\epsilon n^2)$ edges with probability $\geq \frac{3}{4}$ and our proof from before holds.

What we’ve managed to show through all the math is that the TEST-BIPARTITE algorithm satisfies the condition for property testing.

You’ve probably noticed that the actual algorithm for property testing of graph bipartiteness is not too complex, but the process of proving that the algorithm satisfies the conditions for a property testing algorithm is much (much!) more involved. If you’re interested in learning more about sublinear algorithms, you can take a look at these class sites here and here. It is a very new field of algorithms and hence very exciting!