Bias Isn't Garden Variety Noise--Adversarial Noise is Worse

Viktor Mayer-Shonberger has popularized an approach to Big Data which involves overwhelming the noise in "messy" data sets through sheer volume. This is based on the principle that the variance statistic is proportional to the inverse of the square root of sample size. This is sometimes called the "inverse root n" rule. It means, if you make four times as many independent measurements of the same thing, the variance will be cut in half. The inverse root n diminishes slowly, but if you have unlimited amounts of data, you can push the variance down as low as you need to, even if the data is "messy", by which Mayer-Shonberger means that it's got a lot of random noise. 

Mayer-Shonberger is right. This works, and it is a key factor in lots of important solutions. But it's easy to overlook a key assumption. What if the noise is non-random? what if the instrument you measured with is biased? You use a thermometer that consistently measures a bit high, or a scale that weighs things a bit heavy, or a clock that runs a second fast. Well, the variance still reduces by the "inverse-root-n" law. But now it converges to the wrong value! Cutting straight to the chase, the lesson is that bias cannot be treated like random noise because bias, by its nature, doesn't have a zero mean. 

If we have just a few data points, a little bit of bias doesn't matter too much, like in the chart below:

   

The statistics for the unbiased (blue) distribution are about the same as the statistics for the biased (orange) distribution. If we make a decision, like whether some measurement is a member of the blue population, using the statistics for the orange distribution will give almost exactly the same answers. We might not even notice the bias. But if we get more data points, the bias that separates the peaks remains the same while the variance collapses. The new chart looks this:

   

Now, if we use the orange distribution's statistics to decide whether an individual is a member of the blue population, we'll be wrong about 2/3 of the time. 

On the surface, it seems like we did everything right. We gathered more data points to drive the uncertainty down, and what we got for it was the inability to make rational decisions because the same bias that was annoying--though non-critical--for the small sample resulted in completely different distributions for the large sample. 

This is not just a theoretical oddity. It arises in practice. We've seen this in target tracking applications where two different sensors will appear to be tracking different vehicles, one consistently 50 feet behind the other. If the sensors have poor location accuracy, the tracker sees that the error envelopes of the two vehicles overlap and treats them as different measurements of the same target. It combines and fuses the two measurements and gives a plausible estimate of the target position somewhere between the two. 

        


It's not a good estimate, and we may be curious why neither sensor ever sees the vehicle right where the tracker predicts (it's always a bit ahead of the tracker), but it's in the vicinity. Then when the sensors get upgraded to a new model with better location accuracy, things get crazy. The error envelopes no longer overlap. So from the tracker's perspective, each sensor appears to be tracking a different vehicle.

        

Usually this happens because one of the clocks is off. In one case, one of the sensors programmed in a leap second while the other one didn't. A one second bias is enough to make lots of stuff break. GPS and everything that relies on it would be completely useless. Vehicles travel about 50 feet per second in city traffic. That's a lot more than the location accuracy of good moving target sensors. The situation is still worse if the vehicles change speed. If the vehicle slows from 35 MPH to 15 MPH for an extended time, the bias effect changes. Remember, the bias is in time, but is coupled to space by the velocity. At 15 MPH, the spatial effect of the bias is reduced from 50 feet to 20 feet. If the sensors' error envelopes are 25 feet, the two tracks will merge into one track at 15 MPH and then separate again when the vehicle accelerates to 35 MPH, then merge back into a single track when the vehicle slows. 

How often does a leap second occur? It's actually quite frequent: about once every 16 months since 1960. 



There's an even more pernicious form of noise called "adversarial noise" 

With adversarial noise, the shape of the distribution that you see is affected by an adversary or some other player with different objectives than your own. Decoys are a great example. By placing a decoy, the other player gets to inject his messages into your sensor data. You see the decoy and believe that the other player is there, with high certainty because it looks like the target you are looking for. The adversary has shaped the distribution of the noise that you see. If you pursue the decoy, then the adversary has used his ability to inject messages into your data to take over your controls and gotten you to do what he wants. This is one of the few things that the Iraqi army was very good at in the first Gulf War. Pulitzer Prize winner Rick Atkinson’s chronicle of that war, Crusade: The Untold Story of the Persian Gulf War observes that Iraqi decoys were so effective that, in spite of many “successful” sorties against Scud missile launchers, not a single kill could be indisputably confirmed as a genuine Scud missile launcher (p.147).

Classical probabilistic techniques are even more dangerous with adversarial noise than they are with bias, because the adversary can use his ability to influence what your sensors see to control what you do (bias is just one of the things that the adversary can inject). He can lure you into an ambush; he can cause you to leave your flag unprotected. A clever adversary can--unless you are very careful--use your superior sensing abilities to make you perform worse than if you had no sensing ability at all. 

Classical methods of optimal control try to exploit every opportunity to do a better job. That's a big part of what makes them optimal. And this is exactly what makes classical control theory hopeless in the face of adversarial noise. If your opponent can fool you just a little bit, a classical optimal controller will take the bait lead you into a course of action that looks optimal, but favors your opponent. This eleven second YouTube scene from the Disney Pixar movie Up! illustrates the problem hilariously.
        

Disney/Pixar "Up!"


But what happens in the movie clip would not be altogether different from the outcome of controlling a swarm of "drones" using classical optimal control theory.



S3 Data Science, copyright 2015.