Even regression can be nuanced. Typically, when we solve a regression problem we assume a model. Take the most trivial case, where we want to know the weight of a sack of apples and we do so by measuring the weight on four different scales and then taking the mean. The model is that the weight is a single number. If we are trying to find the relationship between the weight of cows and their food intake, the regression model might be a linear one. But there are some regression problems where we don't know what the model is. Scientific discovery is full of examples. Johannes Kepler had lots of data on the planets. Somehow from that he figured out that their orbits were elliptical. That's a case of what we call "gray matter fusion". But could a computer, given that data, figure out that the orbits were elliptical? It turns out that it can. This problem is known as "symbolic regression", and is usually solved by a technique known as "Genetic Programming", which uses Darwinian evolution to find the best model and then fit that model to the data. Multi-Target Tracking (MTT) is another classic data fusion problem that illustrates many important characteristics of data fusion problems. Consider the case where there are some number of (moving) objects that you would like to track, and you have several sensors providing data about those objects. This leads to two problems: Which sensor measurements are associated with which objects?*Association:*Where do we expect the object to be at a given time?*Estimation:*
It turns out that association and estimation are intimately connected. If you associate a different set of sensor measurements with an object, you get a different estimator. And whatever estimators you have affect how you associate the next measurements. Standard techniques for solving the MTT problem used a hard separation between association and estimation, treating the solution as a sort of assembly line where measurements were separated into different baskets and then an estimator was produced for each basket. State-of-the-art techniques accept the fact that association and estimation are tightly bound, and allow a measurement to be divided amongst many baskets, essentially saying that there are a number of objects that might potentially be responsible for that measurement. Sometimes the objects want to be tracked, like in a busy harbor where no vessel wants to collide with another vessel. Even the smallest boats will use radar reflectors to make sure that they are seen, and bigger boats will broadcast their own measurements that effectively say "I'm here. I'm here." But there are situations where the objects don't want to be tracked and they may use stealth to keep from being observed, or decoys to create false measurements. But these complications don't relieve us from the task of solving the problem. We still need to estimate the locations of the targets, even in the face of deception, and there are some very advanced methods that allow us to make good estimates about an object even in the face of deception. Bayes nets or Probabilistic Graphical Models (PGMs) are another important class of data fusion method. In a PGM, a graph captures the relationships between statistically dependent objects. Associated with each object is a probabilistic model. As we get more information from the world (i.e. observations) about an object, messages are sent to the graph nodes representing the dependent objects. These objects update their probability model through Bayes Rule and send the update to the objects that depend on them. The graph hence captures everything that is known about the world of those objects. The combination of applying Bayes Rule and sending update messages is known as the Belief Propagation algorithm. Belief propagation is well defined where there are no loops in the graphical model. But some of the most interesting problems have loops. For instance, in the MTT problem, we've said that association and estimation are coupled. An update to a target's estimated position may change the way that observations (including past observations) are associated with it. The associations will update the positions which update the associations which update the positions, and so on. This is called "Loopy Belief Propagation". It is not particularly well defined, and there are some problems for which it doesn't work, or at least doesn't work very well. But for other problems, Loopy Belief Propagation works well. But it is a bit ad hoc. We need rules like when to stop sending messages. This is an area of open research, and a very interesting one. Some of these problems are addressed by tools like Figaro, the probabilistic programming language, which can be thought of as a general tool for data fusion. |