The word "hacking" has two common meanings. The first has to do with breaking security. That's important, but not what we are talking about here. The second meaning has to do with what has been described as "programming by expedient". Basically, this means that the code is not founded in first principles. It's a "hack", or "quick and dirty". There tends to be confusion between the two meanings. There is honor in hacking, but most of the honor comes from the first meaning, not the second1. A lot of people who work with computers are proud to call themselves "hackers" under the second meaning: "programming by expedient". A whole culture has emerged around the hacker mindset, with an urban mythology presenting the hacker as a modern day Magellan taking us to places no one previously dared to go. For the most part, it's not a new place. They just don't recognize where they are. It's another case of confusing easy with simple. Simple is hard! Easy is complex. Hacking favors the easy, at the expense of long term complexity. You have to maintain the hack; you have to protect the hack, and worse yet, you have to trust the hack. This ultimately leads to complexity, not simplicity. We are not immune to hacks ourselves. Once in a while we hack. But when we do, we're not proud of it. We document it. That's part of identifying incidental complexity. And we hope to learn enough to fix the hack at some point. That's part of eliminating incidental complexity. We feel duty bound to cite a few examples of hacks we've encountered that ended up in train wrecks.
Half of these examples come from data science problems, and half from simulation science. There are lots of other examples we could cite. And these are just examples we've seen up close and personal. Further, these are not "bugs" in the normal sense. These are cases where the people responsible--ourselves included--thought what they were doing was okay; a reasonable expedient to get the job done. Each of these hacks could have caused serious damage to the enterprise, and the routing hack contributed to the failure of the long distance company. Usually, a hack occurs when you don't recognize a pattern. That's almost a definition of hacking. Data science is fundamentally about recognizing patterns. Patterns are what the enterprise exploits to produce value. Patterns are hard because patterns require abstraction; not just an abstraction but the right abstraction. Stanford Professor Keith Devlin, in an essay describing why mathematics is the "science of patterns", explains why this is so difficult. He says2:
Devlin could have just as easily said "data science" instead of "mathematics" to frame these patterns. The hacker mindset is antithetical to investing the time to recognize and understand the deep patterns. If you adopt that mindset, you will get very good at not recognizing or understanding patterns. Your habits, your artifacts, and the processes and procedures you setup for yourself--whether consciously or implicitly--will favor the easy over the simple; will favor not recognizing patterns; will favor just getting stuff to work: using the little undocumented trick to make the right thing happen (at least for the test data). But you don't want to bet your enterprise on hacks. That's a train wreck in the making. Data science is useful because it brings some mathematical rigor to interpreting information and recognizing its patterns. That rigor is anathema to hacks, and so we strongly prefer to do things in an intellectually grounded and traceable way, hacking only as a last resort, and carefully documenting it when we do. We have a page on this web site exposing a blog we keep about good software development practices. We call it Programming Computers Like You Mean It. The blog focuses mostly on software development techniques designed to reduce hacks. For other practices more tightly focused on reducing data science and simulation science hacks, refer to our Educating the User page. For a broad taxonomy of hacks, check out the book AntiPatterns3, Which describes an anti-pattern as a practice that seems expedient at the time, but turns out to increase complexity and rework in the long run. The AntiPatterns book deals with software development, but everything they say about software development applies equally to data science and simulation science. Hacks and "Technical Debt" We need to reiterate that creating a hack does not make you a bad person. Everyone who successfully builds systems creates hacks at some point. The problem is the mindset of hacking, because, to repeat an earlier observation, that mindset generates processes and habits antithetical to understanding patterns in data or behavior. There are also mindsets that accept hacks here and there, while still keeping the eyes on the prize of pattern recognition. The standard approach is to allow a project to accrue hacks but to "keep them on the books". That's what we do by documenting our hacks. The overall method is called "technical debt", and the idea is to manage hacks like we would credit card debt. Steve McConnell has probably written most on this subject4. If you account for your hacks, you can estimate the cost of eliminating them, and this gives a better index of the real value of your project. The project's balance sheet has two sides: assets on one side and liabilities on the other. Sometimes you put actual dollar signs on the technical debt. We've done this when we've served as expert witnesses reviewing software in legal disputes. It's reasonable to incur technical debt (usually short term technical debt) when it increases the net value of the project. But too many hacks will put the net value in the red no matter what functionality it brings. When a hack is removed, the debt associated with it is retired. If you don't eliminate the hacks, you have to pay interest on their technical debt in terms of maintenance and opportunity costs. A number of different kinds of technical debt have been identified: long term and short term, intentional and unintentional, etc. McConnell's taxonomy is:
This gets a bit too complicated for most of our projects, but it's useful in big software jobs. And there's a third major kind of technical debt that is probably even more important. That's the technical debt you don't know you have. We call it "dark debt". If you manage your technical debt ruthlessly, with experience you can make a reasonable estimate of how much dark debt it's carrying. You can't put your finger on it, but you know it's there. 1 There are people in the cyber security field who call themselves "white hats". Their job is to "hack" your cyber security and report the vulnerabilities they discover. Vulnerabilities are patterns. The white hat's job is identify those patterns before the black hat does. So there is real honor in being a white hat hacker. 2 Devlin's Angle, "Patterns? What Patterns?", January 1, 2012. 3 AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis, William Brown, Raphael Malveau, Skip McCormick, Tom Mowbray, John Wiley & Sons, 1998. 4 See, for example, Steve McConnell's blog post Technical Debt. |