- Database inquiry tools like SQuirreL—Getting acquainted with existing data.
- Office Suites (MS-Office and LibreOffice)—Prototyping models and working with small amounts of data (say, 1000 records) for problems like quickly testing the linearity of a given relationship and plotting that relationship. Good for producing presentations and reports.
- R—Handling medium-sized data sets (say, 1 million records) and reading data from existing databases. Good for sampling. Good for summarizing data and performing standard statistical tests on the data. Good for some Machine Learning.
- Matlab—Creating simulations, prototyping linear systems, numerical analysis, and mathematical models built from imperative knowledge. Good for charts, and for animations that show the evolution of a system over time.
- Mathematica—Symbolic and numerical computation, and mathematical models built from analytic or declarative knowledge. Understanding data relationships. Good for charts, analytics prototyping, and producing reports. Often a very good tool for doing a first pass over a problem and getting to understand the issues.
- Scientific Python (NumPy+SciPy+Theano+Canopy)—Good for sophisticated Machine Learning like deep learning networks, and very large data sets. Good exploitation of GPUs for fast computation. Good for delivering code to customers that they can run efficiently as part of their information systems without requiring that they obtain site licenses like those Matlab and Mathematica would require.
- Figaro—General data fusion and Bayesian analysis (probabilistic programming and Monte Carlo simulation). Understanding complex probability distributions. Figaro is a toolset for a new field called "probabilistic programming". It is a Domain Specific Language based on Scala.
- Julia—Machine Learning, numerical linear algebra. Julia is a newer programming language. It was designed with technical computing in mind and has native support for things like matrices and complex numbers, but also has the features of general purpose programming languages. It is as fast as C but as expressive Python.
- Systems software development tools—Building high performance tools that integrate into your existing system configuration. Good for Big Data. We have decades of systems software development experience. We code in Java, C/C++, Scala, etc, and are fluent in the most important configuration management tools (like Git, Maven, and SBT) and the major libraries (like Akka, Gephi, Neo4J, etc.). We have also built nice libraries of our own for things like probability distributions, Monte Carlo simulations, and genetic algorithms.
- Paper and pencil—Still the most flexible and powerful tools for doing science.
S3 Data Science, copyright 2016. |