Seth Gregorie Data Scientist July 13, 2015

It is indisputable that the volume of data being captured is on the rise. What still remains controversial is whether or not the often superfluous capturing will actually lead to an increase in valid discoveries instead of a flood of false and random occurrences perceived to be new discoveries.

Data driven discoveries are commonly backed by the probabilistic or statistical existence of a relationship between one or more variables. For example, measuring the correlation between outside temperature and ice cream sales would likely indicate that as one goes up, so does the other. I haven’t done the math, but for the sake of argument, let’s assume they are related.

Our world is full of chaos and uncertainty.  Since data is just recorded snapshots of our world at a given point in time, data also inherits these varying degrees of chaos and uncertainty. This is where you have to be careful. What if we added beach tourism data and shark attack counts to the ice cream and outside temperature data set mentioned above?  I’d be willing to bet that a strong statistical relationship between shark attacks and ice cream sales would bubble up from the data. But is that relationship real, or is it just a fallacy of other underlying relationships?  A more reasonable inference is that people flock to the beaches in warm weather, and more people in the water mean more shark attacks, which just happen to increase with ice cream sales due to the warmer weather.

So how do we keep from being fooled by these false relationships? First of all, don’t believe everything you see and read. Use your own judgment and expertise to validate claims. Scientific journals generally do a great job vetting through research submissions, but a lot of the vetting still falls back on the original investigators and whether or not they applied proper research design and experimental techniques.

The evolution of big data has opened up the world to great opportunities.  Within those opportunities lies the responsibility to present great science and eliminate fallacies that lie within the data.

Seth Gregorie July 13, 2015

As a Business Analyst, scope creep can be very challenging to overcome during feature development. Carefully evaluate the request and its impacts.

Jenny Walter June 30, 2015

When designing application software, expect failure. It's not just about redundancy and high availability in the infrastructure, it also needs to be part of the application software design.

Fred Robinson June 10, 2015

You need an architectural blueprint to ensure there is a tomorrow.  Fortunately, there are proven styles and techniques to address these challenges.

Alan Frye June 03, 2015