What is a Data Scientist?

What is a Data Scientist?

I often struggle to explain to people what I do. When I lead with my title, I get blank stares or questions about spreadsheets or accounting, and sometimes even a joke about whether I wear a lab coat to work. When I say I’m a software engineer, people ask what language I code in and what my feelings are about agile development. I’ve also tried explaining my work as a data engineer, but again, those conversations get bogged down in questions about ETL, data warehouses, and relational databases. I’m not quite a statistician either; though it’s a vital part of my position, I don’t spend my entire day calculating probabilities.

The truth is somewhere across these descriptions. I often look at big tables of numbers in a spreadsheet to gain familiarity with new data. I transform data and build machine-learning algorithms using a variety of coding languages - Python and R are my personal favorites. I have a trusty set of queries that can retrieve the appropriate data from both relational databases and newer data stores. I also use a toolbox of statistical techniques to validate my results and ensure they are accurate.

How do you sum that up in a polite conversation? My title of Data Scientist is a catchall analogy for a number crunching, data wrangling, diet software engineer/statistician, who, by the way, doesn’t wear a lab coat. The more I think about it, though, the more I realize that the lab coat joke might be closer to the point. In fact, the easiest way to describe my passion and to summarize why I love my job so much is to say that I’m a researcher. Sure, I dabble in the software arts and can work my way around code and queries, and I need to be knowledgeable about our systems and functionality from both the front and back end. Ultimately, however, my goal is top-quality, industry-impacting research and results.

The researcher metaphor brings to surface a lot of great qualities and responsibilities of a successful Data Scientist, many of which are not typically discussed. Job postings and blogs are quick to list a plethora of hard skills, but the soft skills are what push a good Data Scientist to be a great researcher.

The first of these soft skills is domain knowledge. Data Scientists need to be the experts. In this sense, I am the purist form of an academic researcher. I have to be a master of the jargon and to know the latest findings published across a variety of disciplines, both within our industry and across more traditional lines of study. I have to conduct my research with rock-solid research design to ensure I don’t find spurious results. I also, as many scientific researchers would attest, have to be one with my data. I need to know how to find it and where to look for inconsistencies. I need to be aware of what type of problem I am trying to solve and the various approaches to solve it.

The second soft skill is excellent communication. Modern office life is rife with communications from emails to social media, and as a Data Scientist, I have an even greater responsibility for written communications, like writing research summaries and white papers. I regularly need to state assumptions and methods to a variety of stakeholders in a clear, concise, and truthful manner, and written communication is only one side of that coin. I need to be able to communicate in non-verbal or written ways, like creating engaging and helpful visualizations. I also have to have excellent interpersonal communication skills to understand the assumptions of stakeholders who request research studies and to be able to perform outreach to other teams and institutions that aren’t familiar with Data Science.

The last soft skill, and perhaps the most important, is voracious curiosity. In order to be the best researcher I can be, I have to possess zero fear of the unknown - no matter how complex the subject matter. I live in the world of machine learning algorithms, statistical theorems, and big data infrastructure, so I rarely have a day where I feel 100% comfortable with the material I’ve encountered. Researchers are constantly pushing for the ultimate truth and pushing themselves to better understand it. This innate need to fully understand and to always have another question waiting helps me to grow personally and professionally by consistently pushing the boundaries that I’ve created in my head.

I work in an extremely complex and exciting field, and I continually get to improve not only the hard skills required of working at a software company, but also the soft skills required of being a great researcher. Maybe the next time someone asks me what I do, I’ll be proud to simply say, “I’m a researcher. Unfortunately, I don’t have to wear a lab coat."