What does a typical day in the life of a data scientist look like? How do they spend their time at work? And how much does it all cost? We decided to find out more.
In order to get more of an insight into the working lives of data scientists, we asked them “What percentage of your time do you spend on each of these tasks?”
Table of Contents
In this article, we’ll reveal what our 100 respondents had to say about how they spend their day and the approximate costs involved.
- Roughly half (44%) of those who answered our survey said that they spent at least 50% of their time on analyzing data using code – whether Python, R or something else entirely. However, 7% stated that they did not spend any of their time at all analyzing data with code, instead dedicating it to other tasks, such as talking to clients.
- Just under a third (31%) spent about half of their time working on projects involving the programming language Python, while 17% said that they spend less than 10% of their time on Python-based projects. You can consult with the RemoteDBA administrators.
- Data scientists are spending just over 20% of their time modeling data using machine learning algorithms, whether supervised or unsupervised – this makes sense when you consider that Dataiku DSS was the most popular tool for this task among our respondents. Machine learning experts also dedicate 16% of their time to preparing data sets for analysis and 15% writing SQL queries or Hive scripts, whereas 8% spend their time on R and 5% on Scala.
- When it comes to the tools used by data scientists, Microsoft Excel – unsurprisingly – takes second place with 29% of our respondents using it for between one and 10 hours a day. Python was in third place (24%), while Tableau came in fourth (9%). The “other” category allowed some people to express that they use Matlab, SAS and Spark – none of these were selected more than 1% of the time though.
- Data scientists who answered our survey work at organisations such as tech companies (19%), financial services firms (16%), consulting companies (14%) and online businesses such as ecommerce sites or search engines (12%). Dataiku also noted that around 20% work at government agencies, academic institutions and small businesses with less than 50 employees.
- Even though data scientists spend a significant portion of their time programming, almost half (44%) work on projects that involve analyzing large data sets – this suggests that they are involved in tasks such as optimizing business processes or analyzing customer behaviour.
- The average salary for a data scientist is $108,000, but the cost of hiring has been estimated to be between $500,000 and $1m. In addition to your financial expenses, you need to allow some extra time for recruitment: Jobs in San Francisco advertised through Indeed take around 100 days to fill whereas New York jobs take an average of 70 days. And it’s not just about finding the right person; you also need to ensure that they fit into your company culture.
- Dataiku’s Denis Magda noted that hiring a data scientist is “not just about finding someone who is smart and passionate about analytics”. He said: “They must be able to work well in a team, communicate effectively and most importantly – make meaningful connections between the business and technical aspects of their projects.”
- Many companies use their own in-house proprietary tools for data science tasks such as machine learning or statistical modeling, but there are plenty of powerful open-source platforms out there too. Among our respondents, 35% were using the Python data analysis package pandas; 18% used R with RStudio; 14% turned to Microsoft Excel (or another spreadsheet package) for data processing; Tableau was used by 11%; 10% relied on Spark, whereas 8% used Matlab. The “other” category mentioned Python’s Scikit-Learn, Java and C#.
- The term “data scientist” is so new that it does not appear in the Merriam-Webster dictionary yet. However, the Oxford English Dictionary (OED) states that its definition is: “A person employed to analyze complex data often with the goal of discovering previously hidden patterns and relationships.” It goes on to define the word as something belonging to or relating to “the use of statistical or other formal procedures to discover facts about the world”.
Conclusion:
Data science involves a diverse range of disciplines, from computer engineering and statistics to communication skills. You need to be comfortable working in areas such as machine learning, Hadoop and cloud computing – but most important of all is truly understanding the business you are operating in.