Skip to Content

Do data scientists still use Excel?

Excel has been the tool of choice for data analysis for decades. With new and more powerful data science tools available, do data scientists still rely on tried-and-true Excel? The short answer is yes, absolutely. Here’s a closer look at why Excel still has an important role to play.

The Ubiquity of Excel

One of the main reasons data scientists continue to use Excel is simply because it’s everywhere. According to Microsoft, over 1 billion people worldwide use Excel. It’s likely that every company already owns Excel licenses and has years of legacy data stored in Excel files.

Data scientists often need to access and analyze data from multiple departments across an organization. Getting other teams to export their data into a format or platform that data scientists prefer can be challenging. It’s much easier for data scientists to meet others where they are – in Excel.

Powerful Functionality

Although Excel is not a true data science tool, it still offers powerful functionality for many data tasks. Some key features that data scientists take advantage of include:

  • Formulas – Excel contains over 400 built-in formulas for performing calculations on data.
  • PivotTables – Allows summarizing, grouping, counting, and graphing data.
  • Data visualization – Excel provides tools such as charts and graphs for visualizing data.
  • Power Query – Extracts, transforms, and loads data into Excel for analysis.
  • Regression analysis – The Regression tool performs linear and exponential regression on data.
  • Forecasting – Create forecasts from historical time-based data.

While programs like R and Python provide more advanced ML/AI capabilities, Excel has enough statistical and visualization power for many daily data tasks.

Familiar Interface

Proficiency with Excel is almost a given for most data professionals. The familiar user interface makes it easy to load in new data sets and quickly start manipulating and analyzing data without much ramp-up time. Data scientists who primarily use other tools can benefit from having a basic comfort level in Excel when they need to look at something quickly.

Collaboration Capabilities

Collaborating with other teams is an essential part of a data scientist’s job. Excel provides effective ways for data scientists to collaborate, including:

  • Shared workbooks – Allow multiple users to view and edit the same workbook simultaneously.
  • Office 365 Collaboration – Provides file sharing, comments, task assignments, and @mentions in Excel online.
  • Excel charts/graphs – Easy way to share data insights and analysis with non-technical teams.

The ability to bring Excel analytics into presentations, emails, reports, and more makes it easy to communicate data insights across the organization.

Connection to Other Tools

Excel also doesn’t necessarily have to be an isolated tool. Many data scientists connect Excel to other platforms and languages for expanded capabilities:

  • Python – The xlrd, openpyxl, and xlwings libraries allow reading/writing Excel files in Python.
  • R – The readxl package provides R with Excel import/export capability.
  • Power BI – Excel data can be imported into Power BI for additional visualization and dashboard creation.
  • Azure Machine Learning – Excel data sources can feed into Azure ML experiments.

By integrating Excel with other more advanced tools, data teams can optimize their workflows vs. having to standardize on a single platform.

The Downsides of Excel

While Excel is still ubiquitously used by data scientists, it does have some limitations that become more apparent as data sets and analysis needs scale up, including:

  • Calculation bottlenecks – In large data sets, each change means thousands of recalculations.
  • Difficult to collaborate at scale – File locking and difficulty versioning makes it hard to collaborate with large numbers of users.
  • Data limitations – Excel files max out at around 1 million rows and 16,000 columns.
  • Analytics limitations – No native machine learning or AI capabilities.
  • Programming complexity – Any programming requires VBA vs. Python/R libraries.

Data scientists engaged in large-scale analytics or machine learning initiatives likely need platforms designed specifically for that type of work. But Excel still plays an important supporting role in many data science workflows.

Should Data Scientists Learn Excel?

For aspiring data scientists looking to maximize their job prospects, learning Excel should absolutely be on the list. Strong Excel skills are a baseline requirement for most data analyst and data scientist job openings. Recruiters and hiring managers often specifically call out Excel fluency.

In some organizations, data scientists will spend their time exclusively building advanced machine learning models in Python or other languages. But it’s relatively rare for a data scientist to not touch Excel at all. Having some Excel chops is useful for quickly investigating an issue, prototyping a solution, working with non-technical teams, and more.

Key Excel Skills for Data Scientists

Here are some of the most important Excel skills data scientists should aim to develop:

  • Importing and cleaning data
  • Organizing and formatting data
  • Using formulas and functions like VLOOKUP
  • Sorting and filtering data
  • Creating and formatting tables and PivotTables
  • Building charts and graphs
  • Using Power Query to transform data
  • Statistical analysis functions
  • Forecasting and regression analysis

Learning Excel shortcuts and best practices for structuring and formatting data for analysis is also hugely beneficial.

Should Current Data Scientists Learn More Excel?

For data scientists already working in industry, is it worth taking the time to improve Excel skills? That depends on your current skill level and how much your day-to-day work requires Excel. If you are already moderately proficient, learning advanced Excel functionality like Power Query, forecasting tools, and connecting Python/R to Excel might help boost productivity.

But time might be better spent advancing other technical skills like Python, SQL, Hadoop, Spark, cloud platforms, etc. The more your work focuses on engineering and machine learning applications, the less critical ongoing Excel mastery is. But even hardcore data scientists are still likely to encounter Excel-based data they’ll need to handle.

Conclusion

Excel remains deeply entrenched in the day-to-day data workflow at most organizations. While newer and more advanced data science tools emerge, Excel offers convenience, collaboration, and flexibility. Most data scientists will continue to use Excel at least situationally when pulling data together, communicating insights, and integrating with other platforms.

Aspiring data professionals should absolutely invest time to learn Excel alongside other key skills like Python and SQL. Current professionals can look for opportunities to level up their Excel skills, but focusing on expanding capabilities in other tools may have a bigger payoff depending on the role.