A Compilation of Common Data Science Tools

Ben Jacobs - 05 March 2025

Table of Contents

Overview

Data Science can be done in many different ways, with many different tools. I've compiled here some common resources used for data science, grouped them, and included links and short explanations. When I was starting data science, I wish I could have seen the most common tools grouped together in a way like this, so I put one together myself.

Data Science Languages

There are many programming languages used in Data Science. Below are some of the most commonly used.

Python Packages and Frameworks

Dataframes and Basic Data Wrangling

Machine Learning in Python

Pytorch, Scikit-learn, and Tensorflow are 3 separate and independent python libraries for machine learning in python.

Parallel Processing and Cloud Computing

APIs

Visualization Libraries

There are a lot of options for visualization, and some libraries will work just as well as some of the others. Find what works for you.

Table of Visualization Libraries

Name Year Released Main Language Other Languages / Interfaces Strengths
Altair 2016 Python None – It generates Vega‑Lite JSON specs Can stack simple syntax and graphs to make more complicated ones
Bokeh 2013 Python JavaScript (BokehJS for rendering) Interactive web plots and dashboards.
Bqplot 2014 Python None Primarily designed for interactive Jupyter notebooks.
D3.js 2011 JavaScript Wrappers exist in other languages Great for web visualizations.
ggplot (ggplot2) (Python) 2013 Python None – Inspired by R’s ggplot2 Great for plotting in both R and Python. Easily extendable.
HoloViews 2015 Python None (uses Bokeh or Matplotlib) Makes visualization easy, great for large datasets, integrates with Bokeh/Matplotlib.
hvPlot - HoloViz - Python None (built on HoloViews)
Matplotlib 2003 Python None officially
PlotAPI ? Language‑agnostic Any language that can make HTTP requests Paid framework that excels at interactive, colorful, and dynamic visualization.
Plotly / Plotly Express 2013 Python (primary) R, MATLAB, Julia, JavaScript Plotly is exceptional for making graphs and interactive graphs.
Seaborn 2012 Python None
Vega‑Lite 2016 JSON spec (JS) Python (via Altair), R (via wrappers)

R and R Studio

Other Tools

Backend Integration Tools

Databases

SQL / Relational Databases

NoSQL Databases

  1. MongoDB – Document-oriented, JSON-based database.
    • Uses BSON (binary JSON) and MQL (similar to SQL)
    • Free and paid options (MongoDB Atlas, Enterprise)
    • Sharding – Splitting up data across servers
    • mongodb.com
  2. Cassandra – Distributed NoSQL database
    • Great for high availability and heavy loads
    • Free and open source
    • Paid options: DataStax Enterprise, Amazon Keyspaces, Azure Cosmos DB
    • “Distributed” means Cassandra can run on multiple machines while appearing to users as a unified whole
    • cassandra.apache.org
  3. Redis – In-memory key-value store, great for caching
  4. Flat storage – Amazon S3

Dashboarding and Visualization Platforms

Data Storage Formats

Monitoring and Metrics

These tools and services are better geared towards companies than individuals, but are still worth knowing about.

Conclusion

There are so many tools and options available for data science. I hope that aggregating and succinctly explaining what these tools are for can help you better explore and analyze your next data set. Happy wrangling!