When you start to work with Python in the context of Data Analysis, Engineering or Science, pandas
is (likely) one of the first libraries that you will have to learn about. This incredible library enables you to manipulate two very important objects in the Python language — the 1 dimensional Series
and the two dimensional DataFrame
. These objects are part of a lot of data pipelines and mastering them is crucial to start your Pytyon career.
Dataframes are widely used throughout data science and analytics, as they enable the creation of multidimensional and multi-type objects. The goal of this post is to provide a very complete guide on how to use some famous pandas
functions and how to work with the most important features of the library. Hopefully, after reading this guide, you will be ready to work with the most important pandas
eatures. It may also be very common that you are migrating from a SQL background, so I’ll try to leave a comparison with SQL code throughout some instructions in the post, so that it is easier to compare the instructions between the two frameworks. But, keep in mind that knowing SQL is definitely not a requirement to learn pandas
!
Throughout this post, we’ll use a variety of data to learn about pandas
, namely:
- We’ll build our own
pandas
Series and DataFrames using object creation commands. - We’ll work with three datasets containing information about stock prices, available here (https://www.kaggle.com/datasets/rprkh15/sp500-stock-prices) — namely, we’ll use Ford, Apple and Abbvie stock price data.
In this post we’ll cover the most famous pandas
features, namely:
- Creating dataframes
- Selecting rows
- Selecting columns
- Combining dataframes
- Plotting data
- Grouping data
- Chaining functions
Without further ado, let’s start!