When you start to work with Python in the context of Data Analysis, Engineering or Science,
pandasis (likely) one of the first libraries that you will have to learn about. This incredible library enables you to manipulate two very important objects in the Python language — the 1 dimensional
Series and the two dimensional
DataFrame. These objects are part of a lot of data pipelines and mastering them is crucial to start your Pytyon career.
Dataframes are widely used throughout data science and analytics, as they enable the creation of multidimensional and multi-type objects. The goal of this post is to provide a very complete guide on how to use some famous
pandas functions and how to work with the most important features of the library. Hopefully, after reading this guide, you will be ready to work with the most important
pandas eatures. It may also be very common that you are migrating from a SQL background, so I’ll try to leave a comparison with SQL code throughout some instructions in the post, so that it is easier to compare the instructions between the two frameworks. But, keep in mind that knowing SQL is definitely not a requirement to learn
Throughout this post, we’ll use a variety of data to learn about
- We’ll build our own
pandasSeries and DataFrames using object creation commands.
- We’ll work with three datasets containing information about stock prices, available here (https://www.kaggle.com/datasets/rprkh15/sp500-stock-prices) — namely, we’ll use Ford, Apple and Abbvie stock price data.
In this post we’ll cover the most famous
pandas features, namely:
- Creating dataframes
- Selecting rows
- Selecting columns
- Combining dataframes
- Plotting data
- Grouping data
- Chaining functions
Without further ado, let’s start!