10 Ways to Add a Column to Pandas DataFrames | by Soner Yıldırım | Jul, 2023


We often need to derive or create new columns

Soner Yıldırım

Towards Data Science

Photo by Austin Chan on Unsplash

DataFrame is a two-dimensional data structure with labeled rows and columns. We often need to add new columns as part of data analysis or feature engineering processes.

There are many different ways of adding new columns. What suits best to your need depends on the task at hand.

In this article, we’ll learn 10 ways to add a column to Pandas DataFrames.

Let’s start by creating a simple DataFrame using the DataFrame constructor of Pandas. We’ll pass the data as a Python dictionary with column names being keys and rows being the values of the dictionary.

import pandas as pd

# create DataFrame
df = pd.DataFrame(

{
"first_name": ["Jane", "John", "Max", "Emily", "Ashley"],
"last_name": ["Doe", "Doe", "Dune", "Smith", "Fox"],
"id": [101, 103, 143, 118, 128]
}
)

# display DataFrame
df

df (image by author)

1. Use a constant value

We can add a new column of a constant value as follows:

df.loc[:, "department"] = "engineering"

# display DataFrame
df

df (image by author)

2. Use array-like structure

We can use an array-like structure to add a new column. In this case, make sure the number of values in the array is the same as the number of rows in the DataFrame.

df.loc[:, "salary"] = [45000, 43000, 42000, 45900, 54000]

In the example above, we used a Python list. Let’s determine the values randomly with NumPy’s random module.

import numpy as np

df.loc[:, "salary"] = np.random.randint(40000, 55000, size=5)

# display DataFrame
df



Source link

Leave a Comment