Python Lists Vs. NumPy Arrays: A Deep Dive into Memory Layout and Performance Benefits | by Peng Qian | Jul, 2023


SCIENTIFIC COMPUTING

Exploring allocation differences and efficiency gains

Peng Qian

Towards Data Science

Python Lists Vs. NumPy Arrays: A Deep Dive into Memory Layout and Performance Benefits
Data in NumPy arrays are arranged as compactly as books on a shelf. Photo by Eliabe Costa on Unsplash

In this article, we will delve into the memory design differences between native Python lists and NumPy arrays, revealing why NumPy can provide better performance in many cases.

We will compare data structures, memory allocation, and access methods, showcasing the power of NumPy arrays.

Imagine you are preparing to go to the library to find a book. Now, you discover that the library has two shelves:

The first shelf is filled with various exquisite boxes, some containing CDs, some containing pictures, and others containing books. Only the name of the item is attached to the box.

This represents native Python lists, where each element has its memory space and type information.

However, this approach has a problem: many empty spaces in the boxes, wasting shelf space. Moreover, when you want to find a specific book, you must look inside each box, which takes extra time.

Now let’s look at the second shelf. This time there are no boxes; books, CDs, and pictures are all compactly placed together according to their categories.

This is NumPy arrays, which store data in memory in a continuous fashion, improving space utilization.

Since the items are all grouped by category, you can quickly find a book without having to search through many boxes. This is why NumPy arrays are faster than native Python lists in many operations.

Everything in Python is an object

Let’s start with the Python interpreter: although CPython is written in C, Python variables are not basic data types in C, but rather C structures that contain values and additional information.

Take a Python integer x = 10_000 as an example, x is not a basic type on the stack. Instead, it is a pointer to a memory heap object.



Source link

Leave a Comment