11 Types Of Python Data Structures For Data Analysis

May 21, 2025
20 Mins Read

Data analysis is about wrangling raw information into a usable format to discover new trends and insights. But how can you analyze this data? The answer is Python. It is a popular programming language that offers a powerful toolbox for this task. At the core of Python resides fundamental data structures that act as containers for organizing and manipulating your data.

Basic data structures, categorized into mutable types like lists, dictionaries, and sets, and immutable types like tuples, play a crucial role in data analysis. Understanding these structures is important for building efficient and effective data analysis workflows. Additionally, understanding data structures is essential for beginners in data science as it aids in efficient data manipulation and problem-solving, making programming more effective and manageable.

This article will dive into the essential Python data structures suited explicitly for data analysis.

What are Data Structures?

Data structures are the foundation for organizing and storing data efficiently in a computer’s memory. They allow for efficient access, manipulation, and retrieval of the data. In computer science, understanding data structures is crucial as they are fundamental for programming and software development. Here are some common data structures:

  • Array: An array is a group of the same elements or data items of the same type collected at a contiguous memory location. Once it is created with a specific size, it usually cannot be resized later.
  • Tree: A tree is a fundamental data structure that represents and organizes data in a hierarchical format, making it easier to navigate. The top node of the tree is known as the root node, and other nodes below it are called the child nodes.
  • Graph: A graph is a data structure that is not linear and is composed of vertices and edges. Vertices, also known as nodes, and edges, which are lines or arcs, establish connections between any two nodes within the graph.

You will gain a complete understanding of some of the widely used data structures in the below sections.

What are Data Structures in Python?

Data Structures in Python

Python data structures are divided into two parts—mutable and immutable. Mutable data structures can be changed after they are created. For example, you can add, remove, or shuffle their order. Mutable data structures can further be divided into lists, dictionaries, and sets. 

In contrast, immutable data structures cannot be modified once they are created. Python only has one built-in immutable data structure, i.e., tuple. However, Python’s different third-party packages provide their data structures, like DataFrames and Series in Pandas or arrays in NumPy. You’ll get to know about these in the further sections.

Python data structures are divided into two parts—mutable and immutable. Mutable data structures can be changed after they are created. For example, you can add, remove, or shuffle their order. Mutable data structures can further be divided into lists, dictionaries, and sets.

In contrast, immutable data structures cannot be modified once they are created. Python only has one built-in immutable data structure, i.e., tuple. However, Python’s different third-party packages provide their data structures, like DataFrames and Series in Pandas or arrays in NumPy.

Immutable objects, such as frozen sets and tuples, are crucial for maintaining data integrity as they prevent modifications after creation. You’ll get to know about these in the further sections. Understanding common data structure in Python is crucial for organizing and manipulating data efficiently, which is foundational for writing effective and maintainable code.

Lists

In Python, lists are implemented as dynamic, mutable arrays containing a sequence of items, allowing you to store and manipulate multiple elements simultaneously. You can easily add elements to a list using square brackets or the 'append()' method. They are heterogeneous in nature. For instance, you can store integers, strings, and even functions within the same list. Unlike an array, where you have to define the limit, you can expand the number of elements as you wish in a list.

You can access elements in a list using their indices, which allows for efficient retrieval and manipulation of specific items.

Here are some common methods through which you can easily manipulate your list:


# Define a list
demo_list = [1, 2, 3, 4, 5]

# Append an element to the end of the list
demo_list.append(6)
print("After append:", demo_list)

# Insert an element at a specific index
demo_list.insert(2, 8)  # Insert 7 at index 2
print("After insert:", demo_list)

# Extend the list with another list
demo_list.extend([9, 10, 11])
print("After extend:", demo_list)

# Get the index of an element in the list
index = demo_list.index(4)
print("Index of 4:", index)

# Remove an element from the list
demo_list.remove(3)
print("After remove:", demo_list)

Dictionaries 

A dictionary in Python is a collection of ordered and changeable key:value pairs. Here, keys are the unique identifiers that give access to the associated element stored in the dictionary, and values can be any data type in Python. Dictionary objects are central to Python programming due to their efficient data retrieval, insertion, and deletion capabilities. Dictionaries are written in curly brackets ‘{}’.

Python provides many built-in methods for working with dictionaries, allowing for a wide variety of operations and manipulations.

One specialized dictionary type is the defaultdict class, which provides default values for missing keys, simplifying coding by eliminating the need to handle KeyErrors or use the get() method.

Some common dictionary methods are:

  • clear() remove all the elements from the dictionary.
  • copy() gives a replica of the dictionary.
  • fromkeys() method returns a dictionary with the defined keys and values.
  • pop() removes the element with a particular key.
  • values() returns a list of all the values in the dictionary.
  • The update() method revamps the dictionary with the specified key-value pairs.

Example of Dictionaries in Python:


# Define a dictionary
my_dict = {
    "name": "Siya",
    "age": 26,
    "city": "New York"
}

# Access values by keys
print("Name:", my_dict["name"])
print("Age:", my_dict["age"])
print("City:", my_dict["city"])

Sets

Sets in Python are another fundamental data structure that offers a unique way to store and manipulate collections of items. Unlike lists and dictionaries, sets focus on uniqueness and unordered elements. A core characteristic of sets is that they cannot contain duplicate values, effectively eliminating duplicate entries. If you try to add a duplicate element, it will be silently ignored.

This property of not allowing duplicate elements is crucial for enabling efficient membership testing and set operations such as union and intersection. Sets are defined using ‘{}’, and elements are separated by commas within the brackets.

Membership testing is a key aspect of Python Sets, emphasizing its role in efficiently checking for the existence of elements and eliminating duplicates. The underlying data structure utilizes hashing to enhance performance in operations related to insertion, deletion, and traversal.

Here are some of the Set methods:

  • add() adds a new element to the set.
  • clear() remove all the elements from the set.
  • discard() removes the specified item.
  • union() returns a set containing the union of sets.
  • pop() removes an element from the set.

Example of set in Python:


# Define a set
my_set = {1, 2, 3, 4, 5}

# Add an element to the set
my_set.add(6)
print("After adding 6:", my_set)

# Remove an element from the set
my_set.discard(3)
print("After removing 3:", my_set)

# Check if an element is in the set
print("Is 2 in the set?", 2 in my_set)

# Get the length of the set
print("Length of the set:", len(my_set))

# Create another set
new_set = {4, 5, 6, 7, 8}

# Union of sets
union_set = my_set.union(new_set)
print("Union of sets:", union_set)

# Intersection of sets
intersection_set = my_set.intersection(new_set)
print("Intersection of sets:", intersection_set)

Tuples

Tuples in Python are immutable sequences, similar to lists, but the major difference is that a tuple object cannot be modified once created. They are defined by enclosing values separated by commas within parentheses ‘()’. Tuples are often used to store corresponding pieces of information together, such as coordinates, database records, or function arguments.

To access the third element in a tuple, you can use the index position 2, as indexing starts from 0.

The methods you can apply to a tuple are:

  • count() method returns the number of times a specified value occurred in a tuple.
  • index() method searches the tuple for a particular value and returns its position.

Example of tuples in Python:


# Define a tuple
my_tuple = (1, 2, 3, 4, 5, 6, 3, 3)

# Count occurrences of a specific element
count_of_3 = my_tuple.count(3)
print("Count of 3:", count_of_3)

# Find the index of a specific element
index_of_4 = my_tuple.index(4)
print("Index of 4:", index_of_4)

Some of the user-defined data structures that you can use in Python for managing data are:

Stack

Stack is a linear data structure that follows the Last-In-First-Out (LIFO) principle. This means that the most recently added element will be the first one to be removed. You can perform various operations on the Stack, like append or delete.

Implementation of Stack in Python

In Python, there are several ways to implement a stack. Let's explore a few of them in detail:

Method 1: Using a List

The most common way to implement a Stack in Python is by using a list. You can use append() to insert elements to the top of the stack, while pop() performs delete operations to remove the element in LIFO order.

Here is an example:


stack = []

# append() function to push elements into the stack
stack.append('k')
stack.append('l')
stack.append('m')

print(stack)  # Output: ['k', 'l', 'm']

# pop() function to remove element from stack
print(stack.pop(2))	# Output: 'm'
print(stack.pop(1))	# Output: 'l'
print(stack.pop(0))	# Output: 'k'

Method 2: Using collections.deque
Method 2: Using collections.deque

In addition to the built-in data structures, Python offers some additional options for data collection through its built-in collections module. This module includes various data structures, one of which is deque. 

The deque (pronounced "deck") is a "double-ended queue" allows you to insert and delete elements from both the front and rear sides. It is preferred over a list, as the deque performs append and pop operations faster than a list.

Let's understand with an Example:


from collections import deque
stack = deque()

# append() function to push elements into the stack
stack.append('x')
stack.append('y')
stack.append('z')

print(stack)  # Output: deque(['x', 'y', 'z'])

# pop() function to remove element from stack
print(stack.pop(2))	# Output: 'z'
print(stack.pop(1))	# Output: 'y'
print(stack.pop(0))	# Output: 'x'

Linked Lists

Unlike an array, a linked list manages the elements stored more flexibly. Instead of relying on contiguous memory locations, it connects elements using nodes that hold data and an address pointing to the next link in the chain.

Efficient storage is achieved in linked lists by dynamically allocating memory for each element, which optimizes memory usage and allows for easy insertion and deletion of elements.

Implementation of Linked List in Python


#Initializing a node
class Node:
    def __init__(self, data):
        self.data = data	# Assigns the given data to the node
        self.next = None	# Initialize the next attribute to null

#Creating a linked list class
class LinkedList:
    def __init__(self):
        self.head = None  # Initialize head as None

#Inserting a new node at the beginning of a linked list
def insertAtBeginning(self, new_data):
        new_node = Node(new_data)  # Create a new node 
        new_node.next = self.head  
        self.head = new_node 

Queues

The queue data structure is like a list in which all additions are made from one end and deleted from the other. It works on the First-In-First-Out (FIFO) principle. This means the first element inserted into the queue will be removed in priority.

Queues are used to manage tasks or data that need to be processed in a specific order. Although queues maintain a specific order, they can be considered an unordered collection in the sense that elements are not stored based on their value or key, unlike sets and dictionaries. The queue module in Python’s standard library provides synchronized data structures like Queue and LifoQueue, which are essential for parallel computing scenarios.

Implementation of Queues in Python

There are different ways to implement a Queue in Python. Let’s take a look at a few of them:

Method 1: Using a List

One simple approach to creating a queue is to utilize a list. You can insert elements using the append() method and pop() to remove them from the queue.

Here is an example:


# Implementing queue using a list

queue = []
queue.append(5)  # enqueue
queue.append(7)
queue.append(9)
print(queue)   # [5, 7, 9]

a = queue.pop(0) # dequeue
print(a)     # 5
print(queue) # [7, 9]

Method 2: Using collections.deque

In Python, a queue can also be implemented using the deque class from the collections module. The advantage of deque is that appending and deleting elements takes constant time complexity, O(1), compared to lists, which provide O(n) time complexity.

Heaps

A heap is a unique binary tree data structure that is highly efficient in storing a collection of ordered elements. It helps to keep track of the largest and smallest elements in a collection. Heaps are significantly used to implement priority queues, where the highest priority item is always returned first. Heaps are examples of advanced data structures, which are integral to building efficient algorithms and organizing data in software engineering.

Abstract data structures, such as heaps, play a crucial role in priority queues by categorizing data based on assigned priorities. For instance, airlines use priority queues to manage baggage, ensuring that items with higher priority are processed first.

In Python, heaps are implemented using the heapq module. This module offers relevant functions to carry out various operations on the heap data structure. Here are a few of them:

  • heapify() converts a list into a heap.
  • heappush() inserts an element into a heap.
  • heappop() removes the smallest element from a heap.
  • nlargest() returns the n largest elements from a dataset.
  • nsmallest() returns the n smallest elements from a dataset.

Here is an example of using the heapq module:


import heapq
# Create an empty heap
heap = []

# Push items onto the heap
heapq.heappush(heap, 5)
heapq.heappush(heap, 3)
heapq.heappush(heap, 7)
heapq.heappush(heap, 1)

# Pop and print the smallest item from the heap
smallest = heapq.heappop(heap)
print(smallest) 	
# Output: 1

Now, let’s understand how to perform advanced data analysis using Python libraries like NumPy and Pandas. You need to import these libraries to use them in your Python code.

NumPy Arrays

NumPy, short for Numerical Python, is a robust Python library for numerical computation. It supports large, multi-dimensional arrays and provides a wide range of mathematical functions for efficient operations on these arrays.

NumPy arrays are similar to a Python list but have some key differences. The primary distinction is that NumPy arrays are much faster and have stricter requirements on the homogeneity of the objects they contain compared to lists. Additionally, NumPy arrays use less memory than lists, making them more efficient for large datasets.

NumPy arrays hold elements of the same data type. For example, a NumPy array of strings can only contain strings and no other data types. In contrast, a Python list can include a mixture of strings, numbers, booleans, and other objects. However, for scenarios requiring mixed data types, one might consider using lists or tuples, which offer flexibility in accommodating various data formats.

Here are some important NumPy functions in Python

  • np.array() this function creates a NumPy array.
  • np.zeros() creates a new NumPy array filled with zeros.
  • np.arange() creates an array with a range of values.
  • np.ones() creates a new NumPy array filled with ones.
  • np.linspace() creates an array with a specified number of evenly spaced values.

Example of Numpy Arrays in Python:


import numpy as np
# Creating a NumPy array
arr1 = np.array([1, 2, 3, 4])
print("NumPy array:", arr1)
# Output: NumPy array: [1 2 3 4]

# Creating a NumPy array of zeros
arr2 = np.zeros(5)
print("NumPy array of zeros:", arr2)
# Output: NumPy array of zeros: [0. 0. 0. 0. 0.]

# Creating a NumPy array with a range of values
arr3 = np.arange(1, 5)
print("NumPy array with range:", arr3)
# Output: NumPy array with range: [1 2 3 4 ]

Pandas Series

The Pandas Series can be referred to as a one-dimensional labeled array in Python that can hold data of any type, such as integers, floats, strings, and more. You can think of a Pandas Series as a single column of data, where each value in the series has a corresponding label, and these labels are collectively referred to as the index. Each label in the index is associated with its corresponding value, which is essential for understanding how data pairs function within the structure.

A Pandas Series can also handle multiple values at the same index by appending them to form a linked list, similar to how sets manage multiple values.

Here are some commonly used methods available in the Pandas Series:

  • size() returns the total number of elements in the underlying data of the series.
  • head() returns a specified number of rows from the beginning of a series.
  • tail() method returns a specified number of rows from the end of a Series.
  • unique() method is used to see the unique values in a particular column.

Example of Pandas Series in Python:


# Accessing Series values by using index
# The first value has index 0
import pandas as pd
pd.Series ( ["Hadoop", "Spark", "Python", "Oracle"] )
courses = pd.Series ( ["Hadoop", "Spark", "Python", "Oracle"] )
print(courses[2])

# Output: Python

DataFrames

DataFrames are a two-dimensional labeled data structure provided by the Pandas library. It is similar to a table, where data is organized in rows and columns. DataFrames are essential to organize data efficiently, allowing for the storage and manipulation of data in various formats. Pandas DataFrames provide a flexible way to manipulate data, from selecting or replacing specific columns and indices to reshaping the entire dataset.

In the context of DataFrames, a unique key is crucial as it serves as an identifier for associated values, ensuring accurate data retrieval and efficient access to stored information.

Here are some commonly used methods on DataFrames in Python:

  • DataFrame.pop() returns an item and drops from the frame.
  • DataFrame.tail() returns the last n rows.
  • DataFrame.to_numpy() converts the DataFrame to a NumPy array.
  • DataFrame.head() returns the first n rows.

Example of DataFrame in Python:


# Creating a simple Pandas DataFrame:
import pandas as pd

data = {
  "Score": [580, 250, 422],
  "id": [22, 37, 55]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)

Counter

A Counter is a dictionary subclass from Python’s collections module for counting hashable objects. It creates a dictionary where elements are stored as keys and their counts as values. This is particularly useful for counting unique elements in a collection, as it efficiently handles the identification and removal of duplicates.

Here are some commonly used methods on Counter in Python:

  • Counter.elements() - Returns an iterator of elements repeating as many times as their count
  • Counter.subtract([iterable]) - Subtracts counts from another iterable
  • Counter.update([iterable]) - Adds counts from another iterable

from collections import Counter
# Count items in a list
colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
color_counts = Counter(colors)
print(color_counts) # Counter({'blue': 3, 'red': 2, 'green': 1})
# Get most common colors
print(color_counts.most_common(2)) # [('blue', 3), ('red', 2)]

String

A String is an immutable sequence of characters. In Python, string objects are Unicode by default, making them perfect for handling text data. String objects, such as Python’s str, are immutable arrays of Unicode characters, which ensures efficiency and consistency. Additionally, string-like containers, such as UserString, act as wrappers around string objects to provide additional capabilities.

Strings in Python are immutable objects, meaning that once a string is created, it cannot be changed. This immutability helps maintain data integrity and prevents accidental modifications.

Here are some commonly used methods on Strings in Python:

  • string.split() - Splits string into a list of substrings
  • string.strip() - Removes leading and trailing whitespace
  • string.replace(old, new) - Replaces all occurrences of old with a new
  • string.upper()/lower() - Converts to uppercase/lowercase

Example of String operations in Python:


text = "  Data Analysis with Python  "
cleaned = text.strip()
words = cleaned.split()
print(cleaned.lower())  # "data analysis with python"
print(cleaned.replace("Python", "R"))  # "Data Analysis with R"

Matrix

A Matrix is a 2D array structure, typically implemented using NumPy arrays in Python. It’s used for mathematical operations and data representation. Unlike a tree structure, which organizes data hierarchically with nodes and branches, a matrix arranges data in rows and columns.

Here are some commonly used methods on Matrices in Python:

  • matrix.transpose() - Flips matrix over its diagonal
  • matrix.dot(other) - Matrix multiplication
  • matrix.reshape() - Changes matrix dimensions
  • matrix.sum()/mean() - Calculates sum/average of elements

Example of Matrix in Python:


import numpy as np

# Create a matrix
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Basic operations
print(matrix.transpose())  # Transpose matrix
print(matrix.dot(matrix))  # Matrix multiplication

Unlock the Power of Your Data with Airbyte

Airbyte

Now you know what Python data structures are and how to perform different operations on them. However, to perform queries on this data, you first need to consolidate it on a single platform. This is where Airbyte can help you!

It is a data integration platform that allows you to consolidate all your segregated data on a single platform. Airbyte’s vast connector library offers more than 350 pre-built connectors that can connect to multiple sources and destinations. Beyond its extensive library of pre-built connectors, Airbyte’s Connector Development Kit (CDK) empowers you to build custom connectors to suit unique data sources or destinations.

This flexibility allows seamless integration regardless of your specific data ecosystem. Understanding data structures is crucial as they are essential components of any modern software, organizing data effectively for storage, retrieval, and modification.

However, to ensure the updated information is seamlessly synced with your target system, its change data capture (CDC) capabilities enable you to migrate data efficiently.

Key features of Airbyte are:

  • Developer-friendly UI: If you are seeking even greater control and flexibility for designing data pipelines, Airbyte offers PyAirbyte, a Python library that provides programmatic access. With PyAirbyte, you can extract data from multiple connectors supported by Airbyte, enabling the creation of custom data pipelines tailored to your distinct needs. This empowers you to use Airbyte’s capabilities within your development workflows.
  • Transformation with dbt: Airbyte provides seamless integration with robust data transformation tools like dbt. This allows you to utilize transformation capabilities within your data migration pipelines, ensuring data is appropriately cleaned, normalized, and enriched as needed.
  • Robust Security: Airbyte prioritizes security with certifications like SOC 2, GDPR, and ISO and offers HIPAA compliance through its Conduit solution. This contributes to a secure and compliant data migration.

Power Your Data Workflows with the Right Tools

Python data structures are essential for organizing, processing, and analyzing data efficiently. But as your data grows across platforms, mastering local structures is just one part of the equation.

Airbyte helps you go further.

With 600+ pre-built connectors, Airbyte makes it easy to centralize your data, no matter where it lives. You can build robust pipelines using tools like PyAirbyte or integrate seamlessly with dbt for in-flight transformations. Whether you're scaling ingestion or syncing schema changes, Airbyte delivers the flexibility, performance, and control data teams need.

Pair your Python skills with Airbyte’s infrastructure — and unlock faster, more reliable data workflows.

Start syncing smarter with Airbyte.

Limitless data movement with free Alpha and Beta connectors
Introducing: our Free Connector Program
The data movement infrastructure for the modern data teams.
Try a 14-day free trial