Ways to Optimize Your Code in Python

By optimizing Python code, you improve performance, reduce resource consumption, and enhance scalability. While Python is known for its simplicity and readability, these characteristics can sometimes come at the cost of efficiency.

In this article, we'll explore four ways to optimize your Python project and improve performance.

First, we'll look at how best to use data structures.

Efficient Use of Python Data Structures

We'll use some of the most well-known Python data structures to optimize our code.

Lists Vs. Tuples

Lists and tuples are probably the most basic and well-known data structures in Python. They serve different purposes, so they have different performance characteristics:

Lists are mutable, which means they can be modified after creation.
Tuples, instead, are immutable.

Before diving deep into why there are performance differences, let's write a code sample that creates a list and a tuple of 5 numbers.

Python

import timeit
 
# Calculate creation time
list_test = timeit.timeit(stmt="[1, 2, 3, 4, 5]", number=1000000)
tuple_test = timeit.timeit(stmt="(1, 2, 3, 4, 5)", number=1000000)
 
# Print results
print(f"List creation: {list_test: .3} seconds")
print(f"Tuple creation: {tuple_test: .3} seconds")

This results in:

Shell

List creation:  0.135 seconds
Tuple creation:  0.0207 seconds

To calculate performance differences, we use the timeit module like so:

The stmt parameter defines the code snippet we want to evaluate. So, in the case of the list_test variable, it evaluates a list of five numbers; in tuple_test, it evaluates a tuple of five numbers.
The number parameter specifies how many times the stmt parameter must be executed. In both cases, we run it 100,0000 times, meaning the code creates the list and the tuple 100,0000 times.

As the example shows, tuples are way faster than lists. Let's dig into why:

Memory allocation:
- Due to their immutability, tuples are stored in a fixed-size block of memory. The size of this block is determined when the tuple is created, and it doesn’t change. This fixed size makes tuple memory allocation fast.
- Lists, on the other hand, need to support dynamic resizing. This means they often allocate extra space to accommodate potential growth without requiring frequent reallocations.
Internal structure:
- The internal structure of a tuple consists essentially of a continuous block of memory with a fixed layout. This layout includes the elements themselves and some metadata (like size), but since tuples are immutable, the structure remains simple.
- Lists have a more complex internal structure to manage their mutability. They need to keep track of their current size and allocated capacity, and must handle changes in size dynamically.
Caching and optimization:
- Python can use various optimizations for tuples, such as caching, because their immutability guarantees that they won’t change after creation. These optimizations reduce the need for repeated memory allocation and speed up creation.
- While Python does optimize list operations, the potential for lists to change means that optimization is limited compared to tuples.

Dictionaries and Sets Vs. Lists in Python

In Python, dictionaries and sets are data structures that allow for fast lookups. When you want to check if an item is in a set or find a value associated with a key in a dictionary, these operations typically take constant time; this is denoted as O(1) in "Big O notation".

Given their structure, using dictionaries and sets can significantly improve performance when you need to frequently check for the existence of an item or access elements by a key.

Let's show this with a code snippet. For example, suppose we create a dictionary, a set, and a list with 100,0000 numbers. We want to look for the number 999,999 and then work out how long it takes using these three different data structures:

Python

import time
 
# Create a large set, dictionary, and list
large_set = {i for i in range(1000000)}
large_dict = {i: str(i) for i in range(1000000)}
large_list = [i for i in range(1000000)]
 
# define element to lookup
element = 999999
 
# Timing set lookup
start_time = time.time()
found = element in large_set
end_time = time.time()
print(f"Set lookup took: {end_time - start_time:.8f} seconds")
 
# Timing dictionary lookup
start_time = time.time()
found = element in large_dict
end_time = time.time()
print(f"Dictionary lookup took: {end_time - start_time:.8f} seconds")
 
# Timing list lookup
start_time = time.time()
found = element in large_list
end_time = time.time()
print(f"List lookup took: {end_time - start_time:.8f} seconds")

The result is:

Shell

Set lookup took: 0.00000000 seconds
Dictionary lookup took: 0.00000000 seconds
List lookup took: 0.00771618 seconds

So, basically, the time needed to search for the element 999,999 is (almost) 0 seconds for the set and the dictionary.

Of course, if we want to compare the sets and the dictionary, we'll find that the set provides better performance, as we may expect:

Python

import timeit
 
# Calculate timing performance for dictionary and set
dict_test = timeit.timeit(stmt="'a' in {'a': 1, 'b': 2, 'c': 3}", number=1000000)
set_test = timeit.timeit(stmt="1 in {1, 2, 3, 4, 5}", number=1000000)
 
# Print results
print(f"Dictionary lookup: {dict_test: .3} seconds")
print(f"Set lookup: {set_test: .3} seconds")

This results in:

Shell

Dictionary lookup:  0.0821 seconds
Set lookup:  0.0212 seconds

So, how do dictionaries and sets achieve O(1)?

Well, dictionaries and sets use a data structure called a hash table. Here's a simplified explanation of how it works:

Hashing: When you add a key to a dictionary or an item to a set, Python computes a hash value (a fixed-size integer) from the key or item. This hash value determines where the data is stored in memory.
Direct access: With the hash value, Python can directly access the location where the data is stored without searching the entire data structure.

So it's very fast to check if an item exists in a set or dictionary, which is useful for operations that require frequent existence checks.

Choosing the Right Data Structure

Choosing the appropriate data structure based on the specific needs of your application leads to significant performance gains.

If you need to store data and you're sure it won't change over time, definitely use tuples to optimize your code.

When you need to frequently look for elements, prefer sets and dictionaries over lists or tuples.

Global Variables, Encapsulation, and Namespace

In Python, scope determines the visibility and lifetime of a variable in a program. Variables can have different scopes:

Local Scope: This refers to variables defined within a function. They are only accessible inside that function.
Global Scope: Variables defined at the top level of a script or module. They are accessible throughout the module.
Class/instance Scope: Variables defined within a class, including class attributes and instance attributes.

This section describes code optimization by avoiding global variables, using class encapsulation, and managing a namespace correctly.

Avoiding Global Variables

Local variables are faster to access compared to global variables, primarily due to the way Python manages variable scopes and lookups.

In particular, Python uses the Local, Enclosing, Global, Built-in (LEGB) rule to resolve variable names:

Local: Names defined within a function.
Enclosing: Names in the local scopes of any enclosing functions.
Global: Names at the top level of the module or script.
Built-in: Preassigned names in the Python built-in namespace.

When accessing a variable, Python starts searching from the innermost scope (the local one). Since the local scope is limited to the function’s context, it contains fewer variables, making the search process quicker. On the contrary, global scope encompasses all top-level names in the module, resulting in a potentially larger search space.

Here's an example to show the difference in performance when using global vs. local variables:

Python

import time
 
# Local variable test
def local_test():
    a = 0
    for var_1 in range(1000000):
        a += 1
 
# Global variable test
b = 0
def global_test():
    global b
    for var_2 in range(1000000):
        b += 1
 
start_time = time.time()
local_test()
local_time = time.time() - start_time
print(f"Local variable test:{local_time: .3} seconds")
 
start_time = time.time()
global_test()
global_time = time.time() - start_time
print(f"Global variable test: {global_time: .3} seconds")

And the result is:

Shell

Local variable test: 0.0441 seconds
Global variable test:  0.0685 seconds

So, whenever possible, prefer using local variables to global ones.

Encapsulation

Encapsulating variables within functions and classes can improve performance by reducing the scope and limiting the number of variables the interpreter needs to track.

Let's see the difference in performance between using encapsulation and not using it:

Python

import timeit
 
# Class without encapsulation
class Rectangle:
    '''This class creates a rectangle without encapsulation'''
    def __init__(self, width, height):
        self.width = width
        self.height = height
 
    def area(self):
        return self.width * self.height
 
    def perimeter(self):
        return 2 * (self.width + self.height)
 
# Class with encapsulation
class EncapsulatedRectangle:
    '''This class creates a rectangle with encapsulation'''
    def __init__(self, width, height):
        self._width = width
        self._height = height
 
    def get_width(self):
        return self._width
 
    def set_width(self, width):
        self._width = width
 
    def get_height(self):
        return self._height
 
    def set_height(self, height):
        self._height = height
 
    def area(self):
        return self._width * self._height
 
    def perimeter(self):
        return 2 * (self._width + self._height)
 
# Create instances of both classes
rect = Rectangle(10, 20)
enc_rect = EncapsulatedRectangle(10, 20)
 
# Define the test functions
def test_rect_area():
    return rect.area()
 
def test_enc_rect_area():
    return enc_rect.area()
 
def test_rect_perimeter():
    return rect.perimeter()
 
def test_enc_rect_perimeter():
    return enc_rect.perimeter()
 
# Time the functions using timeit
iterations = 5000000
 
rect_area_time = timeit.timeit(test_rect_area, number=iterations)
enc_rect_area_time = timeit.timeit(test_enc_rect_area, number=iterations)
rect_perimeter_time = timeit.timeit(test_rect_perimeter, number=iterations)
enc_rect_perimeter_time = timeit.timeit(test_enc_rect_perimeter, number=iterations)
 
# Print results
print(f"Rectangle (no encapsulation) area time: {rect_area_time:.4f} seconds")
print(f"Encapsulated Rectangle area time: {enc_rect_area_time:.4f} seconds")
print(f"Rectangle (no encapsulation) perimeter time: {rect_perimeter_time:.4f} seconds")
print(f"Encapsulated Rectangle perimeter time: {enc_rect_perimeter_time:.4f} seconds")

The result is:

Shell

Rectangle (no encapsulation) area time: 2.1583 seconds
Encapsulated Rectangle area time: 2.2764 seconds
Rectangle (no encapsulation) perimeter time: 2.5185 seconds
Encapsulated Rectangle perimeter time: 2.4265 seconds

Encapsulation can significantly improve performance in larger applications with numerous variables. By keeping variables local to functions and classes, you reduce the interpreter's workload since it has fewer variables to manage. This leads to a faster execution time, especially in complex programs with many functions and classes.

These are the key benefits of encapsulation:

Reduced scope: By limiting the scope of variables to the smallest necessary context, the interpreter has fewer variables to track, which leads to faster execution.
Memory management: Local variables are automatically deallocated when a function exits, which helps with efficient memory use.
Avoiding naming conflicts: Encapsulation prevents variable name clashes, making code easily maintainable and less error-prone.

So, you'd better use encapsulation to improve performance when creating classes. Also, note that encapsulation provides controlled access and modification of attributes, protecting data from outside modifications.

Correct Namespace Management

Minimizing global namespace pollution leads to better performance. The main idea, in this case, is to use modules and packages to organize your code and keep the global namespace clean. In other words, instead of using too many global variables and creating too many functions or classes, create modules and import them.

Here's a general example of inefficient namespace management:

Python

global_var_1 = "..."
 
def first_function():
    '''A function'''
    ....
 
def second_function():
    '''Another function'''
    ...
 
def main_function():
    '''The main function of the program'''
    ...

The main idea is to change this to something like:

Python

from functions.function_1 import *
from functions.function_2 import *
 
def main_function():
    '''The main function of the program'''
    ...

To do so, you have to modularize your functions so that the structure of your folders becomes something like:

plaintext

main_folder/
|__ main.py
|
|__ functions/
        |__ function_1.py
        |__ function_2.py

NOTE: In this case, function_1.py and function_2.py can be bigger than first_function() and second_function().

So, whenever possible, create modules and packages from your code. This helps with:

Performance: The code is more efficient on the machine side.
Readability: Shorter code is generally more easily readable than longer code. It's better to have small connected programs than a big program.
Reuse: Any module or package you create can be used in other programs, helping you save time in the future.

Utilize List Comprehensions and Generator Expressions

This section describes how code performance can be improved through list comprehension and generators.

List Comprehension

List comprehension is a fast and concise way to create a new list using the power of loops and statements with one line of code.

Let's see the difference in performance first:

Python

import timeit
 
# Define the code snippets as functions
def loop_code():
    '''This function creates a new list out of classic for loop'''
    squares = []
    for x in range(10):
        squares.append(x**2)
 
def comprehension_code():
    ''''This function created a new list out of a list comprehension'''
    squares = [x**2 for x in range(10)]
 
# Measure execution time
loop_test = timeit.timeit(loop_code, number=1000000)
comprehension_test = timeit.timeit(comprehension_code, number=1000000)
 
print(f"Loop: {loop_test: .4} seconds")
print(f"List comprehension: {comprehension_test: .4} seconds")

This leads to:

Shell

Loop:  2.642 seconds
List comprehension:  2.444 seconds

List comprehension is more performance-friendly than standard for-loops because of:

Reduced overhead: List comprehensions are implemented in C within the Python interpreter, making them faster (as lower-level optimizations aren't accessible in a standard Python for-loop).
No method calls: In a traditional for-loop, the append() method is called repeatedly, which adds some overhead. List comprehensions avoid this by constructing the list in a single expression.
Local scope: Variables defined within a list comprehension are scoped more tightly than variables defined in a for-loop. This reduces the potential for variable conflicts and can sometimes make garbage collection more efficient.

So, whenever possible, always prefer using list comprehension to create a new list. This enhances performance and code readability.

Generator Expressions

Generator expressions in Python provide a concise way to create generators without using a separate generator function with the yield() method. They are similar to list comprehensions — the key difference being that they produce values one at a time and only when needed, which makes them more memory-efficient for large data sets.

Let's see how generator expressions can improve performance:

Python

import timeit
 
# Define the size of the iterable
n = 1000000
 
# Generator expression
gen_expr = (i for i in range(n))
 
# List comprehension
list_comp = [i for i in range(n)]
 
# Measure the time taken by the generator expression
gen_time = timeit.timeit('sum((i for i in range(n)))', globals=globals(), number=10)
 
# Measure the time taken by the list comprehension
list_time = timeit.timeit('sum([i for i in range(n)])', globals=globals(), number=10)
 
print(f"Generator expression took {gen_time:.4f} seconds")
print(f"List comprehension took {list_time:.4f} seconds")

This results in:

Shell

Generator expression took 1.4823 seconds
List comprehension took 1.9738 seconds

Generator expressions are more efficient than list comprehension due to:

Lazy evaluation: Generator expressions generate items on the fly. This means that they do not compute all items at once, which is memory efficient.
Memory efficiency: Since values are produced one at a time, generator expressions use less memory compared to list comprehensions, especially for large datasets.

If you need to store values and use them only when needed, always prefer generators for good performance.

Leveraging Built-in Functions and Libraries

This section describes how optimizing code using built-in libraries and functions improves the performance of your machine.

Standard Library Efficiency

Python’s standard library functions are often implemented in C and optimized for speed. Using these functions leads to significant performance improvements, due to:

Lower-level operations: C operates closer to the hardware level compared to Python, providing more efficient memory and CPU usage.
Optimized algorithms: Experienced developers highly optimize standard library functions to perform common tasks efficiently.
Reduced overhead: Invoking a function implemented in C avoids the overhead associated with Python's dynamic typing and interpreted execution.

Suppose we sort a list with a lot of numbers. The following example compares performance when creating a custom function versus using a built-in one:

Python

import time
 
# Sorting via custom function
def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
 
arr = [i for i in range(10000, 0, -1)]
start_time = time.time()
 
bubble_sort(arr)
 
end_time = time.time()
print(f"Bubble sort took: {end_time - start_time} seconds")
 
# Sorting via built-in function
arr = [i for i in range(10000, 0, -1)]
start_time = time.time()
 
sorted(arr)
 
end_time = time.time()
print(f"Sorted function took: {end_time - start_time} seconds")

The performance results:

Shell

Bubble sort took: 25.067534685134888 seconds
Sorted function took: 0.0 seconds

The built-in sorted() function immediately performs the sorting operation. The custom function, on the other hand, takes nearly 30 seconds to complete its tasks.

Using Third-party Libraries

We can also use third-party libraries like NumPy (a library that brings the computational power of languages like C and Fortran to Python) and Pandas (a fast, powerful, flexible, and easy-to-use open source data analysis and manipulation tool, built on top of the Python programming language) for performance optimizations. These libraries are highly optimized for numerical computations and data manipulation, so, for performance reasons, it's always better to use them rather than to create a custom function.

Suppose we want to add up an array's elements. We can do so with a custom function or with the method np.sum() by Numpy:

Python

import time
import numpy as np
 
def sum_array(arr):
    total = 0
    for num in arr:
        total += num
    return total
 
# Create a large array
large_array = list(range(1, 10000001))
 
# Measure time for custom function
start_time = time.time()
custom_sum = sum_array(large_array)
custom_duration = time.time() - start_time
 
 
# Convert the list to a NumPy array
large_array_np = np.array(large_array)
 
# Measure time for NumPy function
start_time = time.time()
numpy_sum = np.sum(large_array_np)
numpy_duration = time.time() - start_time
 
print(f"Duration with custom function: {custom_duration: .4} seconds")
print(f"Duration with Numpy: {numpy_duration: .4} seconds")

And here's the result:

Shell

Duration with custom function:  1.023 seconds
Duration with Numpy:  0.008745 seconds

The difference in performance is huge in this case!

So, remember: you don't need to reinvent the wheel. One of Python's superpowers is that it relies on a vast range of both standard and third-party libraries. You can always use them to save coding and computation time.

Wrapping Up

In this article, we've described four ways to optimize your Python code to improve your machine's performance (and save coding time). We hope you find these tips and tricks useful.

Happy coding!

Core features

Advanced tools

Supported Languages

Larger scale

Add-Ons

Ways to Optimize Your Code in Python

Efficient Use of Python Data Structures

Lists Vs. Tuples

Dictionaries and Sets Vs. Lists in Python

Choosing the Right Data Structure

Global Variables, Encapsulation, and Namespace

Avoiding Global Variables

Encapsulation

Correct Namespace Management

Utilize List Comprehensions and Generator Expressions

List Comprehension

Generator Expressions

Leveraging Built-in Functions and Libraries

Standard Library Efficiency

Using Third-party Libraries

Wrapping Up

Wondering what you can do next?

Most popular Python articles

An Introduction to Flask-SQLAlchemy in Python

Monitor the Performance of Your Python Flask Application with AppSignal

Find and Fix N+1 Queries in Django Using AppSignal

Federico Trotta

AppSignal monitors your apps