Python Performance Secrets: Expert Tips with Real Benchmarks

Posted Jan 22, 2025

By Mahrez Benhamad

3 min read

Python Performance Secrets: Expert Tips with Real Benchmarks 🚀

Hey Pythonistas! 👋 Let’s have a real talk about squeezing every drop of performance from our Python code. I’m not just gonna throw theory at you - we’re going to benchmark everything so you can see the actual impact. Ready? Let’s dive in!

🎯 Tip #1: `slots` - Your Memory’s Best Friend

Here’s the deal - Python classes are memory hogs by default. Each instance carries around a dictionary (__dict__) to store attributes. Super flexible, but super wasteful when you’ve got thousands of instances.

Let’s See Some Numbers!

  
import sys
from pympler import asizeof  # More accurate memory measurement

class ChattyUser:
    def __init__(self, user_id, username):
        self.user_id = user_id
        self.username = username

class QuietUser:
    __slots__ = ['user_id', 'username']
    def __init__(self, user_id, username):
        self.user_id = user_id
        self.username = username

# Memory test
users_reg = [ChattyUser(i, f"user_{i}") for i in range(10000)]
users_slot = [QuietUser(i, f"user_{i}") for i in range(10000)]

print(f"Regular users: {asizeof.asizeof(users_reg)/1024:.2f} KB")
print(f"Slotted users: {asizeof.asizeof(users_slot)/1024:.2f} KB")

On my machine:

Regular users: 1757.81 KB
Slotted users: 703.12 KB

Whoa! That’s 60% less memory! And attribute access is faster too because we’re skipping dictionary lookups.

When Should You Use This?

Building a game with thousands of entities
Processing large datasets with custom objects
Anywhere you’re creating tons of instances

⚡ Tip #2: String Building - The Right Way

I see so many devs doing this:

  
html = ""
for item in menu_items:
    html += f"<li>{item}</li>"

Stop! 🛑 Let’s benchmark better options:

  
import timeit

menu_items = ["Home", "About", "Contact"] * 1000

def bad_concat():
    html = ""
    for item in menu_items:
        html += f"<li>{item}</li>"
    return html

def good_join():
    return "".join(f"<li>{item}</li>" for item in menu_items)

def better_list():
    parts = []
    for item in menu_items:
        parts.append(f"<li>{item}</li>")
    return "".join(parts)

print("+= concatenation:", timeit.timeit(bad_concat, number=100))
print("Generator join:", timeit.timeit(good_join, number=100))
print("List append + join:", timeit.timeit(better_list, number=100))

Results:

+= concatenation: 0.342s
Generator join: 0.289s 
List append + join: 0.256s

The winner? Pre-allocating a list and joining once. But the generator version is more memory efficient for huge datasets.

🔥 Tip #3: List Comprehensions Aren’t Just Pretty

I love list comps for their elegance, but they’re also faster:

  
import timeit

def old_school():
    squares = []
    for x in range(10000):
        if x % 2 == 0:
            squares.append(x*x)
    return squares

def pythonic_way():
    return [x*x for x in range(10000) if x % 2 == 0]

print("Traditional loop:", timeit.timeit(old_school, number=1000))
print("List comprehension:", timeit.timeit(pythonic_way, number=1000))

Results:

Traditional loop: 1.891s
List comprehension: 1.423s

About 25% faster! Plus, it’s more readable (once you’re used to it).

🏎️ Tip #4: Local Variables Speed Hack

This one’s sneaky - Python looks up local variables faster than globals. Watch:

  
import timeit

data = [x for x in range(10000)]

def global_sum():
    return sum(data)

def local_sum():
    local_data = data
    return sum(local_data)

print("Global variable:", timeit.timeit(global_sum, number=10000))
print("Local variable:", timeit.timeit(local_sum, number=10000))

Results:

Global variable: 0.782s
Local variable: 0.673s

Pro Tip: In tight loops, copy global values to locals first!

💡 Bonus Tip: Function Calls Matter

Dot lookups in loops add up:

  
import math
import timeit

def slow_sqrt():
    return [math.sqrt(x) for x in range(1000)]

def fast_sqrt():
    sqrt = math.sqrt  # Cache the function!
    return [sqrt(x) for x in range(1000)]

print("With dot lookup:", timeit.timeit(slow_sqrt, number=10000))
print("Cached function:", timeit.timeit(fast_sqrt, number=10000))

Results:

With dot lookup: 3.112s
Cached function: 2.567s

🎤 Final Thoughts

Remember folks:

Measure first! Don’t optimize blindly
Readability > Micro-optimizations
These tricks matter most in hotspots

What performance tricks have you discovered? Drop them in the comments! Let’s learn from each other.

Keep coding! ✌️

python

This post is licensed under CC BY 4.0 by the author.

Python Performance Secrets: Expert Tips with Real Benchmarks 🚀

🎯 Tip #1: __slots__ - Your Memory’s Best Friend

Let’s See Some Numbers!

When Should You Use This?

⚡ Tip #2: String Building - The Right Way

🔥 Tip #3: List Comprehensions Aren’t Just Pretty

🏎️ Tip #4: Local Variables Speed Hack

💡 Bonus Tip: Function Calls Matter

🎤 Final Thoughts

Trending Tags

🎯 Tip #1: `slots` - Your Memory’s Best Friend