Python Performance Secrets: Expert Tips with Real Benchmarks
Python Performance Secrets: Expert Tips with Real Benchmarks 🚀
Hey Pythonistas! 👋 Let’s have a real talk about squeezing every drop of performance from our Python code. I’m not just gonna throw theory at you - we’re going to benchmark everything so you can see the actual impact. Ready? Let’s dive in!
🎯 Tip #1: __slots__
- Your Memory’s Best Friend
Here’s the deal - Python classes are memory hogs by default. Each instance carries around a dictionary (__dict__
) to store attributes. Super flexible, but super wasteful when you’ve got thousands of instances.
Let’s See Some Numbers!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import sys
from pympler import asizeof # More accurate memory measurement
class ChattyUser:
def __init__(self, user_id, username):
self.user_id = user_id
self.username = username
class QuietUser:
__slots__ = ['user_id', 'username']
def __init__(self, user_id, username):
self.user_id = user_id
self.username = username
# Memory test
users_reg = [ChattyUser(i, f"user_{i}") for i in range(10000)]
users_slot = [QuietUser(i, f"user_{i}") for i in range(10000)]
print(f"Regular users: {asizeof.asizeof(users_reg)/1024:.2f} KB")
print(f"Slotted users: {asizeof.asizeof(users_slot)/1024:.2f} KB")
On my machine:
1
2
Regular users: 1757.81 KB
Slotted users: 703.12 KB
Whoa! That’s 60% less memory! And attribute access is faster too because we’re skipping dictionary lookups.
When Should You Use This?
- Building a game with thousands of entities
- Processing large datasets with custom objects
- Anywhere you’re creating tons of instances
⚡ Tip #2: String Building - The Right Way
I see so many devs doing this:
1
2
3
html = ""
for item in menu_items:
html += f"<li>{item}</li>"
Stop! 🛑 Let’s benchmark better options:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import timeit
menu_items = ["Home", "About", "Contact"] * 1000
def bad_concat():
html = ""
for item in menu_items:
html += f"<li>{item}</li>"
return html
def good_join():
return "".join(f"<li>{item}</li>" for item in menu_items)
def better_list():
parts = []
for item in menu_items:
parts.append(f"<li>{item}</li>")
return "".join(parts)
print("+= concatenation:", timeit.timeit(bad_concat, number=100))
print("Generator join:", timeit.timeit(good_join, number=100))
print("List append + join:", timeit.timeit(better_list, number=100))
Results:
1
2
3
+= concatenation: 0.342s
Generator join: 0.289s
List append + join: 0.256s
The winner? Pre-allocating a list and joining once. But the generator version is more memory efficient for huge datasets.
🔥 Tip #3: List Comprehensions Aren’t Just Pretty
I love list comps for their elegance, but they’re also faster:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import timeit
def old_school():
squares = []
for x in range(10000):
if x % 2 == 0:
squares.append(x*x)
return squares
def pythonic_way():
return [x*x for x in range(10000) if x % 2 == 0]
print("Traditional loop:", timeit.timeit(old_school, number=1000))
print("List comprehension:", timeit.timeit(pythonic_way, number=1000))
Results:
1
2
Traditional loop: 1.891s
List comprehension: 1.423s
About 25% faster! Plus, it’s more readable (once you’re used to it).
🏎️ Tip #4: Local Variables Speed Hack
This one’s sneaky - Python looks up local variables faster than globals. Watch:
1
2
3
4
5
6
7
8
9
10
11
12
13
import timeit
data = [x for x in range(10000)]
def global_sum():
return sum(data)
def local_sum():
local_data = data
return sum(local_data)
print("Global variable:", timeit.timeit(global_sum, number=10000))
print("Local variable:", timeit.timeit(local_sum, number=10000))
Results:
1
2
Global variable: 0.782s
Local variable: 0.673s
Pro Tip: In tight loops, copy global values to locals first!
đź’ˇ Bonus Tip: Function Calls Matter
Dot lookups in loops add up:
1
2
3
4
5
6
7
8
9
10
11
12
import math
import timeit
def slow_sqrt():
return [math.sqrt(x) for x in range(1000)]
def fast_sqrt():
sqrt = math.sqrt # Cache the function!
return [sqrt(x) for x in range(1000)]
print("With dot lookup:", timeit.timeit(slow_sqrt, number=10000))
print("Cached function:", timeit.timeit(fast_sqrt, number=10000))
Results:
1
2
With dot lookup: 3.112s
Cached function: 2.567s
🎤 Final Thoughts
Remember folks:
- Measure first! Don’t optimize blindly
- Readability > Micro-optimizations
- These tricks matter most in hotspots
What performance tricks have you discovered? Drop them in the comments! Let’s learn from each other.
Keep coding! ✌️