5. Performance & Profiling
Measure first. Use the right tools to find hotspots, choose structures wisely, and validate wins with numbers.
Question: Your Python application is slow. How would you profile it to find the bottleneck?
Answer: The first step is to measure. For overall CPU usage hot spots, cProfile
is the standard tool. For a more detailed, line-by-line analysis of a specific function, line_profiler
is excellent. For memory issues, tracemalloc
can pinpoint allocation sites, and tools like py-spy
can profile running production processes with low overhead.
Explanation: It's critical to measure before attempting to optimize. Once a bottleneck is identified, common performance wins include reducing Python-level loops, using vectorized libraries (like NumPy), choosing appropriate data structures (deque
for FIFO, set
for membership), and using __slots__
to reduce memory for large numbers of objects.
import cProfile, pstats, io
pr = cProfile.Profile(); pr.enable()
# ... run workload ...
pr.disable(); s = io.StringIO()
pstats.Stats(pr, stream=s).sort_stats("cumtime").print_stats(20)
print(s.getvalue())
Question: How would you profile memory usage in a Python application?
Answer: For diagnosing memory leaks or high memory usage, the standard library's tracemalloc
module is the best starting point. For a line-by-line analysis of memory consumption in specific functions, the third-party memory-profiler
library is very useful.
Explanation: tracemalloc
can take snapshots of the heap to show you exactly where memory is being allocated and how it's growing over time. This is invaluable for finding the source of memory leaks in long-running applications. memory-profiler
provides a decorator that reports the memory usage of each line of code within a function, helping to identify particularly memory-intensive operations.
Question: How do you microbenchmark correctly in Python?
Answer: Use timeit
to avoid common pitfalls like warmup effects and global lookups. Benchmark minimal, representative snippets.
Explanation: Prefer many loops and statistics; avoid I/O.
import timeit
print(timeit.timeit("sum(range(1000))", number=10000))
Question: What are common performance levers in Python?
Answer: Choose the right data structures (list
vs deque
vs array
), reduce Python loops (vectorize), cache results (lru_cache
), minimize allocations (reuse buffers, memoryview
), and push hot paths to C (NumPy, Cython, Numba) when justified by profiling.
Explanation: Always verify wins with benchmarks and profile again after changes.