5. Performance & Profiling

Measure first. Use the right tools to find hotspots, choose structures wisely, and validate wins with numbers.

Q1 Your Python application is slow. How would you profile it to find the bottleneck?

Answer: The first step is to measure. For overall CPU usage hot spots, cProfile is the standard tool. For a more detailed, line-by-line analysis of a specific function, line_profiler is excellent. For memory issues, tracemalloc can pinpoint allocation sites, and tools like py-spy can profile running production processes with low overhead.

Explanation: It's critical to measure before attempting to optimize. Once a bottleneck is identified, common performance wins include reducing Python-level loops, using vectorized libraries (like NumPy), choosing appropriate data structures (deque for FIFO, set for membership), and using __slots__ to reduce memory for large numbers of objects.

import cProfile, pstats, io

pr = cProfile.Profile(); pr.enable()
# ... run workload ...
pr.disable(); s = io.StringIO()
pstats.Stats(pr, stream=s).sort_stats("cumtime").print_stats(20)
print(s.getvalue())

Q2 How would you profile memory usage in a Python application?

Answer: For diagnosing memory leaks or high memory usage, the standard library's tracemalloc module is the best starting point. For a line-by-line analysis of memory consumption in specific functions, the third-party memory-profiler library is very useful.

Explanation: tracemalloc can take snapshots of the heap to show you exactly where memory is being allocated and how it's growing over time. This is invaluable for finding the source of memory leaks in long-running applications. memory-profiler provides a decorator that reports the memory usage of each line of code within a function, helping to identify particularly memory-intensive operations.

Q3 How do you microbenchmark correctly in Python?

Answer: Use timeit to avoid common pitfalls like warmup effects and global lookups. Benchmark minimal, representative snippets.

Explanation: Prefer many loops and statistics; avoid I/O.

import timeit
print(timeit.timeit("sum(range(1000))", number=10000))

Q4 What are common performance levers in Python?

Answer: Choose the right data structures (list vs deque vs array), reduce Python loops (vectorize), cache results (lru_cache), minimize allocations (reuse buffers, memoryview), and push hot paths to C (NumPy, Cython, Numba) when justified by profiling.

Explanation: Always verify wins with benchmarks and profile again after changes.