“Python Booster Pack” is probably the most clickbait-y title you’ve read for a post which aims at documenting a few tips and tricks in Python. Believe me, the content is not as disappointing as the title. kek. In this post, I wish to explain some of the lesser known features that Python or some of its modules have to offer which may affect the way you code and debug or increase your developer productivity to some extent. Not claiming anything though. Some of them are hidden in plain sight and call for a Eureka! moment .
I’m not the one who discovered or developed any of these features. Most of them appeared on my Twitter feed at 2AM or my Reddit feed at 3AM or my friend’s chat at 4AM.
from functools import lru_cache @lru_cache(maxsize=None) def fibonacci_with_lru(n): return n if n in [0, 1] else (fibonacci_with_lru(n - 1) + fibonacci_with_lru(n - 2)) def fibonacci(n): return n if n in [0, 1] else (fibonacci(n - 1) + fibonacci(n - 2))
Go on. Give the code mentioned above a whirl in your Jupyter notebook.
lru_cache decorator in the
functools module maintains a dictionary in a least recently used fashion to cache or auto-memoize the repeated calls. The
maxsize argument in the decorator decides the max cache size (usually in the powers of 2). It has another argument called
typed which, if set to
3.0 as two separate numbers.
In the snippet above, you can notice a speed up of almost 10-3 even though the input increased by 10x. This is not always advisable and certainly not a replacement for a faster non-recursive approach. It’s certainly cool and helps occasionally during prototyping and has it’s own use cases such as fetching files, etc.
binding.pry in Python? Hmmm.
I’ve seen an umpteen number of Ruby practitioners express their undying love for
binding.pry. It’s basically a debugging routine that helps you debug line by line and check various variables or object values at each stage. Well, it also exists in Python (
pip install ipdb)!
Consider the example of a Binary Search routine that we want to examine from within.
import ipdb; debug = ipdb.set_trace def binary_search(needle, haystack): left, right = 0, len(haystack) - 1 while left <= right: mid = left + (right - left) // 2 debug(); if haystack[mid] == needle: return True elif haystack[mid] > needle: right = mid - 1 else: left = mid + 1 return False binary_search(2, [1, 2, 3, 4]) binary_search(5, [1, 2, 3, 4])
Observe, how we’ve sneakily placed the
debug() routine after the middle index deciding stage of the algorithm.
The program pauses at the
debug() call and enables code introspection. Keep in mind that
n is used to move to the next line and
c is used to continue through the debugger.
You could also try multiple other debuggers. However,
ipdb uses a nice IPython shell with syntax highlighting and tab-completion which makes it my favourite.
Ever wondered why your routine takes so long in spite of writing an efficient piece of code? Spending hours trying to figure out which part of your routine is eating up all the time may be injurious to the hair on your head.
Anyway, if you’re on Python 2.7, start with a
pip install line_profiler
Python 3 users need to follow,
pip3 install Cython git clone https://github.com/rkern/line_profiler cd line_profiler; sudo python3 setup.py install
Let’s take an arbitrary mathematical function as an example. We import the
atexit modules for our use and place the
@profile decorator over the function we want to profile.
import numpy as np import line_profiler from atexit import register profile = line_profiler.LineProfiler() register(profile.print_stats) @profile def math_function(n): x = np.random.normal(size=n) y = np.power(x, 4) z = np.sqrt(y) return np.sum(z) math_function(7123681)
Run your script and voila! It gives you a line by line breakdown of how many hits each line gets (greater than 1 in case of a loop), total time, time per hit and percentage time taken with respect to the entire function.
Images == Numbers
Since an image is nothing but a collection of numbers, we can use a very handy lambda function to generate an image from a matrix of numbers. This lambda is quite fast and lightweight and gets the job done. This is also immensely helpful while prototyping or introspecting the dataset.
import numpy as np from PIL import Image get_image = lambda x: Image.fromarray(np.uint8(255 * (x - x.min()) / x.ptp())) get_image(np.random.rand(200, 200))
In the snippet above, we pass a random 200 x 200 matrix to the
get_image routine and get an
Better Python REPL
Most of the Pythonistas may be familiar with this. For those who aren’t, run
pip install ptpython
Ptpython is a replacement for the traditional Python REPL. It boasts of autocompletion, syntax highlighting, mouse support, multi-line editing and auto-tabbing. What a package!
You could also supercharge your game by setting it as an alias for your specific Python distribution.
Cython is <3
We often encounter scripts that perform a simple task but crunch a lot of data. Gaining any sort of speed up is a real pain at times. Let’s see how we can achieve this to some extent, starting with a simple Python program.
n = 100000000 arr = [0.0] * n for i in range(n): arr[i] = i % 3 print(arr[:10])
13.26s user 0.31s system 99% cpu 13.636 total
Using modifications inspired from Cython, let’s convert this program to something like the one below (also change the file extension to
from array import array cdef int n = 100000000 cdef object arr = array('d', [0.0]) * n cdef double[:] mv = arr cdef int i for i in range(n): mv[i] = i % 3 print(arr[:10])
Execute it using,
pip install Cython cythonize -b -i filename.pyx python -c "import <filename>"
0.53s user 0.31s system 86% cpu 0.866 total
Holy smokes! That’s almost 25 times faster than our non-cython code.
This happens because Cython creates a native binary that is linked at runtime. I’ve heard about a few use cases where it gives a speed up of almost 100x. If a few type declarations and avoiding Python’s lucid syntax doesn’t bother you much, this is a great option for computing heavy scripts.
This is one of the lesser known data structures in the
collections module. Its working is explained in the snippet below.
from collections import namedtuple Candidate = namedtuple('Candidate', 'name age gender') candidate_1 = Candidate(name='John Doe', age=35, gender='M') # Output # >>> candidate_1 # Candidate(name='John Doe', age=35, gender='M') # >>> candidate_1.name # 'John Doe' # >>> candidate_1.age # 35 # >>> candidate_1.gender # 'M'
namedtuple seems like a great replacement for people who want a
struct like functionality in Python and want to avoid classes in order to maintain basic records.
This wraps up the Part 1 of the “Python Booster Pack”. I’ll be on the hunt for more interesting stuff and hopefully there will be a Part 2. Until then, Adios!
Part 2: Link