I first wrote about Codon back in April 2023 in ;login:. At the time, I was excited about the attempt to create a compiler for Python that could run programs much faster than the Python interpreter. Recently, the founders of Codon sent me a pointer to a blog post with updates to their project that I found important, in some ways, exciting.
When I tried Codon before, I struggled to get a simple script that I use to summarize ;login: downloads to compile. This time around, Codon had none of the problems I encountered the first time around. I attribute this to work done by Codon committers to improve the compiler's ability to convert Python scripts into the intermediate language they then present to an LLVM backend. Not that my script seemed to benefit from being compiled by Codon, taking about as long to run. But my script doesn't do much more than fill up an associative array, sum the keys, then print the sorted totals as output.
In their blog post, the authors provide charts showing the improvement of Codon over regular Python when running the NPBench NumPy benchmarks. The geometric mean of speedups is modest, 2.4x, but the maximum is crazy, at 900x. The reason for this is that the Codon team has ported NumPy, a Python library, directly into Codon.
I assumed that the values shown in the chart were correct rather than trying to run the benchmarks myself. I learned a long time ago that attempting to duplicate someone else's benchmark results can be fool's errand: the people who created the hardware or software know much more about how to make it run fast. But I did want to try out a simple Python script that the blog's authors claimed could be speeded up 300x when using Codon-NumPy (in the Loops section of their blog):
import numpy as np
import time
a = np.empty((300, 300, 300), dtype=np.float32)
t0 = time.time()
for i in range(300):
for j in range(300):
for k in range(300):
a[i, j, k] = i + j + k
t1 = time.time()
print(a.mean())
print('time:', t1 - t0)
When I first tried this on my modest Debian on x86 desktop, I didn't see much performance improvement.
rik@nuke:~/C/Codon$ python3 loop.py
448.50006
time: 4.611926078796387
rik@nuke:~/C/Codon$ codon build loop.py
rik@nuke:~/C/Codon$ ./loop
448.5
time: 2.58449
rik@nuke:~/C/Codon$
I contacted Ariya Shajii, CEO of Exaloop, and he replied that I had forgotten to include the -release flag—something that's not mentioned in the blog post, so I really didn't forget about it. When I include -release, I do see a 115x improvement, and the size of the executable is much smaller. Apparently, without the -release flag, the regular NumPy library gets included instead of the Codon-NumPy, something I could guess because the size of the binary without -release is much larger and contains strings that appear to be hooks from Codon into NumPy.
rik@nuke:~/C/Codon$ codon build -release loop.py
rik@nuke:~/C/Codon$ ./loop
448.5
time: 0.0398803
rik@nuke:~/C/Codon$ ls -l loop*
-rwxrwxr-x 1 rik rik 16224 Mar 7 14:35 loop
-rwxrwxr-x 1 rik rik 652688 Mar 5 18:07 loop-no
-rw-rw-r-- 1 rik rik 267 Mar 5 18:06 loop.py
rik@nuke:~/C/Codon$
It's really important that you use -release when using Codon NumPy. I suggested that they make this the default.
There's also support for using GPUs in Codon NumPy via decorators, as well as telling Codon how many threads you want to use in a loop.
That's two improvements, one in usability and another in performance that's a very big win for anyone using NumPy in Python scripts. As Python behaves like it is single-threaded because of the global interpreter lock (GIL), having Codon's ability to execute portions of loops in paralell is a big win, just as it is having compiled rather than interpreted code. Note that Codon does not currently run on Windows, just Linux and MacOS.
The other big news is that Exaloop, the company that is behind Codon, has changed their license to Apache 2, one of the most liberal Open Source licenses. For example, commercial use, and derivations of Codon, are now permitted without licensing.
The bottom line is simple: if you are using Python to process large amounts of data using NumPy, you really want to start using Codon. More trivial uses, like my own routine processing of weblogs, really don't benefit much from compiled Python. On the other hand, if you are constantly spinning up lamdas that run Python code, I imagine that starting up a compiled script will be much faster and certainly cheaper than invoking a Python interpreter and have it process a script for every lambda instance.