Rewriting Python Benchmarks

To understand the performance of RPython, we are trying to rewrite some Python benchmarks with RPython. In this section, we use several examples (fibonacci number and binary tree) to illustrate tips to rewrite Python code with RPython. As you will see, rewriting Python with RPython doesn’t require much effort, and we will gain a lot of performance improvements.

Fibonacci Number

The following code is to calculate fibonacci number recursively.

def fib(n):
    if n == 0 or n == 1:
        return 1
    return fib(n - 1) + fib(n - 2)

def main():
    fib(40)

if __name__ == '__main__':
    main()

To run this benchmark, we should add a target function to tell RPython the entry point. Also, we change to get n from arguments instead of hard-coding a constant to avoid optimization. The following code can be compiled by RPython.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
def fib(n):
    if n == 0 or n == 1:
        return 1
    return fib(n - 1) + fib(n - 2)

# n should not be a constant, otherwise RPython will optimize it
def main(n):
    ret = fib(n)
    # output the result, otherwise RPython will optimize out the above call
    # (the `transform_dead_op_vars` pass)
    print ret

def target(*args):              # we need a target function
    return entry_point, None

def entry_point(argv):
    main(int(argv[1]))          # get n from the arguments
    return 0

Note

There are some passes to simplify flow graph for optimization. These passes include transform_dead_op_vars, eliminate_empty_blocks, remove_assertion_errors, remove_identical_vars_SSA, constfold_exitswitch, remove_trivial_links, SSA_to_SSI, coalesce_bool, transform_ovfcheck, simplify_exceptions, transform_xxxitem, and remove_dead_exceptions.

Here, we simply use the GNU time command to measure the execution time and maximum resident set size during the process’s lifetime.

$ /usr/bin/time -f "%C\t%U\t%M" python fib.py > /dev/null
python fib.py   18.21   5956

$ /usr/bin/time -f "%C\t%U\t%M" pypy fib.py > /dev/null
pypy fib.py     4.77    112960

$ rpython fib_rpy.py

$ /usr/bin/time -f "%C\t%U\t%M" ./fib-c 40 > /dev/null
./fib_rpy-c 40  0.40    1704

Binary Tree

Rewriting the binary tree benchmark needs a little more efforts. In the Python version, it uses a tuple of tuples to represent nodes the the tree. Since RPython doesn’t allow variable length tuples and mixed types tuples, we write a new Node class to represent nodes in binary trees. In addition, we rewrite the sum built-in function. The following code only shows diff of original Python code and modified RPython version.

--- /drone/src/code/benchmarks/binary-tree.py
+++ /drone/src/code/benchmarks/binary-tree_rpy.py
@@ -5,18 +5,29 @@
 # contributed by Antoine Pitrou
 # modified by Dominique Wahli and Daniel Nanz
 
+import math
+
+class Node(object):
+    """
+    Represent nodes in a binary tree.
+    """
+    def __init__(self, value, left, right):
+        self.value = value
+        self.left = left     # left node
+        self.right = right   #  right node
+
 def make_tree(i, d):
 
     if d > 0:
         i2 = i + i
         d -= 1
-        return (i, make_tree(i2 - 1, d), make_tree(i2, d))
-    return (i, None, None)
+        return Node(i, make_tree(i2 - 1, d), make_tree(i2, d))
+    return Node(i, None, None)    # use Node instead of tuple
 
 
 def check_tree(node):
 
-    (i, l, r) = node
+    (i, l, r) = (node.value, node.left, node.right)
     if l is None:
         return i
     else:
@@ -54,15 +65,24 @@
 
     mmd = max_depth + min_depth
     for d in range(min_depth, stretch_depth, 2):
-        i = 2 ** (mmd - d)
+        i = int(math.pow(2, (mmd - d)))    # use math.pow instead of built-in pow
         cs = 0
-        for argchunk in get_argchunks(i,d):
-            cs += sum(map(make_check, argchunk))
-        print '%d\t trees of depth %d\t check: %d' % (i * 2, d, cs)
+        for argchunk in get_argchunks(i,d):    # write the sum built-in function
+            s = 0
+            for a in argchunk:
+                s += make_check(a)
+            cs += s
+        print '%d\t trees of depth %d\t check: %d' % (i*2, d, cs)
 
     print 'long lived tree of depth %d\t check: %d' % (
           max_depth, check_tree(long_lived_tree))
 
-
 if __name__ == '__main__':
     main(17)
+
+def entry_point(argv):
+    main(int(argv[1]))    # get argument from input to avoid optimization
+    return 0
+
+# add a target
+def target(*args): return entry_point, None

Let us measure execution times of the binary tree benchmark with Python and PyPy, and RPython rewrite as well.

$ /usr/bin/time -f "%C\t%U\t%M" python binary-tree.py > /dev/null
python binary-tree.py   10.45   60432

$ /usr/bin/time -f "%C\t%U\t%M" pypy binary-tree.py > /dev/null
pypy binary-tree.py     1.60    187256

$ /usr/bin/time -f "%C\t%U\t%M" ./binary-tree_rpy-c 17 > /dev/null
./binary-tree_rpy-c 17  0.38    68312

Spectral Norm

For the spectral norm benchmark, we made the following changes:

  • rewrite map function

  • rewrite generators

  • use list instead of tuple

  • use loop to rewrite izip function

--- /drone/src/code/benchmarks/spectral-norm.py
+++ /drone/src/code/benchmarks/spectral-norm_rpy.py
@@ -7,18 +7,27 @@
 # Dirtily sped up by Simon Descarpentries
 # Concurrency by Jason Stitt
 
-from itertools       import izip
-
 def eval_A (i, j):
     return 1.0 / ((i + j) * (i + j + 1) / 2 + i + 1)
 
 def eval_A_times_u (u):
-    args = ((i,u) for i in xrange(len(u)))
-    return map(part_A_times_u, args)
+    args = []
+    for i in range(len(u)):
+        args.append((i, u))
+    ret = []
+    for i in args:
+        ret.append(part_A_times_u(i))
+    return ret
 
 def eval_At_times_u (u):
-    args = ((i,u) for i in xrange(len(u)))
-    return map(part_At_times_u, args)
+    args = []
+    for i in range(len(u)):
+        args.append((i, u))
+    ret = []
+    for i in args:
+        tmp = part_At_times_u(i)
+        ret.append(tmp)
+    return ret
 
 def eval_AtA_times_u (u):
     return eval_At_times_u (eval_A_times_u (u))
@@ -39,6 +48,7 @@
 
 def main(n):
     for i in range(n):
+        v = []
         u = [1] * DEFAULT_N
 
         for dummy in xrange (10):
@@ -47,9 +57,20 @@
 
         vBv = vv = 0
 
-        for ue, ve in izip (u, v):
-            vBv += ue * ve
-            vv  += ve * ve
+        l = 0
+        if len(u) > len(v): l = len(u)
+        else: l = len(v)
+        for i in range(l):
+            vBv += u[i] * v[i]
+            vv  += v[i] * v[i]
+        print vBv, vv
 
 if __name__ == "__main__":
     main(400)
+
+def entry_point(argv):
+    main(int(argv[1]))
+    return 0
+
+def target(*args):
+    return entry_point, None
$ /usr/bin/time -f "%C\t%U\t%M" python spectral-norm.py > /dev/null
python spectral-norm.py 34.83   6140

$ /usr/bin/time -f "%C\t%U\t%M" pypy spectral-norm.py > /dev/null
pypy spectral-norm.py   1.20    81016

$ /usr/bin/time -f "%C\t%U\t%M" ./spectral-norm_rpy-c 17 > /dev/null
./spectral-norm_rpy-c 400       0.26    7892

Benchmark Results

In addition to “fibonacci number” and “binary tree”, we also rewrite some other benchmarks (source code). The following table summarize the benchmark results.

Python 2.7.15

PyPy2 v6.0.0

RPython (PyPy2 v6.0.0)

Time (s)

Time (s)

Speedup

Time (s)

Speedup

Size (KB)

fib.py

18.21

4.77

3.82

0.40

45.53

282

binary-tree.py

10.45

1.60

6.53

0.38

27.50

301

float.py

11.47

1.51

7.60

0.57

20.12

277

fannkuch.py

3.91

0.54

7.24

0.32

12.22

282

buffer.py

1.23

0.64

1.92

0.52

2.37

284

spectral-norm.py

34.83

1.20

29.03

0.26

133.96

307

As you can see, the average speedup of RPython compared to Python 2.7.15 is about 21. Moreover, compared to the very large Python interpreter, the size of RPython binary is pretty small.