# Rewriting Python Benchmarks¶

To understand the performance of RPython, we are trying to rewrite some Python benchmarks with RPython. In this section, we use several examples (fibonacci number and binary tree) to illustrate tips to rewrite Python code with RPython. As you will see, rewriting Python with RPython doesn’t require much effort, and we will gain a lot of performance improvements.

## Fibonacci Number¶

The following code is to calculate fibonacci number recursively.

```def fib(n):
if n == 0 or n == 1:
return 1
return fib(n - 1) + fib(n - 2)

def main():
fib(40)

if __name__ == '__main__':
main()
```

To run this benchmark, we should add a target function to tell RPython the entry point. Also, we change to get `n` from arguments instead of hard-coding a constant to avoid optimization. The following code can be compiled by RPython.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```def fib(n): if n == 0 or n == 1: return 1 return fib(n - 1) + fib(n - 2) # n should not be a constant, otherwise RPython will optimize it def main(n): ret = fib(n) # output the result, otherwise RPython will optimize out the above call # (the `transform_dead_op_vars` pass) print ret def target(*args): # we need a target function return entry_point, None def entry_point(argv): main(int(argv)) # get n from the arguments return 0 ```

Note

There are some passes to simplify flow graph for optimization. These passes include `transform_dead_op_vars`, `eliminate_empty_blocks`, `remove_assertion_errors`, `remove_identical_vars_SSA`, `constfold_exitswitch`, `remove_trivial_links`, `SSA_to_SSI`, `coalesce_bool`, `transform_ovfcheck`, `simplify_exceptions`, `transform_xxxitem`, and `remove_dead_exceptions`.

Here, we simply use the GNU `time` command to measure the execution time and maximum resident set size during the process’s lifetime.

```\$ /usr/bin/time -f "%C\t%U\t%M" python fib.py > /dev/null
python fib.py   18.21   5956

\$ /usr/bin/time -f "%C\t%U\t%M" pypy fib.py > /dev/null
pypy fib.py     4.77    112960

\$ rpython fib_rpy.py

\$ /usr/bin/time -f "%C\t%U\t%M" ./fib-c 40 > /dev/null
./fib_rpy-c 40  0.40    1704
```

## Binary Tree¶

Rewriting the binary tree benchmark needs a little more efforts. In the Python version, it uses a tuple of tuples to represent nodes the the tree. Since RPython doesn’t allow variable length tuples and mixed types tuples, we write a new `Node` class to represent nodes in binary trees. In addition, we rewrite the `sum` built-in function. The following code only shows `diff` of original Python code and modified RPython version.

```--- /drone/src/code/benchmarks/binary-tree.py
+++ /drone/src/code/benchmarks/binary-tree_rpy.py
@@ -5,18 +5,29 @@
# contributed by Antoine Pitrou
# modified by Dominique Wahli and Daniel Nanz

+import math
+
+class Node(object):
+    """
+    Represent nodes in a binary tree.
+    """
+    def __init__(self, value, left, right):
+        self.value = value
+        self.left = left     # left node
+        self.right = right   #  right node
+
def make_tree(i, d):

if d > 0:
i2 = i + i
d -= 1
-        return (i, make_tree(i2 - 1, d), make_tree(i2, d))
-    return (i, None, None)
+        return Node(i, make_tree(i2 - 1, d), make_tree(i2, d))
+    return Node(i, None, None)    # use Node instead of tuple

def check_tree(node):

-    (i, l, r) = node
+    (i, l, r) = (node.value, node.left, node.right)
if l is None:
return i
else:
@@ -54,15 +65,24 @@

mmd = max_depth + min_depth
for d in range(min_depth, stretch_depth, 2):
-        i = 2 ** (mmd - d)
+        i = int(math.pow(2, (mmd - d)))    # use math.pow instead of built-in pow
cs = 0
-        for argchunk in get_argchunks(i,d):
-            cs += sum(map(make_check, argchunk))
-        print '%d\t trees of depth %d\t check: %d' % (i * 2, d, cs)
+        for argchunk in get_argchunks(i,d):    # write the sum built-in function
+            s = 0
+            for a in argchunk:
+                s += make_check(a)
+            cs += s
+        print '%d\t trees of depth %d\t check: %d' % (i*2, d, cs)

print 'long lived tree of depth %d\t check: %d' % (
max_depth, check_tree(long_lived_tree))

-
if __name__ == '__main__':
main(17)
+
+def entry_point(argv):
+    main(int(argv))    # get argument from input to avoid optimization
+    return 0
+
+def target(*args): return entry_point, None
```

Let us measure execution times of the binary tree benchmark with Python and PyPy, and RPython rewrite as well.

```\$ /usr/bin/time -f "%C\t%U\t%M" python binary-tree.py > /dev/null
python binary-tree.py   10.45   60432

\$ /usr/bin/time -f "%C\t%U\t%M" pypy binary-tree.py > /dev/null
pypy binary-tree.py     1.60    187256

\$ /usr/bin/time -f "%C\t%U\t%M" ./binary-tree_rpy-c 17 > /dev/null
./binary-tree_rpy-c 17  0.38    68312
```

## Spectral Norm¶

For the spectral norm benchmark, we made the following changes:

• rewrite map function

• rewrite generators

• use list instead of tuple

• use loop to rewrite izip function

```--- /drone/src/code/benchmarks/spectral-norm.py
+++ /drone/src/code/benchmarks/spectral-norm_rpy.py
@@ -7,18 +7,27 @@
# Dirtily sped up by Simon Descarpentries
# Concurrency by Jason Stitt

-from itertools       import izip
-
def eval_A (i, j):
return 1.0 / ((i + j) * (i + j + 1) / 2 + i + 1)

def eval_A_times_u (u):
-    args = ((i,u) for i in xrange(len(u)))
-    return map(part_A_times_u, args)
+    args = []
+    for i in range(len(u)):
+        args.append((i, u))
+    ret = []
+    for i in args:
+        ret.append(part_A_times_u(i))
+    return ret

def eval_At_times_u (u):
-    args = ((i,u) for i in xrange(len(u)))
-    return map(part_At_times_u, args)
+    args = []
+    for i in range(len(u)):
+        args.append((i, u))
+    ret = []
+    for i in args:
+        tmp = part_At_times_u(i)
+        ret.append(tmp)
+    return ret

def eval_AtA_times_u (u):
return eval_At_times_u (eval_A_times_u (u))
@@ -39,6 +48,7 @@

def main(n):
for i in range(n):
+        v = []
u =  * DEFAULT_N

for dummy in xrange (10):
@@ -47,9 +57,20 @@

vBv = vv = 0

-        for ue, ve in izip (u, v):
-            vBv += ue * ve
-            vv  += ve * ve
+        l = 0
+        if len(u) > len(v): l = len(u)
+        else: l = len(v)
+        for i in range(l):
+            vBv += u[i] * v[i]
+            vv  += v[i] * v[i]
+        print vBv, vv

if __name__ == "__main__":
main(400)
+
+def entry_point(argv):
+    main(int(argv))
+    return 0
+
+def target(*args):
+    return entry_point, None
```
```\$ /usr/bin/time -f "%C\t%U\t%M" python spectral-norm.py > /dev/null
python spectral-norm.py 34.83   6140

\$ /usr/bin/time -f "%C\t%U\t%M" pypy spectral-norm.py > /dev/null
pypy spectral-norm.py   1.20    81016

\$ /usr/bin/time -f "%C\t%U\t%M" ./spectral-norm_rpy-c 17 > /dev/null
./spectral-norm_rpy-c 400       0.26    7892
```

## Benchmark Results¶

In addition to “fibonacci number” and “binary tree”, we also rewrite some other benchmarks (source code). The following table summarize the benchmark results.

Python 2.7.15

PyPy2 v6.0.0

RPython (PyPy2 v6.0.0)

Time (s)

Time (s)

Speedup

Time (s)

Speedup

Size (KB)

fib.py

18.21

4.77

3.82

0.40

45.53

282

binary-tree.py

10.45

1.60

6.53

0.38

27.50

301

float.py

11.47

1.51

7.60

0.57

20.12

277

fannkuch.py

3.91

0.54

7.24

0.32

12.22

282

buffer.py

1.23

0.64

1.92

0.52

2.37

284

spectral-norm.py

34.83

1.20

29.03

0.26

133.96

307

As you can see, the average speedup of RPython compared to Python 2.7.15 is about 21. Moreover, compared to the very large Python interpreter, the size of RPython binary is pretty small.