In this article I am going to tell how to implement a function in C, to compile it and to import in Python. This can be helpful for gaining performance in some certain code sections in your Python application. Below I will describe and implement the algorithm of the longest common subsequence, that I implemented in Python and in C with import into Python. And in the end I will benchmark these solutions.

I wrote this article for myself partly, in order to have an opportunity to look at the details of the procedures described. :)

The task

Formulation. We have two sequences of elements (numbers or symbols, for example). It is necessary to find the longest subsequence each element of which is represented in both of given ones, the order does matter. Read more about it in Wikipedia.

I thing most of programmers know about this task. Particularly, the algorithm is included in the standard utility diff that compares the content of two files for differences.

As a sequence let us consider symbols in a string, and instead of the subsequence itself we will calculate its length only. Deep and comprehensive solution is not our goal is this article, we just need an example of an algorithm that is not trivial on one hand, and is easy to implement on another one.

Python implementation

# lcs_py.py

def solve(s1, s2):
    n1 = len(s1)
    n2 = len(s2)

    row = [0 for _ in range(n1)]
    for i2 in range(n2):
        row_p = row[:]
        for i1 in range(n1):
            if s1[i1] == s2[i2]:
                row[i1] = 1 + (row_p[i1 - 1] if i1 > 0 else 0)
            else:
                row[i1] = max(row[i1 - 1] if i1 > 0 else 0, row_p[i1])

    return row[-1]

Cythonization

Cythonization is a way to gain performance of a Python function. Shortly, the main idea is to modify your Python code in some places (to convert it into Cython code), then it is possible to compile it into a C-code and into DLL. After that this library is accessible as a Python module with the accustomed syntax.

# lcs_pyx.pyx

def solve(str s1, str s2):
    n1 = len(s1)
    n2 = len(s2)

    row = [0 for _ in range(n1)]
    for i2 in range(n2):
        row_p = row[:]
        for i1 in range(n1):
            if s1[i1] == s2[i2]:
                row[i1] = 1 + (row_p[i1 - 1] if i1 > 0 else 0)
            else:
                row[i1] = max(row[i1 - 1] if i1 > 0 else 0, row_p[i1])

    return row[-1]

Before importing we need an auxiliary file to compile:

# setup_pyx.py

from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules=cythonize("lcs_pyx.pyx")
)
$ python3 setup_pyx.py build_ext --inplace

C implementation

Same algorithm in C would look like:

/* lcs_c.h */

int solve(const char* s1, const char* s2);
/* lcs_c.c */

#include <string.h>
#include <stdlib.h>

#include "lcs_c.h"

#define max(x, y) (((x) >= (y)) ? x : y)


int solve(const char* s1, const char* s2) {
    int n1 = strlen(s1);
    int n2 = strlen(s2);

    int* row = (int*)calloc(sizeof(int), n1);
    int* row_p = (int*)calloc(sizeof(int), n1);

    memset(row, 0, sizeof(int) * n1);

    for (int i2 = 0; i2 < n2; i2++) {
        memcpy(row_p, row, sizeof(int) * n1);
        for (int i1 = 0; i1 < n1; i1++) {
            if (s1[i1] == s2[i2]) {
                row[i1] = 1 + ((i1 > 0) ? row_p[i1 - 1] : 0);
            }
            else {
                row[i1] = max(
                    (i1 > 0) ? row[i1 - 1] : 0,
                    row_p[i1]
                );
            }
        }
    }

    int result = row[n1 - 1];

    free(row);
    free(row_p);

    return result;
}

Compiliing it:

$ gcc -c -fPIC "lcs_c.c" -o "lcs_c.o"
$ gcc "lcs_c.o" -shared -o "lcs_c.so"

Benchmark

# test.py

import ctypes
from time import time
from random import choice

import lcs_py
import lcs_pyx

lcs_c = ctypes.CDLL('/home/alexander/Stuff/longest_common_subsequence/lcs_c.so')
lcs_c_solve = lambda s1, s2: lcs_c.solve(
    ctypes.c_char_p(s1.encode('utf-8')),
    ctypes.c_char_p(s2.encode('utf-8'))
)


LENGTH = 5000


def gen_random_word(length):
    return ''.join(choice('0123456789abcdef') for _ in range(length))


def test(func, word1, word2):
    tb = time()
    res = func(word1, word2)
    te = time()
    print("time: {:.2f}, result: {}".format(te - tb, res))


word1 = gen_random_word(LENGTH)
word2 = gen_random_word(LENGTH)

test(lcs_py.solve, word1, word2)
test(lcs_pyx.solve, word1, word2)
test(lcs_c_solve, word1, word2)

In the script I generate words with length LENGTH = 5000 consisting of random digits and letters from A to F. After that I measure the execution time. The result impresses. On my machine they are following:

$ python3 test.py
time: 8.21, result: 1971
time: 3.48, result: 1971
time: 0.10, result: 1971

As we see, pure Python is 82 times slower than C. Cythonization did help but it is 2.4 times faster only.

Conclusion

I think the conclusion is obvious. Each approach is acceptable depending on the task, time for developing, performance requirements and other factors. For example, if performance is not the goal, Python seems to be a good choice. Otherwise you should write in C. Cython is an intermediate option, it is especially convenient if you already have a code in Python.