Photo by pypy

With support of huge number of libraries which enables provision for easy compatibility for any tool, Python has been one of the most used programming language for scripting. In this blog, we will look at how can we can easily increase the efficiency of our scripts multifold based on our usecase.

It was rightly said by the Creater Of Python in PyCon 2015

          "If you want your code to run faster,
             you should probably just use PyPy."
                                       - Guido van Rossum



Default Python Execution

Lets first understand how python program is executed.

CPython Implementation
CPython Implementation
  • First you write your python code in file, lets call that file foo.py.
  • You execute your code using below command
    python foo.py

  • Ofcourse, computer doesn’t understand this high level language as computer works only on machine language. To convert your code to machine language, python performs following steps:
    • Python code gets compiled to bytecode and gives .pyc file, in our case foo.pyc
    • This .pyc file then gets interpreted to machine language in PVM(Python Virtual Machine) and finally you get your output
  • This process of implementation is referred as CPython.
  • Python is a programming language & CPython is default implementation of python.



Default Python (CPython) VS PyPy

  • CPython:
    • CPython is the default byte-code interpreter of Python, which is written in C programming language.
    • When you install python from the official website, default interpreter is CPython.
    • CPython basically converts your code to bytecode first and then interprets this bytecode to machine language and gives you desired output.
    • The Global Interpreter Lock (GIL) in CPython allows only one OS thread to execute Python bytecode at any given time, and the consequence of this is that it’s not possible to speed up CPU-intensive Python code by distributing the work among multiple threads
    • CPython tends to be slower due to multilevel conversion of your code and single thread execution of bytecode even with multi-core machines.
  • PyPy
    • As we now understand the reason for CPython to be slow, to fill this gap, PyPy is used.
    • PyPy is written in RPython(Restricted Python - a subset of python)
    • But what makes PyPy faster ? It is JIT Compiler. (And is GIL-less)
    • PyPy uses JIT or Just-in-Time compilation approach to compile the source code directly into native machine code instead of bytecode. Thereby making the code to run faster by skipping any intermediate conversion.
    • PyPy is a compiler while CPython is an interpreter.
    • PyPy works faster for only pure python code i.e. there shouldn’t be any embedded C code. If that’s the case, it’s performance deteriorates.
    • So the use case where PyPy works best is when executing long-running programs where a significant fraction of the time is spent executing Python code.
    • Features of PyPy:
      • Significant increase in execution speed (4.4 times faster than CPython on an average.)
      • Efficient memory usage
      • Stackless
    • Drawbacks of PyPy:
      • if your code doesn’t run for at least a few seconds, then the JIT compiler won’t have enough time to warm up.
      • If all the time is spent in run-time libraries (i.e. in C functions), and not actually running Python code, the JIT compiler will not be any better than usual execution.
      • Not suitable for data science libraries as those contain C functions and not pure python code.

Just to be clear, don’t get confused between PyPy & PyPI(Python Package Index). The Python Package Index (PyPI) is a repository of software for the Python programming language. And we are going to discuss about PyPy in this blog.



Installion and setting up of PyPy3:

Let’s walk through installation of PyPy3 on different machines.

  • Ubuntu:

    • Adding pypy to repository
      sudo add-apt-repository ppa:pypy/ppa

    • Updating packages
      sudo apt update

    • Installing pypy3
      sudo apt install pypy3

  • Mac:

    • Installing pypy3 using brew
      brew install pypy3
  • Windows:

    • Download Windows Binary file from official website
    • Execute the binary for installation


Along with PyPy, we would also need pip for python package management

  • Installation of Pip:
    • Download the latest version of get-pip.py
    • Go to the folder where you download get-pip and run following cmd: pypy get-pip.py

Now finally, it’s similar to use PyPy just like using Cpython for python code execution

  • Running python code using pypy:
    • First write your python code.
    • then run following command instead of running python3 <file_name.py> pypy3 <file_name.py>



CPython & PyPy Execution

Let’s check upon actual execution time of same piece of code in both Cpython & PyPy for getting an idea of its difference!
For comparision, we will execute a long iterating for loop and record execution period using time.

CPython
time taken by CPython
PyPy
time taken by PyPy

As we can see by above example, PyPy took only 5 seconds to complete while CPython took 2 minutes and 20 seconds to complete execution of same loop.

Conclusion

No doubt, Python (CPython) is vastly used for ML/AI projects as most of its libraries have embedded C functions included. So in such case, PyPy won’t serve its purpose well !
But for usecase involving pure python code, it will be best to use PyPy instead of CPython for better execution efficiency.