Python is just awesome! It is the most versatile and readable programming language of all time.
But deep within the core of the statement lies an ambiguity. A dedicated and experienced Python engineer can easily find the flaw. Of course, Python is a great choice for beginners to learn how to program. It is surprisingly the most readable and understandable programming language I've ever encountered.
But what do I mean when I say "Python"? Is it the abstract interface? Or is it the implementation of the abstract interface?
The understanding of an interface and its implementation are one of the concepts that a beginner should be made aware of before heading into the realm of programming. The reason why you should know them is the fact that they usually decide the performance and efficiency of the language in a particular environment. It would be almost impossible for you to deploy a hybrid application, targetting multiple platforms, without getting familiar with different flavours, unless you use some third-party frameworks.
Therefore, in this post, I've brought a brief introduction to what interface and implementation are, and how Python implementations stand when it comes to performance. (Do not confuse Python implementations with Python versions.)
Interface
In layman's terms, an interface can be understood as a set of rules, or say, the grammar of the programming language. It decides what a particular word, or token, would mean and therefore, decides the behaviour of the language in various contexts.
In computer science, it is often termed as an abstract interface of the language. It is neither the data nor the code but just the way how things would happen. It's kind of a blueprint that comprises various syntaxes and their corresponding meanings (a.k.a. semantics).
Implementation
An implementation, on the other hand, is the code that brings those predefined behaviours to life. Technically speaking, it is the realization of the specification or the abstract interface. The way it will be implemented depends solely on the designers and engineers.
Just like developing any other software product, an interface is implemented in one of the well-established programming languages. Therefore, it's obvious to say that the performance of the implementation will depend on the underlying technology used. In addition to that, it also decides whether the language is going to be compiled or interpreted which bring their own set of pros and cons.
So, is Python interpreted or compiled?
Well asking this question is one of the common mistakes that Python beginners commit.
The first thing to realize when making a comparison is that ‘Python’ is an interface. There’s a specification of what Python should do and how it should behave (as with any interface). And there are multiple implementations (as with any interface).
The second thing to realize is that ‘interpreted’ and ‘compiled’ are properties of an implementation, not an interface.
That said, for the most common Python implementation (CPython: written in C, often referred to as simply ‘Python’, and surely what you’re using if you have no idea what I’m talking about), the answer is: interpreted, with some compilation. CPython compiles Python source code to bytecode, and then interprets this bytecode, executing it as it goes.
One thing that I need you to note here that typically 'compilation' is the conversion of high-level language into the machine code. But still, the conversion into bytecode is also called 'compilation' as it sort of compiles into a lower level instruction set. The actual conversion into machine code is done by interpreting the compiled bytecode.
Let’s look at bytecode and machine code more closely, as they will help us understand some of the concepts that come up later in the post.
Bytecode vs. Machine Code
Before moving on to the discussion of Python implementations, it's good to know the difference between bytecode and machine code (a.k.a. native code). Perhaps an example would suffice;
- C compiles to machine code, which is then run directly on your processor. Each instruction instructs your CPU to move stuff around.
- Java compiles to bytecode, which is then run on the Java Virtual Machine (JVM), an abstraction of a computer that executes programs. Each instruction is then handled by the JVM, which interacts with your computer.
This is the reason why machine code looks different depending on the underlying architecture. Bytecode, however, looks the same on all platforms. That is definitely a great feature but the bytecode has to be interpreted before we can make things work. This is where Virtual Machines come into the picture. The idea is to set up a common VM on every single platform, compiled to respective native code, so that it can execute any bytecode program written on any of the systems.
In short: machine code is much faster, but bytecode is more portable and secure.
Returning to CPython implementation, the toolchain process is as follows:
- CPython compiles your Python source code into bytecode.
- That bytecode is then executed on the CPython Virtual Machine.
Alternate VMs: Jython, IronPython, and more...
Apart from CPython, there are several other implementations of Python. As I've mentioned earlier, CPython is the most common of all, but there are others that should be mentioned for the sake of this comparison guide.
The topmost in the list is Jython, a Python implementation written in Java that utilizes JVM. While CPython produces bytecode to run on the CPython VM, Jython produces Java bytecode to run on the JVM (this is the same stuff that’s produced when you compile a Java program).
If you're a beginner, you might be thinking, "Why the hell do I need an alternate implementation?". Well, for one, these different Python implementations play nicely with different technology stacks. (For those of you who don't know what a technology stack is, it's a combination of different programming languages, tools and frameworks that developers use to create software applications.)
How?
CPython makes it very easy to write C-extensions for your Python code because in the end it is executed by a C interpreter. Jython, on the other hand, makes it very easy to work with other Java programs, which means you can import any Java classes with no additional effort, summoning up and utilizing your Java classes from within your Jython programs. (If you haven’t thought about it closely, this may sound a bit absurd. We’re at the point where you can mix and mash different languages and compile them all down to the same substance.)
Here's a snapshot of what a Jython programming environment looks like.
[Java HotSpot(TM) 64-bit Server VM (Oracle Corporation)] on 1.8.0_45
>>> from java.util import HashSet
>>> s = HashSet(5)
>>> s.add("Foo")
>>> s.add("Bar")
>>> s
[Foo, Bar]
IronPython is another popular Python implementation, written entirely in C# and targeting the .NET stack. In particular, it runs on what you might call the .NET Virtual Machine, Microsoft’s Common Language Runtime (CLR), comparable to the JVM. Just like Jython, you can import any C# classes to your IronPython code.
IronPython 1.1 (1.1) on .NET 2.0.50727.42
Copyright (c) Microsoft Corporation. All rights reserved.
>>> for i in range(5): print i
0
1
2
3
4
>>>
"But what if I stick only to CPython?"
It's totally fine. It’s possible to survive without ever touching a non-CPython Python implementation. But there are advantages to be had from switching, most of which are dependent on your technology stack. Using a lot of JVM-based languages? Jython might be for you. All about the .NET stack? Maybe you should try IronPython.
To sum up VM based implementations, here is a list for quick reference.
Just in Time Compilation (JIT): PyPy
So till now, we have a Python implementation written in C, one in Java, and one in C#. Next logical step would be a Python implementation written in ... Python.
For beginners here's where things might get confusing. But before doing anything stupid, let's discuss Just in Time (JIT) compilation.
Recall that native machine code is much faster than bytecode. Well, what if we could compile some of our bytecode and then run it as native code? We’d have to pay some price to compile the bytecode (i.e., time), but if the end result was faster, that’d be great! This is the motivation of JIT compilation, a hybrid technique that mixes the benefits of interpreters and compilers. In basic terms, JIT wants to utilize compilation to speed up an interpreted system.
For example, a common approach taken by JITs:
- Identify bytecode that is executed frequently.
- Compile it down to native machine code.
- Cache the result.
- Whenever the same bytecode is set to be run, instead grab the pre-compiled machine code and reap the benefits (i.e., speed boosts).
Unlike any conventional compilers which convert a program into the native machine code, JIT looks for the optimized way to produce a binary executable.
This is what PyPy implementation is all about: bringing JIT to Python. There are, of course, other goals: PyPy aims to be cross-platform, memory-light, and stackless-supportive. But JIT is really its selling point.
Tests show that PyPy nails it when it comes to speed and performance. This is a brief report from the PyPy Speed Center for reference. Believe me or not, it makes CPython look like a baby who's trying hard not to burst into tears!
(The geometric average of all benchmarks is 0.13 or 7.6 times faster than CPython)
When it comes to understanding PyPy, it's actually a difficult task in itself. There's a lot of confusion around PyPy. In my opinion, that's because PyPy is actually two things:
- A Python interpreter, written in RPython (not Python (I lied before)). RPython is a subset of Python with static typing. Here's where you lose some flexibility but end up getting more control over memory management and whatnot thus bringing in optimizations.
- A compiler that compiles RPython code for various targets and adds in JIT. The default platform is C, i.e., an RPython-to-C compiler, but you can also target the JVM and others.
When you write a program in PyPy, the RPython interpreter converts it into bytecode but the interpreter(RPython) itself has to be compiled before it can work on the PyPy program. This compilation could've been done using CPython but that would've resulted in tremendously slow implementation. Instead, the interpreter is compiled down to the code for another platform (e.g., C, JVM, or CLI) to run on our machine, adding in JIT as well. It’s magical: PyPy dynamically adds JIT to an interpreter, generating its own compiler! (Again, this is nuts: we’re compiling an interpreter, adding in another separate, standalone compiler.)
In the end, the result is a standalone executable that interprets Python source code and exploits JIT optimizations. Which is just what we wanted! I know it’s a mouthful, but maybe this diagram can help:
Wrapping Up
After this long discussion, we now know what an interface is and how it is implemented. We went through different implementations of Python and understood them more closely.
Finally, we ended with JIT compiler based PyPy.
At first, it might seem quite confusing to choose what and what not. So here are some suggestions that you may find helpful.
- If you're new to Python and want to just play around it, give a shot to CPython. As you don't have to worry about speed and performance, it's a perfect choice. Here's a guide on how you should start on Windows and Linux.
- If you're a budding developer and looking for integrating Python to your next project, just check your tech stack and make your choice wisely. As mentioned earlier, choosing an implementation based on your tech stack will definitely boost up your application efficiency.
- If you're writing a hybrid application, targeting multiple platforms at once, it would be wise to compile it on different implementations for better performance.
- If your interest is in writing dedicated native application and speed is all that matters to you, then go blindly for PyPy. It's the future of Python!
So that's all from my side. I hope you guys would've found this post useful and learned something new. Do mention in comments what your next project is based on. Any queries or suggestions are welcome as well.
Happy Programming!
Comments
Post a Comment