How Python works internally ?

How Python works internally ?

Deep dive into internal working of Python

  1. Introduction

Python is one of the most popular programming language around the globe. It is created by Guido Van Rossum in 1991. Due to its simple and easy to understand nature it is very popular among developers. It is an interpreted, object-oriented, high level, dynamically typed general purpose programming language. Python is a platform independent language, i.e., Write Once, Run Anywhere. It is used in web development (server side), software development, machine learning and artificial intelligence.

Now, before we move further, let us understand Interpreter and Compiler in simple words.

  1. Interpreter & Compiler

When we write code in our computer that is being understand by us is called source code and to convert source code into machine code we use either a compiler or interpreter.

Interpreter - An interpreter is a computer program, which converts each high-level program statement into the machine code. This includes source code, pre-compiled code, and scripts.

Compiler - A compiler is a computer program that transforms code written in a high-level programming language into the machine code. It is a program which translates the human-readable code to a language a computer understands. (binary numbers 0s and 1s)

InterpreterCompiler
It translates source code line by line.It translates entire code at once.
It reads the code and executes directly.It generates an executable file before execution.
Debugging is easyDebugging is more challenging.
It performs generally slower.It performs generally faster.
Example - Python, JavaScript.Example - C, C++ and Java.

OK, so that's why Python is generally referred to as an Interpreted Language because python code is executed by interpreter line by line. Actually, the source code is converted to byte code by the interpreter and then byte code is converted to machine code by PVM (Python Virtual Machine) which is then executed by machine and shown to us as an output. So, what is PVM? Let us understand in simple words in next section.

  1. PVM (Python Virtual Machine)

When we write our python code in computer which is understandable by us is the source code (.py file) and to convert source code into machine code we need a compiler because our computer only understands machine code for execution. However, Python Interpreter has an inbuilt compiler which works in a slightly different manner. Here, Python Compiler converts the source code into byte code (.pyc file). Python Byte Code is the series of instructions that is further executed by PVM (Python Virtual Machine). This byte code is an intermediate representation of code which is platform independent and is not directly executed by CPU. It is stored in .pyc file (Compiled Python File, also some times referred to as frozen binaries).

PVM (Python Virtual Machine) is a Python Interpreter which provides the runtime environment for the execution of python byte code into machine code. It is the main run time engine of python. In this, a continuous loop is running to iterate byte code into machine code. PVM takes byte code (.pyc file) in a continuous looping manner reads and executes python byte code line by line and converts into machine code in the form of 0s and 1s which is then executed by CPU to display the final output. If any error is occurred during interpretation, conversion stops and the error is displayed with a message.

PVM is not only responsible for execution of python byte code but also for memory management and its platform independence. In Python, everything is an object, so to allocate the memory for objects created during the program and to deallocate the memory for objects which are no longer in use memory management is very essential. Platform Independence of PVM refers to the ability of the python byte code to run on different platforms and operating systems seamlessly without requiring modifications in the python byte code or in python source code.

  1. Internal Working of Python

Stage 1 : We create a python source code file (.py file) which has our python program in it. We save the file. Now in the first stage Python Compiler starts executing and reads the file.

Stage 2 : As the compiler starts its working it converts the python source code or python script into byte code (a low level human readable code which is platform independent and has a series of instructions in it). Now during compilation Python Compiler checks for the syntax errors in the program and also sees for the hierarchical structure of the program capturing relationships for the different elements of the code if any error is found compilation stops and if not a .pyc file(Compiled Python File) is created which contains python byte code in it.

Stage 3 : Now the byte code goes into PVM (Python Virtual Machine). PVM is the main run time engine of Python. It is an interpreter that reads and executes the byte code line by line. It translates byte code into machine readable code in the form of 0s and 1s.

Stage 4 : At last stage the machine code is being read and executed by CPU to show final output on the screen.


  1. Note :-

  • Let us take an example of a python file say we create a file hello.py as shown in the below image which prints hello world in the terminal. Also defines a function named one which takes single parameter as n and it will print the value which n holds. Calls the one function with string "good morning" and it will print good morning in the terminal.

Now we create another file say one.py as shows in the below image and we will import a function named one from a module named hello with an argument "good evening" and when called it will print good evening in the terminal also it will print the value of the one function which is good morning in the module hello.py.

Now as we can see in the above image a __pycache__ folder is created automatically which contains a text file hello.cpython-312.pyc. Now __pycache__ folder is a system folder created by Python for its own use. This folder has its own importance and will automatically be created whenever we import modules from another directories or files it stores compiled byte code and python executables which is useful for python for speed up further imports and execution.

Also we see a text file hello.cpython-312.pyc let us breakdown it. hello represents the file name or module name (hello.py) from which the module is created. cpython refers to the byte code which was generated by CPython Interpreter. It is the implementation of Python programming language written in C/C++. So, cpython is the standard implementation of Python programming language and it used as a reference here. 312 implies the version of python we are using which is 3.12 and .pyc stands for "Python Compiled" is the standard extension for python byte code files. Now, Whenever we see a .pyc file it means that it contains compiled python byte code of a certain module using Python 3.12 version created using CPython Interpreter. This file is created so that it can improve the performance and efficiency of the imported module. If we make changes in the source code of the imported module here in hello.py the python interpreter will identify the changes using some difference finding algorithms (diffing algorithms) and automatically recompile the byte code and update .pyc file.

  • Byte Code is not Machine Code in any programming language. Byte code doesn't instruct machine but machine code is the direct instruction to the Machine or Hardware. More specifically, here byte code is Python specific interpretation.

So, that's all for now. This concludes my first blog on How Python works internally. Feel free to leave any feedback or corrections in the comments section they are always welcomed. Thank you for reading.