Python is already fast

Interview with Oz Tiram

Asynchronous programming & coroutines in Python

Developer: In your workshop at the Python Summit you will deal with Python's Iterables, Iterators and Generators. Can you briefly explain what is special about it?

The cool thing about Python is that the language has so much inherent in it.

Oz Tiram: Well, actually nothing! Almost all programming languages ​​have similar constructs, and if not, they are easy to implement. The cool thing about Python, however, is that the language inherently brings so much with it. To be productive with it, you don't have to implement things like iterables, iterators, and generators yourself.

Any Python object that contains other objects is iterable. Of course this is also the case with other languages ​​- so no surprise here. Computer programs are just there to do repetitive tasks for us. Therefore every language needs a mechanism to repeat an action until it is no longer needed. One way of doing this is by repeating an action on a set of elements. Formulated in human language that would be:

"For each element from a set of elements: do something with the element."

In Python this theorem is expressed like this:

>>> for item in BagOfItems: ... do_something (item)

In this case it is BagOfItems a Iterable. In my workshop at the Python Summit, for example, we're looking at how one of these isBagOfItems and how python knows when to stop when BagOfItems no longer contains any elements.

Iterators are a special kind of BagOfItems, where we can access each item individually. Usually when we start a task we want to finish it. Take for example instead BagOfItems once BagOfBalls - a sack full of balls. If a person now wants to find all the red balls in the sack, then he could take a ball out of the sack and check whether it is red and, if necessary, throw it into the basket for red balls. Balls of a different color would be placed in different baskets accordingly. A person can simply take a coffee break here and then continue. The sack is the iterator because we can take one element at a time out of it. An Iterable, on the other hand, only allows all balls to be checked at once without a break. So when it comes to performing certain tasks more than once, in practice we sometimes need a break.

A generator is, simply put, a certain type of function that helps us turn iterable elements into an iterator. A generator can also generate iterators without resorting to locally stored data. For this reason, generators can also process large amounts of data without having to keep them in memory. Because generator functions are so useful, they have their own keyword in Python: yield. Generators are functions that are not return-Statement have. Instead, they can be one or more yield-Have instructions in their body area:

>>> def odd_counter (max): ... x = 0 ... while x <= max: ... if not x% 2 == 0: ... yield x ... x + = 1 .. . >>> odd_counter (10)

When a generator is called, a generator instance is created - the generator is not started immediately. In order to actually start the generator, the next element in the generator is called:

>>> c = odd_counter (10) >>> next (c) 1 >>> next (c) 3 ... >>> next (c) Traceback (most recent call last): File "", line 1, in StopIteration

We continue to do this until the code has been processed. At this point the generator will throw a StopIteration Exception.

Developer: Can you give an example of how asynchronous programming can be implemented with the language features described?

Oz Tiram: Since generators don't start immediately, we can create quite a few and write a method to call generators when needed - a kind of scheduler. If a coroutine can use the resources of the operating system, it blocks other coroutines. However, there are mechanisms in Python to switch back and forth between blocking coroutines when a blocking instruction needs to be executed. By the way, this is an old feature in Python that has existed since Python 2.5, but was only really used when the keywords in later versions async and await were introduced. To find out more about this, I recommend looking at the corresponding PEP 342, especially the implementation of such a scheduler.

Developer: How do the coroutines work in Python?

Oz Tiram: Coroutines in Python are a bit idiosyncratic. Like generators, they also carry the keyword yield in her body area. However, appears yield in a coroutine directly to the right of the assignment operator. Coroutines are the next step in generator design. Every time a generator code is executed on a yield-Expression, the value is to the right of yield sent to the caller. At the same time, however, the execution of the generator is also up to next-Call stopped. The caller can take on other tasks in the meantime.

A coroutine can not only produce values, but also receive values ​​from the caller. For this the expression .send (DATA) used to supply coroutines with values. The caller can go through .close () pause execution or via .throw even throw exceptions in coroutines.

Here is a simple example of how to define and use a coroutine:

>>> def hello_coro (subject): ... print ("Searching for:% s"% subject) ... while True: ... message = (yield) ... if subject in message: ... print ("Found% s in% s"% (subject, message)) ... >>> # calling the coroutine does not run it >>> coro = hello_coro ("Python Summit") >>> # advance the coroutine to the next yield statement (priming) >>> next (coro) Searching for: Python Summit >>> coro.send ("This will do nothing") >>> coro.send ("The Python Summit 2019 will be awesome") Found Python Summit in The Python Summit 2019 will be awesome

Python and machine learning

Developer: Python is very popular in the machine learning environment. Why is that so? Is there a particularly large amount of semantic help in the language here? Is it because of the available libraries? Or are there other reasons?

Python takes the approach of being "fast enough" for most use cases.

Oz Tiram: One of the reasons is certainly that Python is very popular. Python is currently at the forefront of Stackoverflow, and in rankings such as the RedMonk Popularity Index or Tiobe, the language is always in the top 3 - at best overtaken by Java and JavaScript.

In general, Python is not necessarily the fastest language - I am referring to the reference implementation cPython, which is the most widely used version. That's because Python is an interpreted language, not a compiled language. But Python takes the approach of being "fast enough" for most use cases and places developer speed above execution speed. If Python code really needs to be executed quickly, it can be optimized using the C programming language. Python itself is written in C, and it is not difficult to extend Python with C-based modules.

In the area of ​​machine learning and data science, a lot of source code is written for prototypes in order to quickly try something out. In most cases it is not important that the code performs well. Performance comes into play when it comes to solving large partial differential equations or performing difficult CPU-based algebraic calculations.

Therefore, most machine learning code is not written directly in Python, but rather as extensions in C. Numpy, SciPy, Pandas and TensorFlow are C-based libraries that can be called directly from Python. So you get the convenience of Python at the speed of C.

What made Python so popular among scientists - and therefore also in the field of machine learning - is the fact that the language is easy to read and write. For several years now, Python has also been the most widely used language in beginner programming courses in the United States. It is therefore no surprise that Python's popularity is increasing.

Python itself has no semantic auxiliary constructs for machine learning. But the popularity of the language and its ease of use make it (also) interesting for the areas of AI and machine learning. One anecdote about data science with Python is that the popularity of the Numpy library, which can be used for matrix calculations and various algebraic operations, led Python language developers to introduce a new operator in Python 3.5: the matrix multiplication operator @. In Python itself, no other built-in construct uses the operator, but Numpy makes heavy use of it.

Python - on the way to typing

Developer: Where is Python language development now? What innovations are being discussed?

Oz Tiram: Python is certainly evolving, but the spirit that the language should remain easy to learn is still present. With Python 3 we saw a trend to introduce more and more support for typing into the dynamic language. Python 3 introduces optional function annotations like these:

>>> def foo (a: int, b: float = 5.0) -> int: ... pass ...

It continued in Python 3.6 with type annotations for variables:

primes: List [int] = [] captain: str # Note: no initial value! class Starship: stats: Dict [str, int] = {}

Python 3.8 will then bring typed dictionaries:

from typing import TypedDict class Movie (TypedDict): name: str year: int

All of which means that Python looks as slow as Java or any other typed language. But it just looks like this, because all typing in Python is optional. This means that you can leave everything as before for small scripts for one-time use, but you can also add types for larger projects where typing is required. I guess Python will become more popular and the image of Python as "just a scripting language" will continue to fade. An interesting project that builds on these innovations is the static analysis tool MyPy. Here is a quote from the documentation:

“MyPy is a static type checker for Python 3 and Python 2.7. If your code contains typing, MyPy can check the typing and detect common bugs. "

I also believe that there will be a growing ecosystem of compilers to compile Python into native code. The best known is Cython, with which it is possible to write Python code and then compile it into machine language via a translation to C. A similar project that has recently been launched from scratch is nuitka, and I think others will follow suit.

Developer: Thank you for this interview!

Oz Tiram studied applied geology in Tübingen. Oz has been programming with Python in various roles for 11 years: as a data and system engineer, backend developer, DevOps and most recently as an IT architect at Noris Network AG. He loves Python as a language for data analysis and visualization, automation and web development.