PyCon CZ 2019 was a community event where almost 500 people gathered for 3 days full of Python and the related topics. While I still have things fresh on my mind, here are my top takeaways from the conference and my impression of the current state of the Python community in general.
- Data science skills are in demand
- How high-performance data manipulation is possible in Python?
- Statistical forecasting vs deep learning forecasting
- Python is excellent for data and image processing tasks
- Start machine learning with cloud services
- Deep learning revolution gains speed
- Python community is amazing
Data science skills are in demand
I was really looking forward to taking part in the workshops on data NumPy, Pandas and Metaplotlib. Even though there were two workshops on the topic (2 x 30 spots), they were filled within 30 minutes from the moment they were made available for registration!
During the conference, I spoke with many pythonists whose main job revolves around data manipulation and analysis. The coding style is a little different with Pandas than in standard Python and people are looking for guidance on it.
How high-performance data manipulation is possible in Python?
Python is a very efficient language, at least in terms of programmer’s time. Jan Škoda had an amazing presentation on performance tuning in Python. He used a simple code and shared his insights on profiling and common performance improvements. Here is a quick summary:
- Measures before decisions
- No need to leave Python if you need speed
- Cythonize long loops and data structures
- Establish good practices if you want to keep good performance
- Don’t overoptimize, keep your logic simple
Contrary to what we might expect, it is not necessarily IO or CPU that are the most common blockers of performance. In Jan’s experience it’s often the inefficient memory management, especially with data structures like dynamic arrays we use in all programming languages every day.
NumPy and Pandas use optimized data structures that handle memory allocation more efficiently and therefore enable high-performance in comparison to typical Python procedures.
Statistical forecasting vs deep learning forecasting
Petr Šimeček head a great talk on a high-level introduction to time series forecasting. As a probability and statistics Ph.D and currently working as a machine learning engineer, Peter is in a great position to compare classical statistical approaches (exponential smoothing, ARIMA and the like) with some more recent machine learning methods (eg, neural networks).
If you are interested in the subject, I really recommend to check out Peter’s presentation. Here I can bring for you the essence which is:
- Statistical forecasting methods are great if you have a few time series with a sufficiently long history
- Deep learning forecasting methods are great if you have many short time series
Python is excellent for data and image processing tasks
Karla Fejfarová work as a biostatistician at the Czech Centre of Phenogenomics, which literally means helping other scientists with computer-related tasks.
Karla described how - thanks to a genetically manipulated mice, we can study the mechanisms of diseases and design effective treatments. After a manipulated mice have been created, researches must examine all measurable traits, including metabolism, bone development, and behaviour.
The data can come not only as numbers or categories but also in the form of images. Python’s data and image processing capabilities are excellent for this kind of scientific analysis work; especially as it’s relatively easy to get started.
Start machine learning with cloud services
Machine Learning is rapidly gaining popularity and is being used for a growing variety of applications. Piotr Grzesik showed us in his presentation how much we can actually do with cloud machine learning APIs from Amazon Web Services, Google Cloud Platform or Microsoft Azure.
Here are the pros and cons of using those APIs according to Piotr:
- Does not require specific ML knowledge (can be used by "regular" developers)
- Easy to consume via API/SDK
- Trained on a large dataset
- Models are constantly re-trained/improved
- "Infinitely" scalable
- Listed functionality, suitable only for common tasks
- Can be much more expensive than a custom solution in the long run
- No way to customize/tweak models (there are small exceptions)
- Potential vendor lock-in
And a summary of what services on you should consider using for common tasks:
- analyze images and videos, use Rekognition
- generate audio from text, use Polly
- analyze text, use Comprehend
- recognize speech, use Transcribe
- translate text, use Translate
- extract text from documents, use Textract
- build a chatbot, use Lex
While those examples are AWS services, other providers have similar services which can, in fact, be cheaper and perform better. The point here is that many common tasks have a cloud machine learning APIs read for you to use today!
Deep learning revolution gains speed
Jakub Langr, a co-author of GANs in Action by Manning Publications gave a great introduction to how far the deep learning technology can take us.
GANs are a novel approach to generating data on a variety of adjacent problems that leverages the power of deep learning and two competing agents. Currently, most cases are around synthesizing full-HD synthetic faces and images. The results are really breathtaking.
Through amazing, GANs primary use cases can rise concerts, i.e. they can be used to generate unique, realistic profile photos of people who do not exist, in order to automate the creation of fake social media profiles. We still need to wait for more valuable use cases of this technology to emerge.
Python community is amazing
People from various backgrounds express their passions through and around Python like in no other community I have come across in the programming world so far. The PyConCz evening lightning talk sessions were fun, dynamic and I felt like there was a whole extra “conference day” squeezed within 1 hour!
Some of the examples include:
- a story capturing a journey from Helsinki through Denmark all the way to Ostrava, with lots of software project problems metaphors and a lot of humour
- keyboard localization problems told in ten European languages by a single speaker whose hobby is to learn new languages - he even got applause for fluently speaking “Grzegorz Brzęczyszczykiewicz” - one of the hardest things to pronounce in Polish
- a live async coding session, that turned into a program and man show - check it out:
Python community is very diversified. That is probably because Python use cases in data processing, vision processing, and machine learning make it extremely attractive to various groups of curious minds.
While data science and machine learning were the main topics at the conference, many people I talked to actually do web development with Python. GraphQL API in Python workshops there the only strictly web development talk/workshop during conference. To me this means that Python web development is a stable platform and the community expands its interests into exploring new areas.