Thursday, August 7, 2014

PyCon AU 2014 Writeup

I recently attended PyCon AU in Brisbane, Australia. It was an amazing conference, and I wanted to record my thoughts. I will organise this post in time order.

Videos are coming out, and will all eventually be published at https://www.youtube.com/user/PyConAU/videos.

Friday Miniconfs


The first day consisted of "miniconfs", which are independently organised focused streams on specific topics. I attended the "Science and Data" miniconf. It is clear that this is a huge and growing component of the Python community. However, science and data still suffer from a lack of general Python community integration. The tools put in place do appear to be having a transformative effect on scientists who are adopting them (notable technologies include Ipython Notebook, scipy, numpy, and efforts such as Software Carpentry). However, general best practises around software design, team workflow, testing, version control and code review have not been so enthusiastically adopted. Going the other way, data-oriented techniques and self-measurement have not been widely adopted within open source.

On of the major "new" tools is "pandas", which provides extremely strong data management for row/column based data. This tool is a few years old, but is really coming into its own. It supports very strong indexing and data relation methods, and some basic statistical techniques for handling missing data and basic plots. More advanced techniques and plots can be achieved by using existing Python libraries for those purposes by fetching the pandas data structures as numpy arrays.

Saturday: Main Conference Day One


The main conference was opened by a keynote from Dr James Curran, who gave an inspiring presentation which discussed the new Australian national curriculum. This is to include coding from early years through to year ten as a standard part of the standard education given to all Australians. This is an amazing development for software and computing, and it looks likely the Python may have a strong tole to play in this.

I presented next on the topic of "Verification: Truth in Statistics". I can't give an unbiased review, but as a presenter, I felt comfortable with the quality of the presentation and I hope I gave the audience value.

I attended "Graphs, Networks and Python: The Power of Interconnection" by Lachlan Blackhall, which included an interesting presentation of applying the NetworkX library to a variety of network-based computing problems.

For those looking for a relevant introduction, "IPython parallel for distributed computing" by Nathan Faggian was a good overview.

"Record linkage: Join for real life" by Rhydwyn Mcguire gave an interesting discussion of techniques for identity matching, in this case for the purpose of cross-matching partially identified patients in the medical system to reduce errors and improve medical histories.

"The Quest for the Pocket-Sized Python" by Christopher Neugebauer was an informative refresh of my understanding on Python for developing mobile applications. Short version: still use Kivy.

Sunday: Main Conference Day Two

The keynote on day two was given by Katie Cunningham on the topic of "Accessibility: Myths and Delusions". This was a fantastically practical, interesting and well-thought-out presentation and I highly recommend that everyone watch it. It had left a strong impression on many members of the audience, as would be shown later during the sprint sessions.

"Software Carpentry in Australia: current activity and future directions" by Damien Irving further addressed many of the issues hinted at during the data and science miniconf. It covered familiar ground for me in that I am very much working at the intersection of software, systems and science anyway. One of the great tips for helping to break down some of the barriers when presenting software concepts to scientists was to work directly with existing work teams, as those scientists will be more comfortable working together where they have a good understanding of their colleagues work practises and levels of software experience. In a crowd of strangers, it can be much more confronting to talk about unfamiliar areas. It strikes me that the reverse is also probably true when talking about improving scientific and mathematical skills for developers.

"Patents and Copyright and Trademarks… Oh, why!?" by Andrea Casillas gave a very thorough and informative introductory talk on legal issues in open source IP management. She was involved with http://www.linuxdefenders.org/, a group of legal activities which protect IP for open source projects.

"PyPy.js: What? How? Why?" by Ryan Kelly was a surprisingly practical-sounding affair after you get over the initial surprise and implementing a Python interpreter in Javascript. One argument for doing this rather than a customised web browser is for uniformity of experience across browsers. If a reasonably effective Python implementation can be delivered via javascript, that could help to pave the way for more efficient solutions later.

The final talk was one of the highlights of the conference, "Serialization formats aren't toys by" Tom Eastman. This highlighted the frankly wide-open security vulnerability of integrating XML or JSON (and presumably a variety of other serialisation formats) without a high degree of awareness. Many ingesters will interpret parts of the documents as executable code, and allow people to execute arbitrary commands against your system if they can inject an XML document into it. For example, if you allow the uploading of XML or JSON, then a naive implementation of reading that data will allow untrusted and arbitrary code execution. I think this left a big impression on a lot of people. 

Monday and Tuesday: Developer Sprints

One of the other conference attendees (Nick Farrell) was aware of my experience in natural language generation, and suggested I help him to put together a system for providing automatic text descriptions of graphs. These text descriptions can be used by screen reader applications used by (among others) the visually impaired in order to access information not otherwise available to them.

Together with around eight other developers over the course of the next two days, I provided coordination and an initial design of a system which could do this. The approach taken is a combination of standard NLG design patterns (data transformation --> feature identification --> language realisation) and a selection of appropriate modern Python tools. We utilised "Jinja2", a web page templating language usually used for rendering dynamic web page components, for providing the language realisation. This had the distinct advantage of being a familiar technology to the developers present at the sprint, and provided a ready-to-go system for text generation. I believe this system has significant limitations around complexity which may become a problem later, however it was an excellent choice for getting the initial prototype built quickly.

You can find the code at https://github.com/tleeuwenburg/wordgraph and the documentation at https://wordgraph.readthedocs.org/en/latest/. Wordgraph is the initial working name chosen quickly during the sprints -- it may be that a more specific name should be chosen at some point.  The documentation provides the acknowledgments for all the developers who volunteered their time over this period.

It was very exciting working so fast with an amazing group of co-contributors. We were able to complete a functional proof-of-concept in just two days, which was capable of providing English-language paragraph-length description of data sets produced by "graphite". This is a standard systems metric web application which produces time-series data. The wordgraph design is easily extensible to other formats and other kinds of description. It the system proved to be of wider use, there is a lot of room to grow. However, there is also a long way to go before the system could be said to be truly generally useful.

Concluding Remarks

This was a fantastic achievement by the organisation committee, and a strong set of presentations made it highly worthwhile and valuable for people who might be considering attending in future. It sparked a great deal of commentary among attendees, and I have a lot ideas for the future and I am sure my work practises will also benefit.

The conference vibe was without doubt the friendliest I have ever experienced, improving even further on previous years' commitment to openness and welcoming new people to the community. This was no doubt partially a result of the indigenous "Welcome to Country" which opened the first day, setting a tone of acceptance, welcoming and diversity for the remainder of the event. The dinners and hallway conversations were a true highlight.

I hope that anyone reading this may be encouraged to come and participate in future years. There are major parts of the conference that I haven't even mentioned yet, including the pre-conference workshops, the Django Girls event, organised icebreaker dinners and all the associated activities. It is suitable for everyone from those who have never programmed before through to experienced developers looking for highly technical content. It is a conference, as far as I am concerned, for anybody at all who is interested or even merely curious.

Finally, I would just like to extend my personal thank you to everyone that I met, talked to, ate with, drank with or coded with. I'd like to thank those people I didn't encounter who were volunteering, presenting to others, or in anyway making the event happen. PyCon AU is basically the highlight of my year from a personal and professional development perspective and this year was no exception.