I’ve spent most of the last six years playing around with data and drawing insights from it (a lot of those insights have been published in Mint). A lot of work that I’ve done can fall under the (rather large) umbrella of “data science”, and some of it can be classified as “machine learning”. Over the last couple of years, though, I’ve been rather disappointed by what goes on in the name of data science.
Stripped to its bare essentials, machine learning is an exercise in pattern recognition. Given a set of inputs and outputs, the system tunes a set of parameters in a mathematical formula such that the outputs can be predicted with as much accuracy as possible given the inputs (I’m massively oversimplifying here, but this captures sufficient essence for this discussion).
One big advantage with machine learning is that algorithms can sometimes recognize patterns that are not easily visible to the human eye. The most spectacular application of this has been in the field of medical imaging, where time and again algorithms have been shown to outperform human experts while analysing images.
In February last year, a team of researchers from Stanford University showed that a deep learning algorithm they had built performed on par against a team of expert doctors in detecting skin cancer. In July, another team from Stanford built an algorithm to detect heart arrhythmia by analysing electrocardiograms, and showed that it outperformed the average cardiologist. More recently, algorithms to detect pneumonia and breast cancer have been shown to perform better than expert doctors.