We aim to give everyone the power to do great machine learning, and seek a highly creative and motivated PhD student to help us progressively automate the data science process, from raw data to machine learning models, to help everyone do better machine learning, faster.
Whereas machine learning aims to build systems that are able to learn and improve over time, the creation of such systems is still done largely manually. The data science process contains many tedious, error-prone, or downright painful tasks, such as data wrangling, data preprocessing, model selection, and proper model evaluation. Current approaches to automate these tasks tackle only small parts of this process, often don't work well with raw (or dirty) data and, importantly, they don't always learn much from one problem to the next.
The key research question that we want to answer is how we can (semi)automate the data science process, and how we can learn effectively from one problem to the next. This involves the combination of optimization (model-based optimization, bandits, genetic programming,...) and machine learning, including meta-learning (learning to learn) and deep learning. The end goal is to develop automated processes that use large amounts if prior experiments to build the most promising 'pipelines' of processes for new datasets, evaluate them, and learn from them to propose ever better pipelines. We will leverage OpenML, the open machine learning platform, and develop a series of 'bots' that immediately make the results of this work tangible to the wider data science community and maximize interaction. While certainly challenging, breakthroughs in this area will bring enable many more people to use machine learning effectively, and use it to solve problems important to them.
You will be collaborating with Dr. ir. Joaquin Vanschoren and the rest of the OpenML core team. This work is set in an very interactive environment, including the Eindhoven Data Mining Group, the OpenML team, and the AutoML community. There is also interaction possible with the US program on 'Data Driven Discovery of Models', and companies working in this area. The availability of extensive (meta)data and expertise offers a unique opportunity for a bright student to tackle this hard problem and become a key player in this field.
For any further inquiries on the content of the position, please contact Joaquin Vanschoren, e-mail
For information about employment conditions please contact P. Hertogs LLM, MSc (HR advisor), e-mail: