Using big data solutions to solve the mystery of worm behavior

A research team, from FOM Institute AMOLF, VU Amsterdam, and Okinawa Institute of Science and Technology, studies the strategies animals use to navigate the world around them. For a model system, they use roundworms, including the popular genetic model organism C. elegans. These organisms are 1 mm long with a simple nervous system, and in the case of C. elegans the researchers know its genome, the developmental history of all the cells of its body, and the neural connections. Despite this, the behaviour of C. elegans largely remains a mystery.

Creating a digital worm model

The approach of this project is to generate high-quality quantitative data on the motility of worms, and then use tools from statistical physics to build simple models that accurately describe the worm behaviour. By comparing models across many individuals and species, they can then discover what aspects of the behaviour are important and how organisms adapt their behaviour to different conditions.

The project faced a number of data challenges. The team tracks the worms using high-resolution automated video imaging. Their behaviour spans diverse timescales, from repetitive body motions on the timescale of a second, abrupt actions on the tens of seconds scale, and changes in the average behaviour over minutes. This required that they image at high frame rates for long periods of time.

More worms, more computing power

Due to hardware limitations, the number of worms that could be recorded simultaneously was limited to four. To record more worms, Stephen Helms, one of the researchers, needed more computing power to perform video analyses. Also, a lack of local storage space limited the amount of video that could be recorded each day.

Furthermore, the approach required that they study hundreds of individuals for each experiment. As a result, the researchers accumulate terabytes of imaging data that must be stored, processed, analysed, and shared among the various members of the team located at the FOM Institute AMOLF, VU Amsterdam, Virginia Commonwealth University (VCU) and the Okinawa Institute of Science and Technology.

The technical challenges at each of these steps would hold the team back from their scientific goals. The computational resources and the connectivity by Dutch Research & Education network SURF, and advice from the Netherlands eScience Center would be important for fully achieving the research goal of using big data to understand the worm.

Technical expertise

The Netherlands eScience Center provided technical expertise and helped Helms with porting his code to an open source platform and to run it on the SURF infrastructure, which resulted in improved speed and efficiency. Helms: “I essentially had a programmer from the eScience Center, who already knew all the infrastructure, knew how to run programs on it and who would take me through things. And from that point, it was much easier for me to customize things and make my own versions of the stuff.”

SURF provided extra computing power and storage space as well as technical support for using those facilities, enabling significant scale up of the experimental pipeline. They did the video analysis on the Lisa Compute Cluster, a computing facility that is typically characterized by performing a large amount of independent, moderately parallel, computing tasks. The data of each worm is sent to a different core or computer in a cluster. With parallelisation, the analysis of one particular experiment about ageing worms, could be done in one-sixteenth of the time.

From 30 to 2.5 hours

Helms: “If I started the analysis the moment I got in in the morning, then before lunch we had the data. But previously analysing a single video was taking around 30 hours. So with the help from the support provided within the Enlighten Your Research project I was able to speed up my analysis from 30 to 2.5 hours. With the old infrastructure and with my old analysis approach, there would be no way we could have analysed this data.”

Essentially, with the current set-up it is possible to generate and analyse as much data in one day as what Helms was able to do in one year. Helms:

“Before, we had around 150 experiments, now in this ageing experiment we have around 2000.”

The storage space available to Helms on the Lisa cluster was 200 GB. This proved to be sufficient for storing a day’s batch of experimental data, where image data could be analysed and then transferred back to local computers at AMOLF for further, more interactive analyses.

This project was supported by the Enlighten Your Research programs EYR4 and EYR-Global, which was organized with SURFnet, SURFsara, NLeSC and international R&E networks such as Internet2. The participating organizations provided up to 20 TB of storage and computing resources for the video data at SURFsara. The Netherlands eScience Center migrated the research group’s analysis code to run on the HPC infrastructure. Internet2 and SURFnet connected the involved institutes with SURFsara using lightpath connections. The involved research institutes/universities were FOM Institute AMOLF, the Vrij Universitiet, Okinawa Institute of Science and Tech, and the Virginia Commonwealth University.

Published: 07/2018

For more information please contact our contributor(s):