A virtual “data house” for genomics researchers

Genome researchers in the Netherlands are working in close cooperation in the field of omics data (such as genomic and metabolomic data). This collaboration is crucial: if significant patterns are to be identified in data, large samples are required, especially now that the entire genome can be analyzed.

“Gathering all the data yourself is too expensive and simply impossible,” explains Marian Beekman. Beekman works at Leiden University Medical Center (LUMC) and is work package coordinator at the national infrastructure for bio-banks BBMRI-NL (Biobanking and BioMolecular Resources Research Infrastructure Netherlands). To facilitate collaboration, the BBMRI is currently building a single virtual platform, part of which is federated and part of which is centralized within SURF. This omics data platform is a virtual ‘data house’ containing different types of data sets that enable researchers from different University Medical Centers (UMCs) to perform analyses.

Sharing data between UMCs

“Within the BBMRI, researchers work on large data sets that they are unable to analyse in full at a single location,” says LUMC researcher Jeroen Laros. An obvious solution is the ability to transfer data quickly and securely between the various UMCs and to share computing power.

“This is already happening between the UMCG (University Medical Center Groningen) and the LUMC. In the long term, we want to share data and computing power with other UMCs as well,” says Laros.

Genome atlas

One of the BBMRI projects is creating an atlas of the entire genome. This atlas will result in a website that will be accessible to all. A researcher who is studying a specific disease and arrives at a specific DNA field can use the genome atlas to find out what that field relates to.

The BBMRI-NL consortium is also investigating the metabolic profile of diseases. Metabolites are the breakdown products of the metabolism and can be measured in the blood. Patterns and abnormalities in a person’s metabolic profile give an indication of their health status. Using a patient’s metabolic profile, it is possible to tell whether they have diabetes or have had a heart attack.

BBMRI-NL is taking this a step further through new research. The researchers are currently investigating whether the metabolic profile of a cardiac patient can be used to predict their chances of having another heart attack, and whether it is possible to predict the progression of diabetes in a diabetic patient. These types of prediction require a large amount of data.

Special network infrastructure

Combining and sharing such large data sets requires a special network infrastructure. To ensure that these omics data can be shared easily and securely, a pilot involving E-LAN network technology is currently underway.

This has led to the development of a shared network environment which is separate from the Internet: the UMC Research LAN. This is effectively a national ‘local’ network for UMCs (see Figure 1). It combines data and computing clusters from different UMCs and SURF virtually in a single location. This allows researchers to share and analyse data within a protected network environment which is optimised for research purposes.

As SURFsara is also connected to the UMC Research LAN, it is easy for researchers to obtain more processing power from SURFsara if there is insufficient capacity within their own UMC.

Figure 1. UMC research LAN. Generic Internet connection (left) and closed UMC Research LAN (right) for high performance purposes.

The UMC research LAN; Generic Internet connection (left) and closed UMC Research LAN (right) for high performance purposes.


E-LAN, a virtual private network between institutions

In an E-LAN, multiple institutions are connected to each other via a layer 2 multipoint-to-multipoint connection. The exchange of data is restricted to the endpoints within the E-LAN. In the UMC Research LAN pilot, the institutions consist of the three UMCs (LUMC, UMCU, UMCG/RuG) and SURFsara, thereby creating a virtual private network.

Thanks to the layer 2 functionality within an E-LAN, each location can exchange data with every other location as if they were present on the same LAN. This means that the UMCs involved in the project have access to a national network which, in terms of performance, feels like a local network.

At the moment the E-LAN is statically configured. It is an environment which is isolated from the Internet for a limited number of locations. In future, the connections may be dynamically configured.

Access to data

These partnerships, which go beyond institutional boundaries, also require suitable infrastructure for authentication and authorization.

“It should be easy to define who has access to the data and what people can do with the data,” says Marian Beekman.

A pilot involving COmanage and a proxy component is currently underway in this field. This will enable researchers to log in using their institution account.

“This leads to various advantages for the researcher (it is their own trusted account) and for the owner of the shared dataset or processing power. It is now clear exactly who the user is, because the user has authenticated themselves through an account that has been verified by the institution. The lead researcher can use COmanage to create groups, invite researchers and assign roles in terms of who can do what. The aim is that researchers from all over the Netherlands will have access to this virtual data house. Ultimately, though, researchers involved in international collaborative projects also need to be able to access these data,” said Beekman.

Collaboration on customised medicine

Customised medicine is the overarching objective behind these developments, i.e. medicine that is predictive, personalized, preventive and participative. As well as close collaboration between researchers, this requires ICT infrastructure that guarantees mutual trust and that responds to the security requirements and needs of the researchers, e.g. being able to share data sets and processing capacity quickly. Without suitable ICT infrastructure, researchers are not able to collaborate effectively with one other, and solutions that would not pass a security test may be selected as a result.

More information

Published: 08/2017

For more information please contact our contributor(s):