Leveraging federated learning and RO-Crates for human genomic data analysis and provenance tracking

1 January 2025 - 31 December 2026

Federated analysis is transforming genomics research by enabling collaborative analysis of distributed datasets while ensuring data privacy and delivering valuable insights into genetic diseases. ELIXIR is involved in the EUCAIM (European Cancer Image) project and coordinates the European Genomic Data Infrastructure (GDI) project, which aims to provide federated access to over one million whole genome sequences.

Although the GDI project is investigating federated approaches for data analysis, it does not plan to deploy them for evaluation. The primary objective of this project is to deploy multiple federated analysis infrastructures (EUCAIM's Flower and Yjs) and evaluate their strengths and weaknesses. The datasets for testing the infrastructures will come from the ELIXIR Cancer community, the DCEG/NIH division (GWAS synthetic datasets for different phenotypes), and real GWAS data from openSNP. RO-Crate, an ELIXIR-endorsed community standard for FAIR data packaging, will be used to track data access procedures.
 

This study involves the collaboration of five ELIXIR Nodes: ELIXIR United Kingdom, ELIXIR Spain, ELIXIR Belgium (Dilza Campos), ELIXIR Portugal and ELIXIR France.