Data management, workflow optimization and science gateway

Researchers of the CRC collect a wide variety of data such as MRI-images, EEG, genetic, or behavioral data from participants in various research projects. Due to the increasing number of participants, multi-modal assessment, improved imaging technologies as well as new research projects the amount of CRC data stored in diverse files is steadily increasing. The data are related to individuals nested by the research projects and need to be processed and analyzed using various software tools and toolboxes for processing of MRI or PET data, statistical analyses of behavioral and experimental data and others. Frequent and largely standardized analyses such as preprocessing of fMRI data could be widely automated with pipelines to save effort, but require to setup a consistent metadata management. Some analyses like automated segmentation with FreeSurfer software, or computational modelling of behavioral data are computationally intensive. Usage of the high performance computing (HPC) resources of ZIH can greatly reduce computing time, but transfer and interface issues have to be solved. Moreover, to efficiently run a project, researchers need tools to manage assessment and data collection (to recruit, schedule and assess new subjects), review project progress (e.g. overview of subjects that have already been assessed) and check completeness and quality of data at the various levels (e.g. automated detection of missing or implausible data). To increase the efficiency and thus the outcome of the CRC research there is a clear need for an integrated management of all the data and metadata during its acquisition and thereafter as well as easy to access resources to analyze the acquired data, especially if computationally affording.

The major problem of large amounts of data is not the storage capacity. State-of-the-art hardware can handle this. The challenge is the number of files and the amount of information that is created – files with experimental data, results of the fMRI analyses, and of other data need to be managed. This includes that researchers can recover data and results, when generated and long after, make them available for reuse by other researchers (Open Access), combine results from different projects, obtain new knowledge by combining results from various experimental setups and more. Data access has to be fast, easy, platform independent and systematic. Semantic metadata and ontologies describing both the experimental data and the results play a crucial role in this process. The project will thus create and setup a comprehensive project, data and metadata management service. A careful analysis and evaluation of existing data and metadata as well as existing tools will be the starting point. As most of the data are personal data, concepts and methods for data security and data privacy will play a major role. The final system will be installed and operated in the proximity of the high performance computing (HPC) resources of ZIH.

Project Members

Principal Investigators