Keywords
FAIR - biomedical research - bioinformatic - medical research - ISA
Introduction
Due to the rapid decline in costs, molecular biological data are increasingly collected
and digitally analyzed in often complex manner, so that paper-based laboratory books,
are reaching their limits to make these data and analyses reproducible. For this reason,
the topics of documentation and long-term archiving are more and more moving into
the focus of bioinformatics.[1]
In recent years in particular, both open source and proprietary platforms have been
developed that offer a solution to these problems. Especially university institutions
like university hospitals are therefore interested in a scientifically correct way
to solve these issues. Therefore, the focus of this work is on open source solutions,
as these are usually less expensive and most likely to be considered in terms of open
science.
Translational research platforms such as cBioPortal[2] and tranSMART,[3] which offer sophisticated data models for the integration and analyses of clinical
and omics data, form the basis of our considerations. These platforms allow the user
to perform analyses via graphical web interfaces but usually do not provide support
for documentation and long-term archiving. So far, this has to be achieved by using
paper-based laboratory books.
This is complicated by the fact that there is currently no generally accepted international
standard on how to structure data in this respect. But Wilkinson et al presented principles,
the so-called FAIR (Findable, Accessible, Interoperable, Reusable) principles, which
the data should follow to be sustainably reusable.
FAIR is an acronym for findable, accessible, interoperable, and reusable. These principles
emphasize the ability of computer systems to find (meta-)data, access it, interact,
or reuse it without or with minimal human intervention. This is becoming progressively
necessary as bioinformatics data become increasingly comprehensive and complex. As
a result, computational support is needed.[5] In the following study, the four principles, which Wilkinson et al named, are briefly
described:
Findable: (Meta-)data should be easy for both humans and computers to find, that is, machine-readable.
Accessible: (Meta-)data should be easily accessible by humans as well as machines using standard
communication protocols and should be archivable in the long run. Authentication and/or
authorization is not excluded.
Interoperable: (Meta-)data should be exchangeable, interpretable, and able to be combined with
other datasets in a human- and machine-readable way in a (semi-)automated manner.
Reusable: (Meta-)data should be described so well that they can be reused in future research
or that the results are reproducible. Furthermore, proper citation must be made possible.
Holub et al[6] extended the principles from FAIR to FAIR-Health. These include additional quality
aspects regarding reproducibility, incentives for enrichment of datasets, and approaches
for privacy-oriented work with human material and its data.
These principles serve to ensure data and workflow provenance by leading to more transparency
and security. Data provenance describes the digital object itself (e.g., a data record)
and strengthens its integrity by documenting who made which changes and how. Workflow
provenance, on the other hand, strengthens the integrity of the processing of these
objects by documenting which settings were made and how. This is of particular benefit
to bioinformaticians and researchers, as a key component in biomedical research is
transparency in the reporting of studies to provide reproducible results.[7]
Objectives
Based on translational research platforms such as cBioPortal and tranSMART, which
enable the integration and analysis of data but do not offer tools for FAIR data management,
the aim of this work was to identify and compare platforms that close this gap in
a structured way. With reference to the data and workflow provenance of these translational
research platforms, the software solution should follow the FAIR principles, ideally
the FAIR-Health principles.
A comparison of platforms that connect to these translational research tools to offer
solutions for their documentation, long-term archiving, and processing of clinical
and omics data according to FAIR principles does not yet exist to the best of our
knowledge. In the following study, the procedure and results of the literature search
for identifying such open source platforms for good scientific practice are described.
Methods
We have used PubMed[8] to examine the scientific literature and identified 1.111 articles that potentially
describe software solutions (PubMed queries are available in [Appendix 1]). The first query searched for articles that deal with data management or translational
research in general. The second query then filtered out those for systems biology.
The next query was to find the FAIR principles mentioned in the remaining articles.
Finally, the results were filtered by agreement, published within the last 5 years
and published in English. Then, we manually searched the literature to identify software
solutions that could both (1) document and (long-term) archive medical/omics data
and (2) process this data. A corresponding search with Google Scholar[9] completed the search.
Appendix 1
Detailed queries used for the interrogation of PubMed database (queries last run on
July 28, 2019)
Query number
|
Query
|
Items found
|
1
|
(((data management[Title/Abstract]) OR (database[Title/Abstract] OR database[Title/Abstract]))
OR (data repository[Title/Abstract] OR data repositories[Title/Abstract])) OR (translational
platform[Title/Abstract] OR translational platforms[Title/Abstract] OR translational
research platform[Title/Abstract] OR translational research platforms[Title/Abstract])
|
283 033
|
2
|
((system biology[Title/Abstract] OR systems biology[Title/Abstract])) OR (bioinformatic[Title/Abstract]
OR bioinformatics[Title/Abstract])
|
66 933
|
3
|
((((((((((data sharing[Title/Abstract]) OR data analysis[Title/Abstract]) OR FAIR[Title/Abstract])
OR (Findable[Title/Abstract] AND Accessible[Title/Abstract] AND Interoperable[Title/Abstract]
AND Reusable[Title/Abstract])) OR (metadata standard[Title/Abstract] OR metadata standards[Title/Abstract]))
OR (metadata[Title/Abstract] OR meta data[Title/Abstract])) OR open source[Title/Abstract])
OR software[Title/Abstract]) OR (solution[Title/Abstract] OR solutions[Title/Abstract]))
OR (standard[Title/Abstract] OR standards[Title/Abstract])) OR (strategy[Title/Abstract]
OR strategies[Title/Abstract])
|
2 647 727
|
4
|
(#1 AND #2)
|
6 830
|
5
|
(#3 AND #4)
|
2 263
|
6
|
Filters: Best match AND published in the past 5 years AND English [Language]
|
1 111
|
At first, the identified articles were searched for compilations of potential software
solutions, but nothing of that kind was found. However, a comparison was found by
Wruck et al[10] in which platforms for large-scale systems biology were presented. This comparison,
together with the platforms already examined by Wilkinson et al, was merged with the
results of our keyword search.
After collating the potentially relevant platforms and their literature sources, those
that did not seem to fit into the context of system biology or translational research
platforms were filtered out.
For the creation of the matrices we initially oriented ourselves on Canuel et al,[11] since our aim is to link translational research platforms such as cBioPortal and
tranSMART with a FAIR data management system and the study has already created an
overview of the former. Therefore, we took over the axis community, system requirements,
and support. The community axis grouped information about availability status and
references. The system requirements axis provides information about the required operating
system. The support axis supplies more detailed information about the help provided.
From Wruck et al the features of systems biological data management were (almost)
completely taken over. This created the axis of requirements for a data management
system for systems biology. This axis shows how data are collected, integrated, and
processed. The data corresponding to the matrices have also been adopted. However,
they have been checked and updated for topicality.
It was then examined whether the identified platforms were designated under the two
FAIR initiatives FAIRDOM[12] or FAIRsharing.[13] The fact that these initiatives mention the identified platforms is an indication
of the FAIRness of the platform in question. The FAIRDOM association references several
research platforms for laboratory research projects in life sciences. The aim is to
support sustainable research data management in these areas. FAIRsharing is a platform
that contains a variety of curated descriptions of standards, databases, and data
policies. The goal is to support reproducible and reusable scientific research. Thus,
the axis of the FAIR indication was created. The matrix was then filled in as far
as possible.
Finally, the development team or the responsible persons of the platforms were contacted
to confirm the found information. We received feedback from each development team.
The information we have found in literature which differs from the statements of the
development team, is marked with “a” in [Tables 1] and [2] (the information of the development teams was entered).
Table 1
Details of the main features of the two identified platforms for workflow provenance
Software
|
Galaxy
|
GenePattern
|
FAIR indication
|
|
|
FAIR reference
|
FAIRDOM
|
FAIRsharing
|
Community
|
|
|
Reference
|
Afgan et al 2018[a]
|
Reich et al 2006
|
DOI or PMID
|
29790989[a]
|
16642009
|
License
|
Academic Free License[a]
|
BSD
|
User mailing list or support
|
Yes
|
Yes
|
URL
|
https://galaxyproject.org/
|
http://genepattern.org
|
System requirements
|
|
|
Operating system
|
Linux/MacOS
|
Linux/MacOS
|
DB managing system
|
PostgreSQL
|
HSQLDB, Oracle, MySQL[a]
|
Main programming language
|
Python, JavaScript[a]
|
Python/Java/R
|
Support
|
|
|
Installation procedures
|
Yes
|
Yes
|
Configuration documentation
|
Yes
|
Yes
|
User documentation
|
Yes
|
Yes
|
Regular maintenance
|
Yes
|
Yes
|
Miscellaneous
|
|
|
GitHub availability
|
Yes
|
Yes
|
Demo server availability
|
–
|
–
|
Requirements for data management in system biology
|
|
|
Compliance to standards
|
SBML
|
MAGE-TAB
|
Metadata model
|
SBML-related
|
MIAME-related
|
Automation of data collection
|
Unknown
|
Unknown
|
Upload to public repositories
|
ArrayExpress
|
ArrayExpress
|
Extensibility
|
Via XML files
|
Java extensions
|
Integration with other DMS
|
Unknown
|
Unknown
|
Abbreviations: DB, database; DMS, data management system; DOI, digital object identifier;
FAIR, findable, accessible, interoperable, reusable; PMID, PubMed-ID; SBML, systems
biology markup language; URL, uniform resource locator.
a Data provided by the development team that differ from the data found in the literature.
Table 2
Details of the main features of the eight identified platforms for data provenance.docx
Software
|
BASE
|
Ensembl
|
ISA-Tools
|
LabKey Server
|
openBIS
|
SABIO-RK
|
SEEK
|
FAIR indication
|
FAIR reference
|
Unknown
|
FAIRsharing
|
FAIRsharing
|
Unknown
|
FAIRDOM
|
FAIRDOM
|
FAIRDOM
|
Note
|
–
|
–
|
Original format: ISA-Tab
|
–
|
–
|
Non-Commercial Purpose License
|
–
|
Community
|
Reference
|
Häkkinen et al. 2016[a]
|
Cunningham et al. 2019[a]
|
Rocca-Serra et al. 2010
|
Nelson et al. 2011
|
Bauch et al. 2011
|
Wittig et al. 2012
|
Wolstencroft et al. 2015
|
DOI or PMID
|
https://doi.org/10.1101/038976[a]
|
30407521[a]
|
20679334
|
21385461
|
22151573
|
22102587
|
26160520
|
License
|
GNU
|
Apache License 2.0
|
CPAL
|
Apache License 2.0
|
Apache License 2.0
|
Non-Commercial License[a]
|
BSD
|
User mailing list or support
|
Yes
|
Yes
|
Yes
|
Yes[a]
|
Yes
|
Yes
|
Yes
|
URL
|
http://base.thep.l u.se
|
http://www.ense mbl.org/index.ht ml
|
https://isa-tools.org
|
https://labkey.org
|
https://labnotebo ok.ch
|
http://sabio.hits.org/
|
https://seek4scie nce.org
|
System requirements
|
Operating system
|
Platform independent
|
Linux[a]
|
Platform independent
|
Platform independent
|
Linux
|
Linux[a]
|
Linux
|
DB managing System
|
MySQL, PostgreSQL
|
MySQL
|
–[a]
|
MySQL, PostgreSQL[a]
|
PostgreSQL
|
PostgreSQL[a]
|
PostgreSQL[a]
|
Main programming language
|
Java
|
Perl
|
Python, Java[a]
|
Java, JavaScript
|
Java
|
Java, Grails[a]
|
Ruby on Rails
|
Support
|
Installation
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
–[a]
|
Yes
|
Configuration
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
–[a]
|
Yes[a]
|
User documentation
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Regular maintenance
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Yes
|
Miscellaneous
|
GitHub availability
|
–
|
Yes
|
Yes
|
Yes
|
Only in GitLab[a]
|
–
|
Yes
|
Demo server availability
|
Yes
|
–
|
–
|
Yes
|
Yes
|
–[a]
|
Yes
|
Requirements for data management in system biology
|
Compliance to standards
|
unknown
|
Track hubs
|
MIAME[a]
|
MIAME
|
mzXml (MIBBI projected)
|
SBML, BioPAX[a]
|
MIBBI
|
Metadata model
|
MIAME-related
|
Track hubs
|
ISATab-related
|
RDF
|
Generic/controlled vocabularies
|
SBML-related
|
JERM
|
Automation of data collection
|
batch import
|
unknown
|
batch import
|
batch import
|
dropboxes
|
no automation
|
harvesters
|
Upload to public repositories
|
ArrayExpress, GEO, CIBEX
|
–[a]
|
ArrayExpress
|
–
|
not yet, but in planning
|
–[a]
|
JWS-online
|
Extensibility
|
Java plugins / extentions
|
Perl modules
|
mostly java classes
|
via client API
|
Java extensions
|
java extensions
|
ruby classes
|
Integration with other DMS
|
–[a]
|
java servlets
|
reference integration
|
non-trivial
|
via interfaces to the framework
|
–[a]
|
good, planned
|
Abbreviations: API, application programming interface; BASE, BioArray Software Environment;
BSD, Berkeley Software Distribution; CPAL, Common Public Attribution License; DMS,
data management system; DOI, digital object identifier; GEO, Gene Expression Omnibus;
FAIR, findable, accessible, interoperable, reusable; FAIRDOM, Findable, Accessible,
Interoperable, Reusable, Data, Operation, Models; gGmbH, gemeinnützige Gesellschaft
mit beschränkter Haftung (eng.: not-for-profit limited liability company); GNU, proper
name (GNU’s not Unix); ISA, Investigation, Study, Assay; MIBBI, minimum information
for biological and biomedical investigations; JERM, Just Enough Results Model; JWS,
Java Web Service; MAGE-TAB, MicroArray Gene Expression Tabular; MIAME, Minimum Information
About a Microarray Experiment; MySQL, My Structured Query Language; PMID, PubMed-Identificator;
RDF, Resource Description Framework; SABIO-RK, system for the analysis of biochemical
pathways—reaction kinetics; SBML, systems biology markup language; SysMO, Systems
Biology for Micro-Organisms.
a Data provided by the development team that differ from the data found in the literature.
Results
During this literature search it became clear that Data and Workflow Provenance each
requires its own platform. Therefore, the results are divided into two areas and compared
in matrices with the above-mentioned properties. In the first part, therefore, the
platforms that are used for processing are named. Processing means further use of
the data, for example, for integration into work processes. In the second part, the
platforms that are used for administration are named which means that the platform
supports the functions of documentation and long-term archiving. In the following
study, the identified platforms are therefore presented in a nonexhaustive enumeration.
Platforms for Workflow Provenance
In biomedical disciplines, a workflow is typically used to perform complex data processing
tasks. Workflow provenance refers to the recording of all derivations that led to
the final output of the workflow. Thus, all steps of the workflow creation should
be made transparent and reproducible. In addition to traceability, workflow provenance
can also help to avoid redundant work; because the input of the individual steps is
documented, they do not have to be re-entered each time.[14] The following platforms seem to be best suited for workflow provenance in processing.
Galaxy (2010)
It is a web-based open-source platform for computational biomedical research, which
can also be used by researchers with no programming experience. It was developed by
the Nekrutenko Laboratory at the Center for Comparative Genomics and Bioinformatics
at Penn State University, the Taylor Laboratory at Johns Hopkins University, and the
Goecks Laboratory at Oregon Health & Science University, along with contributions
from the community. It allows analysis workflows to be executed on the basis of the
researchers' data, shared across teams or other researchers to repeat the same analysis.
It thus supports reproducibility and simplifies the exchange of data and results.
Furthermore, the user does not need to compile and install a software.[15]
[16]
[17]
[18]
GenePattern (2006)
It is a web-based open-source software package for computer-aided bioinformatics that
is developed and distributed at the Broad Institute. It offers a variety of calculation
methods for the analysis of genomic data. It enables researchers to analyze data and
visualize their results. Reproducibility is ensured because the analysis methods used,
the sequence of method applications, and the parameter settings are recorded. This
ensures the data and workflow provenance. The extensible architecture makes it easy
to add additional analysis and visualization modules so researchers can regularly
access new methods.[19]
[20]
[21]
Platforms for Data Provenance
In contrast to workflow provenance, data provenance provides a more detailed view
of the integrity of the data. The focus is on the description of the digital object
itself (where it comes from, how it has reached its current state, etc.). In the field
of bioinformatics, data provenance can help to create trust in a system, e.g. to ensure
verifiability.[22] The following platforms seem to be best suited for data prevention in document management
and long-term archiving.
BASE (2005)
BioArray Software Environment (BASE) is a web-based open-source laboratory informatics
application developed at Lund University. It serves as an annotatable microarray data
repository and analysis application that offers researchers both information management
and analysis. The system integrates biomaterial information, raw images, and extracted
data. The plug-in architecture provides data transformation, data display, and analysis
modules. Individual elements can be shared with other users within the database.[23]
[24]
[25]
[26]
Ensembl (1999)
It was developed as open-source software by the European Molecular Biology Laboratories-European
Bioinformatics with the aim of providing creation, maintenance, and updating of reference
genome annotation and comparative genomic resources. It thus offers a bioinformatic
framework for the organization of biological sequences of genomes. The functionalities
range from sequence analysis to data storage and visualization.[27]
[28]
[29]
ISA-Tools (2010)
The open-source framework, developed and distributed by Investigation, Study, and
Assay (ISA)-Tools, consists of five platform-independent components. ISA is an acronym
for “Investigation,” “Study” and “Assay” and describes a data model. These components
are each desktop applications and function both as an independent and a unified system.
The entire software package consists of the ISA-Tab (structuring and communication
of metadata), ISAcreator (creation, editing, and import of experimental metadata sets),
ISA configurator (editing input fields in ISAcreator), ISA validator (checking requirements
and links of the finished data in ISA-Tab), and ISA converter (export to public repositories).
The main objective is the administration and storage of experimental metadata from
the fields of life sciences, environmental sciences, and biomedical sciences. It is
based on the ISA framework of the same name for data models and serialization, so
that the resulting data and findings are reproducible and reusable.[30]
[31]
LabKey Server (2006)
The web-based open source software was developed under the name Computational Proteomics
Analysis System at the Fred Hutchinson Cancer Research Center. Since 2011 this solution
is distributed by LabKey and available under LabKey Server. Based on the data entered,
projects are created in which the participating researchers can collaborate. The data
can be integrated either from different databases or from XLSM files. The objective
is the integration, analysis, and sharing of biomedical data, especially for very
large amounts of data.[32]
[33]
openBIS (2007)
The web-based open source software open Biology Information System (openBIS) is developed
and distributed at ETH Zurich. But it is also offered through the FAIRDOM initiative.
It consists of an information system which extracts (meta-)data from the measuring
devices used and uploads them to the system via drop boxes (directories). After the
import, the (meta-)data can be further processed and analyzed. The main objective
is to support biological research data workflows from source to publication. This
makes the data provenance more secure and allows researchers to access, edit, and
share (meta-)data.[34]
[35]
SABIO-RK (2006)
System for the Analysis of Biochemical Pathways-Reaction Kinetics (SABIO-RK) is a
web-based database and stores comprehensive information about biochemical reactions
and their kinetic properties. It is distributed by Hits gGmbH. It is free of charge
except for commercial use. Unlike other databases, it is reaction-oriented and contains
quantitative information about reaction dynamics. The data are manually entered and
commented by the researchers. The consistency of the data is checked automatically.
The database also supports data export, e.g., for further modeling tools, including
annotation using the Systems Biology Markup Language.[36]
[37]
SEEK (2009)
This web-based open-source solution consists of several tools and supports as cataloguing
and community platform in the administration, exchange, and research of (meta-)data
and models in systems biology. Originally developed for SysMO, a pan-European consortium
for the study of molecular processes in microorganisms, it is now distributed by the
FAIRDOM initiative. The access driven platform enables researchers to collaborate
on projects on a daily basis and to publish data and models. The plug-in architecture
allows the linking of experiments, their protocols, data, and models. This preserves
associations between datasets, individuals, and organizations.[38]
[39]
Conclusion
The variety of possible platforms for documentation, long-term archiving, and processing
of clinical and omics data makes the search for a suitable platform a challenge; in
particular because the solutions can be either too general or too specific.
Therefore, this work should give a structured overview of the platforms which basically
exist, to make them comparable. In addition, it was also shown which platforms seem
to take the FAIR principles into account. Thus, a first preselection for a suitable
platform can be made. Within the scope of this literature research, it can therefore
not be ruled out that further potentially suitable platforms may exist. In addition,
the focus of the search was on open source platforms, commercial products were not
considered. Furthermore, the ongoing development of the platforms cannot guarantee
that the information provided here is up to date or how long this will remain so.
Based on this literature research, it was not possible to determine to what extent
the platforms mentioned were actually implementing the FAIR principles. It was only
possible to identify indications of compliance with the principles. The same also
applies to the principles extended to FAIR-Health. The fulfilment of the principles
by the identified platforms must therefore be assessed on a case-by-case basis, as
well as their suitability for the individual needs of researchers and organizations.
A further result of this literature research was the finding that a distinction is
made between the fulfilment of provenance between data and work processes. Therefore,
it seems reasonable to select at least two platforms to ensure the provenance from
the time of documentation to processing and (long-term) archiving.
Related Work
In this review, a total of nine platforms were presented. Similarities and differences
between the platforms in the two matrices were shown. In addition, indications of
the compliance with the FAIR principles were identified.
Parts of the matrix created in this paper were adopted from Canuel et al. The study
focused on the analysis of clinical and omics data itself, e.g., safety and interoperability
functions. The ability to FAIR documentation or long-term archiving was not part of
his scope. There was also a gap in terms of further processing of the data.
Wruck et al presented data management strategies for large-scale projects in systems
biology. Parts of the matrix created in this paper were adopted. The focus of Wruck's
work was more on the technical perspective than on a general overview of existing
solutions. The FAIR principles were also not addressed because they had not yet been
developed at the time of writing.
Rating of the Identified Platforms
In order of documentation, further processing and (long-term) archiving the data models
of clinical and omics data from translational research platforms such as cBioPortal
and tranSMART in accordance with the FAIR principles, the identified platforms are
now structured. Starting with those that we consider will meet our needs the most.
SEEK convinces with its functionality for linking data records, persons, and organizations
within projects. This ensures the relationships between the objects and thus the data
provenance. The platform also offers the possibility of daily collaboration. As part
of the FAIRDOM initiative, it can be assumed that the FAIR principles are taken into
account.
Galaxy processes and shares the uploaded data using analysis workflows. Galaxy is named
by the FAIRDOM initiative. This already indicates that the processed data will be
used according to FAIR principles. A further point is the user friendliness of the
system toward noninformaticians. Although the field of bioinformatics requires a certain
affinity to computer science, other end users may (at least partially) come from other,
noninformatics disciplines such as human biology.
openBIS supports biological research, from the generation to the publication of research
data. The permanent availability of the data within the research process ensures the
data provenance. Furthermore, by participating in the FAIRDOM initiative, it can be
concluded that the FAIR principles are taken into account.
GenePattern offers a variety of calculation options for the analysis of the uploaded data. The
data provenance is additionally saved because parameter settings are recorded in addition
to the method used. The software package apparently also takes the FAIR principles
into account.
ISA-Tools convince with its relatively high flexibility: only the desired modules can be used
without having to give up their full functionality. For example, the use of ISAcreator
can be dispensed with if the input fields provided in ISA-Tab are sufficient for the
metadata. On the other hand, you have to implement ISAcreator to edit the input fields.
A further important criterion also seems to fulfil ISA-Tool: the compliance with the
FAIR principles.
Ensembl also takes the FAIR principles into account, making it a potentially usable resource.
However, specialization in genome sequences seems to limit its application to this
discipline. Whether and to what extent this solution can be applied to other fields
of bioinformatics must be examined in each individual case.
LabKey organizes datasets in projects, so researchers can potentially collaborate better
than if the datasets were “loosely” distributed. Of particular interest is the functionality
of importing data from different databases and files. Within this literature search,
however, no clues could be found as to whether and to what extent the FAIR principles
were taken into account. This point would have to be checked in a further investigation.
BASE is a software particularly developed for laboratory applications. The focus of this
application seems to be limited to sequencing and microarray experiments. A potential
use of this solution would therefore initially be limited to this area only. Furthermore,
no evidence could be found to document participation in a FAIR initiative or consideration
of the principles.
SABIO-RK provides a reaction-oriented database for biochemical reactions. Thus, the potential
field of application is also strongly limited to this discipline. This database considers
the FAIR principles, at least there are indications for it in the literature.
Future Work
The development of the FAIR principles creates a regulatory framework that facilitates
the sustainable reusability of such data. This ensures the provenance of both the
data and the workflow. Initiatives have been formed for their spreading and are already
being considered by several platforms. Together with the further development of the
principles, this underlines the importance of this subject.
However, it has also been shown that efforts have been made and are being made to
improve the existing provenance in bioinformatics. Given the variety of platforms
and the ever-increasing availability of clinical and omics data, a FAIR solution is
essential for the consistent and sustainable reuse of this data.
In the next step the identified platforms, probably SEEK/openBIS for documentation
and archiving and Galaxy for processing, will be linked with the translational research
platforms cBioPortal and tranSMART established at the university hospital. This will
ensure a FAIR and translational data management.