Open Source Secure Data Infrastructure and Processes (OSSDIP) Supporting Fully Controlled Data Visiting for Sensitive Data

supported by
Logo EOSCSec

in cooperation with
Logo COVID19 future Operations Board

Already very early on during the rise of the COVID19 pandemic, the need for solid, data-driven decision was recognized. During meetings of the COVID19 Future Operations Clearing Board, a national expert platform it became evident that access to essential data was missing. This was primarily due to the impossibility of data owners to share their data with experts, either due to privacy reasons (medical, social science), but also due to the massive risk involved in sharing commercially sensitive data. In order to break this deadlock, TU Wien within a timeframe of two weeks set up a high-security data infrastructure and according processes to allow data owners to provide

  • highly selective access (data visiting)
  • to specific (fine-granular or aggregated, fingerprinted) subsets of data
  • for identified individuals
  • for limited periods of time
  • to answer precisely defined questions accepted by the data owner

    COVID19 Future Operation Secure Data Infrastructure at TU Wien
    Overview of the system architecture as currently deployed

    This infrastructure - and specifically its fast set-up and deployment to support the work of the COVID19 Future Operations Board was possible as we could build on experience gained by operating a very similar infrastructure in the health care sectore for many years as part of the DEXHELPP project.

    As this infrastructure was set-up and deployed in less than two weeks, it was obviously highly "handcrafted" and targeted to the specific needs and local set-up. Yet, the availability of such an infrastructure has met the interest of several parties who would like to provide such a data hosting / data visiting solution on their own. We have thus been granted funding to package and release this infrastructure components and according documentation to allow other parties to set-up and operate this infrastructure (or parts of it) in their own environment, thus being able to maintain complete control over who is able to work with which parts of their data within their own server infrastructure, and without having to hand over data to external partners.

    The goal of this project, thus, is to clean up, document and enhance this infrastructure to provide a fully documented, entirely open-source based reference implementation of a secure data infrastructure (OSSDIP) supporting data visiting. This will allow institutions to quickly set-up and deploy a similar solution to provide access to their data. It addresses the RDA COVID19 Recommendations, that "Measures should be taken in order to organise the sharing of data and trial documents in a suitable, trustworthy and secure data repository”, and provides some core functionality of the Safe Setting component of Trusted Research Environments as set out in the Green Paper of the UK Health Research Data Alliance. It allows data not sharable via existing COVID19 data portals to be used for analysis by enabling data owners to make their data visitable and usable in a fully controlled manner. It is thus meant to complement existing Open Data portals and support access to sensitive and not openly sharable data.

    Reference Implementation

    For details on this project see the main project website at ossdip.at.

    The reference implementation as well as documentation of the set-up procedure has been released and is available for evaluation at:

    https://gitlab.tuwien.ac.at/martin.weise/ossdip

    A description of the architecture as well as design of the underlying processes is available on Zenodo:

    https://zenodo.org/record/4632903

    (The set-up, as described, caters to the more complex set-up of a third party hosting the infrastructure both for the data owner as well as for analysts. In settings where the data owner is operating the infrastructrue directly, the processes as well as the set-up can be simplified, eliminating the need for the data ingest procedures via the Data Owner VMs.)

    Contact: Andreas Rauber