You will learn here about Process Management Plans (PMPs) which are used to describe eScience experiments. PMPs complement the description of scientific data taking a process centric view, viewing data simply as the result of underlying processes such as capture, (pre-) processing, transformation, integration and analyses. The general objective of PMPs is to foster identification, description, sharing and preservation of scientific processes.


In the era of Research Infrastructures and Big Data sophisticated data management practices are becoming essential building blocks of successful science. Most practices follow a data-centric approach which does not take into account the processes which created, analysed and presented the data. This fact limits the possibilities for reliable verification of results. It further does not guarantee the reuse of research which is one of the key aspects of credible data-driven science. For that reason we propose the introduction of Process Management Plans which focus on the identification, description, sharing and preservation of the entire scientific processes. They enable verification and later reuse of result data and processes of scientific experiments.


The PMP is created at the time of process design and is maintained and updated during the lifetime of the process by various stakeholders. The proposed lifecycle of the PMP is shown in the Figure below. Lifecycle At the beginning when the scientist applies for the research grant, an initial version of a PMP is created. It provides a high level overview on the processes being used in the experiment. When the proposal is accepted and the actual research starts, the users work using any tools and methods they prefer, no burden from the PMP is imposed on them. However, when they reach a milestone in the project, for example when they publish some of the results or an intermediate stage result is being handed over to another scientist, then the PMP requires update, i.e. filling in information which describes the experiment. When the project is finished and the data and processes are being deposited into a repository, all the information must also be provided in the PMP. This does not mean that the lifecycle of the PMP is finished at this stage. When the process data and processes are kept in an archive, several digital preservation actions may be applied to them, e.g. migration, emulation, etc. All these actions have to be reflected in the PMP, because they modify the original process. Finally, when the process is redeployed and reused in a new experiment, information stored in the PMP can be transferred to a new PMP created for the new experiment. Thus, the whole lifecycle of the process, beginning with the process design, finishing with the process reuse, is fully documented.


PMPs address needs of several stakeholders. Figure below presents the affected parties. Stakeholders Projects applicants benefit by being able to better identify and plan the resources needed for the research. For example, if the process of transforming the experimental data assumes use of proprietary software with an expensive license, this information will be revealed at an early stage and can be taken into the account when applying for a grant.

Researchers benefit by working with better documented processes. This leverages sharing of results and eases reuse of existing research. Moreover, they will need to spend less time when joining a new project, because useful information will be provided to them in a structured way by the PMP.

From the point of view of funding bodies the PMPs safeguard the investment made into research by ensuring research results are trustable /verifiable, and can be re-used at later points in time. Furthermore, PMPs facilitate cooperation between the projects, because they make it easier to reuse processes used in other projects and facilitate exploitation of results of other funded projects. Thus, PMPs lead to sustainable research and can save funding that can be allocated to new research projects.

Repositories which keep the deposited processes and data can better estimate the costs of curation and plan actions needed to maintain the deposited resources. PMPs also support long term preservation (keeping process usable over time) and provide information on possible events triggering necessary digital preservation activities.

PMPs bring also benefits to reusing parties, because their research can be accelerated by reusable processes. The reusing parties have also higher confidence that they can build on previous work, because the quality is higher due to the reproducibility. Furthermore, the scientists whose processes are reused for other experiments can gain higher recognition and credit.

Contents of PMP

The PMPs include following information

  • Overivew and context
  • Description of process and its implementation
    • Process description
    • Process implementation
    • Data used and produced by process
  • Preservation
    • Preservation history
    • Long term storage and funding
  • Sharing and reuse
    • Sharing
    • Reuse
    • Verification
    • Legal aspects
  • Monitoring and external dependencies
  • Adherence and Review
You can find more information in our paper [here].

Work in progress

We are currently working on automation of PMP creation and verification by extraction of process characteristics automatically from its environment. Specific focus is on tool support to automate many of the various documentation steps, specifically capturing and monitoring of process components used in the process implementation. Moreover, we are currently evaluating the PMP with stakeholders from different scientific communities.

Useful links

