Tomasz Miksa - Information and Software Engineering

Current and Past Projects of Tomasz Miksa

OSTrails

Open Science Plan-Track-Assess Pathways

(2024 - 2027) OSTrails aims to advance processes and instruments for Planning, Tracking, and Assessing scientific knowledge production beyond state-of-the art, working with various national and thematic contexts, improving existing infrastructure, and connecting key components. For the Plan stage, OSTrails aims to increase the efficacy of Data Management Plans, turning them from static narratives to living, interconnected "machine actionable" resources, making them the instrument of choice for improving quality of RDM. For the Track stage, OSTrails is set to establish an open, interoperable and high-quality ecosystem of Scientific Knowledge Graphs, enriching them to become evidence of communities’ FAIR implementations. For the Assess stage, OSTrails aims to deliver modular and extendable FAIR tests, towards “machine actionable” metrics, complemented by user guidance embedded in tools assisting any stage of research life cycle. Presented as the OSTrails Commons, the resulting methods, tools, services, guidance & training form the necessary building blocks to provide end-to-end solutions that serve: (i) researchers and research support personnel realise FAIR at any stage of the research life cycle, for any digital object; and (ii) research funding organisations, research performing organisations, publishers to drive the improvement of the quality of RDM for any shared, funded and published research product. OSTrails will lower the barriers to plan and practice FAIR research, moving the dial from FAIR assessment to FAIR assisting practice; also, to enhance traceability and improve evidence-based evaluation of research via a more networked scholarship. OSTrails is deeply rooted in the work of 38 partners, 22 research performing organisations, 5 ESFRI Clusters, and 24 pilots, acknowledging that there is no one-sizefits- all solution and that different national and thematic infrastructures have varying goals and priorities, as well as approaches to streamlining FAIR.

Shared RDM

Shared RDM Infrastructure and Services

(2023 - 2026) Research data management (RDM) is becoming increasingly important for researchers and requires a variety of supporting tools that can be used throughout the lifecycle of research data. In recent years, technical and organizational developments have already been advanced and are being used in various research areas (e.g. research data repositories, analysis platforms, electronic laboratory notebooks, and data stewardship programs). The aim of the project is to create a framework to offer selected tools and infrastructures in the field of RDM as shared services for selected Austrian universities and research institutions. This bundling of different expertise creates a sensible use of resources and promotes interoperability, standardization and the connection to international initiatives. The project is carried out in the spirit of the EOSC initiative and thus contributes to an even more reliable and easier re-use of research output. It creates a landscape of national RDM infrastructures and services that can be used as a use case/success story at international level and increases Austria’s visibility.

Skills4EOSC

Skills for the European Open Science Commons: Creating a Training Ecosystem for Open and FAIR Science

(2022 - 2025) Skills4EOSC brings together leading experiences of national, regional, institutional and thematic Open Science (OS) and Data Competence Centres from 18 European countries with the goal of unifying the current training landscape into a common and trusted pan-European ecosystem, in order to accelerate the upskilling of European researchers and data professionals in the field of FAIR and Open Data, intensive-data science and Scientific Data Management. Competence Centres (CC) are seen as centres of gravity of OS and EOSC activities in their countries. These entities can either be established national initiatives (as is the case of ICDI in Italy) or initiatives under establishment (e.g. Austria, Greece and the Nordic countries) or organizations which have the leading or mandated contribution to the OS activities nationally. CCs pool the expertise available within research institutions, universities and thematic and cross-discipline research infrastructures. They offer training and support, empowerment, lifelong learning, professionalization and resources to a variety of stakeholders, including not only researchers and data stewards, but also funders, decision makers, civil servants, and industry. Thanks to their position at the heart of the above described multi-stakeholder landscape, the CCs represented by the Skills4EOSC partners play a pivotal role in national plans for Open Science and in the interaction with scientific communities. They also have close access to policy makers and the related funding streams. The Skills4EOSC project will leverage this reference role to establish a pan-European network of CCs on OS and data, coordinating the work done at the national level to upskill professionals in this field. The Skills4EOSC CC network will drive the co-creation of harmonised trainer accreditation pathways, academic and professional curricula and skills quality assurance, recognition frameworks, and learning material creation methodologies.

gAIa

FFG KIRAS: Predicting landslides - Entwicklung von Gefahrenhinweiskarten fur Hangrutschungen aus konsolidierten Inventardaten

(2021 - 2023) The goal of the gAia project is to generate hazard-warning maps for landslides. The project therefore aims to develop a methodology to improve and amend existing landslide occurrence data by remote-sensing data. Such data originates from heterogeneous sources and will therefore be fused and harmonized in order to create a comprehensive data-inventory. Based on the existing, as well as newly generated remote-sensing data, a prediction model will be created utilizing the modern artificial intelligence (AI) techniques in order to estimate the probability of a landslide occurrence. This prediction model will serve as a base for the generation of landslide-warning maps for two of the Austrian federal states (Lower Austria & Carinthia).

OBARIS

FFG IKT: Ontology-Based ARtifical Intelligence in the Environmental Sector

(2020 - 2022) In recent years, we have seen a renaissance in Artificial Intelligence (AI) research and the diffusion of AI into industry-strength applications, such as intelligent question answering systems or self-driving vehicles. On the one hand, technologies in the area of symbolic, semantic systems have emerged from Semantic Web research. This development culminated in the industrial adoption of Knowledge Graphs, which help to capture complex world knowledge in a reusable manner and makes it possible to infer new knowledge from it. On the other hand, machine learning research has been extended to solve problems where it is important to capture latent knowledge by learning from weak signals. While advances have been made in both symbolic and subsymbolic AI research, further opportunities lie in the combined use of these two paradigms in ways that benefit from their complementary strengths, especially when applied in real-life settings that exhibit characteristics addressable by both paradigms. At this stage, there is still a lack of (i) a systematic understanding of categories of such semantic AI systems and (ii) technology stacks that enable implementation of such system. Finally, auditability, which is a necessary prerequisite towards providing transparency, is a major concern in AI in general and in many practical application contexts in particular. In this setting, OBARIS aims to advance the state of the art in semantic AI systems by investigating both conceptual aspects of these systems and by developing a technology stack that facilitates transposing these system types into concrete settings. In particular, we focus on two use cases from the environmental domain characterized by the need to perform complex analytics on collections of heterogeneous data sources. Concretely, the project's goals are to (i) advance the understanding of typologies of semantic AI systems; (ii) establish a technology stack for creating such systems; (iii) produce guidelines and frameworks that enable building auditable, transparent and trustworthy systems and (iv) to validate the project findings by means of real-world use cases. To that end, the project follows a multidisciplinary approach that brings together research on Knowledge Graphs, Linked Data, Machine Learning, Auditability, and Environmental Informatics. The project goals will materialize in a number of concrete, innovative results including (i) a taxonomy of semantic AI systems that will foster the understanding of such system typologies by the research community and industry adopters alike; (ii) a semantic AI framework offering a technology stack for creating hybrid semantic - machine learning data processing pipelines constructed from intelligent building blocks for data acquisition, integration, and analytics; (iii) guidelines, frameworks and a research prototype that enable building and monitoring auditable, transparent and trustworthy Semantic AI Systems; and (iv) demonstrators that apply the developed methods and components within two use cases in the environmental domain. The involvement of Umweltbundesamt (U), the environmental specialist institution of the Austrian Federal Government, as a use case partner will enable a strong impact of the project, which should ultimately contribute towards the general AI agenda of Austria.

WellFort

FFG Bridge: WellFort

(2019 - 2022) The WellFort project aims to research the basic mechanisms to: (a) provide secure storage for users' sensitive data, (b) deliver a trusted analysis environment for executing data analytics processes in a controlled privacy-preserving environment, (c) combine data from different companies for analysis while respecting user privacy and consent given.
A novelty of this approach is that companies do not get direct access to data, but only in aggregated or anonymised form. In addition, they can benefit from a large group of individuals that are potentially willing to share their data for research. Based on the project results, it will be possible to operate a trusted platform where companies can securely execute data analysis algorithms. A novelty of this approach is that companies do not get direct access to data, but only in aggregated or anonymised form. In addition, they can benefit from a large group of individuals that are potentially willing to share their data for research. Users on the other hand benefit from a privacy and security respecting platform for their data, and can contribute to research projects in a secure manner. Finally, scientific researchers have a detailed source of microdata, if data owners give consent to their research proposals.

ExpCPS

Explainable Cyber Physical Systems

(2020 - 2021) The project Explainable Cyber Physical Systems (ExpCPS) concerns itself with the challenge of increasing the explainability of future Cyber Physical Systems (CPS) to explain their past, current, and future behavior (why a certain actions/decisions are taken, how certain goal can be achieved, etc.). The project goals are to develop solutions involving multi disciplinary fields such as software engineering, systems engineering, etc. covering (i) Ontology integration, (ii) Provenance capturing, (iii) Causality detection. In the domain of smart energy systems particularly smart grid in conjunction with BIFROST smart grid simulation environment

VASQUA

WAW: Verification and Semantics for Data Quality Improvements (VaSQua)

(2020 - 2021) Enterango has developed an innovative travel planning and management tool for meetings and events, that requires large amounts of data. Some of that data are not readily available and has to be integrated from different sources. These sources have a heterogeneous structure, might contain errors, are often not complete and could be updated at any time. To create a sustainable and efficient data integration and cleaning process, Enterango teams up with the Information and Software Engineering team of the TU Wien to apply the latest research insights to solve this real world data problem. In this Verification and Semantics for Data Quality Improvements (VaSQua) project, a new data integration and workflow tracking process is described. First, data from different sources are integrated into a knowledge graph. The various datasets are enriched with metadata describing the history, origin, processing of the data and other features. The data with its metadata are then used to automatically clean and correct data and the remaining data is manually corrected by a human. The data-integration process is novel, as according to latest research additional metadata findings are used to reduce the required human work during the data cleaning.

FAIR Data Austria

BMBWF: FAIR Data Austria

(2020 - 2022) The FAIR Data Austria project is designed to strengthen knowledge transfer between universities, industry, and society und supports the sustainable implementation of the European Open Science Cloud (EOSC). Within the project, implementation of the FAIR principles (which mandate that research data be Findable, Accessible, Interoperable, and Reusable) plays a major role. Observation of the FAIR principles is secured through 1) integrated data management aligned with generic and discipline-specific needs of researchers, 2) development of next-generation repositories for research data, code, and other research outputs, and 3) development of training and support services for efficient research data management. FAIR Data Austria thereby offers tools to complement the Austrian Data Lab and Services as well as RIS Synergy projects. Supporting the entire data lifecycle - from data generation all the way to data archiving - with the appropriate tools and expertise is essential to achieve efficient research data management according to the FAIR principles, a process that can only be successful when supported by all Austrian HEIs. The FAIR Data Austria project therefore supports the collaboration of Austrian universities in developing coherent services for research data, thereby securing Austria's position within the international research landscape.

FAIR2environment

TUW Digitalisierung: FAIR2environment

(2020 - 2021) Das FAIR2environment Projekt konzentriert sich auf folgende drei Ziele. Umweltdaten aus den Disziplinen der Erdbeobachtung, Luftchemie und Hydrologie, welche von den teilnehmenden Fakult�ten zur weiterf�hrenden Forschung von Interesse sind, werden zun�chst auf ihre Vereinbarkeit mit den FAIR-Prinzipien �berpr�ft und ihre Metadaten aneinander angepasst. Die angepassten Umweltdaten werden in ein an der TU Wien vorhandenes Repositorium eingepflegt. Dies geschieht zun�chst manuell. Automatisierte, skalierbare Prozesse zum Upload der verschiedenen Umweltdaten werden entwickelt mit dem Ziel der sp�teren Einbindung noch zu prozessierender oder alternativer Daten. Ein auf die eingepflegten Umweltdaten angepasster Zugang f�r Nutzer der TU Wien Mit diesen Zielen soll die erstmalige Verwendung eines existierenden Repositoriums an der TU Wien, optimiert f�r Geodaten und f�r f�cher�bergreifende Fragestellungen aufgezeigt werden.

IDSDL

FFG: InnovationslehrgangInnovation Course Program Data Science und Deep Learning

(2017-2020) The potential of Big Data analytics in companies is very high. Companies that implement data-driven innovation have 5%-10% higher growth in productivity than those that do not. The application of Big Data analytics in the EU between 2014 and 2020 will increase GNP (GDP) by 1.9%. Due to the high demand, IDC estimates that 770,000 Data Scientist jobs will be unfilled in the EU in 2020. The innovation course "Data Science and Deep Learning" strengthens the knowledge of the latest approaches in the field of data science in companies and thus enables the implementation of data-driven innovation.

e-Infrastructures Austria Plus

BMBWF: e-Infrastructures Austria Plus

(2017-2019) The project �e-Infrastructures Austria Plus� is a project from nine Austrian universities funded by the Austrian Federal Ministry of Education, Science and Research. The aim of the project is the coordinated development of an Austrian network for the establishment and further development of common e-Infrastructures by bundling resources and existing knowledge.

Flowsense

OEAD: Flowsense

(2017 - 2019) Recently, important breakthroughs in science or industry were achieved by applying data analytics methods to large datasets. Data analysis process is interactive and iterative and requires cooperation between the data scientists and domain experts. Additionally, implementation of the data analysis processes required complex infrastructure with many technologies and software/hardware dependencies, which can have an impact on the final results of the analysis. The aim of the proposed project is to define semantic framework for the description of the data analysis processes, covering business and data understanding phase, data pre-processing and data modelling, model evaluation and description of the underlying process execution infrastructure. The main goal of the proposed framework is to support reusability and replicability of the data analysis processes and improve communication efficiency between the data scientists and domain experts.

FAIRness for Life Science Data in Austria

FWF: FAIRness for Life Science Data in Austria

(2017 - 2019) Management, integration, and reuse of research data are key for innovation and creation of new knowledge. Although we have numerous data sources such as ChEMBL, PubChem, UniProt, and the Protein Data Bank available in the public domain, most of the data created in publicly funded research projects end up in pdf-based supplementary files of publications. At best they are additionally deposited in University repositories such as PHAIDRA, or on the web-site of the principal investigator. Although in principle public, they are quite hidden and not directly accessible for search machines. In order to push the demand for open data, the so called FAIR principles for data (Findable, Accessible, Interoperable, Reuseable) were introduced. These four foundational principles should guide data producers and publishers to ensure transparency, reproducibility, and reusability of data, methods, algorithms, and workflows. Within this project we will perform a pilot study for the data created in two multi-partner collaborative projects in the life science domain in order to make them available via PHAIDRA, the digital asset management system for long-term archiving at the University of Vienna: SFB35 - Transmembrane Transporter in Health and Disease, and MolTag - Molecular Drug Targets. In particular, we will adapt the metadata scheme in PHAIDRA, in order to render the data at least partly FAIR.

openEO

H2020 Project: A Common, Open Source Interface between Earth Observation Data Infrastructures and Front-End Applications

(2017-2020) openEO is a H2020 project funded under call EO-2-2017: EO Big Data Shift. The capabilities of the latest generation of Earth observation satellites to collect large volumes of diverse and thematically rich data are unprecedented. For exploiting these valuable data sets, many research and industry groups have started to shift their processing into the cloud. Although the functionalities of existing cloud computing solutions largely overlap, they are all custom-made and tailored to the specific data infrastructures. This lack of standards not only makes it hard for end users and application developers to develop generic front-ends, but also to compare the cloud offerings by running the same analysis against different cloud back-ends. To solve this, a common interface that allows end- and intermediate users to query cloud-based back offices and carry out computations on them in a simple way is needed. The openEO project will design such an interface, implement it as an open source community project, bind it to generic analytics front-ends and evaluate it against a set of relevant Earth observation cloud back offices. The openEO interface will consist of three layers of Application Programming Interfaces, namely a core API for finding, accessing, and processing large datasets, a driver APIs to connect to back offices operated by European and worldwide industry, and client APIs for analysing these datasets using R, Python and JavaScript. To demonstrate the capability of the openEO interface, four use cases based chiefly on Sentinel-1 and Sentinel-2 time series will be implemented. openEO will simplify the use of cloud-based processing engines, allow switching between cloud-based back office providers and comparing them, and enable reproducible, open Earth observation science. Thereby, openEO reduces the entry barriers for the adaptation of cloud computing technologies by a broad user community and paves the way for the federation of infrastructure capabilities.

ROMOR

Erasmus+ Capacity Project: Research Output Management through Open Access Institutional Repositories

(2016-2019) ROMOR aims over the course of three years to build capacity on research output management in four leading PS HEIs by establishing Open Access Institutional Repositories (OAIR). The training which is required to establish these repositories, and then their implementation, population and management will be the core of the project. Learning outcomes from these activities will also be shared and disseminated in a variety of ways, including the establishment of mechanisms to assist proactively other institutions in setting up and managing repositories. This Project aims to improve not only the visibility and the management of scientific research, but also to support the advocacy in support of open access to research outputs and to foster scholarly communication and coordination between PS HEIs.

TIMBUS

Digital Preservation for Timeless Business Processes and Services (Integrated Project (FP7))

TIMBUS will endeavour to enlarge the understanding of DP to include the set of activities, processes and tools that ensure continued access to services and software necessary to produce the context within which information can be accessed, properly rendered, validated and transformed into knowledge. One of the fundamental requirements is to preserve the functional and non-functional specifications of services and software, along with their dependencies.

4C

Collaboration to Clarify the Costs of Curation (FP7 project)

(2013-2015) 4C will help organisations across Europe to invest more effectively in digital curation and preservation. Research in digital preservation and curation has tended to emphasise the cost and complexity of the task in hand. 4C reminds us that the point of this investment is to realise a benefit, so our research must encompass related concepts such as risk, value, quality and sustainability. Organisations that understand this will be more able to effectively control and manage their digital assets over time, but they may also be able to create new cost-effective solutions and services for others.

Data Science - Insights for Society

Current and Past Projects of Tomasz Miksa