Current and Past Projects of Tomasz Miksa

OBARIS

FFG IKT: Ontology-Based ARtifical Intelligence in the Environmental Sector

(2020 - 2022) In recent years, we have seen a renaissance in Artificial Intelligence (AI) research and the diffusion of AI into industry-strength applications, such as intelligent question answering systems or self-driving vehicles. On the one hand, technologies in the area of symbolic, semantic systems have emerged from Semantic Web research. This development culminated in the industrial adoption of Knowledge Graphs, which help to capture complex world knowledge in a reusable manner and makes it possible to infer new knowledge from it. On the other hand, machine learning research has been extended to solve problems where it is important to capture latent knowledge by learning from weak signals. While advances have been made in both symbolic and subsymbolic AI research, further opportunities lie in the combined use of these two paradigms in ways that benefit from their complementary strengths, especially when applied in real-life settings that exhibit characteristics addressable by both paradigms. At this stage, there is still a lack of (i) a systematic understanding of categories of such semantic AI systems and (ii) technology stacks that enable implementation of such system. Finally, auditability, which is a necessary prerequisite towards providing transparency, is a major concern in AI in general and in many practical application contexts in particular. In this setting, OBARIS aims to advance the state of the art in semantic AI systems by investigating both conceptual aspects of these systems and by developing a technology stack that facilitates transposing these system types into concrete settings. In particular, we focus on two use cases from the environmental domain characterized by the need to perform complex analytics on collections of heterogeneous data sources. Concretely, the project's goals are to (i) advance the understanding of typologies of semantic AI systems; (ii) establish a technology stack for creating such systems; (iii) produce guidelines and frameworks that enable building auditable, transparent and trustworthy systems and (iv) to validate the project findings by means of real-world use cases. To that end, the project follows a multidisciplinary approach that brings together research on Knowledge Graphs, Linked Data, Machine Learning, Auditability, and Environmental Informatics. The project goals will materialize in a number of concrete, innovative results including (i) a taxonomy of semantic AI systems that will foster the understanding of such system typologies by the research community and industry adopters alike; (ii) a semantic AI framework offering a technology stack for creating hybrid semantic - machine learning data processing pipelines constructed from intelligent building blocks for data acquisition, integration, and analytics; (iii) guidelines, frameworks and a research prototype that enable building and monitoring auditable, transparent and trustworthy Semantic AI Systems; and (iv) demonstrators that apply the developed methods and components within two use cases in the environmental domain. The involvement of Umweltbundesamt (U), the environmental specialist institution of the Austrian Federal Government, as a use case partner will enable a strong impact of the project, which should ultimately contribute towards the general AI agenda of Austria.

WellFort

FFG Bridge: WellFort

(2019 - 2022) The WellFort project aims to research the basic mechanisms to: (a) provide secure storage for users' sensitive data, (b) deliver a trusted analysis environment for executing data analytics processes in a controlled privacy-preserving environment, (c) combine data from different companies for analysis while respecting user privacy and consent given.
A novelty of this approach is that companies do not get direct access to data, but only in aggregated or anonymised form. In addition, they can benefit from a large group of individuals that are potentially willing to share their data for research. Based on the project results, it will be possible to operate a trusted platform where companies can securely execute data analysis algorithms. A novelty of this approach is that companies do not get direct access to data, but only in aggregated or anonymised form. In addition, they can benefit from a large group of individuals that are potentially willing to share their data for research. Users on the other hand benefit from a privacy and security respecting platform for their data, and can contribute to research projects in a secure manner. Finally, scientific researchers have a detailed source of microdata, if data owners give consent to their research proposals.

VASQUA

WAW: Verification and Semantics for Data Quality Improvements (VaSQua)

(2020 - 2021) Enterango has developed an innovative travel planning and management tool for meetings and events, that requires large amounts of data. Some of that data are not readily available and has to be integrated from different sources. These sources have a heterogeneous structure, might contain errors, are often not complete and could be updated at any time. To create a sustainable and efficient data integration and cleaning process, Enterango teams up with the Information and Software Engineering team of the TU Wien to apply the latest research insights to solve this real world data problem. In this Verification and Semantics for Data Quality Improvements (VaSQua) project, a new data integration and workflow tracking process is described. First, data from different sources are integrated into a knowledge graph. The various datasets are enriched with metadata describing the history, origin, processing of the data and other features. The data with its metadata are then used to automatically clean and correct data and the remaining data is manually corrected by a human. The data-integration process is novel, as according to latest research additional metadata findings are used to reduce the required human work during the data cleaning.

FAIR Data Austria

BMBWF: FAIR Data Austria

(2020 - 2022) The FAIR Data Austria project is designed to strengthen knowledge transfer between universities, industry, and society und supports the sustainable implementation of the European Open Science Cloud (EOSC). Within the project, implementation of the FAIR principles (which mandate that research data be Findable, Accessible, Interoperable, and Reusable) plays a major role. Observation of the FAIR principles is secured through 1) integrated data management aligned with generic and discipline-specific needs of researchers, 2) development of next-generation repositories for research data, code, and other research outputs, and 3) development of training and support services for efficient research data management. FAIR Data Austria thereby offers tools to complement the Austrian Data Lab and Services as well as RIS Synergy projects. Supporting the entire data lifecycle - from data generation all the way to data archiving - with the appropriate tools and expertise is essential to achieve efficient research data management according to the FAIR principles, a process that can only be successful when supported by all Austrian HEIs. The FAIR Data Austria project therefore supports the collaboration of Austrian universities in developing coherent services for research data, thereby securing Austria's position within the international research landscape.

FAIR2environment

TUW Digitalisierung: FAIR2environment

(2020 - 2021) Das FAIR2environment Projekt konzentriert sich auf folgende drei Ziele. Umweltdaten aus den Disziplinen der Erdbeobachtung, Luftchemie und Hydrologie, welche von den teilnehmenden Fakultäten zur weiterführenden Forschung von Interesse sind, werden zunächst auf ihre Vereinbarkeit mit den FAIR-Prinzipien überprüft und ihre Metadaten aneinander angepasst. Die angepassten Umweltdaten werden in ein an der TU Wien vorhandenes Repositorium eingepflegt. Dies geschieht zunächst manuell. Automatisierte, skalierbare Prozesse zum Upload der verschiedenen Umweltdaten werden entwickelt mit dem Ziel der späteren Einbindung noch zu prozessierender oder alternativer Daten. Ein auf die eingepflegten Umweltdaten angepasster Zugang für Nutzer der TU Wien Mit diesen Zielen soll die erstmalige Verwendung eines existierenden Repositoriums an der TU Wien, optimiert für Geodaten und für fächerübergreifende Fragestellungen aufgezeigt werden.

IDSDL

FFG: InnovationslehrgangInnovation Course Program Data Science und Deep Learning

(2017-2020) The potential of Big Data analytics in companies is very high. Companies that implement data-driven innovation have 5%-10% higher growth in productivity than those that do not. The application of Big Data analytics in the EU between 2014 and 2020 will increase GNP (GDP) by 1.9%. Due to the high demand, IDC estimates that 770,000 Data Scientist jobs will be unfilled in the EU in 2020. The innovation course "Data Science and Deep Learning" strengthens the knowledge of the latest approaches in the field of data science in companies and thus enables the implementation of data-driven innovation.

e-Infrastructures Austria Plus

BMBWF: e-Infrastructures Austria Plus

(2017-2019) The project „e-Infrastructures Austria Plus“ is a project from nine Austrian universities funded by the Austrian Federal Ministry of Education, Science and Research. The aim of the project is the coordinated development of an Austrian network for the establishment and further development of common e-Infrastructures by bundling resources and existing knowledge.

Flowsense

OEAD: Flowsense

(2017 - 2019) Recently, important breakthroughs in science or industry were achieved by applying data analytics methods to large datasets. Data analysis process is interactive and iterative and requires cooperation between the data scientists and domain experts. Additionally, implementation of the data analysis processes required complex infrastructure with many technologies and software/hardware dependencies, which can have an impact on the final results of the analysis. The aim of the proposed project is to define semantic framework for the description of the data analysis processes, covering business and data understanding phase, data pre-processing and data modelling, model evaluation and description of the underlying process execution infrastructure. The main goal of the proposed framework is to support reusability and replicability of the data analysis processes and improve communication efficiency between the data scientists and domain experts.

FAIRness for Life Science Data in Austria

FWF: FAIRness for Life Science Data in Austria

(2017 - 2019) Management, integration, and reuse of research data are key for innovation and creation of new knowledge. Although we have numerous data sources such as ChEMBL, PubChem, UniProt, and the Protein Data Bank available in the public domain, most of the data created in publicly funded research projects end up in pdf-based supplementary files of publications. At best they are additionally deposited in University repositories such as PHAIDRA, or on the web-site of the principal investigator. Although in principle public, they are quite hidden and not directly accessible for search machines. In order to push the demand for open data, the so called FAIR principles for data (Findable, Accessible, Interoperable, Reuseable) were introduced. These four foundational principles should guide data producers and publishers to ensure transparency, reproducibility, and reusability of data, methods, algorithms, and workflows. Within this project we will perform a pilot study for the data created in two multi-partner collaborative projects in the life science domain in order to make them available via PHAIDRA, the digital asset management system for long-term archiving at the University of Vienna: SFB35 - Transmembrane Transporter in Health and Disease, and MolTag - Molecular Drug Targets. In particular, we will adapt the metadata scheme in PHAIDRA, in order to render the data at least partly FAIR.

openEO

H2020 Project: A Common, Open Source Interface between Earth Observation Data Infrastructures and Front-End Applications

(2017-2020) openEO is a H2020 project funded under call EO-2-2017: EO Big Data Shift. The capabilities of the latest generation of Earth observation satellites to collect large volumes of diverse and thematically rich data are unprecedented. For exploiting these valuable data sets, many research and industry groups have started to shift their processing into the cloud. Although the functionalities of existing cloud computing solutions largely overlap, they are all custom-made and tailored to the specific data infrastructures. This lack of standards not only makes it hard for end users and application developers to develop generic front-ends, but also to compare the cloud offerings by running the same analysis against different cloud back-ends. To solve this, a common interface that allows end- and intermediate users to query cloud-based back offices and carry out computations on them in a simple way is needed. The openEO project will design such an interface, implement it as an open source community project, bind it to generic analytics front-ends and evaluate it against a set of relevant Earth observation cloud back offices. The openEO interface will consist of three layers of Application Programming Interfaces, namely a core API for finding, accessing, and processing large datasets, a driver APIs to connect to back offices operated by European and worldwide industry, and client APIs for analysing these datasets using R, Python and JavaScript. To demonstrate the capability of the openEO interface, four use cases based chiefly on Sentinel-1 and Sentinel-2 time series will be implemented. openEO will simplify the use of cloud-based processing engines, allow switching between cloud-based back office providers and comparing them, and enable reproducible, open Earth observation science. Thereby, openEO reduces the entry barriers for the adaptation of cloud computing technologies by a broad user community and paves the way for the federation of infrastructure capabilities.

ROMOR

Erasmus+ Capacity Project: Research Output Management through Open Access Institutional Repositories

(2016-2019) ROMOR aims over the course of three years to build capacity on research output management in four leading PS HEIs by establishing Open Access Institutional Repositories (OAIR). The training which is required to establish these repositories, and then their implementation, population and management will be the core of the project. Learning outcomes from these activities will also be shared and disseminated in a variety of ways, including the establishment of mechanisms to assist proactively other institutions in setting up and managing repositories. This Project aims to improve not only the visibility and the management of scientific research, but also to support the advocacy in support of open access to research outputs and to foster scholarly communication and coordination between PS HEIs.

TIMBUS

Digital Preservation for Timeless Business Processes and Services (Integrated Project (FP7))

TIMBUS will endeavour to enlarge the understanding of DP to include the set of activities, processes and tools that ensure continued access to services and software necessary to produce the context within which information can be accessed, properly rendered, validated and transformed into knowledge. One of the fundamental requirements is to preserve the functional and non-functional specifications of services and software, along with their dependencies.

4C

Collaboration to Clarify the Costs of Curation (FP7 project)

(2013-2015) 4C will help organisations across Europe to invest more effectively in digital curation and preservation. Research in digital preservation and curation has tended to emphasise the cost and complexity of the task in hand. 4C reminds us that the point of this investment is to realise a benefit, so our research must encompass related concepts such as risk, value, quality and sustainability. Organisations that understand this will be more able to effectively control and manage their digital assets over time, but they may also be able to create new cost-effective solutions and services for others.