PatentSemTech 2024

5th Workshop on Patent Text Mining and Semantic Technologies

PatentSemTech aims to establish a long-term collaboration and a two-way communication channel between the IP industry and academia from relevant fields such as natural-language processing (NLP), text and data mining (TDM) and semantic technologies (ST) in order to explore and transfer new knowledge, methods and technologies for the benefit of industrial applications as well as support research in applied sciences for the IP and neighbouring domains.

PatentSemTech'24 workshop will be held as a full-day onsite event in conjunction with SIGIR 2024 .

Important Dates:

Time zone: Anywhere on Earth (AoE)

Submission deadline April 29, 2024 (updated)
Acceptance notification May 29, 2024 (updated)
SIGIR PatentSemTech2024 workshop July 18, 2024

Matthew Wahlrab: Keynote at PatentSemTech 2024

Unlocking Strategic Growth: The Role of AI Technology in Intellectual Property

Matthew Wahlrab’s keynote address will delve into the transformative role of AI in the IP industry. He will explore how AI can align IP strategy with market trends and emerging technologies, unlocking the full potential of IP portfolios. In his keynote he will explore the importance of crafting compelling IP narratives for investors, using AI-driven data insights to validate market opportunities and showcase growth potential. Additionally, he will discuss navigating the IP landscape for strategic growth, identifying key trends and opportunities across industries, and leveraging AI tools for informed decision-making and competitive analysis.


Matthew Wahlrab is widely recognized as a pioneering figure in Intellectual Property, business strategy, and innovation, boasting a distinguished career spanning over two decades. As Founder and CEO of RapidAlpha, he specializes in maximizing organizational value through strategic management and commercialization of intangible assets. With exceptional foresight Matthew has managed commercialization efforts of patent portfolios as large as 5,600 assets.
Matthew earned his Master’s in Business Administration with a focus on Finance from Keller Graduate School of Management, complementing his Bachelor of Science in Electrical and Electronics Engineering from DeVry University West Hills. His academic journey includes coursework in Biochemistry and Molecular Biology at UC Santa Barbara, providing him with a comprehensive understanding of IP across diverse sectors.
Globally recognized as a top 300 thought leader in IP by IAM since 2017, Matthew is known for authoring influential industry guides such as Keiretsu Capital’s Exit Strategy Handbook and the 2021 ICC Intellectual Property Roadmap update. His work sets benchmarks for IP strategy and innovation management. Additionally, Matthew chairs the I3PM Risk Management Committee, where he champions excellence in global IP management practices.

Challenges of using IP data for IR

From the definition of a search task perspective, users of patent information systems are highly specialised information professionals, who cooperate with research and/or legal departments in their institutions / companies. The search in this area is generally business critical. There are high requirements on the correctness and completeness of the data to search through, on the efficiency of the search interface, and on the trustworthiness of the provider, on the quality of the search results. For general language documents (like news articles, or Wikipedia articles) there is a variety of tools and methods to process and prepare them for a specific task. It is a most challenging undertaking to adapt or re-design such tools to address the requirements of working with patent and legal documents.

Patent Data Traits

Patent are a type of scientific text which is complex and difficult to analyse compared to the common language. Without being complete, some reasons are:

  • Patents, as a corpus and as a single document, are both very heterogeneous. A patent corpus covers very diverse scientific subject areas, such as chemistry, pharmacology, mining, and all areas of engineering, with the consequence that all kinds of terminology can be found in a patent corpus.
  • A patent corpus usually covers a long time span, often from the 1950s to the present.
  • Typographical errors are not uncommon, since many patents in their machine-readable form are derived from OCR-processing and machine-translation.
  • Patents are composed of detailed descriptions of the invention and the claims. As a result patents are on the average two up to five times longer than scientific articles.
  • Patents usually characterized by the use of the legal language.

Why work with Patent Data?

Working with patent data, besides its challenging aspects, does bring a richness of facets to be exploited with text-mining and semantic methods:

  • It consitutes a huge corpus of scientific-technical documents for a variety of technological domains.
  • They are rich in available meta-data such as spatial data, bibliographic data, classifications, temporal data, etc.
  • Patents describe essential scientific-technical knowledge enclosing solutions for real-world applications.
  • They are complementary knowledge to scientific literature, e.g. chemical and physical properties, bio-science knowledge for drug-target-interaction, which appears first in patents, mostly not published elsewhere.