Analysis of IPC classification codes frequency in patents concerning "in situ" remediation technologies

Published: 1 March 2022| Version 1 | DOI: 10.17632/gk24h42jty.1
Contributor:
Riccardo Priore

Description

The patent dataset analysed is based on search criteria aimed at retrieving patent documents dealing with "in situ" remediation technologies. The dataset has been created in the context of the Horizon2020 funded project "Posidon" (https://www.posidonproject.eu/). According to the European Environment Information and Observation Network for soil (EIONET-SOIL), the number of estimated potential soil contaminated sites is more than 2.5 million , of which about 14 % (340 000 sites) are highly likely to be contaminated, and hence in need of remediation measures. In terms of budget, the management of contaminated sites is estimated to cost around 6 billion Euros (€) annually. The aim of the project is to foster the development of innovative technical solutions through pre-commercial procurement selection procedures. The initial elucidation of the prior art, based on an extensive analysis of patent documents is fundamental. As Patlib centre staff members, also enrolled in the "monitoring board" of Posidon, we produce evidence that there is a considerable amount of predivulgation of decontamination technologies applicable for "in situ" reclamation of contaminated soil and/or water emerging from patent documents. Since we are especially interested in identifying the trends of the technologies that score the highest frequency of citation within the patent dataset, we illustrate one way of "unpacking" the patent dataset by identifying recurrent patterns of IPC classification codes. To this purpose, the IPC classification codes characteristic of each patent family of the dataset are analysed by isolating and clustering through subsequent stages the patent documents sharing specific IPC subgroups, main groups and subclasses patterns. During each phase the t-distributed stochastic neighbor embedding (tSNE) algorithm is applied to an array of patent families depending on presence/absence of IPC subgroups or main groups or subclasses, chosen among those most frequent in the dataset. Therefore, following the first round of clustering, those patent documents sharing specific IPC subgroups patterns are isolated and ready for additional investigation. The remaining patent documents undergo the second analytic phase by means of tSNE, therefore those patent documents sharing specific IPC main groups patterns are isolated and ready for additional investigation. The remaining patent documents undergo the final clustering by means of tSNE in order to separate the patent documents depending on specific patterns of IPC subclasses. By means of this procedure about 90% of the initial dataset (1632 simple families - as defined by the European Patent Office) become "unpacked" and clustered. Further assessments based on the patent bibliographic data can be performed, the essential advantages being that the technical content of each cluster is homogeneous and the results of different clusters can be subsequently aggregated, when a specific IPC code is common to such clusters.

Files

Steps to reproduce

The procedure aimed at retrieving the relevant patent documents dealing with "IN SITU" reclamation techniques is extensively explained in the Methodology section. The reproducibility implicates the availability of patent databases such as Patstat online (managed by the European Patent Office - https://www.epo.org/searching-for-patents/business/patstat.html) and Orbit Intelligence (provided by Questel: https://www.questel.com/orbit-software-suite/orbit-intelligence/). Further elaboration with the tSNE algorithm is based on elaboration of the data by means of the Python programming language. Scripts that allow for clustering of the patent families can be retrieved from an earlier publication (Python elaboration of Patstat and Orbit data, Published: 11 March 2021|Version 1|DOI:10.17632/gfnhp8r52y.1). Upon following the instructions written in the Methodology section and using the datasets' information included in the Data files' section, the reproducibility of our results is straightforward. The main conclusions of the whole analysis follow: 1. Although in principle no distinction has been made between patent documents referring to «IN SITU» remediation of contaminated soil and remediation of contaminated water, the analysis of 1632 patent families reveals that at most 16% of the patent documents concern technologies dealing with both technical issues (i.e. whenever the codes B09C… coexist with C02F…). 2. Not less than 46% of the patent documents concern technologies aimed at «IN SITU» water remediation exclusively (C02F…). 3. Not less than 36% of the patent documents concern technologies aimed at «IN SITU» soil reclamation exclusively (B09C…). Variations may depend on the fact that, while for our analysis the Autumn 2021 edition of Patstat has been used, it is possible that in subsequent editions of Patstat the same search query could produce slightly different amounts of patent families, which may depend on the fact that additional patent families may be included later, even in case the earliest_filing_year is compatible with the timeframe used for our analysis (ranging between yr. 2000 and 2021).

Institutions

Consorzio per l'AREA di Ricerca Scientifica e Tecnologica di Trieste

Categories

Patent, Business Intelligence, Patent Management, Patent Valuation, Patent Classification, European Patent Office

Licence