approved
Italian Common Procurement Vocabulary (CPV)

This dataset contains 5M pairs of Italian tender descriptions and the corresponding Common Procurement Vocabulary (CPV) code. The data are downloaded from the ANAC website https://dati.anticorruzione.it/opendata and split into training (3.2M), developing (800K) and testing (1M). The original dataset is in CSV format, while the three subsets are in JSON format, suitable for fine-tuning encoder-decoder models as T5.

Tags
Data and Resources
To access the resources you must log in
  • 10007545ZIP

    This dataset contains 5M pairs of Italian tender descriptions and the...

    The resource: '10007545' is not accessible as guest user. You must login to access it!
Personal Data Attributes

Description: Personal Data related Information

Field Value
ChildrenData No
Personal Data No
Personal data was manifestly made public by the data subject No
Additional Info
Field Value
Accessibility Both
Accessibility Mode OnLine Access
Accessibility Mode Download
Associate Project FAIR
Availability On-Line
Basic rights Download
Creation Date 2023-10-16
Creator Basile, Pierpaolo, pierpaolo.basile@uniba.it, orcid.org/0000-0002-0545-1105
Dataset Citation Siciliani, L., Tanzi, E., Basile, P., & Lops, P. (2023). Automatic Generation of Common Procurement Vocabulary Codes. In CLiC-it.
Dataset Re-Use Safeguards None
Field/Scope of use Non-commercial research only
Group Demography, Economy and Finance 2.0
Language ita, Italian
License term 2024-07-08 /2044-07-08
Manifestation Type Replica
Processing Degree Secondary
Retention Period 2024-07-08 /2044-07-08
SoBigData Node SoBigData EU
SoBigData Node SoBigData IT
Sublicense rights No
Territory of use World Wide
Thematic Cluster Social Data [SD]
Thematic Cluster Text and Social Media Mining [TSMM]
system:type Dataset
Management Info
Field Value
Author BASILE PIERPAOLO
Maintainer BASILE PIERPAOLO
Version 1
Last Updated 9 July 2024, 09:37 (CEST)
Created 8 July 2024, 19:07 (CEST)