approved
dolly-15k-it

This dataset is obtained by automatically translating the dolly 15k dataset. The dolly-15k dataset is an open-source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioural categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.

Tags
Data and Resources
To access the resources you must log in
  • dolly-15k-itjsonl

    This dataset is obtained by automatically translating the dolly 15k dataset...

    The resource: 'dolly-15k-it' is not accessible as guest user. You must login to access it!
Personal Data Attributes

Description: Personal Data related Information

Field Value
ChildrenData No
Personal Data No
Personal data was manifestly made public by the data subject No
Additional Info
Field Value
Accessibility Both
Accessibility Mode API Access
Accessibility Mode Download
Associate Project FAIR
Availability On-Line
Basic rights Download
Basic rights Distribution
Basic rights Modification
Creation Date 2023-10-02
Creator Basile, Pierpaolo, pierpaolo.basile@uniba.it, orcid.org/0000-0002-0545-1105
Dataset Citation Basile, P., Cassotti, P., Polignano, M., Siciliani, L., & Semeraro, G. (2023). On the Impact of Language Adaptation for Large Language Models: A Case Study for the Italian Language Using Only Open Resources. In CLiC-it.
Dataset Re-Use Safeguards None
Field/Scope of use Any use
Group Others
Language ita, Italian
License term 2024-07-08 /2044-07-08
Manifestation Type Replica
Processing Degree Secondary
Retention Period 2024-07-08 /2044-07-08
SoBigData Node SoBigData EU
SoBigData Node SoBigData IT
Sublicense rights No
Territory of use World Wide
Thematic Cluster Text and Social Media Mining [TSMM]
system:type Dataset
Management Info
Field Value
Author BASILE PIERPAOLO
Maintainer BASILE PIERPAOLO
Version 1
Last Updated 9 July 2024, 09:37 (CEST)
Created 8 July 2024, 18:47 (CEST)