Others - SoBigData.eu Catalogue

Dataset

SWH Filenames

A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...

ZIP
The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!

Access required...

×

Dataset

Private Smart Cities Weather and Pollution conditions

A set of weather and climatic conditions gathered during the Toolsmart PoN project ( Open Community PA 2020 – Pon Governance 2014-2020). Data are obtained from IoT based...

Dataset

FANCY Dataset

(NLI) FANCY (FActivity, Negation, Common-sense, hYpernimy) is a new dataset with 4000 sentence pairs concerning complex linguistic phenomena such as factivity, negation,...

The resource: 'FANCY Dataset' is not accessible as guest user. You must login to access it!

Dataset

Santorini Tweets July-August 2021

This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...

ZIP
The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!

Dataset

Wikipedia Word Embeddings

Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...

The resource: 'Embeddings' is not accessible as guest user. You must login to access it!

Dataset

Conversational search dataset with labels

CAsT 2019 data is split into two files one for training and the other one for testing. - Training set: CAsT 2019 conversations from training set and from test set without...

The resource: 'Conversational dataset ...' is not accessible as guest user. You must login to access it!

Dataset

Cross-Lingual Dataset of Crisis-Related Social Media

If you use this dataset, please cite the following paper: Fedor Vitiugin, Carlos Castillo: Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive...

The resource: 'Cross-Lingual Dataset of ...' is not accessible as guest user. You must login to access it!

Dataset

Dataset for Evaluating Abstractive Summaries of Crisis-Related Social Media

The dataset created for evaluation of summaries generated from social media posted during five natural disasters. The dataset contains: ground truth reports created by human...

The resource: 'Dataset for Evaluating ...' is not accessible as guest user. You must login to access it!

8 items found

SWH Filenames

Access required...

Private Smart Cities Weather and Pollution conditions

FANCY Dataset

Santorini Tweets July-August 2021

Wikipedia Word Embeddings

Conversational search dataset with labels

Cross-Lingual Dataset of Crisis-Related Social Media

Dataset for Evaluating Abstractive Summaries of Crisis-Related Social Media