Others - SoBigData.eu Catalogue

Dataset

Multi-Task Faces (MTF) dataset

The Multi-Task Faces (MTF) dataset consists of cropped human faces for classification tasks or other research purposes. Each image in the dataset is labelled according to four...

ZIP
The resource: 'MTF_dataset_20230701' is not accessible as guest user. You must login to access it!

Dataset

Spotify Tracks Dataset (full)

The dataset is created exploiting the Spotify API and the tracks id provided by the authors of https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset.... The...

The resource: 'std_full' is not accessible as guest user. You must login to access it!

Dataset

Spotify track dataset (small)

The dataset is created exploiting the Spotify API and the tracks id provided by the authors of https://www.kaggle.com/datasets/maharshipandya/-spotify-tracks-dataset.... The...

ZIP
The resource: 'std_small' is not accessible as guest user. You must login to access it!

Dataset

SWH Filenames

A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...

ZIP
The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!

Access required...

×

Dataset

Private Smart Cities Weather and Pollution conditions

A set of weather and climatic conditions gathered during the Toolsmart PoN project ( Open Community PA 2020 – Pon Governance 2014-2020). Data are obtained from IoT based...

Dataset

Santorini Tweets July-August 2021

This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...

ZIP
The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!

Dataset

The Italian Music Dataset

The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...

JSON
The resource: 'Dataset' is not accessible as guest user. You must login to access it!

Dataset

German Academic Web

The dataset contains regular crawls of the websites for German academic institutions.

Dataset

GERDAQ Dataset

This is a benchmark dataset of annotated search-engine queries. Mentions of entities in search-engine queries are tagged with the entity they refer to. Wikipedia is used as...

XML
The resource: 'GERDAQ dataset' is not accessible as guest user. You must login to access it!

Dataset

MSN Search query log

The data consists of an MSN Search query log excerpt with 15 million queries, from US users, sampled over one month of activity. Data attributes made available per query: 1)...

Dataset

Wikipedia Word Embeddings

Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...

The resource: 'Embeddings' is not accessible as guest user. You must login to access it!

Dataset

CoPhIR

The CoPhIR (Content-based Photo Image Retrieval) Test-Collection has been developed to make significant tests on the scalability of the SAPIR project infrastructure (SAPIR:...

The resource: 'cophir.isti.cnr.it' is not accessible as guest user. You must login to access it!

Dataset

Product Reviews for Ordinal Quantification

This data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. It appears in our research paper "Ordinal Quantification...

The resource: 'Zenodo link' is not accessible as guest user. You must login to access it!

13 items found

Access required...

Private Smart Cities Weather and Pollution conditions