MARBLE🏔️ Music Audio Representation Benchmark for universaL Evaluation


For those who are already familiar with MARBLE, here is the quick link to the music understanding model leaderboard.

What is MARBLE🏔️?

Music Audio Representation Benchmark for universaL Evaluation (MARBLE 🏔️) is a benchmark proposed to help the academic & industrial to study, compare, and select pre-trained models according comprehensive evaluation.

We aims to provide a benchmark for a wide range of music information retrieval (MIR) tasks by defining a comprehensive four-level taxonomy.

Welcome to submit your results to MARBLE!

Supported Datasets

We collect and organise the MARBLE on 14 (a growing number) tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.

The MARBLE tasks are categorised into four levels of hierarchy, including

The full details of taxonomy and datasets are listed in can be referred to here and the MARBLE paper.

Submission Guideline

Based on the task set, we establish a unified protocol and provide a corresponding evaluation suite.

There are three tracks for the submission models.

The process are put in the submit page.


The benchmark is initialised by Ruibin Yuan, Yinghao Ma, Yizhi Li, and Ge Zhang,
and is supervised by Chenghua Lin, Emmanouil Benetos, Jie Fu, and Roger Dannenberg.

The full list of the organisation is in the MARBLE about page.