Skip to the content.

giantsteps_tempo

This is the tempo_eval report for the ‘giantsteps_tempo’ corpus.

Reports for other corpora may be found here.

Table of Contents

References for ‘giantsteps_tempo’

References

1.0

Attribute Value
Corpus GiantSteps Tempo
Version 1.0
Curator Peter Knees
Validation http://www.cp.jku.at/datasets/giantsteps/
Data Source P. Knees, Á. Faraldo, P. Herrera, R. Vogl, S. Böck, F. Hörschläger, and M. Le Goff.: ‘Two Data Sets for Tempo Estimation and Key Detection in Electronic Dance Music Annotated from User Corrections.’ In Proc. of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain, 2015.
Annotation Tools Beatport Forum

2.0

Attribute Value
Corpus GiantSteps Tempo
Version 2.0
Curator Hendrik Schreiber
Data Source crowdsource
Annotation Tools crowdsourced, web-based experiment
Annotator, bibtex Schreiber2018b
Annotator, ref_url http://www.tagtraum.com/tempo_estimation.html

Basic Statistics

Reference Size Min Max Avg Stdev Sweet Oct. Start Sweet Oct. Coverage
1.0 664 53.00 200.00 136.66 28.35 91.00 0.88
2.0 661 64.00 197.00 138.58 25.91 90.00 0.92

Table 1: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 1: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Tag Distribution for ‘tag_open’

Figure 2: Percentage of tracks tagged with tags from namespace ‘tag_open’. Annotations are from reference 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimates for ‘giantsteps_tempo’

Estimators

boeck2015/tempodetector2016_default

Attribute Value
Corpus giantsteps_tempo
Version 0.17.dev0
Annotation Tools TempoDetector.2016, madmom, https://github.com/CPJKU/madmom
Annotator, bibtex Boeck2015

boeck2019/multi_task

Attribute Value
Corpus giantsteps_tempo
Version 0.0.1
Annotation Tools model=multi_task, https://github.com/superbock/ISMIR2019
Annotator, bibtex Boeck2019

boeck2019/multi_task_hjdb

Attribute Value
Corpus giantsteps_tempo
Version 0.0.1
Annotation Tools model=multi_task_hjdb, https://github.com/superbock/ISMIR2019
Annotator, bibtex Boeck2019

boeck2020/dar

Attribute Value
Corpus giantsteps_tempo
Version 0.0.1
Annotation Tools https://github.com/superbock/ISMIR2020
Annotator, bibtex Boeck2020

davies2009/mirex_qm_tempotracker

Attribute Value  
Corpus giantsteps_tempo  
Version 1.0  
Annotation Tools QM Tempotracker, Sonic Annotator plugin. https://code.soundsoftware.ac.uk/projects/mirex2013/repository/show/audio_tempo_estimation/qm-tempotracker Note that the current macOS build of ‘qm-vamp-plugins’ was used.  
Annotator, bibtex Davies2009 Davies2007

percival2014/stem

Attribute Value
Corpus giantsteps_tempo
Version 1.0
Annotation Tools percival 2014, ‘tempo’ implementation from Marsyas, http://marsyas.info, git checkout tempo-stem
Annotator, bibtex Percival2014

schreiber2014/default

Attribute Value
Corpus giantsteps_tempo
Version 0.0.1
Annotation Tools schreiber 2014, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2014

schreiber2017/ismir2017

Attribute Value
Corpus giantsteps_tempo
Version 0.0.4
Annotation Tools schreiber 2017, model=ismir2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2017

schreiber2017/mirex2017

Attribute Value
Corpus giantsteps_tempo
Version 0.0.4
Annotation Tools schreiber 2017, model=mirex2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2017

schreiber2018/cnn

Attribute Value
Corpus giantsteps_tempo
Version 0.0.2
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=cnn), https://github.com/hendriks73/tempo-cnn

schreiber2018/fcn

Attribute Value
Corpus giantsteps_tempo
Version 0.0.2
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=fcn), https://github.com/hendriks73/tempo-cnn

schreiber2018/ismir2018

Attribute Value
Corpus giantsteps_tempo
Version 0.0.2
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=ismir2018), https://github.com/hendriks73/tempo-cnn

sun2021/default

Attribute Value
Corpus giantsteps_tempo
Version 0.0.2
Data Source Xiaoheng Sun, Qiqi He, Yongwei Gao, Wei Li. Musical Tempo Estimation Using a Multi-scale Network. in Proc. of the 22nd Int. Society for Music Information Retrieval Conf., Online, 2021
Annotation Tools https://github.com/Qqi-HE/TempoEstimation_MGANet
Annotator, bibtex Sun2021

Basic Statistics

Estimator Size Min Max Avg Stdev Sweet Oct. Start Sweet Oct. Coverage
boeck2015/tempodetector2016_default 664 41.10 214.29 112.49 37.34 74.00 0.70
boeck2019/multi_task 664 57.49 201.91 118.32 26.41 76.00 0.92
boeck2019/multi_task_hjdb 664 58.62 201.92 124.61 27.13 83.00 0.89
boeck2020/dar 664 47.65 201.50 134.27 28.64 91.00 0.86
davies2009/mirex_qm_tempotracker 664 63.80 191.41 115.46 27.09 84.00 0.89
percival2014/stem 664 54.98 160.25 106.59 27.22 75.00 0.85
schreiber2014/default 664 63.03 171.91 112.05 27.01 78.00 0.87
schreiber2017/ismir2017 664 63.50 176.01 118.57 27.31 80.00 0.88
schreiber2017/mirex2017 664 69.00 190.05 132.45 26.88 88.00 0.86
schreiber2018/cnn 664 67.00 216.00 135.25 27.24 90.00 0.88
schreiber2018/fcn 664 63.00 201.00 138.38 26.85 90.00 0.90
schreiber2018/ismir2018 664 69.00 186.00 129.66 26.96 89.00 0.84
sun2021/default 664 58.00 211.00 139.16 27.57 94.00 0.90

Table 2: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 3: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy

Accuracy1 is defined as the percentage of correct estimates, allowing a 4% tolerance for individual BPM values.

Accuracy2 additionally permits estimates to be wrong by a factor of 2, 3, 1/2 or 1/3 (so-called octave errors).

See [Gouyon2006].

Note: When comparing accuracy values for different algorithms, keep in mind that an algorithm may have been trained on the test set or that the test set may have even been created using one of the tested algorithms.

Accuracy Results for 1.0

Estimator Accuracy1 Accuracy2
sun2021/default 0.8042 0.9096
schreiber2018/fcn 0.7982 0.9066
schreiber2017/mirex2017 0.7696 0.8916
boeck2020/dar 0.7651 0.8916
schreiber2018/cnn 0.7636 0.8931
schreiber2018/ismir2018 0.7349 0.8946
boeck2019/multi_task_hjdb 0.6913 0.8931
boeck2019/multi_task 0.6431 0.8886
schreiber2017/ismir2017 0.6235 0.8870
boeck2015/tempodetector2016_default 0.5798 0.8630
schreiber2014/default 0.5663 0.8690
percival2014/stem 0.5151 0.8855
davies2009/mirex_qm_tempotracker 0.2666 0.4639

Table 3: Mean accuracy of estimates compared to version 1.0 with 4% tolerance ordered by Accuracy1.

CSV JSON LATEX PICKLE

Raw data Accuracy1: CSV JSON LATEX PICKLE

Raw data Accuracy2: CSV JSON LATEX PICKLE

Accuracy1 for 1.0

Figure 4: Mean Accuracy1 for estimates compared to version 1.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for 1.0

Figure 5: Mean Accuracy2 for estimates compared to version 1.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy Results for 2.0

Estimator Accuracy1 Accuracy2
schreiber2018/fcn 0.8986 0.9803
sun2021/default 0.8911 0.9849
boeck2020/dar 0.8744 0.9697
schreiber2018/cnn 0.8638 0.9758
schreiber2017/mirex2017 0.8260 0.9607
schreiber2018/ismir2018 0.8245 0.9758
boeck2019/multi_task_hjdb 0.7670 0.9622
boeck2019/multi_task 0.7005 0.9622
schreiber2017/ismir2017 0.6899 0.9561
boeck2015/tempodetector2016_default 0.6384 0.9410
schreiber2014/default 0.6051 0.9319
percival2014/stem 0.5703 0.9607
davies2009/mirex_qm_tempotracker 0.3540 0.5325

Table 4: Mean accuracy of estimates compared to version 2.0 with 4% tolerance ordered by Accuracy1.

CSV JSON LATEX PICKLE

Raw data Accuracy1: CSV JSON LATEX PICKLE

Raw data Accuracy2: CSV JSON LATEX PICKLE

Accuracy1 for 2.0

Figure 6: Mean Accuracy1 for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for 2.0

Figure 7: Mean Accuracy2 for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Differing Items

For which items did a given estimator not estimate a correct value with respect to a given ground truth? Are there items which are either very difficult, not suitable for the task, or incorrectly annotated and therefore never estimated correctly, regardless which estimator is used?

Differing Items Accuracy1

Items with different tempo annotations (Accuracy1, 4% tolerance) in different versions:

1.0 compared with boeck2015/tempodetector2016_default (279 differences): ‘1030011.LOFI’ ‘1068430.LOFI’ ‘1092771.LOFI’ ‘1177875.LOFI’ ‘1198571.LOFI’ ‘1234668.LOFI’ ‘1240669.LOFI’ ‘1240672.LOFI’ ‘1317507.LOFI’ ‘1327052.LOFI’ ‘1329955.LOFI’ … CSV

1.0 compared with boeck2019/multi_task (237 differences): ‘1068430.LOFI’ ‘1120171.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1240669.LOFI’ ‘1240672.LOFI’ ‘1317507.LOFI’ ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ … CSV

1.0 compared with boeck2019/multi_task_hjdb (205 differences): ‘1068430.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1240672.LOFI’ ‘1317507.LOFI’ ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1461087.LOFI’ ‘1479462.LOFI’ … CSV

1.0 compared with boeck2020/dar (156 differences): ‘1240672.LOFI’ ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1461087.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ … CSV

1.0 compared with davies2009/mirex_qm_tempotracker (487 differences): ‘1030011.LOFI’ ‘1068430.LOFI’ ‘1084996.LOFI’ ‘1092771.LOFI’ ‘1114156.LOFI’ ‘1118326.LOFI’ ‘1120171.LOFI’ ‘1171800.LOFI’ ‘1174239.LOFI’ ‘1177875.LOFI’ ‘1183908.LOFI’ … CSV

1.0 compared with percival2014/stem (322 differences): ‘1030011.LOFI’ ‘1068430.LOFI’ ‘1092771.LOFI’ ‘1118326.LOFI’ ‘1120171.LOFI’ ‘1174239.LOFI’ ‘1177875.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1240669.LOFI’ ‘1240672.LOFI’ … CSV

1.0 compared with schreiber2014/default (288 differences): ‘1068430.LOFI’ ‘1118326.LOFI’ ‘1120171.LOFI’ ‘1174239.LOFI’ ‘1177875.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1240669.LOFI’ ‘1240672.LOFI’ ‘1317507.LOFI’ ‘1327052.LOFI’ … CSV

1.0 compared with schreiber2017/ismir2017 (250 differences): ‘1068430.LOFI’ ‘1120171.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1240669.LOFI’ ‘1240672.LOFI’ ‘1317507.LOFI’ ‘1327052.LOFI’ ‘1380256.LOFI’ ‘1461087.LOFI’ ‘1479462.LOFI’ … CSV

1.0 compared with schreiber2017/mirex2017 (153 differences): ‘1234668.LOFI’ ‘1240669.LOFI’ ‘1240672.LOFI’ ‘1317507.LOFI’ ‘1327052.LOFI’ ‘1380256.LOFI’ ‘1479462.LOFI’ ‘172384.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ … CSV

1.0 compared with schreiber2018/cnn (157 differences): ‘1114156.LOFI’ ‘1240669.LOFI’ ‘1240672.LOFI’ ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1461087.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ … CSV

1.0 compared with schreiber2018/fcn (134 differences): ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1461087.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1765409.LOFI’ ‘1885798.LOFI’ ‘1905592.LOFI’ ‘1950701.LOFI’ … CSV

1.0 compared with schreiber2018/ismir2018 (176 differences): ‘1068430.LOFI’ ‘1327052.LOFI’ ‘1461087.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1765409.LOFI’ ‘1817444.LOFI’ ‘1855660.LOFI’ ‘1905591.LOFI’ … CSV

1.0 compared with sun2021/default (130 differences): ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1418652.LOFI’ ‘1461087.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1765409.LOFI’ ‘1839656.LOFI’ ‘1874244.LOFI’ … CSV

2.0 compared with boeck2015/tempodetector2016_default (239 differences): ‘1030011.LOFI’ ‘1068430.LOFI’ ‘1092771.LOFI’ ‘1177875.LOFI’ ‘1198571.LOFI’ ‘1234668.LOFI’ ‘1240669.LOFI’ ‘1317507.LOFI’ ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ … CSV

2.0 compared with boeck2019/multi_task (198 differences): ‘1068430.LOFI’ ‘1120171.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1240669.LOFI’ ‘1317507.LOFI’ ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1424458.LOFI’ ‘1569136.LOFI’ … CSV

2.0 compared with boeck2019/multi_task_hjdb (154 differences): ‘1068430.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1317507.LOFI’ ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1728723.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ ‘1765409.LOFI’ … CSV

2.0 compared with boeck2020/dar (83 differences): ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1698047.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ ‘1874244.LOFI’ ‘2422602.LOFI’ ‘2734649.LOFI’ ‘28952.LOFI’ ‘3069960.LOFI’ … CSV

2.0 compared with davies2009/mirex_qm_tempotracker (427 differences): ‘1030011.LOFI’ ‘1068430.LOFI’ ‘1084996.LOFI’ ‘1092771.LOFI’ ‘1114156.LOFI’ ‘1118326.LOFI’ ‘1120171.LOFI’ ‘1171800.LOFI’ ‘1174239.LOFI’ ‘1177875.LOFI’ ‘1183908.LOFI’ … CSV

2.0 compared with percival2014/stem (284 differences): ‘1030011.LOFI’ ‘1068430.LOFI’ ‘1092771.LOFI’ ‘1118326.LOFI’ ‘1120171.LOFI’ ‘1174239.LOFI’ ‘1177875.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1240669.LOFI’ ‘1317507.LOFI’ … CSV

2.0 compared with schreiber2014/default (261 differences): ‘1068430.LOFI’ ‘1118326.LOFI’ ‘1120171.LOFI’ ‘1174239.LOFI’ ‘1177875.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1240669.LOFI’ ‘1317507.LOFI’ ‘1329955.LOFI’ ‘1380256.LOFI’ … CSV

2.0 compared with schreiber2017/ismir2017 (205 differences): ‘1068430.LOFI’ ‘1120171.LOFI’ ‘1234668.LOFI’ ‘1234669.LOFI’ ‘1240669.LOFI’ ‘1317507.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1623443.LOFI’ ‘1676961.LOFI’ ‘1728723.LOFI’ … CSV

2.0 compared with schreiber2017/mirex2017 (115 differences): ‘1234668.LOFI’ ‘1240669.LOFI’ ‘1317507.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1461087.LOFI’ ‘1698047.LOFI’ ‘172384.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ ‘1855660.LOFI’ … CSV

2.0 compared with schreiber2018/cnn (90 differences): ‘1114156.LOFI’ ‘1240669.LOFI’ ‘1329955.LOFI’ ‘1418652.LOFI’ ‘1698047.LOFI’ ‘1743969.LOFI’ ‘1874244.LOFI’ ‘1905591.LOFI’ ‘2083969.LOFI’ ‘2757093.LOFI’ ‘3023605.LOFI’ … CSV

2.0 compared with schreiber2018/fcn (67 differences): ‘1240672.LOFI’ ‘1329955.LOFI’ ‘1418652.LOFI’ ‘1698047.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ ‘1885798.LOFI’ ‘2083969.LOFI’ ‘2741734.LOFI’ ‘3013673.LOFI’ ‘3023605.LOFI’ … CSV

2.0 compared with schreiber2018/ismir2018 (116 differences): ‘1240672.LOFI’ ‘1418652.LOFI’ ‘1698047.LOFI’ ‘1747518.LOFI’ ‘1817444.LOFI’ ‘1855660.LOFI’ ‘1905591.LOFI’ ‘1929611.LOFI’ ‘2706792.LOFI’ ‘2734649.LOFI’ ‘2734862.LOFI’ … CSV

2.0 compared with sun2021/default (72 differences): ‘1240672.LOFI’ ‘1329955.LOFI’ ‘1418652.LOFI’ ‘1698047.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ ‘1839656.LOFI’ ‘1874244.LOFI’ ‘1950701.LOFI’ ‘210560.LOFI’ ‘3013772.LOFI’ … CSV

None of the estimators estimated the following 5 items ‘correctly’ using Accuracy1: ‘3630279.LOFI’ ‘3980001.LOFI’ ‘5089294.LOFI’ ‘5137154.LOFI’ ‘942357.LOFI’ CSV

Differing Items Accuracy2

Items with different tempo annotations (Accuracy2, 4% tolerance) in different versions:

1.0 compared with boeck2015/tempodetector2016_default (91 differences): ‘1240669.LOFI’ ‘1327052.LOFI’ ‘1479462.LOFI’ ‘1698047.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ ‘1905592.LOFI’ ‘1950701.LOFI’ ‘2726350.LOFI’ … CSV

1.0 compared with boeck2019/multi_task (74 differences): ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1424458.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ ‘1905592.LOFI’ ‘2422602.LOFI’ … CSV

1.0 compared with boeck2019/multi_task_hjdb (71 differences): ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ ‘1905592.LOFI’ ‘2039114.LOFI’ ‘2422602.LOFI’ … CSV

1.0 compared with boeck2020/dar (72 differences): ‘1327052.LOFI’ ‘1380256.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1747518.LOFI’ ‘1905592.LOFI’ ‘2422602.LOFI’ ‘2726350.LOFI’ ‘2726353.LOFI’ … CSV

1.0 compared with davies2009/mirex_qm_tempotracker (356 differences): ‘1030011.LOFI’ ‘1068430.LOFI’ ‘1084996.LOFI’ ‘1092771.LOFI’ ‘1114156.LOFI’ ‘1118326.LOFI’ ‘1120171.LOFI’ ‘1171800.LOFI’ ‘1174239.LOFI’ ‘1177875.LOFI’ ‘1183908.LOFI’ … CSV

1.0 compared with percival2014/stem (76 differences): ‘1030011.LOFI’ ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1424458.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1874244.LOFI’ ‘1905592.LOFI’ ‘2432724.LOFI’ … CSV

1.0 compared with schreiber2014/default (87 differences): ‘1327052.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1479462.LOFI’ ‘1676961.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1905592.LOFI’ ‘1974485.LOFI’ ‘2676506.LOFI’ ‘2726350.LOFI’ … CSV

1.0 compared with schreiber2017/ismir2017 (75 differences): ‘1327052.LOFI’ ‘1380256.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1905592.LOFI’ ‘1974485.LOFI’ ‘2422602.LOFI’ ‘2726350.LOFI’ ‘2726353.LOFI’ … CSV

1.0 compared with schreiber2017/mirex2017 (72 differences): ‘1327052.LOFI’ ‘1380256.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1905592.LOFI’ ‘1974485.LOFI’ ‘2422602.LOFI’ ‘2726350.LOFI’ ‘2726353.LOFI’ … CSV

1.0 compared with schreiber2018/cnn (71 differences): ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1874244.LOFI’ ‘1905592.LOFI’ ‘2726350.LOFI’ ‘2726353.LOFI’ ‘2726355.LOFI’ … CSV

1.0 compared with schreiber2018/fcn (62 differences): ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1905592.LOFI’ ‘2726350.LOFI’ ‘2726353.LOFI’ ‘2726355.LOFI’ ‘2745205.LOFI’ … CSV

1.0 compared with schreiber2018/ismir2018 (70 differences): ‘1068430.LOFI’ ‘1327052.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1905591.LOFI’ ‘1905592.LOFI’ ‘2726350.LOFI’ ‘2726353.LOFI’ ‘2726355.LOFI’ … CSV

1.0 compared with sun2021/default (60 differences): ‘1327052.LOFI’ ‘1329955.LOFI’ ‘1418652.LOFI’ ‘1479462.LOFI’ ‘1728723.LOFI’ ‘1735621.LOFI’ ‘1743969.LOFI’ ‘1905592.LOFI’ ‘2726350.LOFI’ ‘2726353.LOFI’ ‘2726355.LOFI’ … CSV

2.0 compared with boeck2015/tempodetector2016_default (39 differences): ‘1240669.LOFI’ ‘1418652.LOFI’ ‘1747518.LOFI’ ‘3069960.LOFI’ ‘3312045.LOFI’ ‘3377892.LOFI’ ‘3414605.LOFI’ ‘3480108.LOFI’ ‘3509304.LOFI’ ‘3564559.LOFI’ ‘3630279.LOFI’ … CSV

2.0 compared with boeck2019/multi_task (25 differences): ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1424458.LOFI’ ‘1747518.LOFI’ ‘2422602.LOFI’ ‘3069960.LOFI’ ‘3480108.LOFI’ ‘3630279.LOFI’ ‘3642438.LOFI’ ‘3787878.LOFI’ … CSV

2.0 compared with boeck2019/multi_task_hjdb (25 differences): ‘1329955.LOFI’ ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1747518.LOFI’ ‘2039114.LOFI’ ‘2422602.LOFI’ ‘3069960.LOFI’ ‘3480108.LOFI’ ‘3630279.LOFI’ ‘3642438.LOFI’ ‘3787878.LOFI’ … CSV

2.0 compared with boeck2020/dar (20 differences): ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1747518.LOFI’ ‘2422602.LOFI’ ‘3069960.LOFI’ ‘3480108.LOFI’ ‘3630279.LOFI’ ‘3642438.LOFI’ ‘3801596.LOFI’ ‘4120277.LOFI’ ‘4237913.LOFI’ … CSV

2.0 compared with davies2009/mirex_qm_tempotracker (309 differences): ‘1030011.LOFI’ ‘1068430.LOFI’ ‘1084996.LOFI’ ‘1092771.LOFI’ ‘1114156.LOFI’ ‘1118326.LOFI’ ‘1120171.LOFI’ ‘1171800.LOFI’ ‘1174239.LOFI’ ‘1177875.LOFI’ ‘1183908.LOFI’ … CSV

2.0 compared with percival2014/stem (26 differences): ‘1030011.LOFI’ ‘1329955.LOFI’ ‘1418652.LOFI’ ‘1424458.LOFI’ ‘1874244.LOFI’ ‘2432724.LOFI’ ‘3435022.LOFI’ ‘3480108.LOFI’ ‘3565815.LOFI’ ‘3711752.LOFI’ ‘3787878.LOFI’ … CSV

2.0 compared with schreiber2014/default (45 differences): ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1676961.LOFI’ ‘1728723.LOFI’ ‘1974485.LOFI’ ‘2676506.LOFI’ ‘2759853.LOFI’ ‘3013772.LOFI’ ‘3023605.LOFI’ ‘3069960.LOFI’ ‘3128068.LOFI’ … CSV

2.0 compared with schreiber2017/ismir2017 (29 differences): ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1974485.LOFI’ ‘2422602.LOFI’ ‘3023605.LOFI’ ‘3069960.LOFI’ ‘3480108.LOFI’ ‘3630279.LOFI’ ‘3642438.LOFI’ ‘3692859.LOFI’ ‘3787878.LOFI’ … CSV

2.0 compared with schreiber2017/mirex2017 (26 differences): ‘1380256.LOFI’ ‘1418652.LOFI’ ‘1974485.LOFI’ ‘2422602.LOFI’ ‘3023605.LOFI’ ‘3069960.LOFI’ ‘3480108.LOFI’ ‘3630279.LOFI’ ‘3642438.LOFI’ ‘3787878.LOFI’ ‘4017611.LOFI’ … CSV

2.0 compared with schreiber2018/cnn (16 differences): ‘1329955.LOFI’ ‘1418652.LOFI’ ‘1874244.LOFI’ ‘3023605.LOFI’ ‘3069960.LOFI’ ‘3480108.LOFI’ ‘3630279.LOFI’ ‘3642438.LOFI’ ‘3787878.LOFI’ ‘4044591.LOFI’ ‘4283854.LOFI’ … CSV

2.0 compared with schreiber2018/fcn (13 differences): ‘1329955.LOFI’ ‘1418652.LOFI’ ‘3023605.LOFI’ ‘3480108.LOFI’ ‘3630279.LOFI’ ‘3642438.LOFI’ ‘3787878.LOFI’ ‘4043892.LOFI’ ‘4288893.LOFI’ ‘4332592.LOFI’ ‘5137153.LOFI’ … CSV

2.0 compared with schreiber2018/ismir2018 (16 differences): ‘1418652.LOFI’ ‘1905591.LOFI’ ‘3069960.LOFI’ ‘3189712.LOFI’ ‘3480108.LOFI’ ‘3630279.LOFI’ ‘3642438.LOFI’ ‘3787878.LOFI’ ‘4043892.LOFI’ ‘4044591.LOFI’ ‘4288893.LOFI’ … CSV

2.0 compared with sun2021/default (10 differences): ‘1329955.LOFI’ ‘1418652.LOFI’ ‘3069960.LOFI’ ‘3630279.LOFI’ ‘3787878.LOFI’ ‘4264210.LOFI’ ‘4288893.LOFI’ ‘5137153.LOFI’ ‘5137154.LOFI’ ‘906760.LOFI’ CSV

All tracks were estimated ‘correctly’ by at least one system.

Significance of Differences

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0011 0.0000 0.0000 0.0000 0.0026 0.5938 0.0374 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task 0.0011 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2084 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task_hjdb 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0068 0.0000
boeck2020/dar 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.8453 1.0000 0.0189 0.0624 0.0034
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.0026 0.0000 0.0000 0.0000 0.0000 1.0000 0.0047 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.5938 0.0000 0.0000 0.0000 0.0000 0.0047 1.0000 0.0009 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0374 0.2084 0.0000 0.0000 0.0000 0.0000 0.0009 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0000 0.8453 0.0000 0.0000 0.0000 0.0000 1.0000 0.7728 0.0648 0.0255 0.0368
schreiber2018/cnn 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.7728 1.0000 0.0044 0.0536 0.0048
schreiber2018/fcn 0.0000 0.0000 0.0000 0.0189 0.0000 0.0000 0.0000 0.0000 0.0648 0.0044 1.0000 0.0000 0.6985
schreiber2018/ismir2018 0.0000 0.0000 0.0068 0.0624 0.0000 0.0000 0.0000 0.0000 0.0255 0.0536 0.0000 1.0000 0.0000
sun2021/default 0.0000 0.0000 0.0000 0.0034 0.0000 0.0000 0.0000 0.0000 0.0368 0.0048 0.6985 0.0000 1.0000

Table 5: McNemar p-values, using reference annotations 1.0 as groundtruth with Accuracy1 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0022 0.0000 0.0000 0.0000 0.0022 0.1716 0.0172 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task 0.0022 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5248 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task_hjdb 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0007 0.0000 0.0000 0.0004 0.0000
boeck2020/dar 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0029 0.5102 0.0976 0.0016 0.2418
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.0022 0.0000 0.0000 0.0000 0.0000 1.0000 0.0620 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.1716 0.0000 0.0000 0.0000 0.0000 0.0620 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0172 0.5248 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0007 0.0029 0.0000 0.0000 0.0000 0.0000 1.0000 0.0215 0.0000 1.0000 0.0001
schreiber2018/cnn 0.0000 0.0000 0.0000 0.5102 0.0000 0.0000 0.0000 0.0000 0.0215 1.0000 0.0059 0.0099 0.0700
schreiber2018/fcn 0.0000 0.0000 0.0000 0.0976 0.0000 0.0000 0.0000 0.0000 0.0000 0.0059 1.0000 0.0000 0.6198
schreiber2018/ismir2018 0.0000 0.0000 0.0004 0.0016 0.0000 0.0000 0.0000 0.0000 1.0000 0.0099 0.0000 1.0000 0.0000
sun2021/default 0.0000 0.0000 0.0000 0.2418 0.0000 0.0000 0.0000 0.0000 0.0001 0.0700 0.6198 0.0000 1.0000

Table 6: McNemar p-values, using reference annotations 2.0 as groundtruth with Accuracy1 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0076 0.0012 0.0034 0.0000 0.0444 0.6936 0.0166 0.0043 0.0022 0.0000 0.0015 0.0000
boeck2019/multi_task 0.0076 1.0000 0.5078 0.7744 0.0000 0.8555 0.0789 1.0000 0.8388 0.6636 0.0118 0.5413 0.0005
boeck2019/multi_task_hjdb 0.0012 0.5078 1.0000 1.0000 0.0000 0.4731 0.0293 0.5235 1.0000 1.0000 0.0784 1.0000 0.0127
boeck2020/dar 0.0034 0.7744 1.0000 1.0000 0.0000 0.6076 0.0400 0.6636 1.0000 1.0000 0.0525 0.8388 0.0118
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.0444 0.8555 0.4731 0.6076 0.0000 1.0000 0.1690 1.0000 0.5966 0.4049 0.0043 0.3771 0.0025
schreiber2014/default 0.6936 0.0789 0.0293 0.0400 0.0000 0.1690 1.0000 0.0576 0.0167 0.0195 0.0001 0.0161 0.0000
schreiber2017/ismir2017 0.0166 1.0000 0.5235 0.6636 0.0000 1.0000 0.0576 1.0000 0.2500 0.5235 0.0044 0.3833 0.0015
schreiber2017/mirex2017 0.0043 0.8388 1.0000 1.0000 0.0000 0.5966 0.0167 0.2500 1.0000 1.0000 0.0213 0.8145 0.0075
schreiber2018/cnn 0.0022 0.6636 1.0000 1.0000 0.0000 0.4049 0.0195 0.5235 1.0000 1.0000 0.0117 1.0000 0.0034
schreiber2018/fcn 0.0000 0.0118 0.0784 0.0525 0.0000 0.0043 0.0001 0.0044 0.0213 0.0117 1.0000 0.0386 0.7539
schreiber2018/ismir2018 0.0015 0.5413 1.0000 0.8388 0.0000 0.3771 0.0161 0.3833 0.8145 1.0000 0.0386 1.0000 0.0129
sun2021/default 0.0000 0.0005 0.0127 0.0118 0.0000 0.0025 0.0000 0.0015 0.0075 0.0034 0.7539 0.0129 1.0000

Table 7: McNemar p-values, using reference annotations 1.0 as groundtruth with Accuracy2 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0243 0.0243 0.0013 0.0000 0.0725 0.5044 0.1433 0.0533 0.0003 0.0000 0.0002 0.0000
boeck2019/multi_task 0.0243 1.0000 1.0000 0.2266 0.0000 1.0000 0.0045 0.5235 1.0000 0.0784 0.0075 0.0784 0.0001
boeck2019/multi_task_hjdb 0.0243 1.0000 1.0000 0.2266 0.0000 1.0000 0.0045 0.5235 1.0000 0.0784 0.0075 0.0636 0.0001
boeck2020/dar 0.0013 0.2266 0.2266 1.0000 0.0000 0.3616 0.0002 0.0784 0.2863 0.5034 0.1435 0.4807 0.0213
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.0725 1.0000 1.0000 0.3616 0.0000 1.0000 0.0127 0.7283 1.0000 0.0639 0.0072 0.0987 0.0004
schreiber2014/default 0.5044 0.0045 0.0045 0.0002 0.0000 0.0127 1.0000 0.0052 0.0009 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.1433 0.5235 0.5235 0.0784 0.0000 0.7283 0.0052 1.0000 0.2500 0.0106 0.0001 0.0072 0.0001
schreiber2017/mirex2017 0.0533 1.0000 1.0000 0.2863 0.0000 1.0000 0.0009 0.2500 1.0000 0.0414 0.0010 0.0309 0.0004
schreiber2018/cnn 0.0003 0.0784 0.0784 0.5034 0.0000 0.0639 0.0000 0.0106 0.0414 1.0000 0.5488 1.0000 0.1460
schreiber2018/fcn 0.0000 0.0075 0.0075 0.1435 0.0000 0.0072 0.0000 0.0001 0.0010 0.5488 1.0000 0.5488 0.4531
schreiber2018/ismir2018 0.0002 0.0784 0.0636 0.4807 0.0000 0.0987 0.0000 0.0072 0.0309 1.0000 0.5488 1.0000 0.1460
sun2021/default 0.0000 0.0001 0.0001 0.0213 0.0000 0.0004 0.0000 0.0001 0.0004 0.1460 0.4531 0.1460 1.0000

Table 8: McNemar p-values, using reference annotations 2.0 as groundtruth with Accuracy2 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Accuracy1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy1 on Tempo-Subsets for 1.0

Figure 8: Mean Accuracy1 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 on Tempo-Subsets for 2.0

Figure 9: Mean Accuracy1 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy2 on Tempo-Subsets for 1.0

Figure 10: Mean Accuracy2 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on Tempo-Subsets for 2.0

Figure 11: Mean Accuracy2 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy1 for Tempo

When fitting a generalized additive model (GAM) to Accuracy1-values and a ground truth, what Accuracy1 can we expect with confidence?

Estimated Accuracy1 for Tempo for 1.0

Predictions of GAMs trained on Accuracy1 for estimates for reference 1.0.

Figure 12: Accuracy1 predictions of a generalized additive model (GAM) fit to Accuracy1 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy1 for Tempo for 2.0

Predictions of GAMs trained on Accuracy1 for estimates for reference 2.0.

Figure 13: Accuracy1 predictions of a generalized additive model (GAM) fit to Accuracy1 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy2 for Tempo

When fitting a generalized additive model (GAM) to Accuracy2-values and a ground truth, what Accuracy2 can we expect with confidence?

Estimated Accuracy2 for Tempo for 1.0

Predictions of GAMs trained on Accuracy2 for estimates for reference 1.0.

Figure 14: Accuracy2 predictions of a generalized additive model (GAM) fit to Accuracy2 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy2 for Tempo for 2.0

Predictions of GAMs trained on Accuracy2 for estimates for reference 2.0.

Figure 15: Accuracy2 predictions of a generalized additive model (GAM) fit to Accuracy2 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

Accuracy1 for ‘tag_open’ Tags for 1.0

Figure 16: Mean Accuracy1 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 for ‘tag_open’ Tags for 2.0

Figure 17: Mean Accuracy1 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

Accuracy2 for ‘tag_open’ Tags for 1.0

Figure 18: Mean Accuracy2 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for ‘tag_open’ Tags for 2.0

Figure 19: Mean Accuracy2 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

MIREX-Style Evaluation

P-Score is defined as the average of two tempi weighted by their perceptual strength, allowing an 8% tolerance for both tempo values [MIREX 2006 Definition].

One Correct is the fraction of estimate pairs of which at least one of the two values is equal to a reference value (within an 8% tolerance).

Both Correct is the fraction of estimate pairs of which both values are equal to the reference values (within an 8% tolerance).

See [McKinney2007].

Note: Very few datasets actually have multiple annotations per track along with a salience distributions. References without suitable annotations are not shown.

MIREX Results for 2.0

Estimator P-Score One Correct Both Correct
schreiber2018/cnn 0.9636 0.9955 0.6354
schreiber2018/fcn 0.9572 0.9939 0.6157
schreiber2018/ismir2018 0.9564 0.9924 0.6142
schreiber2017/mirex2017 0.9537 0.9879 0.6384
boeck2019/multi_task_hjdb 0.9466 0.9924 0.6006
schreiber2017/ismir2017 0.9446 0.9758 0.6369
boeck2020/dar 0.9436 0.9834 0.5567
boeck2019/multi_task 0.9268 0.9834 0.5643
schreiber2014/default 0.9119 0.9531 0.6021
sun2021/default 0.8452 0.9879 0.1528
boeck2015/tempodetector2016_default 0.8442 0.9818 0.3132
percival2014/stem 0.5909 0.8956 0.0408
davies2009/mirex_qm_tempotracker 0.4060 0.5507 0.2148

Table 9: Compared to 2.0 with 8.0% tolerance.

CSV JSON LATEX PICKLE

Raw data P-Score: CSV JSON LATEX PICKLE

Raw data One Correct: CSV JSON LATEX PICKLE

Raw data Both Correct: CSV JSON LATEX PICKLE

P-Score for 2.0

Figure 20: Mean P-Score for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

One Correct for 2.0

Figure 21: Mean One Correct for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Both Correct for 2.0

Figure 22: Mean Both Correct for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

P-Score on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean P-Score for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

P-Score on Tempo-Subsets for 1.0

Figure 23: Mean P-Score for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

P-Score on Tempo-Subsets for 2.0

Figure 24: Mean P-Score for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

One Correct on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean One Correct for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

One Correct on Tempo-Subsets for 1.0

Figure 25: Mean One Correct for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

One Correct on Tempo-Subsets for 2.0

Figure 26: Mean One Correct for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Both Correct on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Both Correct for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Both Correct on Tempo-Subsets for 1.0

Figure 27: Mean Both Correct for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Both Correct on Tempo-Subsets for 2.0

Figure 28: Mean Both Correct for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated P-Score for Tempo

When fitting a generalized additive model (GAM) to P-Score-values and a ground truth, what P-Score can we expect with confidence?

Estimated P-Score for Tempo for 1.0

Predictions of GAMs trained on P-Score for estimates for reference 1.0.

Figure 29: P-Score predictions of a generalized additive model (GAM) fit to P-Score results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated P-Score for Tempo for 2.0

Predictions of GAMs trained on P-Score for estimates for reference 2.0.

Figure 30: P-Score predictions of a generalized additive model (GAM) fit to P-Score results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated One Correct for Tempo

When fitting a generalized additive model (GAM) to One Correct-values and a ground truth, what One Correct can we expect with confidence?

Estimated One Correct for Tempo for 1.0

Predictions of GAMs trained on One Correct for estimates for reference 1.0.

Figure 31: One Correct predictions of a generalized additive model (GAM) fit to One Correct results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated One Correct for Tempo for 2.0

Predictions of GAMs trained on One Correct for estimates for reference 2.0.

Figure 32: One Correct predictions of a generalized additive model (GAM) fit to One Correct results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Both Correct for Tempo

When fitting a generalized additive model (GAM) to Both Correct-values and a ground truth, what Both Correct can we expect with confidence?

Estimated Both Correct for Tempo for 1.0

Predictions of GAMs trained on Both Correct for estimates for reference 1.0.

Figure 33: Both Correct predictions of a generalized additive model (GAM) fit to Both Correct results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Both Correct for Tempo for 2.0

Predictions of GAMs trained on Both Correct for estimates for reference 2.0.

Figure 34: Both Correct predictions of a generalized additive model (GAM) fit to Both Correct results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

P-Score for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

P-Score for ‘tag_open’ Tags for 1.0

Figure 35: Mean P-Score of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

P-Score for ‘tag_open’ Tags for 2.0

Figure 36: Mean P-Score of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

One Correct for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

One Correct for ‘tag_open’ Tags for 1.0

Figure 37: Mean One Correct of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

One Correct for ‘tag_open’ Tags for 2.0

Figure 38: Mean One Correct of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Both Correct for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

Both Correct for ‘tag_open’ Tags for 1.0

Figure 39: Mean Both Correct of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Both Correct for ‘tag_open’ Tags for 2.0

Figure 40: Mean Both Correct of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 and OE2

OE1 is defined as octave error between an estimate E and a reference value R.This means that the most common errors—by a factor of 2 or ½—have the same magnitude, namely 1: OE2(E) = log2(E/R).

OE2 is the signed OE1 corresponding to the minimum absolute OE1 allowing the octaveerrors 2, 3, 1/2, and 1/3: OE2(E) = arg minx(|x|) with x ∈ {OE1(E), OE1(2E), OE1(3E), OE1(½E), OE1(⅓E)}

Mean OE1/OE2 Results for 1.0

Estimator OE1_MEAN OE1_STDEV OE2_MEAN OE2_STDEV
sun2021/default 0.0294 0.3648 -0.0089 0.0856
schreiber2018/fcn 0.0231 0.3658 -0.0025 0.0874
schreiber2017/mirex2017 -0.0423 0.3815 -0.0055 0.0939
boeck2020/dar -0.0276 0.3905 -0.0060 0.0932
schreiber2018/cnn -0.0118 0.3910 -0.0052 0.0981
schreiber2018/ismir2018 -0.0744 0.4236 -0.0045 0.0917
boeck2019/multi_task_hjdb -0.1351 0.4633 0.0028 0.0978
boeck2019/multi_task -0.2110 0.4819 0.0017 0.0995
schreiber2017/ismir2017 -0.2106 0.4933 -0.0043 0.0971
schreiber2014/default -0.2953 0.4982 -0.0056 0.1088
percival2014/stem -0.3716 0.5215 0.0130 0.1003
davies2009/mirex_qm_tempotracker -0.2466 0.5271 0.1751 0.2304
boeck2015/tempodetector2016_default -0.3380 0.6087 -0.0163 0.1192

Table 10: Mean OE1/OE2 for estimates compared to version 1.0 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE1: CSV JSON LATEX PICKLE

Raw data OE2: CSV JSON LATEX PICKLE

OE1 distribution for 1.0

Figure 41: OE1 for estimates compared to version 1.0. Shown are the mean OE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 distribution for 1.0

Figure 42: OE2 for estimates compared to version 1.0. Shown are the mean OE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean OE1/OE2 Results for 2.0

Estimator OE1_MEAN OE1_STDEV OE2_MEAN OE2_STDEV
schreiber2018/fcn -0.0020 0.2935 0.0010 0.0465
boeck2020/dar -0.0520 0.3224 -0.0018 0.0588
sun2021/default 0.0035 0.3269 -0.0019 0.0423
schreiber2018/cnn -0.0388 0.3414 0.0005 0.0568
schreiber2017/mirex2017 -0.0676 0.3735 -0.0041 0.0648
schreiber2018/ismir2018 -0.1022 0.3833 -0.0008 0.0541
boeck2019/multi_task_hjdb -0.1596 0.4231 0.0077 0.0659
boeck2019/multi_task -0.2358 0.4676 0.0056 0.0682
schreiber2017/ismir2017 -0.2367 0.4709 -0.0037 0.0691
davies2009/mirex_qm_tempotracker -0.2755 0.4871 0.1805 0.2200
percival2014/stem -0.3978 0.4936 0.0121 0.0723
schreiber2014/default -0.3239 0.4941 -0.0046 0.0874
boeck2015/tempodetector2016_default -0.3658 0.5811 -0.0120 0.0920

Table 11: Mean OE1/OE2 for estimates compared to version 2.0 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE1: CSV JSON LATEX PICKLE

Raw data OE2: CSV JSON LATEX PICKLE

OE1 distribution for 2.0

Figure 43: OE1 for estimates compared to version 2.0. Shown are the mean OE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 distribution for 2.0

Figure 44: OE2 for estimates compared to version 2.0. Shown are the mean OE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.0001 0.1542 0.0921 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task 0.0000 1.0000 0.0000 0.0000 0.0571 0.0000 0.0000 0.9816 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task_hjdb 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0000
boeck2020/dar 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.3335 0.2459 0.0002 0.0017 0.0000
davies2009/mirex_qm_tempotracker 0.0001 0.0571 0.0000 0.0000 1.0000 0.0000 0.0080 0.0533 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.1542 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0921 0.0000 0.0000 0.0000 0.0080 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.9816 0.0000 0.0000 0.0533 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0000 0.3335 0.0000 0.0000 0.0000 0.0000 1.0000 0.0438 0.0000 0.0254 0.0000
schreiber2018/cnn 0.0000 0.0000 0.0000 0.2459 0.0000 0.0000 0.0000 0.0000 0.0438 1.0000 0.0031 0.0000 0.0035
schreiber2018/fcn 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0031 1.0000 0.0000 0.6052
schreiber2018/ismir2018 0.0000 0.0000 0.0001 0.0017 0.0000 0.0000 0.0000 0.0000 0.0254 0.0000 0.0000 1.0000 0.0000
sun2021/default 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0035 0.6052 0.0000 1.0000

Table 12: Paired t-test p-values, using reference annotations 1.0 as groundtruth with OE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.0002 0.1738 0.0968 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task 0.0000 1.0000 0.0000 0.0000 0.0334 0.0000 0.0000 0.9492 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task_hjdb 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0000
boeck2020/dar 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.3023 0.3297 0.0002 0.0007 0.0000
davies2009/mirex_qm_tempotracker 0.0002 0.0334 0.0000 0.0000 1.0000 0.0000 0.0086 0.0371 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.1738 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0968 0.0000 0.0000 0.0000 0.0086 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.9492 0.0000 0.0000 0.0371 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0000 0.3023 0.0000 0.0000 0.0000 0.0000 1.0000 0.0563 0.0000 0.0159 0.0000
schreiber2018/cnn 0.0000 0.0000 0.0000 0.3297 0.0000 0.0000 0.0000 0.0000 0.0563 1.0000 0.0017 0.0000 0.0026
schreiber2018/fcn 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0017 1.0000 0.0000 0.6498
schreiber2018/ismir2018 0.0000 0.0000 0.0002 0.0007 0.0000 0.0000 0.0000 0.0000 0.0159 0.0000 0.0000 1.0000 0.0000
sun2021/default 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0026 0.6498 0.0000 1.0000

Table 13: Paired t-test p-values, using reference annotations 2.0 as groundtruth with OE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0167 0.0000 0.0000 0.0357 0.0096 0.0233 0.0200 0.0018 0.0095 0.0870
boeck2019/multi_task 0.0000 1.0000 0.5670 0.0002 0.0000 0.0013 0.0794 0.0541 0.0263 0.0390 0.1616 0.0259 0.0002
boeck2019/multi_task_hjdb 0.0000 0.5670 1.0000 0.0004 0.0000 0.0060 0.0331 0.0334 0.0102 0.0100 0.0515 0.0027 0.0000
boeck2020/dar 0.0167 0.0002 0.0004 1.0000 0.0000 0.0000 0.9142 0.5475 0.8776 0.7724 0.2084 0.5476 0.2777
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.0000 0.0013 0.0060 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0357 0.0794 0.0331 0.9142 0.0000 0.0000 1.0000 0.7006 0.9942 0.9175 0.3962 0.7899 0.3801
schreiber2017/ismir2017 0.0096 0.0541 0.0334 0.5475 0.0000 0.0000 0.7006 1.0000 0.3385 0.7779 0.5219 0.9417 0.1427
schreiber2017/mirex2017 0.0233 0.0263 0.0102 0.8776 0.0000 0.0000 0.9942 0.3385 1.0000 0.8900 0.2295 0.7191 0.2396
schreiber2018/cnn 0.0200 0.0390 0.0100 0.7724 0.0000 0.0000 0.9175 0.7779 0.8900 1.0000 0.2475 0.8119 0.1152
schreiber2018/fcn 0.0018 0.1616 0.0515 0.2084 0.0000 0.0000 0.3962 0.5219 0.2295 0.2475 1.0000 0.3874 0.0093
schreiber2018/ismir2018 0.0095 0.0259 0.0027 0.5476 0.0000 0.0000 0.7899 0.9417 0.7191 0.8119 0.3874 1.0000 0.0804
sun2021/default 0.0870 0.0002 0.0000 0.2777 0.0000 0.0000 0.3801 0.1427 0.2396 0.1152 0.0093 0.0804 1.0000

Table 14: Paired t-test p-values, using reference annotations 1.0 as groundtruth with OE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0040 0.0000 0.0000 0.0994 0.0302 0.0398 0.0014 0.0006 0.0050 0.0045
boeck2019/multi_task 0.0000 1.0000 0.2759 0.0002 0.0000 0.0524 0.0061 0.0003 0.0002 0.0537 0.0670 0.0246 0.0005
boeck2019/multi_task_hjdb 0.0000 0.2759 1.0000 0.0000 0.0000 0.1933 0.0008 0.0000 0.0000 0.0048 0.0055 0.0016 0.0000
boeck2020/dar 0.0040 0.0002 0.0000 1.0000 0.0000 0.0001 0.4403 0.4434 0.3685 0.3747 0.2543 0.7125 0.9723
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.0000 0.0524 0.1933 0.0001 0.0000 1.0000 0.0000 0.0000 0.0000 0.0001 0.0002 0.0001 0.0000
schreiber2014/default 0.0994 0.0061 0.0008 0.4403 0.0000 0.0000 1.0000 0.7546 0.8456 0.1341 0.0800 0.2735 0.4313
schreiber2017/ismir2017 0.0302 0.0003 0.0000 0.4434 0.0000 0.0000 0.7546 1.0000 0.7106 0.0891 0.0289 0.2838 0.4613
schreiber2017/mirex2017 0.0398 0.0002 0.0000 0.3685 0.0000 0.0000 0.8456 0.7106 1.0000 0.0472 0.0096 0.2009 0.3438
schreiber2018/cnn 0.0014 0.0537 0.0048 0.3747 0.0000 0.0001 0.1341 0.0891 0.0472 1.0000 0.7883 0.5223 0.2255
schreiber2018/fcn 0.0006 0.0670 0.0055 0.2543 0.0000 0.0002 0.0800 0.0289 0.0096 0.7883 1.0000 0.3700 0.0789
schreiber2018/ismir2018 0.0050 0.0246 0.0016 0.7125 0.0000 0.0001 0.2735 0.2838 0.2009 0.5223 0.3700 1.0000 0.6350
sun2021/default 0.0045 0.0005 0.0000 0.9723 0.0000 0.0000 0.4313 0.4613 0.3438 0.2255 0.0789 0.6350 1.0000

Table 15: Paired t-test p-values, using reference annotations 2.0 as groundtruth with OE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

OE1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE1 on Tempo-Subsets for 1.0

Figure 45: Mean OE1 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 on Tempo-Subsets for 2.0

Figure 46: Mean OE1 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE2 on Tempo-Subsets for 1.0

Figure 47: Mean OE2 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on Tempo-Subsets for 2.0

Figure 48: Mean OE2 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE1 for Tempo

When fitting a generalized additive model (GAM) to OE1-values and a ground truth, what OE1 can we expect with confidence?

Estimated OE1 for Tempo for 1.0

Predictions of GAMs trained on OE1 for estimates for reference 1.0.

Figure 49: OE1 predictions of a generalized additive model (GAM) fit to OE1 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE1 for Tempo for 2.0

Predictions of GAMs trained on OE1 for estimates for reference 2.0.

Figure 50: OE1 predictions of a generalized additive model (GAM) fit to OE1 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE2 for Tempo

When fitting a generalized additive model (GAM) to OE2-values and a ground truth, what OE2 can we expect with confidence?

Estimated OE2 for Tempo for 1.0

Predictions of GAMs trained on OE2 for estimates for reference 1.0.

Figure 51: OE2 predictions of a generalized additive model (GAM) fit to OE2 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE2 for Tempo for 2.0

Predictions of GAMs trained on OE2 for estimates for reference 2.0.

Figure 52: OE2 predictions of a generalized additive model (GAM) fit to OE2 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

OE1 for ‘tag_open’ Tags for 1.0

Figure 53: OE1 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

OE1 for ‘tag_open’ Tags for 2.0

Figure 54: OE1 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

OE2 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

OE2 for ‘tag_open’ Tags for 1.0

Figure 55: OE2 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

OE2 for ‘tag_open’ Tags for 2.0

Figure 56: OE2 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE1 and AOE2

AOE1 is defined as absolute octave error between an estimate and a reference value: AOE1(E) = |log2(E/R)|.

AOE2 is the minimum of AOE1 allowing the octave errors 2, 3, 1/2, and 1/3: AOE2(E) = min(AOE1(E), AOE1(2E), AOE1(3E), AOE1(½E), AOE1(⅓E)).

Mean AOE1/AOE2 Results for 1.0

Estimator AOE1_MEAN AOE1_STDEV AOE2_MEAN AOE2_STDEV
sun2021/default 0.1493 0.3341 0.0286 0.0811
schreiber2018/fcn 0.1504 0.3342 0.0275 0.0830
schreiber2017/mirex2017 0.1658 0.3462 0.0301 0.0891
boeck2020/dar 0.1713 0.3521 0.0339 0.0870
schreiber2018/cnn 0.1731 0.3508 0.0319 0.0930
schreiber2018/ismir2018 0.2032 0.3790 0.0295 0.0870
boeck2019/multi_task_hjdb 0.2519 0.4117 0.0340 0.0917
boeck2019/multi_task 0.2969 0.4342 0.0352 0.0930
schreiber2017/ismir2017 0.3078 0.4392 0.0318 0.0919
schreiber2014/default 0.3589 0.4546 0.0386 0.1019
boeck2015/tempodetector2016_default 0.4129 0.5606 0.0470 0.1108
percival2014/stem 0.4278 0.4765 0.0342 0.0952
davies2009/mirex_qm_tempotracker 0.4662 0.3484 0.2141 0.1946

Table 16: Mean AOE1/AOE2 for estimates compared to version 1.0 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE1: CSV JSON LATEX PICKLE

Raw data AOE2: CSV JSON LATEX PICKLE

AOE1 distribution for 1.0

Figure 57: AOE1 for estimates compared to version 1.0. Shown are the mean AOE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 distribution for 1.0

Figure 58: AOE2 for estimates compared to version 1.0. Shown are the mean AOE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean AOE1/AOE2 Results for 2.0

Estimator AOE1_MEAN AOE1_STDEV AOE2_MEAN AOE2_STDEV
schreiber2018/fcn 0.0968 0.2771 0.0135 0.0446
sun2021/default 0.1127 0.3069 0.0134 0.0401
boeck2020/dar 0.1179 0.3046 0.0189 0.0556
schreiber2018/cnn 0.1299 0.3182 0.0159 0.0545
schreiber2017/mirex2017 0.1574 0.3454 0.0191 0.0621
schreiber2018/ismir2018 0.1683 0.3592 0.0154 0.0519
boeck2019/multi_task_hjdb 0.2183 0.3961 0.0205 0.0631
boeck2019/multi_task 0.2873 0.4378 0.0213 0.0650
schreiber2017/ismir2017 0.2911 0.4393 0.0207 0.0661
schreiber2014/default 0.3662 0.4636 0.0287 0.0827
boeck2015/tempodetector2016_default 0.3974 0.5599 0.0302 0.0877
percival2014/stem 0.4156 0.4787 0.0217 0.0701
davies2009/mirex_qm_tempotracker 0.4325 0.3552 0.2026 0.1998

Table 17: Mean AOE1/AOE2 for estimates compared to version 2.0 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE1: CSV JSON LATEX PICKLE

Raw data AOE2: CSV JSON LATEX PICKLE

AOE1 distribution for 2.0

Figure 59: AOE1 for estimates compared to version 2.0. Shown are the mean AOE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 distribution for 2.0

Figure 60: AOE2 for estimates compared to version 2.0. Shown are the mean AOE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.0174 0.5153 0.0272 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.4269 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task_hjdb 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 0.0010 0.0000
boeck2020/dar 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.7152 0.8927 0.1170 0.0303 0.0839
davies2009/mirex_qm_tempotracker 0.0174 0.0000 0.0000 0.0000 1.0000 0.0350 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.5153 0.0000 0.0000 0.0000 0.0350 1.0000 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0272 0.0000 0.0000 0.0000 0.0000 0.0001 1.0000 0.0021 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.4269 0.0002 0.0000 0.0000 0.0000 0.0021 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0000 0.7152 0.0000 0.0000 0.0000 0.0000 1.0000 0.6264 0.2746 0.0083 0.2850
schreiber2018/cnn 0.0000 0.0000 0.0000 0.8927 0.0000 0.0000 0.0000 0.0000 0.6264 1.0000 0.0509 0.0291 0.0828
schreiber2018/fcn 0.0000 0.0000 0.0000 0.1170 0.0000 0.0000 0.0000 0.0000 0.2746 0.0509 1.0000 0.0001 0.9255
schreiber2018/ismir2018 0.0000 0.0000 0.0010 0.0303 0.0000 0.0000 0.0000 0.0000 0.0083 0.0291 0.0001 1.0000 0.0004
sun2021/default 0.0000 0.0000 0.0000 0.0839 0.0000 0.0000 0.0000 0.0000 0.2850 0.0828 0.9255 0.0004 1.0000

Table 18: Paired t-test p-values, using reference annotations 1.0 as groundtruth with AOE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.1200 0.4302 0.2061 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.7826 0.0000 0.0000 0.0000 0.0000 0.0000
boeck2019/multi_task_hjdb 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0009 0.0000
boeck2020/dar 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0077 0.3601 0.1070 0.0005 0.6831
davies2009/mirex_qm_tempotracker 0.1200 0.0000 0.0000 0.0000 1.0000 0.3503 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.4302 0.0000 0.0000 0.0000 0.3503 1.0000 0.0036 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.2061 0.0000 0.0000 0.0000 0.0002 0.0036 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.7826 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0001 0.0077 0.0000 0.0000 0.0000 0.0000 1.0000 0.0629 0.0000 0.4387 0.0039
schreiber2018/cnn 0.0000 0.0000 0.0000 0.3601 0.0000 0.0000 0.0000 0.0000 0.0629 1.0000 0.0036 0.0056 0.2156
schreiber2018/fcn 0.0000 0.0000 0.0000 0.1070 0.0000 0.0000 0.0000 0.0000 0.0000 0.0036 1.0000 0.0000 0.1835
schreiber2018/ismir2018 0.0000 0.0000 0.0009 0.0005 0.0000 0.0000 0.0000 0.0000 0.4387 0.0056 0.0000 1.0000 0.0003
sun2021/default 0.0000 0.0000 0.0000 0.6831 0.0000 0.0000 0.0000 0.0000 0.0039 0.2156 0.1835 0.0003 1.0000

Table 19: Paired t-test p-values, using reference annotations 2.0 as groundtruth with AOE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0009 0.0002 0.0001 0.0000 0.0020 0.0475 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000
boeck2019/multi_task 0.0009 1.0000 0.3909 0.4447 0.0000 0.7405 0.3388 0.1667 0.0436 0.1945 0.0020 0.0228 0.0027
boeck2019/multi_task_hjdb 0.0002 0.3909 1.0000 0.9727 0.0000 0.9300 0.1819 0.3765 0.1252 0.4066 0.0078 0.0610 0.0118
boeck2020/dar 0.0001 0.4447 0.9727 1.0000 0.0000 0.9172 0.1555 0.3075 0.0733 0.4135 0.0047 0.0366 0.0127
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.0020 0.7405 0.9300 0.9172 0.0000 1.0000 0.2598 0.4521 0.2173 0.4194 0.0184 0.1419 0.0561
schreiber2014/default 0.0475 0.3388 0.1819 0.1555 0.0000 0.2598 1.0000 0.0154 0.0028 0.0411 0.0004 0.0058 0.0020
schreiber2017/ismir2017 0.0000 0.1667 0.3765 0.3075 0.0000 0.4521 0.0154 1.0000 0.0698 0.9566 0.0428 0.2946 0.1710
schreiber2017/mirex2017 0.0000 0.0436 0.1252 0.0733 0.0000 0.2173 0.0028 0.0698 1.0000 0.4230 0.1700 0.7367 0.4765
schreiber2018/cnn 0.0001 0.1945 0.4066 0.4135 0.0000 0.4194 0.0411 0.9566 0.4230 1.0000 0.0287 0.2110 0.1018
schreiber2018/fcn 0.0000 0.0020 0.0078 0.0047 0.0000 0.0184 0.0004 0.0428 0.1700 0.0287 1.0000 0.3123 0.5603
schreiber2018/ismir2018 0.0000 0.0228 0.0610 0.0366 0.0000 0.1419 0.0058 0.2946 0.7367 0.2110 0.3123 1.0000 0.6381
sun2021/default 0.0000 0.0027 0.0118 0.0127 0.0000 0.0561 0.0020 0.1710 0.4765 0.1018 0.5603 0.6381 1.0000

Table 20: Paired t-test p-values, using reference annotations 1.0 as groundtruth with AOE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default
boeck2015/tempodetector2016_default 1.0000 0.0102 0.0043 0.0005 0.0000 0.0360 0.7097 0.0076 0.0020 0.0001 0.0000 0.0000 0.0000
boeck2019/multi_task 0.0102 1.0000 0.5613 0.1758 0.0000 0.8910 0.0400 0.7932 0.3719 0.0324 0.0013 0.0151 0.0001
boeck2019/multi_task_hjdb 0.0043 0.5613 1.0000 0.3719 0.0000 0.6753 0.0197 0.9376 0.5745 0.0617 0.0028 0.0254 0.0003
boeck2020/dar 0.0005 0.1758 0.3719 1.0000 0.0000 0.3648 0.0041 0.4428 0.9518 0.2092 0.0177 0.1006 0.0074
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.0360 0.8910 0.6753 0.3648 0.0000 1.0000 0.0771 0.7435 0.4184 0.0416 0.0028 0.0431 0.0013
schreiber2014/default 0.7097 0.0400 0.0197 0.0041 0.0000 0.0771 1.0000 0.0048 0.0007 0.0001 0.0000 0.0001 0.0000
schreiber2017/ismir2017 0.0076 0.7932 0.9376 0.4428 0.0000 0.7435 0.0048 1.0000 0.0857 0.0478 0.0008 0.0245 0.0026
schreiber2017/mirex2017 0.0020 0.3719 0.5745 0.9518 0.0000 0.4184 0.0007 0.0857 1.0000 0.1542 0.0041 0.0881 0.0112
schreiber2018/cnn 0.0001 0.0324 0.0617 0.2092 0.0000 0.0416 0.0001 0.0478 0.1542 1.0000 0.1911 0.7931 0.1992
schreiber2018/fcn 0.0000 0.0013 0.0028 0.0177 0.0000 0.0028 0.0000 0.0008 0.0041 0.1911 1.0000 0.3085 0.9701
schreiber2018/ismir2018 0.0000 0.0151 0.0254 0.1006 0.0000 0.0431 0.0001 0.0245 0.0881 0.7931 0.3085 1.0000 0.2806
sun2021/default 0.0000 0.0001 0.0003 0.0074 0.0000 0.0013 0.0000 0.0026 0.0112 0.1992 0.9701 0.2806 1.0000

Table 21: Paired t-test p-values, using reference annotations 2.0 as groundtruth with AOE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

AOE1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE1 on Tempo-Subsets for 1.0

Figure 61: Mean AOE1 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 on Tempo-Subsets for 2.0

Figure 62: Mean AOE1 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE2 on Tempo-Subsets for 1.0

Figure 63: Mean AOE2 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on Tempo-Subsets for 2.0

Figure 64: Mean AOE2 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE1 for Tempo

When fitting a generalized additive model (GAM) to AOE1-values and a ground truth, what AOE1 can we expect with confidence?

Estimated AOE1 for Tempo for 1.0

Predictions of GAMs trained on AOE1 for estimates for reference 1.0.

Figure 65: AOE1 predictions of a generalized additive model (GAM) fit to AOE1 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE1 for Tempo for 2.0

Predictions of GAMs trained on AOE1 for estimates for reference 2.0.

Figure 66: AOE1 predictions of a generalized additive model (GAM) fit to AOE1 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE2 for Tempo

When fitting a generalized additive model (GAM) to AOE2-values and a ground truth, what AOE2 can we expect with confidence?

Estimated AOE2 for Tempo for 1.0

Predictions of GAMs trained on AOE2 for estimates for reference 1.0.

Figure 67: AOE2 predictions of a generalized additive model (GAM) fit to AOE2 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE2 for Tempo for 2.0

Predictions of GAMs trained on AOE2 for estimates for reference 2.0.

Figure 68: AOE2 predictions of a generalized additive model (GAM) fit to AOE2 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

AOE1 for ‘tag_open’ Tags for 1.0

Figure 69: AOE1 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE1 for ‘tag_open’ Tags for 2.0

Figure 70: AOE1 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE2 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

AOE2 for ‘tag_open’ Tags for 1.0

Figure 71: AOE2 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE2 for ‘tag_open’ Tags for 2.0

Figure 72: AOE2 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG


Generated by tempo_eval 0.1.1 on 2022-06-29 18:32. Size L.