acm_mirum

This is the tempo_eval report for the ‘acm_mirum’ corpus.

Reports for other corpora may be found here.

References for ‘acm_mirum’
Estimates for ‘acm_mirum’

References for ‘acm_mirum’

References

1.0

Attribute	Value
Corpus	acm_mirum
Version	1.0
Curator	Geoffroy Peeters
Annotator, bibtex	Peeters2012
Annotator, ref_url	http://recherche.ircam.fr/anasyn/peeters/pub/2012_ACMMIRUM/

2.0

Attribute	Value
Corpus	acm_mirum
Version	2.0
Curator	Graham Percival
Annotator, bibtex	Percival2014
Annotator, ref_url	http://www.marsyas.info/tempo/

Basic Statistics

Reference	Size	Min	Max	Avg	Stdev	Sweet Oct. Start	Sweet Oct. Coverage
1.0	1410	36.00	257.00	102.54	32.73	69.00	0.73
2.0	1410	37.00	257.00	102.72	32.59	69.00	0.73

Table 1: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 1: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimates for ‘acm_mirum’

Estimators

boeck2015/tempodetector2016_default

Attribute	Value
Corpus	acm_mirum
Version	0.17.dev0
Annotation Tools	TempoDetector.2016, madmom, https://github.com/CPJKU/madmom
Annotator, bibtex	Boeck2015

boeck2019/multi_task

Attribute	Value
Corpus	acm_mirum
Version	0.0.1
Annotation Tools	model=multi_task, https://github.com/superbock/ISMIR2019
Annotator, bibtex	Boeck2019

boeck2019/multi_task_hjdb

Attribute	Value
Corpus	acm_mirum
Version	0.0.1
Annotation Tools	model=multi_task_hjdb, https://github.com/superbock/ISMIR2019
Annotator, bibtex	Boeck2019

boeck2020/dar

Attribute	Value
Corpus	acm_mirum
Version	0.0.1
Annotation Tools	https://github.com/superbock/ISMIR2020
Annotator, bibtex	Boeck2020

davies2009/mirex_qm_tempotracker

Attribute	Value
Corpus	acm_mirum
Version	1.0
Annotation Tools	QM Tempotracker, Sonic Annotator plugin. https://code.soundsoftware.ac.uk/projects/mirex2013/repository/show/audio_tempo_estimation/qm-tempotracker Note that the current macOS build of ‘qm-vamp-plugins’ was used.
Annotator, bibtex	Davies2009	Davies2007

echonest/version_3_2_1

Attribute	Value
Corpus	acm_mirum
Version	3.2.1
Data Source	Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools	Echo Nest track analyzer v3.2.1
Annotator, bibtex	Percival2014

gkiokas2012/default

Attribute	Value
Corpus	acm_mirum
Version	1.0
Data Source	Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools	Gkiokas2012
Annotator, bibtex	Gkiokas2012

klapuri2006/percival2014

Attribute	Value
Corpus	acm_mirum
Version	1.0
Data Source	Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools	Klapuri 2006
Annotator, bibtex	Klapuri2006

oliveira2010/ibt

Attribute	Value
Corpus	acm_mirum
Version	1.0
Data Source	Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools	Oliveira 2010
Annotator, bibtex	Oliveira2010

percival2014/stem

Attribute	Value
Corpus	acm_mirum
Version	1.0
Annotation Tools	percival 2014, ‘tempo’ implementation from Marsyas, http://marsyas.info, git checkout tempo-stem
Annotator, bibtex	Percival2014

scheirer1998/percival2014

Attribute	Value
Corpus	acm_mirum
Version	1.0
Data Source	Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools	Scheirer 1998
Annotator, bibtex	Scheirer1998

schreiber2014/default

Attribute	Value
Corpus	acm_mirum
Version	0.0.1
Annotation Tools	schreiber 2014, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex	Schreiber2014

schreiber2017/ismir2017

Attribute	Value
Corpus	acm_mirum
Version	0.0.4
Annotation Tools	schreiber 2017, model=ismir2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex	Schreiber2017

schreiber2017/mirex2017

Attribute	Value
Corpus	acm_mirum
Version	0.0.4
Annotation Tools	schreiber 2017, model=mirex2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex	Schreiber2017

schreiber2018/cnn

Attribute	Value
Corpus	acm_mirum
Version	0.0.2
Data Source	Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools	schreiber tempo-cnn (model=cnn), https://github.com/hendriks73/tempo-cnn

schreiber2018/fcn

Attribute	Value
Corpus	acm_mirum
Version	0.0.2
Data Source	Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools	schreiber tempo-cnn (model=fcn), https://github.com/hendriks73/tempo-cnn

schreiber2018/ismir2018

Attribute	Value
Corpus	acm_mirum
Version	0.0.2
Data Source	Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools	schreiber tempo-cnn (model=ismir2018), https://github.com/hendriks73/tempo-cnn

sun2021/default

Attribute	Value
Corpus	acm_mirum
Version	0.0.2
Data Source	Xiaoheng Sun, Qiqi He, Yongwei Gao, Wei Li. Musical Tempo Estimation Using a Multi-scale Network. in Proc. of the 22nd Int. Society for Music Information Retrieval Conf., Online, 2021
Annotation Tools	https://github.com/Qqi-HE/TempoEstimation_MGANet
Annotator, bibtex	Sun2021

zplane/auftakt_v3

Attribute	Value
Corpus	acm_mirum
Version	3.0
Data Source	Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools	zplane aufTAKT version 3.0, http://licensing.zplane.de/technology#auftakt
Annotator, bibtex	Percival2014

Basic Statistics

Estimator	Size	Min	Max	Avg	Stdev	Sweet Oct. Start	Sweet Oct. Coverage
boeck2015/tempodetector2016_default	1410	41.96	240.00	118.09	34.59	84.00	0.72
boeck2019/multi_task	1410	39.89	206.08	107.56	28.79	72.00	0.82
boeck2019/multi_task_hjdb	1410	39.64	204.80	108.48	29.73	72.00	0.79
boeck2020/dar	1410	48.02	237.16	109.24	32.30	73.00	0.76
davies2009/mirex_qm_tempotracker	1410	60.80	191.41	120.01	27.50	84.00	0.87
echonest/version_3_2_1	1410	45.91	197.99	108.64	30.82	72.00	0.76
gkiokas2012/default	1410	44.00	218.00	107.42	30.72	73.00	0.77
klapuri2006/percival2014	1410	65.01	164.06	110.33	23.89	76.00	0.93
oliveira2010/ibt	1410	80.00	167.00	118.42	23.87	81.00	1.00
percival2014/stem	1410	50.67	156.60	102.13	22.97	72.00	0.92
scheirer1998/percival2014	1312	61.35	181.82	106.51	31.06	77.00	0.72
schreiber2014/default	1410	56.34	160.08	99.91	22.74	68.00	0.90
schreiber2017/ismir2017	1410	40.72	203.92	105.72	27.80	72.00	0.83
schreiber2017/mirex2017	1410	28.86	193.36	104.59	29.61	72.00	0.79
schreiber2018/cnn	1410	41.00	204.00	111.56	31.17	74.00	0.77
schreiber2018/fcn	1410	40.00	214.00	109.67	33.17	72.00	0.75
schreiber2018/ismir2018	1410	56.00	204.00	110.56	28.18	74.00	0.85
sun2021/default	1410	47.00	240.00	110.64	32.64	73.00	0.75
zplane/auftakt_v3	1410	65.00	171.00	108.30	25.58	79.00	0.85

Table 2: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 2: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy

Accuracy₁ is defined as the percentage of correct estimates, allowing a 4% tolerance for individual BPM values.

Accuracy₂ additionally permits estimates to be wrong by a factor of 2, 3, 1/2 or 1/3 (so-called octave errors).

See [Gouyon2006].

Note: When comparing accuracy values for different algorithms, keep in mind that an algorithm may have been trained on the test set or that the test set may have even been created using one of the tested algorithms.

Accuracy Results for 1.0

Estimator	Accuracy1	Accuracy2
schreiber2017/mirex2017	0.8348	0.9078
boeck2020/dar	0.7844	0.9085
schreiber2018/cnn	0.7645	0.9071
schreiber2018/fcn	0.7631	0.9007
sun2021/default	0.7582	0.8851
schreiber2017/ismir2017	0.7511	0.8936
schreiber2018/ismir2018	0.7163	0.8972
boeck2019/multi_task	0.6936	0.9000
schreiber2014/default	0.6922	0.8794
boeck2015/tempodetector2016_default	0.6858	0.9000
boeck2019/multi_task_hjdb	0.6851	0.8972
echonest/version_3_2_1	0.6766	0.8539
percival2014/stem	0.6702	0.9000
gkiokas2012/default	0.6610	0.8972
zplane/auftakt_v3	0.6461	0.8660
klapuri2006/percival2014	0.6270	0.8865
davies2009/mirex_qm_tempotracker	0.6085	0.8645
oliveira2010/ibt	0.5794	0.8603
scheirer1998/percival2014	0.4957	0.7149

Table 3: Mean accuracy of estimates compared to version 1.0 with 4% tolerance ordered by Accuracy₁.

CSV JSON LATEX PICKLE

Raw data Accuracy₁: CSV JSON LATEX PICKLE

Raw data Accuracy₂: CSV JSON LATEX PICKLE

Accuracy₁ for 1.0

Figure 3: Mean Accuracy₁ for estimates compared to version 1.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy₂ for 1.0

Figure 4: Mean Accuracy₂ for estimates compared to version 1.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy Results for 2.0

Estimator	Accuracy1	Accuracy2
schreiber2017/mirex2017	0.9092	0.9872
boeck2020/dar	0.8475	0.9908
schreiber2018/cnn	0.8291	0.9844
schreiber2018/fcn	0.8277	0.9794
sun2021/default	0.8206	0.9638
schreiber2017/ismir2017	0.8184	0.9730
schreiber2018/ismir2018	0.7809	0.9766
schreiber2014/default	0.7617	0.9603
boeck2019/multi_task	0.7574	0.9773
boeck2019/multi_task_hjdb	0.7489	0.9738
boeck2015/tempodetector2016_default	0.7404	0.9780
echonest/version_3_2_1	0.7390	0.9291
percival2014/stem	0.7369	0.9794
gkiokas2012/default	0.7270	0.9801
zplane/auftakt_v3	0.7021	0.9390
klapuri2006/percival2014	0.6879	0.9688
davies2009/mirex_qm_tempotracker	0.6603	0.9348
oliveira2010/ibt	0.6305	0.9312
scheirer1998/percival2014	0.5355	0.7723

Table 4: Mean accuracy of estimates compared to version 2.0 with 4% tolerance ordered by Accuracy₁.

CSV JSON LATEX PICKLE

Raw data Accuracy₁: CSV JSON LATEX PICKLE

Raw data Accuracy₂: CSV JSON LATEX PICKLE

Accuracy₁ for 2.0

Figure 5: Mean Accuracy₁ for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy₂ for 2.0

Figure 6: Mean Accuracy₂ for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Differing Items

For which items did a given estimator not estimate a correct value with respect to a given ground truth? Are there items which are either very difficult, not suitable for the task, or incorrectly annotated and therefore never estimated correctly, regardless which estimator is used?

Differing Items Accuracy₁

Items with different tempo annotations (Accuracy₁, 4% tolerance) in different versions:

1.0 compared with boeck2015/tempodetector2016_default (443 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10503810.clip’ ‘105233.clip’ ‘10563001.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10766976.clip’ ‘10809231.clip’ … CSV

1.0 compared with boeck2019/multi_task (432 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10503810.clip’ ‘10563001.clip’ ‘10726039.clip’ ‘10809231.clip’ ‘10875997.clip’ ‘10893272.clip’ … CSV

1.0 compared with boeck2019/multi_task_hjdb (444 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10503810.clip’ ‘10563001.clip’ ‘10726039.clip’ ‘10809231.clip’ ‘10875997.clip’ ‘10893272.clip’ … CSV

1.0 compared with boeck2020/dar (304 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726039.clip’ ‘10809231.clip’ ‘10875997.clip’ ‘10893272.clip’ … CSV

1.0 compared with davies2009/mirex_qm_tempotracker (552 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10503810.clip’ ‘10563001.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726076.clip’ ‘1074945.clip’ ‘10809231.clip’ … CSV

1.0 compared with echonest/version_3_2_1 (456 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104260.clip’ ‘10503810.clip’ ‘10563001.clip’ ‘10563039.clip’ ‘1074945.clip’ ‘10766976.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘10894070.clip’ … CSV

1.0 compared with gkiokas2012/default (478 differences): ‘10118334.clip’ ‘10258351.clip’ ‘10270809.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10503810.clip’ ‘10563001.clip’ ‘10563039.clip’ ‘1074945.clip’ ‘10809231.clip’ … CSV

1.0 compared with klapuri2006/percival2014 (526 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘105009.clip’ ‘10503810.clip’ ‘105233.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726076.clip’ … CSV

1.0 compared with oliveira2010/ibt (593 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726058.clip’ ‘10726076.clip’ ‘1074945.clip’ … CSV

1.0 compared with percival2014/stem (465 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726076.clip’ ‘10809231.clip’ ‘10893272.clip’ … CSV

1.0 compared with scheirer1998/percival2014 (711 differences): ‘10118334.clip’ ‘10258351.clip’ ‘10270809.clip’ ‘10332517.clip’ ‘104186.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘105009.clip’ ‘10503810.clip’ ‘105233.clip’ … CSV

1.0 compared with schreiber2014/default (434 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘105009.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726058.clip’ ‘10809231.clip’ … CSV

1.0 compared with schreiber2017/ismir2017 (351 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘10894286.clip’ ‘11173645.clip’ ‘11430821.clip’ … CSV

1.0 compared with schreiber2017/mirex2017 (233 differences): ‘10118334.clip’ ‘10258351.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘12185327.clip’ … CSV

1.0 compared with schreiber2018/cnn (332 differences): ‘10118334.clip’ ‘10258351.clip’ ‘10389587.clip’ ‘104222.clip’ ‘104260.clip’ ‘10503810.clip’ ‘10563001.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726058.clip’ ‘10726076.clip’ … CSV

1.0 compared with schreiber2018/fcn (334 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10503810.clip’ ‘10563001.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726058.clip’ ‘10766976.clip’ ‘10809231.clip’ … CSV

1.0 compared with schreiber2018/ismir2018 (400 differences): ‘10118334.clip’ ‘10258351.clip’ ‘10389587.clip’ ‘104222.clip’ ‘104260.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10809231.clip’ ‘10875997.clip’ ‘10893272.clip’ … CSV

1.0 compared with sun2021/default (341 differences): ‘10118334.clip’ ‘10258351.clip’ ‘10389587.clip’ ‘104222.clip’ ‘104260.clip’ ‘10503810.clip’ ‘105233.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726039.clip’ ‘10726076.clip’ … CSV

1.0 compared with zplane/auftakt_v3 (499 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10503810.clip’ ‘1071042.clip’ ‘10726032.clip’ ‘10726058.clip’ ‘10726076.clip’ ‘1074945.clip’ … CSV

2.0 compared with boeck2015/tempodetector2016_default (366 differences): ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘105233.clip’ ‘10563001.clip’ ‘1071042.clip’ ‘10766976.clip’ ‘10875997.clip’ ‘10894070.clip’ ‘10894286.clip’ ‘1094919.clip’ … CSV

2.0 compared with boeck2019/multi_task (342 differences): ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10563001.clip’ ‘10726039.clip’ ‘10875997.clip’ ‘10894070.clip’ ‘10894286.clip’ ‘11116238.clip’ ‘11173645.clip’ … CSV

2.0 compared with boeck2019/multi_task_hjdb (354 differences): ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10563001.clip’ ‘10726039.clip’ ‘10875997.clip’ ‘10894070.clip’ ‘10894286.clip’ ‘11116238.clip’ ‘11173645.clip’ … CSV

2.0 compared with boeck2020/dar (215 differences): ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘1071042.clip’ ‘10726039.clip’ ‘10875997.clip’ ‘10894070.clip’ ‘10894286.clip’ ‘11173645.clip’ ‘11401306.clip’ ‘11622.clip’ … CSV

2.0 compared with davies2009/mirex_qm_tempotracker (479 differences): ‘10258351.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10563001.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726076.clip’ ‘1074945.clip’ ‘10894070.clip’ ‘10894286.clip’ ‘11030307.clip’ … CSV

2.0 compared with echonest/version_3_2_1 (368 differences): ‘10258351.clip’ ‘104260.clip’ ‘10563001.clip’ ‘1074945.clip’ ‘10766976.clip’ ‘10893272.clip’ ‘10894070.clip’ ‘10894286.clip’ ‘11173645.clip’ ‘11409728.clip’ ‘11430821.clip’ … CSV

2.0 compared with gkiokas2012/default (385 differences): ‘10258351.clip’ ‘10270809.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10563001.clip’ ‘1074945.clip’ ‘10894286.clip’ ‘11030307.clip’ ‘11173645.clip’ ‘11471649.clip’ … CSV

2.0 compared with klapuri2006/percival2014 (440 differences): ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘105009.clip’ ‘105233.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10894070.clip’ ‘10894286.clip’ ‘1108465.clip’ … CSV

2.0 compared with oliveira2010/ibt (521 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10726058.clip’ ‘10726076.clip’ ‘1074945.clip’ ‘10894070.clip’ … CSV

2.0 compared with percival2014/stem (371 differences): ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10563039.clip’ ‘1071042.clip’ ‘10894286.clip’ ‘11030307.clip’ ‘11116238.clip’ ‘11173645.clip’ ‘11554698.clip’ … CSV

2.0 compared with scheirer1998/percival2014 (655 differences): ‘10118334.clip’ ‘10258351.clip’ ‘10270809.clip’ ‘10332517.clip’ ‘104186.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘105009.clip’ ‘105233.clip’ ‘10563001.clip’ … CSV

2.0 compared with schreiber2014/default (336 differences): ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘105009.clip’ ‘1071042.clip’ ‘10726058.clip’ ‘10894286.clip’ ‘11030307.clip’ ‘11116238.clip’ ‘11173645.clip’ … CSV

2.0 compared with schreiber2017/ismir2017 (256 differences): ‘10258351.clip’ ‘104260.clip’ ‘10443543.clip’ ‘10894286.clip’ ‘11173645.clip’ ‘11612127.clip’ ‘11622.clip’ ‘1162915.clip’ ‘11812770.clip’ ‘12185327.clip’ ‘1245162.clip’ … CSV

2.0 compared with schreiber2017/mirex2017 (128 differences): ‘10258351.clip’ ‘11173645.clip’ ‘11612127.clip’ ‘12185327.clip’ ‘1245162.clip’ ‘12638738.clip’ ‘129570.clip’ ‘13041056.clip’ ‘13086293.clip’ ‘1358509.clip’ ‘1393863.clip’ … CSV

2.0 compared with schreiber2018/cnn (241 differences): ‘10258351.clip’ ‘10389587.clip’ ‘104222.clip’ ‘104260.clip’ ‘10563001.clip’ ‘1071042.clip’ ‘10726058.clip’ ‘10726076.clip’ ‘10875997.clip’ ‘10894070.clip’ ‘10894286.clip’ … CSV

2.0 compared with schreiber2018/fcn (243 differences): ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10563001.clip’ ‘1071042.clip’ ‘10726058.clip’ ‘10766976.clip’ ‘10875997.clip’ ‘10894286.clip’ ‘1094938.clip’ ‘11030307.clip’ … CSV

2.0 compared with schreiber2018/ismir2018 (309 differences): ‘10258351.clip’ ‘10389587.clip’ ‘104222.clip’ ‘104260.clip’ ‘1071042.clip’ ‘10875997.clip’ ‘10894070.clip’ ‘10894286.clip’ ‘11173645.clip’ ‘11612127.clip’ ‘11622.clip’ … CSV

2.0 compared with sun2021/default (253 differences): ‘10258351.clip’ ‘10389587.clip’ ‘104222.clip’ ‘104260.clip’ ‘105233.clip’ ‘1071042.clip’ ‘10726039.clip’ ‘10726076.clip’ ‘10875997.clip’ ‘10894070.clip’ ‘10894286.clip’ … CSV

2.0 compared with zplane/auftakt_v3 (420 differences): ‘10118334.clip’ ‘10258351.clip’ ‘104222.clip’ ‘104260.clip’ ‘10443543.clip’ ‘1071042.clip’ ‘10726032.clip’ ‘10726058.clip’ ‘10726076.clip’ ‘1074945.clip’ ‘10894286.clip’ … CSV

None of the estimators estimated the following 46 items ‘correctly’ using Accuracy₁: ‘10258351.clip’ ‘11173645.clip’ ‘12185327.clip’ ‘1245162.clip’ ‘12638738.clip’ ‘129570.clip’ ‘1358509.clip’ ‘1393863.clip’ ‘1433718.clip’ ‘1492191.clip’ ‘14970371.clip’ … CSV

Differing Items Accuracy₂

Items with different tempo annotations (Accuracy₂, 4% tolerance) in different versions:

1.0 compared with boeck2015/tempodetector2016_default (141 differences): ‘10118334.clip’ ‘104260.clip’ ‘10503810.clip’ ‘105233.clip’ ‘10563039.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘1108465.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ … CSV

1.0 compared with boeck2019/multi_task (141 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ ‘12643846.clip’ ‘12906077.clip’ … CSV

1.0 compared with boeck2019/multi_task_hjdb (145 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ ‘12643846.clip’ ‘12906123.clip’ … CSV

1.0 compared with boeck2020/dar (129 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ ‘1248986.clip’ … CSV

1.0 compared with davies2009/mirex_qm_tempotracker (191 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10726076.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘1108465.clip’ ‘11193878.clip’ ‘11430821.clip’ ‘11552767.clip’ ‘1173974.clip’ … CSV

1.0 compared with echonest/version_3_2_1 (206 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10766976.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘1173975.clip’ ‘11883189.clip’ … CSV

1.0 compared with gkiokas2012/default (145 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ ‘12643846.clip’ … CSV

1.0 compared with klapuri2006/percival2014 (160 differences): ‘10118334.clip’ ‘105009.clip’ ‘10503810.clip’ ‘105233.clip’ ‘10563039.clip’ ‘10726076.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘1108465.clip’ ‘11430821.clip’ ‘1173974.clip’ … CSV

1.0 compared with oliveira2010/ibt (197 differences): ‘10118334.clip’ ‘104260.clip’ ‘10503810.clip’ ‘10726058.clip’ ‘10726076.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘1107612.clip’ ‘11095374.clip’ ‘11173645.clip’ ‘11430821.clip’ … CSV

1.0 compared with percival2014/stem (141 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10726076.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ … CSV

1.0 compared with scheirer1998/percival2014 (402 differences): ‘10118334.clip’ ‘10270809.clip’ ‘104260.clip’ ‘105009.clip’ ‘10503810.clip’ ‘10563001.clip’ ‘10563039.clip’ ‘10726076.clip’ ‘1077416.clip’ ‘10809231.clip’ ‘10875997.clip’ … CSV

1.0 compared with schreiber2014/default (170 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10726058.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ … CSV

1.0 compared with schreiber2017/ismir2017 (150 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ ‘12643846.clip’ … CSV

1.0 compared with schreiber2017/mirex2017 (130 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ ‘12643846.clip’ … CSV

1.0 compared with schreiber2018/cnn (131 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10726058.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ … CSV

1.0 compared with schreiber2018/fcn (140 differences): ‘10118334.clip’ ‘104260.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10726058.clip’ ‘10766976.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘11612127.clip’ … CSV

1.0 compared with schreiber2018/ismir2018 (145 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10563039.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ ‘1245162.clip’ ‘12643846.clip’ … CSV

1.0 compared with sun2021/default (162 differences): ‘10118334.clip’ ‘104260.clip’ ‘10503810.clip’ ‘105233.clip’ ‘10563039.clip’ ‘10726076.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘1107612.clip’ ‘11173645.clip’ ‘11430821.clip’ … CSV

1.0 compared with zplane/auftakt_v3 (189 differences): ‘10118334.clip’ ‘10503810.clip’ ‘10726032.clip’ ‘10726058.clip’ ‘10726076.clip’ ‘10809231.clip’ ‘10893272.clip’ ‘11119266.clip’ ‘11430821.clip’ ‘1173974.clip’ ‘11883189.clip’ … CSV

2.0 compared with boeck2015/tempodetector2016_default (31 differences): ‘104260.clip’ ‘105233.clip’ ‘1108465.clip’ ‘11173645.clip’ ‘1358509.clip’ ‘14600040.clip’ ‘164628.clip’ ‘1827592.clip’ ‘208474.clip’ ‘2252893.clip’ ‘2347473.clip’ … CSV

2.0 compared with boeck2019/multi_task (32 differences): ‘11173645.clip’ ‘12906077.clip’ ‘1359069.clip’ ‘1435815.clip’ ‘14600040.clip’ ‘164628.clip’ ‘168497.clip’ ‘168499.clip’ ‘2373182.clip’ ‘2517897.clip’ ‘253654.clip’ … CSV

2.0 compared with boeck2019/multi_task_hjdb (37 differences): ‘11173645.clip’ ‘1245162.clip’ ‘1359069.clip’ ‘1435815.clip’ ‘14600040.clip’ ‘164628.clip’ ‘168497.clip’ ‘168499.clip’ ‘168502.clip’ ‘2373182.clip’ ‘2517897.clip’ … CSV

2.0 compared with boeck2020/dar (13 differences): ‘11173645.clip’ ‘1245162.clip’ ‘13036093.clip’ ‘164628.clip’ ‘168499.clip’ ‘250576.clip’ ‘3333901.clip’ ‘4326130.clip’ ‘458707.clip’ ‘6018930.clip’ ‘8231796.clip’ … CSV

2.0 compared with davies2009/mirex_qm_tempotracker (92 differences): ‘10726076.clip’ ‘1108465.clip’ ‘11193878.clip’ ‘11552767.clip’ ‘12185327.clip’ ‘13036093.clip’ ‘13376370.clip’ ‘1355192.clip’ ‘1359069.clip’ ‘143682.clip’ ‘15416738.clip’ … CSV

2.0 compared with echonest/version_3_2_1 (100 differences): ‘10766976.clip’ ‘11173645.clip’ ‘11430821.clip’ ‘1173975.clip’ ‘13036093.clip’ ‘13065657.clip’ ‘13167246.clip’ ‘13376370.clip’ ‘13561397.clip’ ‘1385105.clip’ ‘13851876.clip’ … CSV

2.0 compared with gkiokas2012/default (28 differences): ‘11173645.clip’ ‘13036093.clip’ ‘1393863.clip’ ‘15416738.clip’ ‘168499.clip’ ‘1827592.clip’ ‘2252893.clip’ ‘2284109.clip’ ‘2347473.clip’ ‘256907.clip’ ‘281623.clip’ … CSV

2.0 compared with klapuri2006/percival2014 (44 differences): ‘105009.clip’ ‘105233.clip’ ‘1108465.clip’ ‘129570.clip’ ‘13376370.clip’ ‘14180121.clip’ ‘14600040.clip’ ‘166285.clip’ ‘168497.clip’ ‘168499.clip’ ‘168502.clip’ … CSV

2.0 compared with oliveira2010/ibt (97 differences): ‘104260.clip’ ‘10563039.clip’ ‘10726058.clip’ ‘10726076.clip’ ‘1107612.clip’ ‘11095374.clip’ ‘11173645.clip’ ‘12185327.clip’ ‘12643846.clip’ ‘129570.clip’ ‘13036093.clip’ … CSV

2.0 compared with percival2014/stem (29 differences): ‘11173645.clip’ ‘1245162.clip’ ‘13851876.clip’ ‘14600040.clip’ ‘15416738.clip’ ‘164628.clip’ ‘168499.clip’ ‘1949678.clip’ ‘2232914.clip’ ‘2411992.clip’ ‘282894.clip’ … CSV

2.0 compared with scheirer1998/percival2014 (321 differences): ‘10118334.clip’ ‘10270809.clip’ ‘104260.clip’ ‘105009.clip’ ‘10563001.clip’ ‘10726076.clip’ ‘1077416.clip’ ‘10875997.clip’ ‘10893272.clip’ ‘1103942.clip’ ‘1108465.clip’ … CSV

2.0 compared with schreiber2014/default (56 differences): ‘10726058.clip’ ‘11173645.clip’ ‘11612127.clip’ ‘14180121.clip’ ‘1447613.clip’ ‘14600040.clip’ ‘14634350.clip’ ‘15952618.clip’ ‘164628.clip’ ‘168497.clip’ ‘168499.clip’ … CSV

2.0 compared with schreiber2017/ismir2017 (38 differences): ‘11173645.clip’ ‘11612127.clip’ ‘1245162.clip’ ‘14180121.clip’ ‘1447613.clip’ ‘14600040.clip’ ‘15952618.clip’ ‘164628.clip’ ‘1827592.clip’ ‘2214125.clip’ ‘2232914.clip’ … CSV

2.0 compared with schreiber2017/mirex2017 (18 differences): ‘11173645.clip’ ‘11612127.clip’ ‘1245162.clip’ ‘1447613.clip’ ‘14600040.clip’ ‘164628.clip’ ‘2232914.clip’ ‘2233308.clip’ ‘3188766.clip’ ‘3732728.clip’ ‘4326130.clip’ … CSV

2.0 compared with schreiber2018/cnn (22 differences): ‘10726058.clip’ ‘11173645.clip’ ‘11612127.clip’ ‘14600040.clip’ ‘15416738.clip’ ‘164628.clip’ ‘168499.clip’ ‘281623.clip’ ‘282894.clip’ ‘3110264.clip’ ‘3176577.clip’ … CSV

2.0 compared with schreiber2018/fcn (29 differences): ‘104260.clip’ ‘10726058.clip’ ‘10766976.clip’ ‘11173645.clip’ ‘14600040.clip’ ‘14634350.clip’ ‘15416738.clip’ ‘164628.clip’ ‘168499.clip’ ‘281623.clip’ ‘3030777.clip’ … CSV

2.0 compared with schreiber2018/ismir2018 (33 differences): ‘11173645.clip’ ‘11612127.clip’ ‘1245162.clip’ ‘12937673.clip’ ‘13851876.clip’ ‘14600040.clip’ ‘15416738.clip’ ‘168499.clip’ ‘1827592.clip’ ‘3073826.clip’ ‘3110264.clip’ … CSV

2.0 compared with sun2021/default (51 differences): ‘104260.clip’ ‘105233.clip’ ‘10726076.clip’ ‘1107612.clip’ ‘11173645.clip’ ‘12185327.clip’ ‘14180121.clip’ ‘15952618.clip’ ‘168499.clip’ ‘1827592.clip’ ‘2284109.clip’ … CSV

2.0 compared with zplane/auftakt_v3 (86 differences): ‘10726032.clip’ ‘10726058.clip’ ‘10726076.clip’ ‘11119266.clip’ ‘11612127.clip’ ‘1173974.clip’ ‘1245162.clip’ ‘12937673.clip’ ‘13036093.clip’ ‘13376370.clip’ ‘1385105.clip’ … CSV

All tracks were estimated ‘correctly’ by at least one system.

Significance of Differences

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.1783	0.5183	0.0000	0.0000	0.9574	0.3552	0.0001	0.0000	0.8385	0.0000	0.1410	0.0000	0.0000	0.0000	0.0000	0.0005	0.0000	0.0063
boeck2019/multi_task	0.1783	1.0000	0.1691	0.0000	0.0000	0.1649	0.0242	0.0000	0.0000	0.0725	0.0000	0.7416	0.0000	0.0000	0.0000	0.0000	0.0393	0.0000	0.0000
boeck2019/multi_task_hjdb	0.5183	0.1691	1.0000	0.0000	0.0000	0.4821	0.1093	0.0000	0.0000	0.3416	0.0000	0.2954	0.0000	0.0000	0.0000	0.0000	0.0051	0.0000	0.0002
boeck2020/dar	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0054	0.0000	0.0566	0.0501	0.0000	0.0019	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0077	0.0032	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0012
echonest/version_3_2_1	0.9574	0.1649	0.4821	0.0000	0.0000	1.0000	0.3904	0.0001	0.0000	0.9125	0.0000	0.0773	0.0000	0.0000	0.0000	0.0000	0.0006	0.0000	0.0055
gkiokas2012/default	0.3552	0.0242	0.1093	0.0000	0.0000	0.3904	1.0000	0.0028	0.0000	0.4306	0.0000	0.0057	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0577
klapuri2006/percival2014	0.0001	0.0000	0.0000	0.0000	0.0077	0.0001	0.0028	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.2369
oliveira2010/ibt	0.0000	0.0000	0.0000	0.0000	0.0032	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
percival2014/stem	0.8385	0.0725	0.3416	0.0000	0.0000	0.9125	0.4306	0.0000	0.0000	1.0000	0.0000	0.0120	0.0000	0.0000	0.0000	0.0000	0.0001	0.0000	0.0035
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.1410	0.7416	0.2954	0.0000	0.0000	0.0773	0.0057	0.0000	0.0000	0.0120	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0871	0.0000	0.0000
schreiber2017/ismir2017	0.0000	0.0000	0.0000	0.0054	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.3282	0.4020	0.0003	0.8944	0.0000
schreiber2017/mirex2017	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2018/cnn	0.0000	0.0000	0.0000	0.0566	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.3282	0.0000	1.0000	0.9385	0.0000	0.3904	0.0000
schreiber2018/fcn	0.0000	0.0000	0.0000	0.0501	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.4020	0.0000	0.9385	1.0000	0.0000	0.5182	0.0000
schreiber2018/ismir2018	0.0005	0.0393	0.0051	0.0000	0.0000	0.0006	0.0000	0.0000	0.0000	0.0001	0.0000	0.0871	0.0003	0.0000	0.0000	0.0000	1.0000	0.0002	0.0000
sun2021/default	0.0000	0.0000	0.0000	0.0019	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.8944	0.0000	0.3904	0.5182	0.0002	1.0000	0.0000
zplane/auftakt_v3	0.0063	0.0000	0.0002	0.0000	0.0012	0.0055	0.0577	0.2369	0.0000	0.0035	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000

Table 5: McNemar p-values, using reference annotations 2.0 as groundtruth with Accuracy₁ [Gouyon2006]. H₀: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H₀, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.5435	1.0000	0.0000	0.0000	0.5030	0.0680	0.0000	0.0000	0.2657	0.0000	0.6703	0.0000	0.0000	0.0000	0.0000	0.0057	0.0000	0.0031
boeck2019/multi_task	0.5435	1.0000	0.1550	0.0000	0.0000	0.1900	0.0124	0.0000	0.0000	0.0368	0.0000	0.9455	0.0000	0.0000	0.0000	0.0000	0.0401	0.0000	0.0000
boeck2019/multi_task_hjdb	1.0000	0.1550	1.0000	0.0000	0.0000	0.5411	0.0676	0.0000	0.0000	0.2295	0.0000	0.5692	0.0000	0.0000	0.0000	0.0000	0.0048	0.0000	0.0012
boeck2020/dar	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0009	0.0000	0.0350	0.0307	0.0000	0.0028	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0001	0.0727	0.0038	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0025
echonest/version_3_2_1	0.5030	0.1900	0.5411	0.0000	0.0000	1.0000	0.2360	0.0001	0.0000	0.6490	0.0000	0.2143	0.0000	0.0000	0.0000	0.0000	0.0007	0.0000	0.0172
gkiokas2012/default	0.0680	0.0124	0.0676	0.0000	0.0001	0.2360	1.0000	0.0068	0.0000	0.4541	0.0000	0.0084	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.2410
klapuri2006/percival2014	0.0000	0.0000	0.0000	0.0000	0.0727	0.0001	0.0068	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0899
oliveira2010/ibt	0.0000	0.0000	0.0000	0.0000	0.0038	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
percival2014/stem	0.2657	0.0368	0.2295	0.0000	0.0000	0.6490	0.4541	0.0000	0.0000	1.0000	0.0000	0.0210	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0361
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.6703	0.9455	0.5692	0.0000	0.0000	0.2143	0.0084	0.0000	0.0000	0.0210	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0234	0.0000	0.0000
schreiber2017/ismir2017	0.0000	0.0000	0.0000	0.0009	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.2020	0.2567	0.0007	0.5476	0.0000
schreiber2017/mirex2017	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2018/cnn	0.0000	0.0000	0.0000	0.0350	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.2020	0.0000	1.0000	0.9362	0.0000	0.5309	0.0000
schreiber2018/fcn	0.0000	0.0000	0.0000	0.0307	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.2567	0.0000	0.9362	1.0000	0.0000	0.6556	0.0000
schreiber2018/ismir2018	0.0057	0.0401	0.0048	0.0000	0.0000	0.0007	0.0000	0.0000	0.0000	0.0000	0.0000	0.0234	0.0007	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000
sun2021/default	0.0000	0.0000	0.0000	0.0028	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.5476	0.0000	0.5309	0.6556	0.0000	1.0000	0.0000
zplane/auftakt_v3	0.0031	0.0000	0.0012	0.0000	0.0025	0.0172	0.2410	0.0899	0.0000	0.0361	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000

Table 6: McNemar p-values, using reference annotations 1.0 as groundtruth with Accuracy₁ [Gouyon2006]. H₀: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H₀, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	1.0000	0.4799	0.0039	0.0000	0.0000	0.7552	0.1175	0.0000	0.8804	0.0000	0.0026	0.3817	0.0470	0.2110	0.8830	0.8804	0.0105	0.0000
boeck2019/multi_task	1.0000	1.0000	0.2266	0.0005	0.0000	0.0000	0.6177	0.1550	0.0000	0.7359	0.0000	0.0012	0.4709	0.0243	0.1102	0.7428	1.0000	0.0127	0.0000
boeck2019/multi_task_hjdb	0.4799	0.2266	1.0000	0.0000	0.0000	0.0000	0.2327	0.4500	0.0000	0.2800	0.0000	0.0145	1.0000	0.0019	0.0201	0.2800	0.6718	0.0925	0.0000
boeck2020/dar	0.0039	0.0005	0.0000	1.0000	0.0000	0.0000	0.0081	0.0000	0.0000	0.0025	0.0000	0.0000	0.0000	0.3018	0.0784	0.0025	0.0005	0.0000	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.5625	0.0000	0.0000	0.7139	0.0000	0.0000	0.0013	0.0000	0.0000	0.0000	0.0000	0.0000	0.0002	0.6612
echonest/version_3_2_1	0.0000	0.0000	0.0000	0.0000	0.5625	1.0000	0.0000	0.0000	0.8467	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.1980
gkiokas2012/default	0.7552	0.6177	0.2327	0.0081	0.0000	0.0000	1.0000	0.0365	0.0000	1.0000	0.0000	0.0000	0.1539	0.1325	0.3449	1.0000	0.4731	0.0004	0.0000
klapuri2006/percival2014	0.1175	0.1550	0.4500	0.0000	0.0000	0.0000	0.0365	1.0000	0.0000	0.0444	0.0000	0.1550	0.4966	0.0003	0.0009	0.0627	0.1690	0.4011	0.0000
oliveira2010/ibt	0.0000	0.0000	0.0000	0.0000	0.7139	0.8467	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.3049
percival2014/stem	0.8804	0.7359	0.2800	0.0025	0.0000	0.0000	1.0000	0.0444	0.0000	1.0000	0.0000	0.0004	0.2221	0.0522	0.2478	1.0000	0.5966	0.0026	0.0000
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.0026	0.0012	0.0145	0.0000	0.0013	0.0000	0.0000	0.1550	0.0000	0.0004	0.0000	1.0000	0.0009	0.0000	0.0000	0.0000	0.0014	0.5758	0.0002
schreiber2017/ismir2017	0.3817	0.4709	1.0000	0.0000	0.0000	0.0000	0.1539	0.4966	0.0000	0.2221	0.0000	0.0009	1.0000	0.0000	0.0090	0.1877	0.4996	0.0854	0.0000
schreiber2017/mirex2017	0.0470	0.0243	0.0019	0.3018	0.0000	0.0000	0.1325	0.0003	0.0000	0.0522	0.0000	0.0000	0.0000	1.0000	0.5235	0.0433	0.0107	0.0000	0.0000
schreiber2018/cnn	0.2110	0.1102	0.0201	0.0784	0.0000	0.0000	0.3449	0.0009	0.0000	0.2478	0.0000	0.0000	0.0090	0.5235	1.0000	0.1892	0.0614	0.0000	0.0000
schreiber2018/fcn	0.8830	0.7428	0.2800	0.0025	0.0000	0.0000	1.0000	0.0627	0.0000	1.0000	0.0000	0.0000	0.1877	0.0433	0.1892	1.0000	0.6076	0.0026	0.0000
schreiber2018/ismir2018	0.8804	1.0000	0.6718	0.0005	0.0000	0.0000	0.4731	0.1690	0.0000	0.5966	0.0000	0.0014	0.4996	0.0107	0.0614	0.6076	1.0000	0.0133	0.0000
sun2021/default	0.0105	0.0127	0.0925	0.0000	0.0002	0.0000	0.0004	0.4011	0.0000	0.0026	0.0000	0.5758	0.0854	0.0000	0.0000	0.0026	0.0133	1.0000	0.0001
zplane/auftakt_v3	0.0000	0.0000	0.0000	0.0000	0.6612	0.1980	0.0000	0.0000	0.3049	0.0000	0.0000	0.0002	0.0000	0.0000	0.0000	0.0000	0.0000	0.0001	1.0000

Table 7: McNemar p-values, using reference annotations 2.0 as groundtruth with Accuracy₂ [Gouyon2006]. H₀: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H₀, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.8918	0.6936	0.0884	0.0000	0.0000	0.6778	0.0327	0.0000	0.8918	0.0000	0.0008	0.2717	0.1263	0.2116	1.0000	0.6655	0.0154	0.0000
boeck2019/multi_task	0.8918	1.0000	0.3437	0.0501	0.0000	0.0000	0.6516	0.0248	0.0000	1.0000	0.0000	0.0001	0.2221	0.0614	0.1325	1.0000	0.6655	0.0170	0.0000
boeck2019/multi_task_hjdb	0.6936	0.3437	1.0000	0.0070	0.0000	0.0000	1.0000	0.0919	0.0000	0.6587	0.0000	0.0013	0.5515	0.0135	0.0385	0.5601	0.8918	0.0647	0.0000
boeck2020/dar	0.0884	0.0501	0.0070	1.0000	0.0000	0.0000	0.0090	0.0000	0.0000	0.0501	0.0000	0.0000	0.0008	1.0000	0.8555	0.0895	0.0139	0.0000	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.2699	0.0000	0.0041	0.6634	0.0000	0.0000	0.0783	0.0002	0.0000	0.0000	0.0000	0.0000	0.0176	0.9312
echonest/version_3_2_1	0.0000	0.0000	0.0000	0.0000	0.2699	1.0000	0.0000	0.0000	0.4517	0.0000	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.1219
gkiokas2012/default	0.6778	0.6516	1.0000	0.0090	0.0000	0.0000	1.0000	0.0722	0.0000	0.6516	0.0000	0.0003	0.5515	0.0275	0.0288	0.5424	1.0000	0.0270	0.0000
klapuri2006/percival2014	0.0327	0.0248	0.0919	0.0000	0.0041	0.0000	0.0722	1.0000	0.0002	0.0163	0.0000	0.2530	0.2203	0.0000	0.0001	0.0225	0.0674	0.9020	0.0008
oliveira2010/ibt	0.0000	0.0000	0.0000	0.0000	0.6634	0.4517	0.0000	0.0002	1.0000	0.0000	0.0000	0.0048	0.0000	0.0000	0.0000	0.0000	0.0000	0.0005	0.4655
percival2014/stem	0.8918	1.0000	0.6587	0.0501	0.0000	0.0000	0.6516	0.0163	0.0000	1.0000	0.0000	0.0001	0.2430	0.0708	0.1742	1.0000	0.6440	0.0125	0.0000
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.0008	0.0001	0.0013	0.0000	0.0783	0.0001	0.0003	0.2530	0.0048	0.0001	0.0000	1.0000	0.0001	0.0000	0.0000	0.0000	0.0005	0.3662	0.0204
schreiber2017/ismir2017	0.2717	0.2221	0.5515	0.0008	0.0002	0.0000	0.5515	0.2203	0.0000	0.2430	0.0000	0.0001	1.0000	0.0000	0.0034	0.1641	0.5327	0.1550	0.0000
schreiber2017/mirex2017	0.1263	0.0614	0.0135	1.0000	0.0000	0.0000	0.0275	0.0000	0.0000	0.0708	0.0000	0.0000	0.0000	1.0000	1.0000	0.0987	0.0201	0.0001	0.0000
schreiber2018/cnn	0.2116	0.1325	0.0385	0.8555	0.0000	0.0000	0.0288	0.0001	0.0000	0.1742	0.0000	0.0000	0.0034	1.0000	1.0000	0.1221	0.0201	0.0000	0.0000
schreiber2018/fcn	1.0000	1.0000	0.5601	0.0895	0.0000	0.0000	0.5424	0.0225	0.0000	1.0000	0.0000	0.0000	0.1641	0.0987	0.1221	1.0000	0.5114	0.0054	0.0000
schreiber2018/ismir2018	0.6655	0.6655	0.8918	0.0139	0.0000	0.0000	1.0000	0.0674	0.0000	0.6440	0.0000	0.0005	0.5327	0.0201	0.0201	0.5114	1.0000	0.0270	0.0000
sun2021/default	0.0154	0.0170	0.0647	0.0000	0.0176	0.0000	0.0270	0.9020	0.0005	0.0125	0.0000	0.3662	0.1550	0.0001	0.0000	0.0054	0.0270	1.0000	0.0053
zplane/auftakt_v3	0.0000	0.0000	0.0000	0.0000	0.9312	0.1219	0.0000	0.0008	0.4655	0.0000	0.0000	0.0204	0.0000	0.0000	0.0000	0.0000	0.0000	0.0053	1.0000

Table 8: McNemar p-values, using reference annotations 1.0 as groundtruth with Accuracy₂ [Gouyon2006]. H₀: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H₀, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Accuracy₁ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy₁ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy₁ on Tempo-Subsets for 1.0

Figure 7: Mean Accuracy₁ for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy₁ on Tempo-Subsets for 2.0

Figure 8: Mean Accuracy₁ for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy₂ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy₂ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy₂ on Tempo-Subsets for 1.0

Figure 9: Mean Accuracy₂ for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy₂ on Tempo-Subsets for 2.0

Figure 10: Mean Accuracy₂ for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy₁ for Tempo

When fitting a generalized additive model (GAM) to Accuracy₁-values and a ground truth, what Accuracy₁ can we expect with confidence?

Estimated Accuracy₁ for Tempo for 1.0

Predictions of GAMs trained on Accuracy₁ for estimates for reference 1.0.

Figure 11: Accuracy₁ predictions of a generalized additive model (GAM) fit to Accuracy₁ results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy₁ for Tempo for 2.0

Predictions of GAMs trained on Accuracy₁ for estimates for reference 2.0.

Figure 12: Accuracy₁ predictions of a generalized additive model (GAM) fit to Accuracy₁ results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy₂ for Tempo

When fitting a generalized additive model (GAM) to Accuracy₂-values and a ground truth, what Accuracy₂ can we expect with confidence?

Estimated Accuracy₂ for Tempo for 1.0

Predictions of GAMs trained on Accuracy₂ for estimates for reference 1.0.

Figure 13: Accuracy₂ predictions of a generalized additive model (GAM) fit to Accuracy₂ results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy₂ for Tempo for 2.0

Predictions of GAMs trained on Accuracy₂ for estimates for reference 2.0.

Figure 14: Accuracy₂ predictions of a generalized additive model (GAM) fit to Accuracy₂ results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

OE₁ and OE₂

OE₁ is defined as octave error between an estimate E and a reference value R.This means that the most common errors—by a factor of 2 or ½—have the same magnitude, namely 1: OE₂(E) = log₂(E/R).

OE₂ is the signed OE₁ corresponding to the minimum absolute OE₁ allowing the octaveerrors 2, 3, 1/2, and 1/3: OE₂(E) = arg min_x(|x|) with x ∈ {OE₁(E), OE₁(2E), OE₁(3E), OE₁(½E), OE₁(⅓E)}

Mean OE₁/OE₂ Results for 1.0

Estimator	OE1_MEAN	OE1_STDEV	OE2_MEAN	OE2_STDEV
schreiber2017/mirex2017	0.0430	0.3048	-0.0044	0.0699
sun2021/default	0.1207	0.3878	-0.0172	0.0879
schreiber2018/cnn	0.1386	0.3923	-0.0046	0.0729
boeck2020/dar	0.1026	0.3999	-0.0031	0.0683
schreiber2018/fcn	0.1050	0.4075	-0.0064	0.0772
schreiber2017/ismir2017	0.0678	0.4191	-0.0091	0.0859
schreiber2018/ismir2018	0.1355	0.4572	-0.0088	0.0824
echonest/version_3_2_1	0.0985	0.4642	-0.0089	0.1044
schreiber2014/default	-0.0013	0.4664	-0.0159	0.0956
boeck2019/multi_task	0.0907	0.4862	-0.0033	0.0793
boeck2019/multi_task_hjdb	0.0997	0.4925	-0.0025	0.0819
boeck2015/tempodetector2016_default	0.2135	0.4983	-0.0046	0.0765
percival2014/stem	0.0307	0.5103	-0.0040	0.0751
zplane/auftakt_v3	0.1116	0.5119	-0.0136	0.1059
gkiokas2012/default	0.0821	0.5199	-0.0076	0.0831
davies2009/mirex_qm_tempotracker	0.2614	0.5321	0.0221	0.0914
oliveira2010/ibt	0.2509	0.5421	-0.0066	0.0984
klapuri2006/percival2014	0.1444	0.5490	-0.0051	0.0839
scheirer1998/percival2014	0.0736	0.5693	0.0302	0.1589

Table 9: Mean OE1/OE2 for estimates compared to version 1.0 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE₁: CSV JSON LATEX PICKLE

Raw data OE₂: CSV JSON LATEX PICKLE

OE₁ distribution for 1.0

Figure 15: OE₁ for estimates compared to version 1.0. Shown are the mean OE₁ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE₂ distribution for 1.0

Figure 16: OE₂ for estimates compared to version 1.0. Shown are the mean OE₂ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean OE₁/OE₂ Results for 2.0

Estimator	OE1_MEAN	OE1_STDEV	OE2_MEAN	OE2_STDEV
schreiber2017/mirex2017	0.0394	0.2958	-0.0044	0.0384
sun2021/default	0.1172	0.3845	-0.0187	0.0664
schreiber2018/cnn	0.1350	0.3886	-0.0046	0.0441
boeck2020/dar	0.0991	0.3962	-0.0041	0.0368
schreiber2018/fcn	0.1014	0.4024	-0.0064	0.0517
schreiber2017/ismir2017	0.0643	0.4141	-0.0091	0.0629
schreiber2018/ismir2018	0.1320	0.4518	-0.0084	0.0590
echonest/version_3_2_1	0.0950	0.4614	-0.0132	0.0875
schreiber2014/default	-0.0048	0.4615	-0.0159	0.0760
boeck2019/multi_task	0.0872	0.4829	-0.0050	0.0561
boeck2019/multi_task_hjdb	0.0962	0.4887	-0.0056	0.0616
boeck2015/tempodetector2016_default	0.2100	0.4957	-0.0049	0.0487
zplane/auftakt_v3	0.1080	0.5032	-0.0183	0.0921
percival2014/stem	0.0271	0.5040	-0.0036	0.0510
gkiokas2012/default	0.0786	0.5136	-0.0074	0.0581
davies2009/mirex_qm_tempotracker	0.2578	0.5266	0.0229	0.0714
oliveira2010/ibt	0.2473	0.5368	-0.0074	0.0815
klapuri2006/percival2014	0.1409	0.5433	-0.0053	0.0580
scheirer1998/percival2014	0.0697	0.5681	0.0306	0.1516

Table 10: Mean OE1/OE2 for estimates compared to version 2.0 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE₁: CSV JSON LATEX PICKLE

Raw data OE₂: CSV JSON LATEX PICKLE

OE₁ distribution for 2.0

Figure 17: OE₁ for estimates compared to version 2.0. Shown are the mean OE₁ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE₂ distribution for 2.0

Figure 18: OE₂ for estimates compared to version 2.0. Shown are the mean OE₂ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.0000	0.0000	0.0000	0.0001	0.0000	0.0000	0.0000	0.0067	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
boeck2019/multi_task	0.0000	1.0000	0.1105	0.2844	0.0000	0.5284	0.5252	0.0000	0.0000	0.0000	0.3366	0.0000	0.0314	0.0000	0.0000	0.2300	0.0001	0.0111	0.0805
boeck2019/multi_task_hjdb	0.0000	0.1105	1.0000	0.7938	0.0000	0.9264	0.1975	0.0003	0.0000	0.0000	0.0873	0.0000	0.0042	0.0000	0.0006	0.6574	0.0014	0.0752	0.3264
boeck2020/dar	0.0000	0.2844	0.7938	1.0000	0.0000	0.7256	0.1375	0.0018	0.0000	0.0000	0.0573	0.0000	0.0006	0.0000	0.0001	0.8179	0.0018	0.0322	0.4812
davies2009/mirex_qm_tempotracker	0.0001	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.2650	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
echonest/version_3_2_1	0.0000	0.5284	0.9264	0.7256	0.0000	1.0000	0.2104	0.0004	0.0000	0.0000	0.0453	0.0000	0.0064	0.0000	0.0005	0.5907	0.0018	0.0603	0.3123
gkiokas2012/default	0.0000	0.5252	0.1975	0.1375	0.0000	0.2104	1.0000	0.0000	0.0000	0.0000	0.2579	0.0000	0.2650	0.0026	0.0000	0.0860	0.0000	0.0032	0.0163
klapuri2006/percival2014	0.0000	0.0000	0.0003	0.0018	0.0000	0.0004	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.6424	0.0045	0.4080	0.0845	0.0025
oliveira2010/ibt	0.0067	0.0000	0.0000	0.0000	0.2650	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
percival2014/stem	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0062	0.0004	0.0011	0.3013	0.0000	0.0000	0.0000	0.0000	0.0000
scheirer1998/percival2014	0.0000	0.3366	0.0873	0.0573	0.0000	0.0453	0.2579	0.0000	0.0000	0.0062	1.0000	0.0000	0.9128	0.0463	0.0000	0.0324	0.0000	0.0017	0.0074
schreiber2014/default	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0004	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2017/ismir2017	0.0000	0.0314	0.0042	0.0006	0.0000	0.0064	0.2650	0.0000	0.0000	0.0011	0.9128	0.0000	1.0000	0.0044	0.0000	0.0003	0.0000	0.0000	0.0002
schreiber2017/mirex2017	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0026	0.0000	0.0000	0.3013	0.0463	0.0000	0.0044	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2018/cnn	0.0000	0.0000	0.0006	0.0001	0.0000	0.0005	0.0000	0.6424	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0004	0.7457	0.0526	0.0258
schreiber2018/fcn	0.0000	0.2300	0.6574	0.8179	0.0000	0.5907	0.0860	0.0045	0.0000	0.0000	0.0324	0.0000	0.0003	0.0000	0.0004	1.0000	0.0050	0.1092	0.6207
schreiber2018/ismir2018	0.0000	0.0001	0.0014	0.0018	0.0000	0.0018	0.0000	0.4080	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.7457	0.0050	1.0000	0.1645	0.0341
sun2021/default	0.0000	0.0111	0.0752	0.0322	0.0000	0.0603	0.0032	0.0845	0.0000	0.0000	0.0017	0.0000	0.0000	0.0000	0.0526	0.1092	0.1645	1.0000	0.4676
zplane/auftakt_v3	0.0000	0.0805	0.3264	0.4812	0.0000	0.3123	0.0163	0.0025	0.0000	0.0000	0.0074	0.0000	0.0002	0.0000	0.0258	0.6207	0.0341	0.4676	1.0000

Table 11: Paired t-test p-values, using reference annotations 2.0 as groundtruth with OE₁. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.0000	0.0000	0.0000	0.0001	0.0000	0.0000	0.0000	0.0067	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
boeck2019/multi_task	0.0000	1.0000	0.1105	0.2844	0.0000	0.5284	0.5252	0.0000	0.0000	0.0000	0.3366	0.0000	0.0314	0.0000	0.0000	0.2300	0.0001	0.0111	0.0805
boeck2019/multi_task_hjdb	0.0000	0.1105	1.0000	0.7938	0.0000	0.9264	0.1975	0.0003	0.0000	0.0000	0.0873	0.0000	0.0042	0.0000	0.0006	0.6574	0.0014	0.0752	0.3264
boeck2020/dar	0.0000	0.2844	0.7938	1.0000	0.0000	0.7256	0.1375	0.0018	0.0000	0.0000	0.0573	0.0000	0.0006	0.0000	0.0001	0.8179	0.0018	0.0322	0.4812
davies2009/mirex_qm_tempotracker	0.0001	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.2650	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
echonest/version_3_2_1	0.0000	0.5284	0.9264	0.7256	0.0000	1.0000	0.2104	0.0004	0.0000	0.0000	0.0453	0.0000	0.0064	0.0000	0.0005	0.5907	0.0018	0.0603	0.3123
gkiokas2012/default	0.0000	0.5252	0.1975	0.1375	0.0000	0.2104	1.0000	0.0000	0.0000	0.0000	0.2579	0.0000	0.2650	0.0026	0.0000	0.0860	0.0000	0.0032	0.0163
klapuri2006/percival2014	0.0000	0.0000	0.0003	0.0018	0.0000	0.0004	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.6424	0.0045	0.4080	0.0845	0.0025
oliveira2010/ibt	0.0067	0.0000	0.0000	0.0000	0.2650	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
percival2014/stem	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0062	0.0004	0.0011	0.3013	0.0000	0.0000	0.0000	0.0000	0.0000
scheirer1998/percival2014	0.0000	0.3366	0.0873	0.0573	0.0000	0.0453	0.2579	0.0000	0.0000	0.0062	1.0000	0.0000	0.9128	0.0463	0.0000	0.0324	0.0000	0.0017	0.0074
schreiber2014/default	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0004	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2017/ismir2017	0.0000	0.0314	0.0042	0.0006	0.0000	0.0064	0.2650	0.0000	0.0000	0.0011	0.9128	0.0000	1.0000	0.0044	0.0000	0.0003	0.0000	0.0000	0.0002
schreiber2017/mirex2017	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0026	0.0000	0.0000	0.3013	0.0463	0.0000	0.0044	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2018/cnn	0.0000	0.0000	0.0006	0.0001	0.0000	0.0005	0.0000	0.6424	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0004	0.7457	0.0526	0.0258
schreiber2018/fcn	0.0000	0.2300	0.6574	0.8179	0.0000	0.5907	0.0860	0.0045	0.0000	0.0000	0.0324	0.0000	0.0003	0.0000	0.0004	1.0000	0.0050	0.1092	0.6207
schreiber2018/ismir2018	0.0000	0.0001	0.0014	0.0018	0.0000	0.0018	0.0000	0.4080	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.7457	0.0050	1.0000	0.1645	0.0341
sun2021/default	0.0000	0.0111	0.0752	0.0322	0.0000	0.0603	0.0032	0.0845	0.0000	0.0000	0.0017	0.0000	0.0000	0.0000	0.0526	0.1092	0.1645	1.0000	0.4676
zplane/auftakt_v3	0.0000	0.0805	0.3264	0.4812	0.0000	0.3123	0.0163	0.0025	0.0000	0.0000	0.0074	0.0000	0.0002	0.0000	0.0258	0.6207	0.0341	0.4676	1.0000

Table 12: Paired t-test p-values, using reference annotations 1.0 as groundtruth with OE₁. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.9315	0.6956	0.5957	0.0000	0.0006	0.1088	0.8152	0.2494	0.4725	0.0000	0.0000	0.0129	0.7336	0.8650	0.3635	0.0205	0.0000	0.0000
boeck2019/multi_task	0.9315	1.0000	0.5629	0.5151	0.0000	0.0007	0.1332	0.8874	0.3180	0.3772	0.0000	0.0000	0.0307	0.6790	0.7758	0.3699	0.0460	0.0000	0.0000
boeck2019/multi_task_hjdb	0.6956	0.5629	1.0000	0.3323	0.0000	0.0023	0.3173	0.8820	0.4520	0.2961	0.0000	0.0000	0.0695	0.4587	0.5392	0.6340	0.1217	0.0000	0.0000
boeck2020/dar	0.5957	0.5151	0.3323	1.0000	0.0000	0.0001	0.0315	0.4663	0.1493	0.7277	0.0000	0.0000	0.0026	0.7923	0.6600	0.0763	0.0022	0.0000	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0848	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
echonest/version_3_2_1	0.0006	0.0007	0.0023	0.0001	0.0000	1.0000	0.0142	0.0010	0.0400	0.0001	0.0000	0.2504	0.0863	0.0002	0.0004	0.0039	0.0496	0.0117	0.0604
gkiokas2012/default	0.1088	0.1332	0.3173	0.0315	0.0000	0.0142	1.0000	0.2318	0.9969	0.0134	0.0000	0.0000	0.3207	0.0473	0.0383	0.4655	0.5170	0.0000	0.0000
klapuri2006/percival2014	0.8152	0.8874	0.8820	0.4663	0.0000	0.0010	0.2318	1.0000	0.3003	0.3771	0.0000	0.0000	0.0423	0.6079	0.6827	0.5557	0.1001	0.0000	0.0000
oliveira2010/ibt	0.2494	0.3180	0.4520	0.1493	0.0000	0.0400	0.9969	0.3003	1.0000	0.1095	0.0000	0.0003	0.4727	0.1855	0.2286	0.6675	0.6843	0.0000	0.0001
percival2014/stem	0.4725	0.3772	0.2961	0.7277	0.0000	0.0001	0.0134	0.3771	0.1095	1.0000	0.0000	0.0000	0.0021	0.5542	0.5121	0.0729	0.0020	0.0000	0.0000
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.0848	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.0000	0.0000	0.0000	0.0000	0.0000	0.2504	0.0000	0.0000	0.0003	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0859	0.2549
schreiber2017/ismir2017	0.0129	0.0307	0.0695	0.0026	0.0000	0.0863	0.3207	0.0423	0.4727	0.0021	0.0000	0.0000	1.0000	0.0011	0.0076	0.1227	0.6926	0.0000	0.0001
schreiber2017/mirex2017	0.7336	0.6790	0.4587	0.7923	0.0000	0.0002	0.0473	0.6079	0.1855	0.5542	0.0000	0.0000	0.0011	1.0000	0.8536	0.1151	0.0064	0.0000	0.0000
schreiber2018/cnn	0.8650	0.7758	0.5392	0.6600	0.0000	0.0004	0.0383	0.6827	0.2286	0.5121	0.0000	0.0000	0.0076	0.8536	1.0000	0.0896	0.0112	0.0000	0.0000
schreiber2018/fcn	0.3635	0.3699	0.6340	0.0763	0.0000	0.0039	0.4655	0.5557	0.6675	0.0729	0.0000	0.0000	0.1227	0.1151	0.0896	1.0000	0.1878	0.0000	0.0000
schreiber2018/ismir2018	0.0205	0.0460	0.1217	0.0022	0.0000	0.0496	0.5170	0.1001	0.6843	0.0020	0.0000	0.0000	0.6926	0.0064	0.0112	0.1878	1.0000	0.0000	0.0001
sun2021/default	0.0000	0.0000	0.0000	0.0000	0.0000	0.0117	0.0000	0.0000	0.0000	0.0000	0.0000	0.0859	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.8738
zplane/auftakt_v3	0.0000	0.0000	0.0000	0.0000	0.0000	0.0604	0.0000	0.0000	0.0001	0.0000	0.0000	0.2549	0.0001	0.0000	0.0000	0.0000	0.0001	0.8738	1.0000

Table 13: Paired t-test p-values, using reference annotations 2.0 as groundtruth with OE₂. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.4684	0.2678	0.3061	0.0000	0.0955	0.0925	0.7844	0.4103	0.7647	0.0000	0.0000	0.0088	0.8997	0.9845	0.2850	0.0134	0.0000	0.0008
boeck2019/multi_task	0.4684	1.0000	0.4164	0.8994	0.0000	0.0245	0.0201	0.3981	0.2038	0.6498	0.0000	0.0000	0.0030	0.4887	0.3943	0.0562	0.0041	0.0000	0.0001
boeck2019/multi_task_hjdb	0.2678	0.4164	1.0000	0.6782	0.0000	0.0120	0.0132	0.2353	0.1099	0.3874	0.0000	0.0000	0.0015	0.2814	0.2079	0.0269	0.0018	0.0000	0.0000
boeck2020/dar	0.3061	0.8994	0.6782	1.0000	0.0000	0.0236	0.0102	0.2720	0.1669	0.4681	0.0000	0.0000	0.0003	0.2214	0.1960	0.0135	0.0003	0.0000	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0808	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
echonest/version_3_2_1	0.0955	0.0245	0.0120	0.0236	0.0000	1.0000	0.6050	0.1602	0.4190	0.0627	0.0000	0.0096	0.9496	0.0847	0.0942	0.3147	0.9695	0.0008	0.1218
gkiokas2012/default	0.0925	0.0201	0.0132	0.0102	0.0000	0.6050	1.0000	0.1797	0.6699	0.0363	0.0000	0.0000	0.4085	0.0688	0.0635	0.4786	0.3920	0.0000	0.0246
klapuri2006/percival2014	0.7844	0.3981	0.2353	0.2720	0.0000	0.1602	0.1797	1.0000	0.4953	0.6075	0.0000	0.0000	0.0541	0.7114	0.7863	0.5354	0.0617	0.0000	0.0013
oliveira2010/ibt	0.4103	0.2038	0.1099	0.1669	0.0000	0.4190	0.6699	0.4953	1.0000	0.3200	0.0000	0.0001	0.3091	0.3850	0.4419	0.9495	0.3551	0.0000	0.0104
percival2014/stem	0.7647	0.6498	0.3874	0.4681	0.0000	0.0627	0.0363	0.6075	0.3200	1.0000	0.0000	0.0000	0.0058	0.7999	0.6924	0.1094	0.0023	0.0000	0.0001
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.0808	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.0000	0.0000	0.0000	0.0000	0.0000	0.0096	0.0000	0.0000	0.0001	0.0000	0.0000	1.0000	0.0001	0.0000	0.0000	0.0000	0.0002	0.4677	0.3381
schreiber2017/ismir2017	0.0088	0.0030	0.0015	0.0003	0.0000	0.9496	0.4085	0.0541	0.3091	0.0058	0.0000	0.0001	1.0000	0.0011	0.0076	0.1227	0.8864	0.0000	0.0667
schreiber2017/mirex2017	0.8997	0.4887	0.2814	0.2214	0.0000	0.0847	0.0688	0.7114	0.3850	0.7999	0.0000	0.0000	0.0011	1.0000	0.8536	0.1151	0.0077	0.0000	0.0004
schreiber2018/cnn	0.9845	0.3943	0.2079	0.1960	0.0000	0.0942	0.0635	0.7863	0.4419	0.6924	0.0000	0.0000	0.0076	0.8536	1.0000	0.0896	0.0079	0.0000	0.0005
schreiber2018/fcn	0.2850	0.0562	0.0269	0.0135	0.0000	0.3147	0.4786	0.5354	0.9495	0.1094	0.0000	0.0000	0.1227	0.1151	0.0896	1.0000	0.1333	0.0000	0.0052
schreiber2018/ismir2018	0.0134	0.0041	0.0018	0.0003	0.0000	0.9695	0.3920	0.0617	0.3551	0.0023	0.0000	0.0002	0.8864	0.0077	0.0079	0.1333	1.0000	0.0000	0.0696
sun2021/default	0.0000	0.0000	0.0000	0.0000	0.0000	0.0008	0.0000	0.0000	0.0000	0.0000	0.0000	0.4677	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.1541
zplane/auftakt_v3	0.0008	0.0001	0.0000	0.0000	0.0000	0.1218	0.0246	0.0013	0.0104	0.0001	0.0000	0.3381	0.0667	0.0004	0.0005	0.0052	0.0696	0.1541	1.0000

Table 14: Paired t-test p-values, using reference annotations 1.0 as groundtruth with OE₂. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

OE₁ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE₁ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE₁ on Tempo-Subsets for 1.0

Figure 19: Mean OE₁ for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE₁ on Tempo-Subsets for 2.0

Figure 20: Mean OE₁ for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE₂ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE₂ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE₂ on Tempo-Subsets for 1.0

Figure 21: Mean OE₂ for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE₂ on Tempo-Subsets for 2.0

Figure 22: Mean OE₂ for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE₁ for Tempo

When fitting a generalized additive model (GAM) to OE₁-values and a ground truth, what OE₁ can we expect with confidence?

Estimated OE₁ for Tempo for 1.0

Predictions of GAMs trained on OE₁ for estimates for reference 1.0.

Figure 23: OE₁ predictions of a generalized additive model (GAM) fit to OE₁ results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE₁ for Tempo for 2.0

Predictions of GAMs trained on OE₁ for estimates for reference 2.0.

Figure 24: OE₁ predictions of a generalized additive model (GAM) fit to OE₁ results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE₂ for Tempo

When fitting a generalized additive model (GAM) to OE₂-values and a ground truth, what OE₂ can we expect with confidence?

Estimated OE₂ for Tempo for 1.0

Predictions of GAMs trained on OE₂ for estimates for reference 1.0.

Figure 25: OE₂ predictions of a generalized additive model (GAM) fit to OE₂ results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE₂ for Tempo for 2.0

Predictions of GAMs trained on OE₂ for estimates for reference 2.0.

Figure 26: OE₂ predictions of a generalized additive model (GAM) fit to OE₂ results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE₁ and AOE₂

AOE₁ is defined as absolute octave error between an estimate and a reference value: AOE₁(E) = |log₂(E/R)|.

AOE₂ is the minimum of AOE₁ allowing the octave errors 2, 3, 1/2, and 1/3: AOE₂(E) = min(AOE₁(E), AOE₁(2E), AOE₁(3E), AOE₁(½E), AOE₁(⅓E)).

Mean AOE₁/AOE₂ Results for 1.0

Estimator	AOE1_MEAN	AOE1_STDEV	AOE2_MEAN	AOE2_STDEV
schreiber2017/mirex2017	0.1109	0.2871	0.0262	0.0650
boeck2020/dar	0.1758	0.3735	0.0264	0.0631
schreiber2018/cnn	0.1862	0.3721	0.0271	0.0679
sun2021/default	0.1870	0.3605	0.0379	0.0812
schreiber2018/fcn	0.1882	0.3763	0.0290	0.0718
schreiber2017/ismir2017	0.1949	0.3772	0.0322	0.0801
schreiber2018/ismir2018	0.2357	0.4145	0.0307	0.0770
schreiber2014/default	0.2415	0.3990	0.0366	0.0897
echonest/version_3_2_1	0.2461	0.4057	0.0427	0.0957
boeck2019/multi_task	0.2569	0.4227	0.0304	0.0733
boeck2019/multi_task_hjdb	0.2643	0.4273	0.0313	0.0757
percival2014/stem	0.2764	0.4300	0.0285	0.0697
boeck2015/tempodetector2016_default	0.2859	0.4606	0.0310	0.0701
gkiokas2012/default	0.2877	0.4407	0.0309	0.0775
zplane/auftakt_v3	0.2922	0.4349	0.0424	0.0980
klapuri2006/percival2014	0.3268	0.4642	0.0329	0.0773
davies2009/mirex_qm_tempotracker	0.3488	0.4793	0.0440	0.0831
scheirer1998/percival2014	0.3613	0.4461	0.0804	0.1403
oliveira2010/ibt	0.3671	0.4712	0.0424	0.0890

Table 15: Mean AOE1/AOE2 for estimates compared to version 1.0 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE₁: CSV JSON LATEX PICKLE

Raw data AOE₂: CSV JSON LATEX PICKLE

AOE₁ distribution for 1.0

Figure 27: AOE₁ for estimates compared to version 1.0. Shown are the mean AOE₁ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE₂ distribution for 1.0

Figure 28: AOE₂ for estimates compared to version 1.0. Shown are the mean AOE₂ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean AOE₁/AOE₂ Results for 2.0

Estimator	AOE1_MEAN	AOE1_STDEV	AOE2_MEAN	AOE2_STDEV
schreiber2017/mirex2017	0.0957	0.2826	0.0128	0.0365
boeck2020/dar	0.1640	0.3740	0.0131	0.0347
schreiber2018/cnn	0.1727	0.3734	0.0136	0.0422
schreiber2018/fcn	0.1748	0.3764	0.0159	0.0497
sun2021/default	0.1750	0.3618	0.0252	0.0642
schreiber2017/ismir2017	0.1808	0.3780	0.0188	0.0607
schreiber2018/ismir2018	0.2221	0.4151	0.0175	0.0570
schreiber2014/default	0.2275	0.4015	0.0233	0.0740
echonest/version_3_2_1	0.2342	0.4087	0.0302	0.0832
boeck2019/multi_task	0.2450	0.4251	0.0176	0.0535
boeck2019/multi_task_hjdb	0.2523	0.4295	0.0191	0.0588
percival2014/stem	0.2623	0.4313	0.0158	0.0486
gkiokas2012/default	0.2745	0.4411	0.0170	0.0560
boeck2015/tempodetector2016_default	0.2746	0.4631	0.0176	0.0457
zplane/auftakt_v3	0.2801	0.4318	0.0311	0.0886
klapuri2006/percival2014	0.3141	0.4651	0.0188	0.0552
davies2009/mirex_qm_tempotracker	0.3384	0.4789	0.0325	0.0676
scheirer1998/percival2014	0.3552	0.4488	0.0708	0.1375
oliveira2010/ibt	0.3562	0.4717	0.0307	0.0759

Table 16: Mean AOE1/AOE2 for estimates compared to version 2.0 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE₁: CSV JSON LATEX PICKLE

Raw data AOE₂: CSV JSON LATEX PICKLE

AOE₁ distribution for 2.0

Figure 29: AOE₁ for estimates compared to version 2.0. Shown are the mean AOE₁ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE₂ distribution for 2.0

Figure 30: AOE₂ for estimates compared to version 2.0. Shown are the mean AOE₂ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.0161	0.0688	0.0000	0.0000	0.0022	0.9949	0.0035	0.0000	0.3908	0.0000	0.0008	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.6914
boeck2019/multi_task	0.0161	1.0000	0.1903	0.0000	0.0000	0.3717	0.0260	0.0000	0.0000	0.1168	0.0000	0.0980	0.0000	0.0000	0.0000	0.0000	0.0359	0.0000	0.0025
boeck2019/multi_task_hjdb	0.0688	0.1903	1.0000	0.0000	0.0000	0.1451	0.0941	0.0000	0.0000	0.3994	0.0000	0.0281	0.0000	0.0000	0.0000	0.0000	0.0063	0.0000	0.0188
boeck2020/dar	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0954	0.0000	0.3469	0.2855	0.0000	0.1860	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0127	0.0552	0.0000	0.3032	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
echonest/version_3_2_1	0.0022	0.3717	0.1451	0.0000	0.0000	1.0000	0.0015	0.0000	0.0000	0.0222	0.0000	0.5738	0.0000	0.0000	0.0000	0.0000	0.2949	0.0000	0.0003
gkiokas2012/default	0.9949	0.0260	0.0941	0.0000	0.0000	0.0015	1.0000	0.0017	0.0000	0.2922	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.6477
klapuri2006/percival2014	0.0035	0.0000	0.0000	0.0000	0.0127	0.0000	0.0017	1.0000	0.0000	0.0000	0.0018	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0016
oliveira2010/ibt	0.0000	0.0000	0.0000	0.0000	0.0552	0.0000	0.0000	0.0000	1.0000	0.0000	0.8445	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
percival2014/stem	0.3908	0.1168	0.3994	0.0000	0.0000	0.0222	0.2922	0.0000	0.0000	1.0000	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0002	0.0000	0.1000
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.3032	0.0000	0.0000	0.0018	0.8445	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.0008	0.0980	0.0281	0.0000	0.0000	0.5738	0.0001	0.0000	0.0000	0.0001	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.6058	0.0000	0.0000
schreiber2017/ismir2017	0.0000	0.0000	0.0000	0.0954	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.4215	0.5511	0.0001	0.5775	0.0000
schreiber2017/mirex2017	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2018/cnn	0.0000	0.0000	0.0000	0.3469	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.4215	0.0000	1.0000	0.8258	0.0000	0.7965	0.0000
schreiber2018/fcn	0.0000	0.0000	0.0000	0.2855	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.5511	0.0000	0.8258	1.0000	0.0000	0.9785	0.0000
schreiber2018/ismir2018	0.0000	0.0359	0.0063	0.0000	0.0000	0.2949	0.0000	0.0000	0.0000	0.0002	0.0000	0.6058	0.0001	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000
sun2021/default	0.0000	0.0000	0.0000	0.1860	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.5775	0.0000	0.7965	0.9785	0.0000	1.0000	0.0000
zplane/auftakt_v3	0.6914	0.0025	0.0188	0.0000	0.0000	0.0003	0.6477	0.0016	0.0000	0.1000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000

Table 17: Paired t-test p-values, using reference annotations 2.0 as groundtruth with AOE₁. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.0164	0.0738	0.0000	0.0000	0.0022	0.8944	0.0022	0.0000	0.5050	0.0000	0.0014	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.6397
boeck2019/multi_task	0.0164	1.0000	0.1764	0.0000	0.0000	0.3636	0.0179	0.0000	0.0000	0.0723	0.0000	0.1407	0.0000	0.0000	0.0000	0.0000	0.0497	0.0000	0.0019
boeck2019/multi_task_hjdb	0.0738	0.1764	1.0000	0.0000	0.0000	0.1351	0.0728	0.0000	0.0000	0.3027	0.0000	0.0410	0.0000	0.0000	0.0000	0.0000	0.0088	0.0000	0.0167
boeck2020/dar	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0555	0.0000	0.2548	0.2104	0.0000	0.1759	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0221	0.0469	0.0000	0.4824	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
echonest/version_3_2_1	0.0022	0.3636	0.1351	0.0000	0.0000	1.0000	0.0008	0.0000	0.0000	0.0122	0.0000	0.6968	0.0000	0.0000	0.0000	0.0000	0.3647	0.0000	0.0002
gkiokas2012/default	0.8944	0.0179	0.0728	0.0000	0.0000	0.0008	1.0000	0.0017	0.0000	0.3238	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.7065
klapuri2006/percival2014	0.0022	0.0000	0.0000	0.0000	0.0221	0.0000	0.0017	1.0000	0.0000	0.0000	0.0084	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0012
oliveira2010/ibt	0.0000	0.0000	0.0000	0.0000	0.0469	0.0000	0.0000	0.0000	1.0000	0.0000	0.5590	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
percival2014/stem	0.5050	0.0723	0.3027	0.0000	0.0000	0.0122	0.3238	0.0000	0.0000	1.0000	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0002	0.0000	0.1391
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.4824	0.0000	0.0000	0.0084	0.5590	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.0014	0.1407	0.0410	0.0000	0.0000	0.6968	0.0001	0.0000	0.0000	0.0001	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.5818	0.0000	0.0000
schreiber2017/ismir2017	0.0000	0.0000	0.0000	0.0555	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.3870	0.5129	0.0001	0.4431	0.0000
schreiber2017/mirex2017	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2018/cnn	0.0000	0.0000	0.0000	0.2548	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.3870	0.0000	1.0000	0.8220	0.0000	0.9280	0.0000
schreiber2018/fcn	0.0000	0.0000	0.0000	0.2104	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.5129	0.0000	0.8220	1.0000	0.0000	0.8928	0.0000
schreiber2018/ismir2018	0.0000	0.0497	0.0088	0.0000	0.0000	0.3647	0.0000	0.0000	0.0000	0.0002	0.0000	0.5818	0.0001	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000
sun2021/default	0.0000	0.0000	0.0000	0.1759	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.4431	0.0000	0.9280	0.8928	0.0000	1.0000	0.0000
zplane/auftakt_v3	0.6397	0.0019	0.0167	0.0000	0.0000	0.0002	0.7065	0.0012	0.0000	0.1391	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000

Table 18: Paired t-test p-values, using reference annotations 1.0 as groundtruth with AOE₁. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.9937	0.4098	0.0007	0.0000	0.0000	0.6997	0.4706	0.0000	0.2819	0.0000	0.0039	0.4474	0.0003	0.0070	0.2751	0.9416	0.0000	0.0000
boeck2019/multi_task	0.9937	1.0000	0.1345	0.0007	0.0000	0.0000	0.6958	0.5093	0.0000	0.2413	0.0000	0.0025	0.5087	0.0009	0.0041	0.2364	0.9427	0.0000	0.0000
boeck2019/multi_task_hjdb	0.4098	0.1345	1.0000	0.0000	0.0000	0.0000	0.2482	0.9069	0.0000	0.0559	0.0000	0.0295	0.8942	0.0000	0.0004	0.0486	0.3775	0.0010	0.0000
boeck2020/dar	0.0007	0.0007	0.0000	1.0000	0.0000	0.0000	0.0076	0.0002	0.0000	0.0141	0.0000	0.0000	0.0002	0.7345	0.6108	0.0184	0.0011	0.0000	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.3969	0.0000	0.0000	0.4393	0.0000	0.0000	0.0002	0.0000	0.0000	0.0000	0.0000	0.0000	0.0018	0.6000
echonest/version_3_2_1	0.0000	0.0000	0.0000	0.0000	0.3969	1.0000	0.0000	0.0000	0.8334	0.0000	0.0000	0.0004	0.0000	0.0000	0.0000	0.0000	0.0000	0.0084	0.7325
gkiokas2012/default	0.6997	0.6958	0.2482	0.0076	0.0000	0.0000	1.0000	0.2831	0.0000	0.4208	0.0000	0.0003	0.2681	0.0050	0.0110	0.4177	0.7234	0.0000	0.0000
klapuri2006/percival2014	0.4706	0.5093	0.9069	0.0002	0.0000	0.0000	0.2831	1.0000	0.0000	0.0667	0.0000	0.0176	0.9914	0.0001	0.0007	0.0791	0.4415	0.0001	0.0000
oliveira2010/ibt	0.0000	0.0000	0.0000	0.0000	0.4393	0.8334	0.0000	0.0000	1.0000	0.0000	0.0000	0.0002	0.0000	0.0000	0.0000	0.0000	0.0000	0.0057	0.8778
percival2014/stem	0.2819	0.2413	0.0559	0.0141	0.0000	0.0000	0.4208	0.0667	0.0000	1.0000	0.0000	0.0001	0.0840	0.0140	0.0762	0.9914	0.2265	0.0000	0.0000
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.0039	0.0025	0.0295	0.0000	0.0002	0.0004	0.0003	0.0176	0.0002	0.0001	0.0000	1.0000	0.0009	0.0000	0.0000	0.0000	0.0009	0.2286	0.0001
schreiber2017/ismir2017	0.4474	0.5087	0.8942	0.0002	0.0000	0.0000	0.2681	0.9914	0.0000	0.0840	0.0000	0.0009	1.0000	0.0000	0.0005	0.0555	0.3866	0.0000	0.0000
schreiber2017/mirex2017	0.0003	0.0009	0.0000	0.7345	0.0000	0.0000	0.0050	0.0001	0.0000	0.0140	0.0000	0.0000	0.0000	1.0000	0.4675	0.0120	0.0007	0.0000	0.0000
schreiber2018/cnn	0.0070	0.0041	0.0004	0.6108	0.0000	0.0000	0.0110	0.0007	0.0000	0.0762	0.0000	0.0000	0.0005	0.4675	1.0000	0.0254	0.0036	0.0000	0.0000
schreiber2018/fcn	0.2751	0.2364	0.0486	0.0184	0.0000	0.0000	0.4177	0.0791	0.0000	0.9914	0.0000	0.0000	0.0555	0.0120	0.0254	1.0000	0.2300	0.0000	0.0000
schreiber2018/ismir2018	0.9416	0.9427	0.3775	0.0011	0.0000	0.0000	0.7234	0.4415	0.0000	0.2265	0.0000	0.0009	0.3866	0.0007	0.0036	0.2300	1.0000	0.0000	0.0000
sun2021/default	0.0000	0.0000	0.0010	0.0000	0.0018	0.0084	0.0000	0.0001	0.0057	0.0000	0.0000	0.2286	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0065
zplane/auftakt_v3	0.0000	0.0000	0.0000	0.0000	0.6000	0.7325	0.0000	0.0000	0.8778	0.0000	0.0000	0.0001	0.0000	0.0000	0.0000	0.0000	0.0000	0.0065	1.0000

Table 19: Paired t-test p-values, using reference annotations 2.0 as groundtruth with AOE₂. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	boeck2019/multi_task	boeck2019/multi_task_hjdb	boeck2020/dar	davies2009/mirex_qm_tempotracker	echonest/version_3_2_1	gkiokas2012/default	klapuri2006/percival2014	oliveira2010/ibt	percival2014/stem	scheirer1998/percival2014	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn	schreiber2018/ismir2018	sun2021/default	zplane/auftakt_v3
boeck2015/tempodetector2016_default	1.0000	0.7217	0.8502	0.0005	0.0000	0.0000	0.9722	0.2553	0.0000	0.1165	0.0000	0.0036	0.4346	0.0003	0.0078	0.2086	0.8692	0.0001	0.0000
boeck2019/multi_task	0.7217	1.0000	0.3124	0.0015	0.0000	0.0000	0.7346	0.1657	0.0000	0.1754	0.0000	0.0007	0.3079	0.0029	0.0125	0.3208	0.8276	0.0000	0.0000
boeck2019/multi_task_hjdb	0.8502	0.3124	1.0000	0.0002	0.0000	0.0000	0.8277	0.3860	0.0000	0.0806	0.0000	0.0050	0.5966	0.0004	0.0032	0.1288	0.7470	0.0002	0.0000
boeck2020/dar	0.0005	0.0015	0.0002	1.0000	0.0000	0.0000	0.0020	0.0000	0.0000	0.0631	0.0000	0.0000	0.0001	0.8028	0.4661	0.0205	0.0011	0.0000	0.0000
davies2009/mirex_qm_tempotracker	0.0000	0.0000	0.0000	0.0000	1.0000	0.6022	0.0000	0.0000	0.4540	0.0000	0.0000	0.0029	0.0000	0.0000	0.0000	0.0000	0.0000	0.0077	0.5309
echonest/version_3_2_1	0.0000	0.0000	0.0000	0.0000	0.6022	1.0000	0.0000	0.0000	0.8885	0.0000	0.0000	0.0013	0.0000	0.0000	0.0000	0.0000	0.0000	0.0094	0.8916
gkiokas2012/default	0.9722	0.7346	0.8277	0.0020	0.0000	0.0000	1.0000	0.2317	0.0000	0.0898	0.0000	0.0008	0.4282	0.0019	0.0052	0.1812	0.8990	0.0000	0.0000
klapuri2006/percival2014	0.2553	0.1657	0.3860	0.0000	0.0000	0.0000	0.2317	1.0000	0.0000	0.0053	0.0000	0.0511	0.6883	0.0000	0.0001	0.0198	0.2004	0.0026	0.0000
oliveira2010/ibt	0.0000	0.0000	0.0000	0.0000	0.4540	0.8885	0.0000	0.0000	1.0000	0.0000	0.0000	0.0043	0.0000	0.0000	0.0000	0.0000	0.0000	0.0218	1.0000
percival2014/stem	0.1165	0.1754	0.0806	0.0631	0.0000	0.0000	0.0898	0.0053	0.0000	1.0000	0.0000	0.0000	0.0266	0.0608	0.2647	0.6959	0.0907	0.0000	0.0000
scheirer1998/percival2014	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	1.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
schreiber2014/default	0.0036	0.0007	0.0050	0.0000	0.0029	0.0013	0.0008	0.0511	0.0043	0.0000	0.0000	1.0000	0.0007	0.0000	0.0000	0.0000	0.0004	0.4260	0.0031
schreiber2017/ismir2017	0.4346	0.3079	0.5966	0.0001	0.0000	0.0000	0.4282	0.6883	0.0000	0.0266	0.0000	0.0007	1.0000	0.0000	0.0006	0.0335	0.3211	0.0003	0.0000
schreiber2017/mirex2017	0.0003	0.0029	0.0004	0.8028	0.0000	0.0000	0.0019	0.0000	0.0000	0.0608	0.0000	0.0000	0.0000	1.0000	0.4118	0.0190	0.0008	0.0000	0.0000
schreiber2018/cnn	0.0078	0.0125	0.0032	0.4661	0.0000	0.0000	0.0052	0.0001	0.0000	0.2647	0.0000	0.0000	0.0006	0.4118	1.0000	0.0605	0.0051	0.0000	0.0000
schreiber2018/fcn	0.2086	0.3208	0.1288	0.0205	0.0000	0.0000	0.1812	0.0198	0.0000	0.6959	0.0000	0.0000	0.0335	0.0190	0.0605	1.0000	0.1904	0.0000	0.0000
schreiber2018/ismir2018	0.8692	0.8276	0.7470	0.0011	0.0000	0.0000	0.8990	0.2004	0.0000	0.0907	0.0000	0.0004	0.3211	0.0008	0.0051	0.1904	1.0000	0.0000	0.0000
sun2021/default	0.0001	0.0000	0.0002	0.0000	0.0077	0.0094	0.0000	0.0026	0.0218	0.0000	0.0000	0.4260	0.0003	0.0000	0.0000	0.0000	0.0000	1.0000	0.0267
zplane/auftakt_v3	0.0000	0.0000	0.0000	0.0000	0.5309	0.8916	0.0000	0.0000	1.0000	0.0000	0.0000	0.0031	0.0000	0.0000	0.0000	0.0000	0.0000	0.0267	1.0000

Table 20: Paired t-test p-values, using reference annotations 1.0 as groundtruth with AOE₂. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

AOE₁ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE₁ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE₁ on Tempo-Subsets for 1.0

Figure 31: Mean AOE₁ for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE₁ on Tempo-Subsets for 2.0

Figure 32: Mean AOE₁ for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE₂ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE₂ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE₂ on Tempo-Subsets for 1.0

Figure 33: Mean AOE₂ for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE₂ on Tempo-Subsets for 2.0

Figure 34: Mean AOE₂ for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE₁ for Tempo

When fitting a generalized additive model (GAM) to AOE₁-values and a ground truth, what AOE₁ can we expect with confidence?

Estimated AOE₁ for Tempo for 1.0

Predictions of GAMs trained on AOE₁ for estimates for reference 1.0.

Figure 35: AOE₁ predictions of a generalized additive model (GAM) fit to AOE₁ results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE₁ for Tempo for 2.0

Predictions of GAMs trained on AOE₁ for estimates for reference 2.0.

Figure 36: AOE₁ predictions of a generalized additive model (GAM) fit to AOE₁ results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE₂ for Tempo

When fitting a generalized additive model (GAM) to AOE₂-values and a ground truth, what AOE₂ can we expect with confidence?

Estimated AOE₂ for Tempo for 1.0

Predictions of GAMs trained on AOE₂ for estimates for reference 1.0.

Figure 37: AOE₂ predictions of a generalized additive model (GAM) fit to AOE₂ results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE₂ for Tempo for 2.0

Predictions of GAMs trained on AOE₂ for estimates for reference 2.0.

Figure 38: AOE₂ predictions of a generalized additive model (GAM) fit to AOE₂ results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Generated by tempo_eval 0.1.1 on 2022-06-29 18:07. Size L.

acm_mirum

Table of Contents

References for ‘acm_mirum’

References

1.0

2.0

Basic Statistics

Smoothed Tempo Distribution

Estimates for ‘acm_mirum’

Estimators

boeck2015/tempodetector2016_default

boeck2019/multi_task

boeck2019/multi_task_hjdb

boeck2020/dar

davies2009/mirex_qm_tempotracker

echonest/version_3_2_1

gkiokas2012/default

klapuri2006/percival2014

oliveira2010/ibt

percival2014/stem

scheirer1998/percival2014

schreiber2014/default

schreiber2017/ismir2017

schreiber2017/mirex2017

schreiber2018/cnn

schreiber2018/fcn

schreiber2018/ismir2018

sun2021/default

zplane/auftakt_v3

Basic Statistics

Smoothed Tempo Distribution

Accuracy

Accuracy Results for 1.0

Accuracy1 for 1.0

Accuracy2 for 1.0

Accuracy Results for 2.0

Accuracy1 for 2.0

Accuracy2 for 2.0

Differing Items

Differing Items Accuracy1

Differing Items Accuracy2

Significance of Differences

Accuracy1 on Tempo-Subsets

Accuracy1 on Tempo-Subsets for 1.0

Accuracy1 on Tempo-Subsets for 2.0

Accuracy2 on Tempo-Subsets

Accuracy2 on Tempo-Subsets for 1.0

Accuracy2 on Tempo-Subsets for 2.0

Estimated Accuracy1 for Tempo

Estimated Accuracy1 for Tempo for 1.0

Estimated Accuracy1 for Tempo for 2.0

Estimated Accuracy2 for Tempo

Estimated Accuracy2 for Tempo for 1.0

Estimated Accuracy2 for Tempo for 2.0

OE1 and OE2

Mean OE1/OE2 Results for 1.0

OE1 distribution for 1.0

OE2 distribution for 1.0

Mean OE1/OE2 Results for 2.0

OE1 distribution for 2.0

OE2 distribution for 2.0

Significance of Differences

OE1 on Tempo-Subsets

OE1 on Tempo-Subsets for 1.0

OE1 on Tempo-Subsets for 2.0

OE2 on Tempo-Subsets

OE2 on Tempo-Subsets for 1.0

OE2 on Tempo-Subsets for 2.0

Estimated OE1 for Tempo

Estimated OE1 for Tempo for 1.0

Estimated OE1 for Tempo for 2.0

Estimated OE2 for Tempo

Estimated OE2 for Tempo for 1.0

Estimated OE2 for Tempo for 2.0

AOE1 and AOE2

Mean AOE1/AOE2 Results for 1.0

AOE1 distribution for 1.0

AOE2 distribution for 1.0

Mean AOE1/AOE2 Results for 2.0

AOE1 distribution for 2.0

Accuracy₁ for 1.0

Accuracy₂ for 1.0

Accuracy₁ for 2.0

Accuracy₂ for 2.0

Differing Items Accuracy₁

Differing Items Accuracy₂

Accuracy₁ on Tempo-Subsets

Accuracy₁ on Tempo-Subsets for 1.0

Accuracy₁ on Tempo-Subsets for 2.0

Accuracy₂ on Tempo-Subsets

Accuracy₂ on Tempo-Subsets for 1.0

Accuracy₂ on Tempo-Subsets for 2.0

Estimated Accuracy₁ for Tempo

Estimated Accuracy₁ for Tempo for 1.0

Estimated Accuracy₁ for Tempo for 2.0

Estimated Accuracy₂ for Tempo

Estimated Accuracy₂ for Tempo for 1.0

Estimated Accuracy₂ for Tempo for 2.0

OE₁ and OE₂

Mean OE₁/OE₂ Results for 1.0

OE₁ distribution for 1.0

OE₂ distribution for 1.0

Mean OE₁/OE₂ Results for 2.0

OE₁ distribution for 2.0

OE₂ distribution for 2.0

OE₁ on Tempo-Subsets

OE₁ on Tempo-Subsets for 1.0

OE₁ on Tempo-Subsets for 2.0

OE₂ on Tempo-Subsets

OE₂ on Tempo-Subsets for 1.0

OE₂ on Tempo-Subsets for 2.0

Estimated OE₁ for Tempo

Estimated OE₁ for Tempo for 1.0

Estimated OE₁ for Tempo for 2.0

Estimated OE₂ for Tempo

Estimated OE₂ for Tempo for 1.0

Estimated OE₂ for Tempo for 2.0

AOE₁ and AOE₂

Mean AOE₁/AOE₂ Results for 1.0

AOE₁ distribution for 1.0

AOE₂ distribution for 1.0

Mean AOE₁/AOE₂ Results for 2.0

AOE₁ distribution for 2.0

AOE₂ distribution for 2.0

AOE₁ on Tempo-Subsets

AOE₁ on Tempo-Subsets for 1.0

AOE₁ on Tempo-Subsets for 2.0

AOE₂ on Tempo-Subsets

AOE₂ on Tempo-Subsets for 1.0

AOE₂ on Tempo-Subsets for 2.0

Estimated AOE₁ for Tempo

Estimated AOE₁ for Tempo for 1.0

Estimated AOE₁ for Tempo for 2.0

Estimated AOE₂ for Tempo

Estimated AOE₂ for Tempo for 1.0

Estimated AOE₂ for Tempo for 2.0