Skip to the content.

hainsworth

This is the tempo_eval report for the ‘hainsworth’ corpus.

Reports for other corpora may be found here.

Table of Contents

References for ‘hainsworth’

References

1.0

Attribute Value
Corpus hainsworth
Version 1.0
Curator Stephen Webley Hainsworth
Data Source manual annotation
Annotation Tools derived from beat annotations
Annotation Rules mean of inter beat intervals
Annotator, bibtex Hainsworth2004

2.0

Attribute Value
Corpus hainsworth
Version 2.0
Curator Stephen Webley Hainsworth
Data Source manual annotation
Annotation Tools derived from beat annotations
Annotation Rules median of corresponding inter beat intervals
Annotator, bibtex Hainsworth2004

3.0

Attribute Value
Corpus hainsworth
Version 3.0
Curator Stephen Webley Hainsworth
Data Source manual annotation
Annotation Tools derived from beat annotations
Annotation Rules median of inter beat intervals
Annotator, bibtex Hainsworth2004

Basic Statistics

Reference Size Min Max Avg Stdev Sweet Oct. Start Sweet Oct. Coverage
1.0 222 51.86 198.15 113.30 28.84 79.00 0.82
2.0 222 55.51 198.95 114.59 29.56 76.00 0.81
3.0 222 55.72 198.26 114.62 29.36 77.00 0.82

Table 1: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 1: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Tag Distribution for ‘tag_open’

Figure 2: Percentage of tracks tagged with tags from namespace ‘tag_open’. Annotations are from reference 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Beat-Based Tempo Variation

Figure 3: Fraction of the dataset with beat-annotated tracks with cvar < τ.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimates for ‘hainsworth’

Estimators

boeck2015/tempodetector2016_default

Attribute Value
Corpus hainsworth
Version 0.17.dev0
Annotation Tools TempoDetector.2016, madmom, https://github.com/CPJKU/madmom
Annotator, bibtex Boeck2015

boeck2019/multi_task

Attribute Value
Corpus hainsworth
Version 0.0.1
Annotation Tools model=multi_task, https://github.com/superbock/ISMIR2019
Annotator, bibtex Boeck2019

boeck2019/multi_task_hjdb

Attribute Value
Corpus hainsworth
Version 0.0.1
Annotation Tools model=multi_task_hjdb, https://github.com/superbock/ISMIR2019
Annotator, bibtex Boeck2019

boeck2020/dar

Attribute Value
Corpus hainsworth
Version 0.0.1
Annotation Tools https://github.com/superbock/ISMIR2020
Annotator, bibtex Boeck2020

davies2009/mirex_qm_tempotracker

Attribute Value  
Corpus hainsworth  
Version 1.0  
Annotation Tools QM Tempotracker, Sonic Annotator plugin. https://code.soundsoftware.ac.uk/projects/mirex2013/repository/show/audio_tempo_estimation/qm-tempotracker Note that the current macOS build of ‘qm-vamp-plugins’ was used.  
Annotator, bibtex Davies2009 Davies2007

echonest/version_3_2_1

Attribute Value
Corpus hainsworth
Version 3.2.1
Data Source Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools Echo Nest track analyzer v3.2.1
Annotator, bibtex Percival2014

gkiokas2012/default

Attribute Value
Corpus hainsworth
Version 1.0
Data Source Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools Gkiokas2012
Annotator, bibtex Gkiokas2012

klapuri2006/percival2014

Attribute Value
Corpus hainsworth
Version 1.0
Data Source Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools Klapuri 2006
Annotator, bibtex Klapuri2006

oliveira2010/ibt

Attribute Value
Corpus hainsworth
Version 1.0
Data Source Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools Oliveira 2010
Annotator, bibtex Oliveira2010

percival2014/stem

Attribute Value
Corpus hainsworth
Version 1.0
Annotation Tools percival 2014, ‘tempo’ implementation from Marsyas, http://marsyas.info, git checkout tempo-stem
Annotator, bibtex Percival2014

scheirer1998/percival2014

Attribute Value
Corpus hainsworth
Version 1.0
Data Source Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools Scheirer 1998
Annotator, bibtex Scheirer1998

schreiber2014/default

Attribute Value
Corpus hainsworth
Version 0.0.1
Annotation Tools schreiber 2014, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2014

schreiber2017/ismir2017

Attribute Value
Corpus hainsworth
Version 0.0.4
Annotation Tools schreiber 2017, model=ismir2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2017

schreiber2017/mirex2017

Attribute Value
Corpus hainsworth
Version 0.0.4
Annotation Tools schreiber 2017, model=mirex2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2017

schreiber2018/cnn

Attribute Value
Corpus hainsworth
Version 0.0.2
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=cnn), https://github.com/hendriks73/tempo-cnn

schreiber2018/fcn

Attribute Value
Corpus hainsworth
Version 0.0.2
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=fcn), https://github.com/hendriks73/tempo-cnn

schreiber2018/ismir2018

Attribute Value
Corpus hainsworth
Version 0.0.2
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=ismir2018), https://github.com/hendriks73/tempo-cnn

sun2021/default

Attribute Value
Corpus hainsworth
Version 0.0.2
Data Source Xiaoheng Sun, Qiqi He, Yongwei Gao, Wei Li. Musical Tempo Estimation Using a Multi-scale Network. in Proc. of the 22nd Int. Society for Music Information Retrieval Conf., Online, 2021
Annotation Tools https://github.com/Qqi-HE/TempoEstimation_MGANet
Annotator, bibtex Sun2021

zplane/auftakt_v3

Attribute Value
Corpus hainsworth
Version 3.0
Data Source Graham Percival and George Tzanetakis. Streamlined tempo estimation based on autocorrelation and crosscorrelation with pulses. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 22(12):1765–1776, 2014.
Annotation Tools zplane aufTAKT version 3.0, http://licensing.zplane.de/technology#auftakt
Annotator, bibtex Percival2014

Basic Statistics

Estimator Size Min Max Avg Stdev Sweet Oct. Start Sweet Oct. Coverage
boeck2015/tempodetector2016_default 222 41.96 230.77 112.29 30.61 70.00 0.82
boeck2019/multi_task 222 55.32 208.46 112.94 29.11 71.00 0.84
boeck2019/multi_task_hjdb 222 45.53 208.12 112.57 28.84 70.00 0.82
boeck2020/dar 222 33.35 231.58 114.71 33.64 79.00 0.78
davies2009/mirex_qm_tempotracker 222 80.75 234.91 125.65 26.74 81.00 0.93
echonest/version_3_2_1 221 58.30 191.72 100.20 27.44 71.00 0.76
gkiokas2012/default 222 52.00 244.00 112.24 31.59 73.00 0.83
klapuri2006/percival2014 222 74.36 161.50 114.11 19.76 76.00 0.98
oliveira2010/ibt 222 82.00 161.00 116.20 20.30 81.00 1.00
percival2014/stem 222 50.79 152.00 105.84 22.20 72.00 0.93
scheirer1998/percival2014 212 61.35 181.82 109.47 28.37 74.00 0.81
schreiber2014/default 222 54.85 164.50 101.41 22.60 69.00 0.90
schreiber2017/ismir2017 222 26.50 193.51 106.25 26.09 71.00 0.84
schreiber2017/mirex2017 222 13.25 197.54 105.21 29.14 74.00 0.81
schreiber2018/cnn 222 63.00 216.00 116.03 29.14 81.00 0.88
schreiber2018/fcn 222 50.00 208.00 114.58 30.17 75.00 0.82
schreiber2018/ismir2018 222 65.00 208.00 114.31 26.12 77.00 0.91
sun2021/default 222 56.00 218.00 115.41 32.97 73.00 0.79
zplane/auftakt_v3 222 65.50 164.80 111.46 22.53 76.00 0.92

Table 2: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 4: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy

Accuracy1 is defined as the percentage of correct estimates, allowing a 4% tolerance for individual BPM values.

Accuracy2 additionally permits estimates to be wrong by a factor of 2, 3, 1/2 or 1/3 (so-called octave errors).

See [Gouyon2006].

Note: When comparing accuracy values for different algorithms, keep in mind that an algorithm may have been trained on the test set or that the test set may have even been created using one of the tested algorithms.

Accuracy Results for 1.0

Estimator Accuracy1 Accuracy2
boeck2020/dar 0.8108 0.8919
boeck2015/tempodetector2016_default 0.8063 0.8829
sun2021/default 0.8018 0.9099
boeck2019/multi_task_hjdb 0.7973 0.8874
boeck2019/multi_task 0.7973 0.8964
schreiber2018/ismir2018 0.7748 0.8423
schreiber2018/fcn 0.7703 0.8649
schreiber2018/cnn 0.7658 0.8468
schreiber2017/mirex2017 0.7387 0.8604
schreiber2017/ismir2017 0.7297 0.8514
oliveira2010/ibt 0.7252 0.8198
davies2009/mirex_qm_tempotracker 0.7207 0.8288
klapuri2006/percival2014 0.7162 0.8423
schreiber2014/default 0.7072 0.8694
zplane/auftakt_v3 0.6982 0.8243
percival2014/stem 0.6982 0.8694
echonest/version_3_2_1 0.6667 0.8559
gkiokas2012/default 0.6441 0.8468
scheirer1998/percival2014 0.4910 0.6532

Table 3: Mean accuracy of estimates compared to version 1.0 with 4% tolerance ordered by Accuracy1.

CSV JSON LATEX PICKLE

Raw data Accuracy1: CSV JSON LATEX PICKLE

Raw data Accuracy2: CSV JSON LATEX PICKLE

Accuracy1 for 1.0

Figure 5: Mean Accuracy1 for estimates compared to version 1.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for 1.0

Figure 6: Mean Accuracy2 for estimates compared to version 1.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy Results for 2.0

Estimator Accuracy1 Accuracy2
boeck2020/dar 0.8514 0.9459
boeck2015/tempodetector2016_default 0.8514 0.9279
boeck2019/multi_task_hjdb 0.8333 0.9324
boeck2019/multi_task 0.8243 0.9324
sun2021/default 0.8243 0.9279
schreiber2018/ismir2018 0.8018 0.8784
schreiber2018/fcn 0.7928 0.8919
schreiber2018/cnn 0.7838 0.8739
davies2009/mirex_qm_tempotracker 0.7523 0.8649
schreiber2017/mirex2017 0.7477 0.8964
schreiber2017/ismir2017 0.7477 0.8919
oliveira2010/ibt 0.7432 0.8378
klapuri2006/percival2014 0.7297 0.8559
zplane/auftakt_v3 0.7117 0.8468
percival2014/stem 0.7117 0.9054
schreiber2014/default 0.6982 0.8829
echonest/version_3_2_1 0.6847 0.8829
gkiokas2012/default 0.6757 0.8829
scheirer1998/percival2014 0.5180 0.6847

Table 4: Mean accuracy of estimates compared to version 2.0 with 4% tolerance ordered by Accuracy1.

CSV JSON LATEX PICKLE

Raw data Accuracy1: CSV JSON LATEX PICKLE

Raw data Accuracy2: CSV JSON LATEX PICKLE

Accuracy1 for 2.0

Figure 7: Mean Accuracy1 for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for 2.0

Figure 8: Mean Accuracy2 for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy Results for 3.0

Estimator Accuracy1 Accuracy2
boeck2015/tempodetector2016_default 0.8604 0.9369
boeck2020/dar 0.8514 0.9414
boeck2019/multi_task_hjdb 0.8243 0.9279
sun2021/default 0.8198 0.9279
boeck2019/multi_task 0.8153 0.9279
schreiber2018/ismir2018 0.8018 0.8739
schreiber2018/fcn 0.7928 0.8919
schreiber2018/cnn 0.7793 0.8694
davies2009/mirex_qm_tempotracker 0.7523 0.8559
schreiber2017/ismir2017 0.7477 0.8919
schreiber2017/mirex2017 0.7432 0.8919
oliveira2010/ibt 0.7432 0.8378
klapuri2006/percival2014 0.7297 0.8559
zplane/auftakt_v3 0.7117 0.8423
percival2014/stem 0.7117 0.8919
schreiber2014/default 0.7072 0.8829
echonest/version_3_2_1 0.6802 0.8784
gkiokas2012/default 0.6712 0.8784
scheirer1998/percival2014 0.5180 0.6892

Table 5: Mean accuracy of estimates compared to version 3.0 with 4% tolerance ordered by Accuracy1.

CSV JSON LATEX PICKLE

Raw data Accuracy1: CSV JSON LATEX PICKLE

Raw data Accuracy2: CSV JSON LATEX PICKLE

Accuracy1 for 3.0

Figure 9: Mean Accuracy1 for estimates compared to version 3.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for 3.0

Figure 10: Mean Accuracy2 for estimates compared to version 3.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Differing Items

For which items did a given estimator not estimate a correct value with respect to a given ground truth? Are there items which are either very difficult, not suitable for the task, or incorrectly annotated and therefore never estimated correctly, regardless which estimator is used?

Differing Items Accuracy1

Items with different tempo annotations (Accuracy1, 4% tolerance) in different versions:

1.0 compared with boeck2015/tempodetector2016_default (43 differences): ‘006’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘038’ ‘053’ ‘055’ ‘058’ ‘059’ … CSV

1.0 compared with boeck2019/multi_task (45 differences): ‘006’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘047’ ‘048’ ‘055’ ‘057’ ‘058’ … CSV

1.0 compared with boeck2019/multi_task_hjdb (45 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘047’ ‘055’ ‘057’ ‘058’ … CSV

1.0 compared with boeck2020/dar (42 differences): ‘006’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘059’ ‘062’ ‘072’ ‘073’ ‘075’ … CSV

1.0 compared with davies2009/mirex_qm_tempotracker (62 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘035’ ‘037’ ‘047’ ‘055’ … CSV

1.0 compared with echonest/version_3_2_1 (74 differences): ‘003’ ‘006’ ‘009’ ‘012’ ‘013’ ‘019’ ‘024’ ‘037’ ‘053’ ‘055’ ‘059’ … CSV

1.0 compared with gkiokas2012/default (79 differences): ‘003’ ‘006’ ‘007’ ‘008’ ‘009’ ‘010’ ‘012’ ‘013’ ‘022’ ‘024’ ‘037’ … CSV

1.0 compared with klapuri2006/percival2014 (63 differences): ‘006’ ‘007’ ‘009’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘047’ ‘053’ ‘055’ … CSV

1.0 compared with oliveira2010/ibt (61 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘035’ ‘047’ ‘055’ ‘058’ … CSV

1.0 compared with percival2014/stem (67 differences): ‘006’ ‘007’ ‘009’ ‘012’ ‘013’ ‘022’ ‘024’ ‘042’ ‘047’ ‘053’ ‘055’ … CSV

1.0 compared with scheirer1998/percival2014 (113 differences): ‘001’ ‘002’ ‘003’ ‘006’ ‘007’ ‘009’ ‘010’ ‘012’ ‘013’ ‘019’ ‘020’ … CSV

1.0 compared with schreiber2014/default (65 differences): ‘006’ ‘007’ ‘009’ ‘012’ ‘013’ ‘016’ ‘022’ ‘024’ ‘037’ ‘052’ ‘053’ … CSV

1.0 compared with schreiber2017/ismir2017 (60 differences): ‘006’ ‘009’ ‘012’ ‘013’ ‘016’ ‘024’ ‘025’ ‘055’ ‘058’ ‘059’ ‘061’ … CSV

1.0 compared with schreiber2017/mirex2017 (58 differences): ‘006’ ‘009’ ‘012’ ‘013’ ‘024’ ‘025’ ‘035’ ‘058’ ‘059’ ‘061’ ‘062’ … CSV

1.0 compared with schreiber2018/cnn (52 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘019’ ‘022’ ‘024’ ‘037’ ‘056’ ‘057’ ‘059’ … CSV

1.0 compared with schreiber2018/fcn (51 differences): ‘006’ ‘007’ ‘010’ ‘012’ ‘013’ ‘024’ ‘035’ ‘037’ ‘052’ ‘059’ ‘062’ … CSV

1.0 compared with schreiber2018/ismir2018 (50 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘037’ ‘047’ ‘053’ ‘057’ … CSV

1.0 compared with sun2021/default (44 differences): ‘006’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘056’ ‘057’ ‘059’ ‘062’ ‘072’ … CSV

1.0 compared with zplane/auftakt_v3 (67 differences): ‘006’ ‘007’ ‘010’ ‘012’ ‘013’ ‘022’ ‘024’ ‘025’ ‘047’ ‘055’ ‘057’ … CSV

2.0 compared with boeck2015/tempodetector2016_default (33 differences): ‘006’ ‘013’ ‘022’ ‘025’ ‘038’ ‘053’ ‘055’ ‘058’ ‘059’ ‘066’ ‘072’ … CSV

2.0 compared with boeck2019/multi_task (39 differences): ‘006’ ‘013’ ‘022’ ‘024’ ‘025’ ‘047’ ‘048’ ‘055’ ‘058’ ‘059’ ‘072’ … CSV

2.0 compared with boeck2019/multi_task_hjdb (37 differences): ‘006’ ‘013’ ‘022’ ‘025’ ‘047’ ‘055’ ‘058’ ‘059’ ‘060’ ‘072’ ‘073’ … CSV

2.0 compared with boeck2020/dar (33 differences): ‘006’ ‘013’ ‘022’ ‘025’ ‘059’ ‘072’ ‘073’ ‘075’ ‘097’ ‘103’ ‘107’ … CSV

2.0 compared with davies2009/mirex_qm_tempotracker (55 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘022’ ‘025’ ‘047’ ‘055’ ‘057’ ‘058’ ‘059’ … CSV

2.0 compared with echonest/version_3_2_1 (70 differences): ‘003’ ‘006’ ‘009’ ‘013’ ‘019’ ‘024’ ‘037’ ‘053’ ‘055’ ‘059’ ‘078’ … CSV

2.0 compared with gkiokas2012/default (72 differences): ‘003’ ‘006’ ‘007’ ‘008’ ‘009’ ‘010’ ‘013’ ‘022’ ‘053’ ‘055’ ‘059’ … CSV

2.0 compared with klapuri2006/percival2014 (60 differences): ‘006’ ‘007’ ‘009’ ‘013’ ‘022’ ‘025’ ‘047’ ‘053’ ‘055’ ‘057’ ‘059’ … CSV

2.0 compared with oliveira2010/ibt (57 differences): ‘006’ ‘007’ ‘013’ ‘022’ ‘024’ ‘025’ ‘047’ ‘055’ ‘058’ ‘059’ ‘069’ … CSV

2.0 compared with percival2014/stem (64 differences): ‘006’ ‘007’ ‘009’ ‘013’ ‘022’ ‘042’ ‘047’ ‘053’ ‘055’ ‘057’ ‘059’ … CSV

2.0 compared with scheirer1998/percival2014 (107 differences): ‘001’ ‘002’ ‘003’ ‘006’ ‘007’ ‘009’ ‘010’ ‘012’ ‘013’ ‘019’ ‘020’ … CSV

2.0 compared with schreiber2014/default (67 differences): ‘006’ ‘007’ ‘009’ ‘013’ ‘016’ ‘022’ ‘024’ ‘052’ ‘053’ ‘055’ ‘059’ … CSV

2.0 compared with schreiber2017/ismir2017 (56 differences): ‘006’ ‘009’ ‘013’ ‘016’ ‘024’ ‘025’ ‘055’ ‘058’ ‘059’ ‘061’ ‘067’ … CSV

2.0 compared with schreiber2017/mirex2017 (56 differences): ‘006’ ‘009’ ‘013’ ‘025’ ‘035’ ‘058’ ‘059’ ‘061’ ‘062’ ‘067’ ‘075’ … CSV

2.0 compared with schreiber2018/cnn (48 differences): ‘006’ ‘007’ ‘013’ ‘019’ ‘022’ ‘056’ ‘059’ ‘060’ ‘062’ ‘070’ ‘073’ … CSV

2.0 compared with schreiber2018/fcn (46 differences): ‘006’ ‘007’ ‘010’ ‘013’ ‘035’ ‘052’ ‘059’ ‘062’ ‘074’ ‘075’ ‘079’ … CSV

2.0 compared with schreiber2018/ismir2018 (44 differences): ‘006’ ‘007’ ‘013’ ‘022’ ‘025’ ‘047’ ‘053’ ‘057’ ‘059’ ‘062’ ‘075’ … CSV

2.0 compared with sun2021/default (39 differences): ‘006’ ‘013’ ‘022’ ‘025’ ‘056’ ‘057’ ‘059’ ‘062’ ‘072’ ‘073’ ‘075’ … CSV

2.0 compared with zplane/auftakt_v3 (64 differences): ‘006’ ‘007’ ‘010’ ‘013’ ‘022’ ‘025’ ‘047’ ‘055’ ‘057’ ‘059’ ‘062’ … CSV

3.0 compared with boeck2015/tempodetector2016_default (31 differences): ‘006’ ‘013’ ‘022’ ‘025’ ‘038’ ‘053’ ‘055’ ‘058’ ‘059’ ‘066’ ‘072’ … CSV

3.0 compared with boeck2019/multi_task (41 differences): ‘006’ ‘013’ ‘022’ ‘024’ ‘025’ ‘047’ ‘048’ ‘055’ ‘058’ ‘059’ ‘072’ … CSV

3.0 compared with boeck2019/multi_task_hjdb (39 differences): ‘006’ ‘013’ ‘022’ ‘025’ ‘047’ ‘055’ ‘058’ ‘059’ ‘060’ ‘072’ ‘073’ … CSV

3.0 compared with boeck2020/dar (33 differences): ‘006’ ‘013’ ‘022’ ‘025’ ‘059’ ‘072’ ‘073’ ‘075’ ‘097’ ‘103’ ‘107’ … CSV

3.0 compared with davies2009/mirex_qm_tempotracker (55 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘022’ ‘025’ ‘047’ ‘055’ ‘057’ ‘058’ ‘059’ … CSV

3.0 compared with echonest/version_3_2_1 (71 differences): ‘003’ ‘006’ ‘007’ ‘009’ ‘013’ ‘019’ ‘024’ ‘037’ ‘053’ ‘055’ ‘059’ … CSV

3.0 compared with gkiokas2012/default (73 differences): ‘003’ ‘006’ ‘007’ ‘008’ ‘009’ ‘010’ ‘013’ ‘022’ ‘053’ ‘055’ ‘059’ … CSV

3.0 compared with klapuri2006/percival2014 (60 differences): ‘006’ ‘007’ ‘009’ ‘013’ ‘022’ ‘025’ ‘047’ ‘053’ ‘055’ ‘057’ ‘059’ … CSV

3.0 compared with oliveira2010/ibt (57 differences): ‘006’ ‘007’ ‘013’ ‘022’ ‘024’ ‘025’ ‘047’ ‘055’ ‘058’ ‘059’ ‘069’ … CSV

3.0 compared with percival2014/stem (64 differences): ‘006’ ‘007’ ‘009’ ‘013’ ‘022’ ‘042’ ‘047’ ‘053’ ‘055’ ‘059’ ‘067’ … CSV

3.0 compared with scheirer1998/percival2014 (107 differences): ‘001’ ‘002’ ‘003’ ‘006’ ‘007’ ‘009’ ‘010’ ‘012’ ‘013’ ‘019’ ‘020’ … CSV

3.0 compared with schreiber2014/default (65 differences): ‘006’ ‘009’ ‘013’ ‘016’ ‘022’ ‘024’ ‘052’ ‘053’ ‘055’ ‘059’ ‘062’ … CSV

3.0 compared with schreiber2017/ismir2017 (56 differences): ‘006’ ‘009’ ‘013’ ‘016’ ‘024’ ‘025’ ‘055’ ‘058’ ‘059’ ‘061’ ‘067’ … CSV

3.0 compared with schreiber2017/mirex2017 (57 differences): ‘006’ ‘009’ ‘013’ ‘025’ ‘035’ ‘058’ ‘059’ ‘061’ ‘062’ ‘067’ ‘075’ … CSV

3.0 compared with schreiber2018/cnn (49 differences): ‘006’ ‘007’ ‘013’ ‘019’ ‘022’ ‘056’ ‘059’ ‘060’ ‘062’ ‘070’ ‘073’ … CSV

3.0 compared with schreiber2018/fcn (46 differences): ‘006’ ‘007’ ‘010’ ‘013’ ‘035’ ‘052’ ‘059’ ‘062’ ‘074’ ‘075’ ‘079’ … CSV

3.0 compared with schreiber2018/ismir2018 (44 differences): ‘006’ ‘007’ ‘013’ ‘022’ ‘025’ ‘047’ ‘053’ ‘057’ ‘059’ ‘062’ ‘075’ … CSV

3.0 compared with sun2021/default (40 differences): ‘006’ ‘013’ ‘022’ ‘025’ ‘056’ ‘059’ ‘062’ ‘072’ ‘073’ ‘075’ ‘079’ … CSV

3.0 compared with zplane/auftakt_v3 (64 differences): ‘006’ ‘007’ ‘010’ ‘013’ ‘022’ ‘025’ ‘047’ ‘055’ ‘057’ ‘059’ ‘062’ … CSV

None of the estimators estimated the following 4 items ‘correctly’ using Accuracy1: ‘006’ ‘013’ ‘059’ ‘137’ CSV

Differing Items Accuracy2

Items with different tempo annotations (Accuracy2, 4% tolerance) in different versions:

1.0 compared with boeck2015/tempodetector2016_default (26 differences): ‘006’ ‘012’ ‘013’ ‘024’ ‘059’ ‘062’ ‘072’ ‘075’ ‘078’ ‘107’ ‘125’ … CSV

1.0 compared with boeck2019/multi_task (23 differences): ‘006’ ‘012’ ‘013’ ‘024’ ‘057’ ‘058’ ‘059’ ‘062’ ‘072’ ‘107’ ‘126’ … CSV

1.0 compared with boeck2019/multi_task_hjdb (25 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘024’ ‘057’ ‘058’ ‘059’ ‘062’ ‘072’ ‘075’ … CSV

1.0 compared with boeck2020/dar (24 differences): ‘006’ ‘012’ ‘013’ ‘024’ ‘059’ ‘062’ ‘072’ ‘075’ ‘107’ ‘122’ ‘126’ … CSV

1.0 compared with davies2009/mirex_qm_tempotracker (38 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘024’ ‘035’ ‘037’ ‘059’ ‘062’ ‘075’ ‘091’ … CSV

1.0 compared with echonest/version_3_2_1 (32 differences): ‘006’ ‘012’ ‘013’ ‘024’ ‘059’ ‘062’ ‘094’ ‘103’ ‘106’ ‘107’ ‘121’ … CSV

1.0 compared with gkiokas2012/default (34 differences): ‘003’ ‘006’ ‘010’ ‘012’ ‘013’ ‘024’ ‘037’ ‘057’ ‘059’ ‘062’ ‘091’ … CSV

1.0 compared with klapuri2006/percival2014 (35 differences): ‘006’ ‘012’ ‘013’ ‘024’ ‘057’ ‘059’ ‘062’ ‘075’ ‘091’ ‘103’ ‘107’ … CSV

1.0 compared with oliveira2010/ibt (40 differences): ‘006’ ‘007’ ‘012’ ‘024’ ‘035’ ‘058’ ‘059’ ‘062’ ‘070’ ‘075’ ‘091’ … CSV

1.0 compared with percival2014/stem (29 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘024’ ‘057’ ‘059’ ‘062’ ‘075’ ‘107’ ‘123’ … CSV

1.0 compared with scheirer1998/percival2014 (77 differences): ‘001’ ‘002’ ‘003’ ‘007’ ‘009’ ‘010’ ‘012’ ‘013’ ‘020’ ‘024’ ‘043’ … CSV

1.0 compared with schreiber2014/default (29 differences): ‘006’ ‘007’ ‘012’ ‘024’ ‘037’ ‘052’ ‘059’ ‘062’ ‘072’ ‘075’ ‘091’ … CSV

1.0 compared with schreiber2017/ismir2017 (33 differences): ‘006’ ‘012’ ‘013’ ‘024’ ‘058’ ‘059’ ‘061’ ‘062’ ‘075’ ‘091’ ‘107’ … CSV

1.0 compared with schreiber2017/mirex2017 (31 differences): ‘006’ ‘012’ ‘013’ ‘024’ ‘058’ ‘059’ ‘062’ ‘075’ ‘091’ ‘107’ ‘124’ … CSV

1.0 compared with schreiber2018/cnn (34 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘024’ ‘037’ ‘057’ ‘059’ ‘062’ ‘091’ ‘107’ … CSV

1.0 compared with schreiber2018/fcn (30 differences): ‘006’ ‘007’ ‘010’ ‘012’ ‘013’ ‘024’ ‘037’ ‘052’ ‘062’ ‘078’ ‘107’ … CSV

1.0 compared with schreiber2018/ismir2018 (35 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘024’ ‘037’ ‘057’ ‘059’ ‘062’ ‘078’ ‘091’ … CSV

1.0 compared with sun2021/default (20 differences): ‘006’ ‘012’ ‘013’ ‘024’ ‘057’ ‘059’ ‘072’ ‘075’ ‘107’ ‘127’ ‘129’ … CSV

1.0 compared with zplane/auftakt_v3 (39 differences): ‘006’ ‘007’ ‘010’ ‘012’ ‘013’ ‘024’ ‘057’ ‘059’ ‘062’ ‘075’ ‘107’ … CSV

2.0 compared with boeck2015/tempodetector2016_default (16 differences): ‘006’ ‘013’ ‘059’ ‘072’ ‘075’ ‘107’ ‘125’ ‘126’ ‘127’ ‘132’ ‘133’ … CSV

2.0 compared with boeck2019/multi_task (15 differences): ‘006’ ‘013’ ‘058’ ‘059’ ‘072’ ‘107’ ‘126’ ‘127’ ‘132’ ‘137’ ‘138’ … CSV

2.0 compared with boeck2019/multi_task_hjdb (15 differences): ‘006’ ‘013’ ‘058’ ‘059’ ‘072’ ‘075’ ‘107’ ‘127’ ‘132’ ‘134’ ‘137’ … CSV

2.0 compared with boeck2020/dar (12 differences): ‘006’ ‘059’ ‘072’ ‘075’ ‘122’ ‘126’ ‘133’ ‘137’ ‘139’ ‘140’ ‘150’ … CSV

2.0 compared with davies2009/mirex_qm_tempotracker (30 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘059’ ‘075’ ‘091’ ‘122’ ‘123’ ‘124’ ‘126’ … CSV

2.0 compared with echonest/version_3_2_1 (26 differences): ‘006’ ‘013’ ‘024’ ‘059’ ‘094’ ‘103’ ‘106’ ‘107’ ‘122’ ‘123’ ‘124’ … CSV

2.0 compared with gkiokas2012/default (26 differences): ‘003’ ‘006’ ‘010’ ‘013’ ‘059’ ‘091’ ‘107’ ‘121’ ‘122’ ‘124’ ‘125’ … CSV

2.0 compared with klapuri2006/percival2014 (32 differences): ‘006’ ‘013’ ‘057’ ‘059’ ‘075’ ‘090’ ‘091’ ‘103’ ‘107’ ‘121’ ‘122’ … CSV

2.0 compared with oliveira2010/ibt (36 differences): ‘006’ ‘007’ ‘024’ ‘058’ ‘059’ ‘070’ ‘075’ ‘091’ ‘103’ ‘121’ ‘122’ … CSV

2.0 compared with percival2014/stem (21 differences): ‘006’ ‘007’ ‘013’ ‘057’ ‘059’ ‘075’ ‘123’ ‘124’ ‘125’ ‘126’ ‘127’ … CSV

2.0 compared with scheirer1998/percival2014 (70 differences): ‘001’ ‘002’ ‘003’ ‘006’ ‘007’ ‘009’ ‘010’ ‘012’ ‘013’ ‘020’ ‘043’ … CSV

2.0 compared with schreiber2014/default (26 differences): ‘006’ ‘007’ ‘024’ ‘052’ ‘059’ ‘072’ ‘075’ ‘084’ ‘091’ ‘123’ ‘124’ … CSV

2.0 compared with schreiber2017/ismir2017 (24 differences): ‘006’ ‘013’ ‘058’ ‘059’ ‘061’ ‘075’ ‘091’ ‘124’ ‘125’ ‘126’ ‘127’ … CSV

2.0 compared with schreiber2017/mirex2017 (23 differences): ‘006’ ‘013’ ‘058’ ‘059’ ‘075’ ‘091’ ‘124’ ‘125’ ‘126’ ‘127’ ‘128’ … CSV

2.0 compared with schreiber2018/cnn (28 differences): ‘006’ ‘007’ ‘013’ ‘059’ ‘091’ ‘107’ ‘121’ ‘122’ ‘123’ ‘124’ ‘125’ … CSV

2.0 compared with schreiber2018/fcn (24 differences): ‘006’ ‘007’ ‘010’ ‘013’ ‘052’ ‘107’ ‘121’ ‘124’ ‘125’ ‘126’ ‘127’ … CSV

2.0 compared with schreiber2018/ismir2018 (27 differences): ‘006’ ‘007’ ‘013’ ‘057’ ‘059’ ‘091’ ‘107’ ‘122’ ‘123’ ‘125’ ‘126’ … CSV

2.0 compared with sun2021/default (16 differences): ‘006’ ‘013’ ‘057’ ‘059’ ‘072’ ‘075’ ‘127’ ‘137’ ‘138’ ‘139’ ‘140’ … CSV

2.0 compared with zplane/auftakt_v3 (34 differences): ‘006’ ‘007’ ‘010’ ‘013’ ‘057’ ‘059’ ‘075’ ‘107’ ‘121’ ‘122’ ‘123’ … CSV

3.0 compared with boeck2015/tempodetector2016_default (14 differences): ‘006’ ‘013’ ‘059’ ‘072’ ‘075’ ‘107’ ‘126’ ‘127’ ‘133’ ‘137’ ‘138’ … CSV

3.0 compared with boeck2019/multi_task (16 differences): ‘006’ ‘058’ ‘059’ ‘072’ ‘107’ ‘122’ ‘126’ ‘127’ ‘132’ ‘133’ ‘137’ … CSV

3.0 compared with boeck2019/multi_task_hjdb (16 differences): ‘006’ ‘058’ ‘059’ ‘072’ ‘075’ ‘107’ ‘122’ ‘127’ ‘132’ ‘133’ ‘134’ … CSV

3.0 compared with boeck2020/dar (13 differences): ‘006’ ‘059’ ‘072’ ‘075’ ‘107’ ‘122’ ‘126’ ‘133’ ‘137’ ‘139’ ‘140’ … CSV

3.0 compared with davies2009/mirex_qm_tempotracker (32 differences): ‘006’ ‘007’ ‘012’ ‘013’ ‘059’ ‘075’ ‘091’ ‘122’ ‘123’ ‘124’ ‘125’ … CSV

3.0 compared with echonest/version_3_2_1 (27 differences): ‘006’ ‘007’ ‘013’ ‘024’ ‘059’ ‘094’ ‘103’ ‘106’ ‘107’ ‘122’ ‘123’ … CSV

3.0 compared with gkiokas2012/default (27 differences): ‘003’ ‘006’ ‘010’ ‘013’ ‘059’ ‘091’ ‘107’ ‘121’ ‘122’ ‘124’ ‘125’ … CSV

3.0 compared with klapuri2006/percival2014 (32 differences): ‘006’ ‘013’ ‘057’ ‘059’ ‘075’ ‘090’ ‘091’ ‘103’ ‘107’ ‘121’ ‘122’ … CSV

3.0 compared with oliveira2010/ibt (36 differences): ‘006’ ‘007’ ‘024’ ‘058’ ‘059’ ‘070’ ‘075’ ‘091’ ‘103’ ‘121’ ‘122’ … CSV

3.0 compared with percival2014/stem (24 differences): ‘006’ ‘007’ ‘013’ ‘059’ ‘075’ ‘107’ ‘121’ ‘122’ ‘123’ ‘124’ ‘125’ … CSV

3.0 compared with scheirer1998/percival2014 (69 differences): ‘001’ ‘002’ ‘003’ ‘006’ ‘007’ ‘009’ ‘010’ ‘012’ ‘020’ ‘043’ ‘058’ … CSV

3.0 compared with schreiber2014/default (26 differences): ‘006’ ‘024’ ‘052’ ‘059’ ‘072’ ‘075’ ‘084’ ‘091’ ‘107’ ‘122’ ‘123’ … CSV

3.0 compared with schreiber2017/ismir2017 (24 differences): ‘006’ ‘013’ ‘058’ ‘059’ ‘061’ ‘075’ ‘091’ ‘107’ ‘124’ ‘125’ ‘126’ … CSV

3.0 compared with schreiber2017/mirex2017 (24 differences): ‘006’ ‘013’ ‘058’ ‘059’ ‘075’ ‘091’ ‘107’ ‘124’ ‘125’ ‘126’ ‘127’ … CSV

3.0 compared with schreiber2018/cnn (29 differences): ‘006’ ‘007’ ‘013’ ‘059’ ‘091’ ‘107’ ‘121’ ‘122’ ‘123’ ‘124’ ‘125’ … CSV

3.0 compared with schreiber2018/fcn (24 differences): ‘006’ ‘007’ ‘010’ ‘013’ ‘052’ ‘107’ ‘121’ ‘124’ ‘126’ ‘127’ ‘129’ … CSV

3.0 compared with schreiber2018/ismir2018 (28 differences): ‘006’ ‘007’ ‘013’ ‘057’ ‘059’ ‘091’ ‘107’ ‘122’ ‘123’ ‘125’ ‘126’ … CSV

3.0 compared with sun2021/default (16 differences): ‘006’ ‘013’ ‘059’ ‘072’ ‘075’ ‘121’ ‘127’ ‘133’ ‘137’ ‘138’ ‘139’ … CSV

3.0 compared with zplane/auftakt_v3 (35 differences): ‘006’ ‘007’ ‘010’ ‘013’ ‘057’ ‘059’ ‘075’ ‘107’ ‘112’ ‘121’ ‘122’ … CSV

All tracks were estimated ‘correctly’ by at least one system.

Significance of Differences

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.8318 0.8450 1.0000 0.0026 0.0000 0.0000 0.0029 0.0064 0.0003 0.0000 0.0013 0.0033 0.0201 0.2110 0.2430 0.2478 1.0000 0.0003
boeck2019/multi_task 0.8318 1.0000 1.0000 0.6776 0.0076 0.0002 0.0000 0.0079 0.0139 0.0013 0.0000 0.0078 0.0167 0.0596 0.3368 0.4408 0.4731 1.0000 0.0009
boeck2019/multi_task_hjdb 0.8450 1.0000 1.0000 0.6636 0.0060 0.0003 0.0000 0.0064 0.0113 0.0013 0.0000 0.0078 0.0201 0.0660 0.3105 0.4408 0.4583 1.0000 0.0007
boeck2020/dar 1.0000 0.6776 0.6636 1.0000 0.0037 0.0000 0.0000 0.0025 0.0054 0.0003 0.0000 0.0022 0.0039 0.0139 0.1214 0.1996 0.1849 0.8388 0.0002
davies2009/mirex_qm_tempotracker 0.0026 0.0076 0.0060 0.0037 1.0000 0.1337 0.0115 1.0000 1.0000 0.5114 0.0000 0.7838 0.8714 0.6177 0.1214 0.1173 0.0227 0.0175 0.4049
echonest/version_3_2_1 0.0000 0.0002 0.0003 0.0000 0.1337 1.0000 0.5114 0.1352 0.1048 0.3489 0.0000 0.2221 0.0436 0.0226 0.0038 0.0008 0.0009 0.0002 0.3916
gkiokas2012/default 0.0000 0.0000 0.0000 0.0000 0.0115 0.5114 1.0000 0.0139 0.0114 0.0730 0.0001 0.0488 0.0079 0.0031 0.0001 0.0000 0.0000 0.0000 0.0807
klapuri2006/percival2014 0.0029 0.0079 0.0064 0.0025 1.0000 0.1352 0.0139 1.0000 0.8145 0.5413 0.0000 0.8804 0.7428 0.5224 0.0895 0.0807 0.0146 0.0145 0.5413
oliveira2010/ibt 0.0064 0.0139 0.0113 0.0054 1.0000 0.1048 0.0114 0.8145 1.0000 0.3915 0.0000 0.6885 1.0000 0.7428 0.1755 0.1641 0.0522 0.0270 0.3269
percival2014/stem 0.0003 0.0013 0.0013 0.0003 0.5114 0.3489 0.0730 0.5413 0.3915 1.0000 0.0000 0.8714 0.2962 0.1877 0.0357 0.0195 0.0060 0.0018 1.0000
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0013 0.0078 0.0078 0.0022 0.7838 0.2221 0.0488 0.8804 0.6885 0.8714 0.0000 1.0000 0.5224 0.3489 0.0854 0.0595 0.0444 0.0065 0.8776
schreiber2017/ismir2017 0.0033 0.0167 0.0201 0.0039 0.8714 0.0436 0.0079 0.7428 1.0000 0.2962 0.0000 0.5224 1.0000 0.8036 0.2800 0.1996 0.1214 0.0259 0.3240
schreiber2017/mirex2017 0.0201 0.0596 0.0660 0.0139 0.6177 0.0226 0.0031 0.5224 0.7428 0.1877 0.0000 0.3489 0.8036 1.0000 0.4408 0.3368 0.2559 0.0649 0.1755
schreiber2018/cnn 0.2110 0.3368 0.3105 0.1214 0.1214 0.0038 0.0001 0.0895 0.1755 0.0357 0.0000 0.0854 0.2800 0.4408 1.0000 1.0000 0.8388 0.2682 0.0315
schreiber2018/fcn 0.2430 0.4408 0.4408 0.1996 0.1173 0.0008 0.0000 0.0807 0.1641 0.0195 0.0000 0.0595 0.1996 0.3368 1.0000 1.0000 1.0000 0.3489 0.0195
schreiber2018/ismir2018 0.2478 0.4731 0.4583 0.1849 0.0227 0.0009 0.0000 0.0146 0.0522 0.0060 0.0000 0.0444 0.1214 0.2559 0.8388 1.0000 1.0000 0.4177 0.0033
sun2021/default 1.0000 1.0000 1.0000 0.8388 0.0175 0.0002 0.0000 0.0145 0.0270 0.0018 0.0000 0.0065 0.0259 0.0649 0.2682 0.3489 0.4177 1.0000 0.0018
zplane/auftakt_v3 0.0003 0.0009 0.0007 0.0002 0.4049 0.3916 0.0807 0.5413 0.3269 1.0000 0.0000 0.8776 0.3240 0.1755 0.0315 0.0195 0.0033 0.0018 1.0000

Table 6: McNemar p-values, using reference annotations 1.0 as groundtruth with Accuracy1 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.3075 0.5716 1.0000 0.0016 0.0000 0.0000 0.0001 0.0004 0.0000 0.0000 0.0000 0.0001 0.0006 0.0237 0.0596 0.0708 0.3915 0.0000
boeck2019/multi_task 0.3075 1.0000 0.7905 0.3449 0.0226 0.0001 0.0001 0.0031 0.0079 0.0005 0.0000 0.0001 0.0060 0.0161 0.2110 0.3713 0.4996 1.0000 0.0005
boeck2019/multi_task_hjdb 0.5716 0.7905 1.0000 0.5572 0.0064 0.0000 0.0000 0.0008 0.0029 0.0002 0.0000 0.0001 0.0034 0.0066 0.1081 0.2327 0.2962 0.8601 0.0001
boeck2020/dar 1.0000 0.3449 0.5572 1.0000 0.0026 0.0000 0.0000 0.0002 0.0009 0.0000 0.0000 0.0000 0.0006 0.0008 0.0167 0.0596 0.0801 0.3449 0.0000
davies2009/mirex_qm_tempotracker 0.0016 0.0226 0.0064 0.0026 1.0000 0.0722 0.0161 0.3833 0.8036 0.1877 0.0000 0.1480 1.0000 1.0000 0.3240 0.2430 0.0614 0.0440 0.1221
echonest/version_3_2_1 0.0000 0.0001 0.0000 0.0000 0.0722 1.0000 0.8776 0.1934 0.1048 0.4408 0.0000 0.7709 0.0488 0.0595 0.0046 0.0009 0.0007 0.0002 0.4799
gkiokas2012/default 0.0000 0.0001 0.0000 0.0000 0.0161 0.8776 1.0000 0.0730 0.0357 0.2682 0.0000 0.5682 0.0328 0.0293 0.0005 0.0003 0.0001 0.0001 0.2912
klapuri2006/percival2014 0.0001 0.0031 0.0008 0.0002 0.3833 0.1934 0.0730 1.0000 0.6476 0.5413 0.0000 0.3817 0.6358 0.6358 0.0652 0.0436 0.0037 0.0099 0.5572
oliveira2010/ibt 0.0004 0.0079 0.0029 0.0009 0.8036 0.1048 0.0357 0.6476 1.0000 0.3240 0.0000 0.2370 1.0000 1.0000 0.1628 0.1263 0.0241 0.0247 0.2478
percival2014/stem 0.0000 0.0005 0.0002 0.0000 0.1877 0.4408 0.2682 0.5413 0.3240 1.0000 0.0000 0.7493 0.2295 0.2559 0.0226 0.0096 0.0022 0.0013 1.0000
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0000 0.0001 0.0001 0.0000 0.1480 0.7709 0.5682 0.3817 0.2370 0.7493 0.0000 1.0000 0.1081 0.1173 0.0110 0.0055 0.0022 0.0004 0.7608
schreiber2017/ismir2017 0.0001 0.0060 0.0034 0.0006 1.0000 0.0488 0.0328 0.6358 1.0000 0.2295 0.0000 0.1081 1.0000 1.0000 0.2800 0.1641 0.0652 0.0241 0.2800
schreiber2017/mirex2017 0.0006 0.0161 0.0066 0.0008 1.0000 0.0595 0.0293 0.6358 1.0000 0.2559 0.0000 0.1173 1.0000 1.0000 0.2682 0.1433 0.0730 0.0270 0.2430
schreiber2018/cnn 0.0237 0.2110 0.1081 0.0167 0.3240 0.0046 0.0005 0.0652 0.1628 0.0226 0.0000 0.0110 0.2800 0.2682 1.0000 0.8450 0.5413 0.2221 0.0195
schreiber2018/fcn 0.0596 0.3713 0.2327 0.0596 0.2430 0.0009 0.0003 0.0436 0.1263 0.0096 0.0000 0.0055 0.1641 0.1433 0.8450 1.0000 0.8450 0.3604 0.0079
schreiber2018/ismir2018 0.0708 0.4996 0.2962 0.0801 0.0614 0.0007 0.0001 0.0037 0.0241 0.0022 0.0000 0.0022 0.0652 0.0730 0.5413 0.8450 1.0000 0.5224 0.0005
sun2021/default 0.3915 1.0000 0.8601 0.3449 0.0440 0.0002 0.0001 0.0099 0.0247 0.0013 0.0000 0.0004 0.0241 0.0270 0.2221 0.3604 0.5224 1.0000 0.0010
zplane/auftakt_v3 0.0000 0.0005 0.0001 0.0000 0.1221 0.4799 0.2912 0.5572 0.2478 1.0000 0.0000 0.7608 0.2800 0.2430 0.0195 0.0079 0.0005 0.0010 1.0000

Table 7: McNemar p-values, using reference annotations 2.0 as groundtruth with Accuracy1 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.0639 0.1849 0.8506 0.0007 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000 0.0001 0.0079 0.0275 0.0351 0.1496 0.0000
boeck2019/multi_task 0.0639 1.0000 0.7905 0.1686 0.0436 0.0001 0.0001 0.0066 0.0166 0.0011 0.0000 0.0007 0.0167 0.0195 0.2682 0.5424 0.7283 1.0000 0.0011
boeck2019/multi_task_hjdb 0.1849 0.7905 1.0000 0.3075 0.0139 0.0000 0.0000 0.0019 0.0064 0.0005 0.0000 0.0004 0.0095 0.0079 0.1325 0.3604 0.4731 1.0000 0.0002
boeck2020/dar 0.8506 0.1686 0.3075 1.0000 0.0026 0.0000 0.0000 0.0002 0.0009 0.0000 0.0000 0.0000 0.0006 0.0004 0.0113 0.0596 0.0801 0.2478 0.0000
davies2009/mirex_qm_tempotracker 0.0007 0.0436 0.0139 0.0026 1.0000 0.0519 0.0096 0.3833 0.8036 0.1877 0.0000 0.2288 1.0000 0.8746 0.4050 0.2430 0.0614 0.0534 0.1221
echonest/version_3_2_1 0.0000 0.0001 0.0000 0.0000 0.0519 1.0000 0.8746 0.1439 0.0759 0.3489 0.0000 0.4614 0.0357 0.0595 0.0038 0.0005 0.0004 0.0002 0.3916
gkiokas2012/default 0.0000 0.0001 0.0000 0.0000 0.0096 0.8746 1.0000 0.0470 0.0226 0.1877 0.0000 0.3020 0.0213 0.0259 0.0005 0.0001 0.0001 0.0001 0.2221
klapuri2006/percival2014 0.0000 0.0066 0.0019 0.0002 0.3833 0.1439 0.0470 1.0000 0.6476 0.5413 0.0000 0.5515 0.6358 0.7493 0.0895 0.0436 0.0037 0.0119 0.5572
oliveira2010/ibt 0.0002 0.0166 0.0064 0.0009 0.8036 0.0759 0.0226 0.6476 1.0000 0.3105 0.0000 0.3497 1.0000 1.0000 0.2153 0.1263 0.0241 0.0270 0.2478
percival2014/stem 0.0000 0.0011 0.0005 0.0000 0.1877 0.3489 0.1877 0.5413 0.3105 1.0000 0.0000 1.0000 0.2295 0.3105 0.0315 0.0079 0.0022 0.0018 1.0000
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0000 0.0007 0.0004 0.0000 0.2288 0.4614 0.3020 0.5515 0.3497 1.0000 0.0000 1.0000 0.1877 0.2430 0.0328 0.0110 0.0046 0.0019 1.0000
schreiber2017/ismir2017 0.0000 0.0167 0.0095 0.0006 1.0000 0.0357 0.0213 0.6358 1.0000 0.2295 0.0000 0.1877 1.0000 1.0000 0.3604 0.1641 0.0652 0.0328 0.2800
schreiber2017/mirex2017 0.0001 0.0195 0.0079 0.0004 0.8746 0.0595 0.0259 0.7493 1.0000 0.3105 0.0000 0.2430 1.0000 1.0000 0.2559 0.0989 0.0470 0.0213 0.3105
schreiber2018/cnn 0.0079 0.2682 0.1325 0.0113 0.4050 0.0038 0.0005 0.0895 0.2153 0.0315 0.0000 0.0328 0.3604 0.2559 1.0000 0.6900 0.4049 0.1996 0.0275
schreiber2018/fcn 0.0275 0.5424 0.3604 0.0596 0.2430 0.0005 0.0001 0.0436 0.1263 0.0079 0.0000 0.0110 0.1641 0.0989 0.6900 1.0000 0.8450 0.4177 0.0079
schreiber2018/ismir2018 0.0351 0.7283 0.4731 0.0801 0.0614 0.0004 0.0001 0.0037 0.0241 0.0022 0.0000 0.0046 0.0652 0.0470 0.4049 0.8450 1.0000 0.6177 0.0005
sun2021/default 0.1496 1.0000 1.0000 0.2478 0.0534 0.0002 0.0001 0.0119 0.0270 0.0018 0.0000 0.0019 0.0328 0.0213 0.1996 0.4177 0.6177 1.0000 0.0015
zplane/auftakt_v3 0.0000 0.0011 0.0002 0.0000 0.1221 0.3916 0.2221 0.5572 0.2478 1.0000 0.0000 1.0000 0.2800 0.3105 0.0275 0.0079 0.0005 0.0015 1.0000

Table 8: McNemar p-values, using reference annotations 3.0 as groundtruth with Accuracy1 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.5078 1.0000 0.6250 0.0118 0.2101 0.0963 0.0636 0.0066 0.5488 0.0000 0.6072 0.0923 0.2668 0.1153 0.4545 0.0352 0.1094 0.0106
boeck2019/multi_task 0.5078 1.0000 0.6250 1.0000 0.0026 0.0490 0.0127 0.0169 0.0009 0.1460 0.0000 0.2632 0.0129 0.0574 0.0266 0.1892 0.0042 0.5078 0.0015
boeck2019/multi_task_hjdb 1.0000 0.6250 1.0000 1.0000 0.0072 0.1892 0.0784 0.0525 0.0041 0.3437 0.0000 0.4807 0.0768 0.2101 0.0784 0.3833 0.0309 0.1797 0.0066
boeck2020/dar 0.6250 1.0000 1.0000 1.0000 0.0013 0.0574 0.0309 0.0192 0.0015 0.2266 0.0000 0.3018 0.0225 0.0923 0.0309 0.2632 0.0127 0.2891 0.0015
davies2009/mirex_qm_tempotracker 0.0118 0.0026 0.0072 0.0013 1.0000 0.2863 0.4807 0.6476 0.8036 0.0225 0.0000 0.0636 0.3323 0.1435 0.4545 0.1338 0.6072 0.0003 1.0000
echonest/version_3_2_1 0.2101 0.0490 0.1892 0.0574 0.2863 1.0000 0.8145 0.6072 0.1153 0.6072 0.0000 0.6776 1.0000 1.0000 0.8145 0.8145 0.6291 0.0118 0.1671
gkiokas2012/default 0.0963 0.0127 0.0784 0.0309 0.4807 0.8145 1.0000 1.0000 0.2863 0.3323 0.0000 0.4049 1.0000 0.6072 1.0000 0.4807 1.0000 0.0043 0.3323
klapuri2006/percival2014 0.0636 0.0169 0.0525 0.0192 0.6476 0.6072 1.0000 1.0000 0.3323 0.2632 0.0000 0.3269 0.8036 0.4545 1.0000 0.3593 1.0000 0.0015 0.5235
oliveira2010/ibt 0.0066 0.0009 0.0041 0.0015 0.8036 0.1153 0.2863 0.3323 1.0000 0.0192 0.0000 0.0522 0.1671 0.0636 0.2632 0.0872 0.3593 0.0005 1.0000
percival2014/stem 0.5488 0.1460 0.3437 0.2266 0.0225 0.6072 0.3323 0.2632 0.0192 1.0000 0.0000 1.0000 0.3877 0.7744 0.3018 1.0000 0.1094 0.0225 0.0129
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.6072 0.2632 0.4807 0.3018 0.0636 0.6776 0.4049 0.3269 0.0522 1.0000 0.0000 1.0000 0.5034 0.8238 0.3593 1.0000 0.2632 0.0490 0.0755
schreiber2017/ismir2017 0.0923 0.0129 0.0768 0.0225 0.3323 1.0000 1.0000 0.8036 0.1671 0.3877 0.0000 0.5034 1.0000 0.5000 1.0000 0.6636 0.8145 0.0044 0.2101
schreiber2017/mirex2017 0.2668 0.0574 0.2101 0.0923 0.1435 1.0000 0.6072 0.4545 0.0636 0.7744 0.0000 0.8238 0.5000 1.0000 0.6291 1.0000 0.4807 0.0192 0.0768
schreiber2018/cnn 0.1153 0.0266 0.0784 0.0309 0.4545 0.8145 1.0000 1.0000 0.2632 0.3018 0.0000 0.3593 1.0000 0.6291 1.0000 0.4807 1.0000 0.0043 0.3323
schreiber2018/fcn 0.4545 0.1892 0.3833 0.2632 0.1338 0.8145 0.4807 0.3593 0.0872 1.0000 0.0000 1.0000 0.6636 1.0000 0.4807 1.0000 0.3323 0.0414 0.0931
schreiber2018/ismir2018 0.0352 0.0042 0.0309 0.0127 0.6072 0.6291 1.0000 1.0000 0.3593 0.1094 0.0000 0.2632 0.8145 0.4807 1.0000 0.3323 1.0000 0.0007 0.5034
sun2021/default 0.1094 0.5078 0.1797 0.2891 0.0003 0.0118 0.0043 0.0015 0.0005 0.0225 0.0000 0.0490 0.0044 0.0192 0.0043 0.0414 0.0007 1.0000 0.0003
zplane/auftakt_v3 0.0106 0.0015 0.0066 0.0015 1.0000 0.1671 0.3323 0.5235 1.0000 0.0129 0.0000 0.0755 0.2101 0.0768 0.3323 0.0931 0.5034 0.0003 1.0000

Table 9: McNemar p-values, using reference annotations 1.0 as groundtruth with Accuracy2 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 1.0000 1.0000 0.2891 0.0094 0.0309 0.0213 0.0015 0.0001 0.3018 0.0000 0.0309 0.0386 0.0654 0.0042 0.0963 0.0127 1.0000 0.0003
boeck2019/multi_task 1.0000 1.0000 1.0000 0.5488 0.0059 0.0192 0.0074 0.0023 0.0000 0.2379 0.0000 0.0266 0.0225 0.0386 0.0044 0.0636 0.0075 1.0000 0.0003
boeck2019/multi_task_hjdb 1.0000 1.0000 1.0000 0.5488 0.0059 0.0266 0.0192 0.0015 0.0001 0.2379 0.0000 0.0266 0.0352 0.0574 0.0072 0.0784 0.0169 1.0000 0.0005
boeck2020/dar 0.2891 0.5488 0.5488 1.0000 0.0003 0.0043 0.0043 0.0001 0.0000 0.0636 0.0000 0.0013 0.0042 0.0074 0.0004 0.0169 0.0007 0.3437 0.0000
davies2009/mirex_qm_tempotracker 0.0094 0.0059 0.0059 0.0003 1.0000 0.5413 0.5235 0.8318 0.2379 0.0352 0.0000 0.5572 0.2863 0.1892 0.8145 0.3075 0.6476 0.0094 0.5413
echonest/version_3_2_1 0.0309 0.0192 0.0266 0.0043 0.5413 1.0000 1.0000 0.2379 0.0309 0.3323 0.0000 1.0000 0.8238 0.6476 0.8145 0.8318 1.0000 0.0755 0.1338
gkiokas2012/default 0.0213 0.0074 0.0192 0.0043 0.5235 1.0000 1.0000 0.2632 0.0414 0.3593 0.0000 1.0000 0.7905 0.5811 0.7905 0.8145 1.0000 0.0639 0.0963
klapuri2006/percival2014 0.0015 0.0023 0.0015 0.0001 0.8318 0.2379 0.2632 1.0000 0.4807 0.0347 0.0000 0.3915 0.1516 0.0931 0.4807 0.1338 0.4049 0.0025 0.8318
oliveira2010/ibt 0.0001 0.0000 0.0001 0.0000 0.2379 0.0309 0.0414 0.4807 1.0000 0.0007 0.0000 0.0755 0.0118 0.0044 0.0768 0.0227 0.0636 0.0005 0.8238
percival2014/stem 0.3018 0.2379 0.2379 0.0636 0.0352 0.3323 0.3593 0.0347 0.0007 1.0000 0.0000 0.3833 0.6072 0.7905 0.1185 0.6476 0.1796 0.3593 0.0010
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0309 0.0266 0.0266 0.0013 0.5572 1.0000 1.0000 0.3915 0.0755 0.3833 0.0000 1.0000 0.8238 0.6476 0.8318 0.8450 1.0000 0.0309 0.1686
schreiber2017/ismir2017 0.0386 0.0225 0.0352 0.0042 0.2863 0.8238 0.7905 0.1516 0.0118 0.6072 0.0000 0.8238 1.0000 1.0000 0.4545 1.0000 0.6291 0.0963 0.0414
schreiber2017/mirex2017 0.0654 0.0386 0.0574 0.0074 0.1892 0.6476 0.5811 0.0931 0.0044 0.7905 0.0000 0.6476 1.0000 1.0000 0.3018 1.0000 0.4545 0.1435 0.0192
schreiber2018/cnn 0.0042 0.0044 0.0072 0.0004 0.8145 0.8145 0.7905 0.4807 0.0768 0.1185 0.0000 0.8318 0.4545 0.3018 1.0000 0.4240 1.0000 0.0227 0.2101
schreiber2018/fcn 0.0963 0.0636 0.0784 0.0169 0.3075 0.8318 0.8145 0.1338 0.0227 0.6476 0.0000 0.8450 1.0000 1.0000 0.4240 1.0000 0.6476 0.1153 0.0414
schreiber2018/ismir2018 0.0127 0.0075 0.0169 0.0007 0.6476 1.0000 1.0000 0.4049 0.0636 0.1796 0.0000 1.0000 0.6291 0.4545 1.0000 0.6476 1.0000 0.0266 0.1435
sun2021/default 1.0000 1.0000 1.0000 0.3437 0.0094 0.0755 0.0639 0.0025 0.0005 0.3593 0.0000 0.0309 0.0963 0.1435 0.0227 0.1153 0.0266 1.0000 0.0009
zplane/auftakt_v3 0.0003 0.0003 0.0005 0.0000 0.5413 0.1338 0.0963 0.8318 0.8238 0.0010 0.0000 0.1686 0.0414 0.0192 0.2101 0.0414 0.1435 0.0009 1.0000

Table 10: McNemar p-values, using reference annotations 2.0 as groundtruth with Accuracy2 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.6875 0.6875 1.0000 0.0003 0.0072 0.0044 0.0003 0.0000 0.0213 0.0000 0.0018 0.0129 0.0063 0.0007 0.0309 0.0026 0.6875 0.0000
boeck2019/multi_task 0.6875 1.0000 1.0000 0.4531 0.0025 0.0192 0.0127 0.0037 0.0000 0.0963 0.0000 0.0129 0.0574 0.0386 0.0044 0.1338 0.0075 1.0000 0.0002
boeck2019/multi_task_hjdb 0.6875 1.0000 1.0000 0.4531 0.0025 0.0266 0.0266 0.0025 0.0001 0.0963 0.0000 0.0129 0.0768 0.0574 0.0072 0.1516 0.0169 1.0000 0.0003
boeck2020/dar 1.0000 0.4531 0.4531 1.0000 0.0002 0.0043 0.0043 0.0002 0.0000 0.0127 0.0000 0.0002 0.0127 0.0074 0.0004 0.0266 0.0007 0.5078 0.0000
davies2009/mirex_qm_tempotracker 0.0003 0.0025 0.0025 0.0002 1.0000 0.3833 0.3593 1.0000 0.4240 0.0574 0.0000 0.2863 0.1153 0.0963 0.5811 0.1338 0.4545 0.0015 0.6476
echonest/version_3_2_1 0.0072 0.0192 0.0266 0.0043 0.3833 1.0000 1.0000 0.3593 0.0490 0.5811 0.0000 1.0000 0.6636 0.6476 0.8036 0.6636 1.0000 0.0347 0.1153
gkiokas2012/default 0.0044 0.0127 0.0266 0.0043 0.3593 1.0000 1.0000 0.3593 0.0636 0.6291 0.0000 1.0000 0.5811 0.5811 0.7905 0.6291 1.0000 0.0266 0.0963
klapuri2006/percival2014 0.0003 0.0037 0.0025 0.0002 1.0000 0.3593 0.3593 1.0000 0.4807 0.1153 0.0000 0.3449 0.1516 0.1338 0.6291 0.1338 0.5235 0.0015 0.6776
oliveira2010/ibt 0.0000 0.0000 0.0001 0.0000 0.4240 0.0490 0.0636 0.4807 1.0000 0.0042 0.0000 0.0639 0.0169 0.0118 0.1185 0.0227 0.0963 0.0002 1.0000
percival2014/stem 0.0213 0.0963 0.0963 0.0127 0.0574 0.5811 0.6291 0.1153 0.0042 1.0000 0.0000 0.8238 1.0000 1.0000 0.2266 1.0000 0.4240 0.1153 0.0010
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0018 0.0129 0.0129 0.0002 0.2863 1.0000 1.0000 0.3449 0.0639 0.8238 0.0000 1.0000 0.8145 0.8036 0.6476 0.8388 0.8238 0.0309 0.1078
schreiber2017/ismir2017 0.0129 0.0574 0.0768 0.0127 0.1153 0.6636 0.5811 0.1516 0.0169 1.0000 0.0000 0.8145 1.0000 1.0000 0.3323 1.0000 0.4807 0.0963 0.0192
schreiber2017/mirex2017 0.0063 0.0386 0.0574 0.0074 0.0963 0.6476 0.5811 0.1338 0.0118 1.0000 0.0000 0.8036 1.0000 1.0000 0.3018 1.0000 0.4545 0.0768 0.0127
schreiber2018/cnn 0.0007 0.0044 0.0072 0.0004 0.5811 0.8036 0.7905 0.6291 0.1185 0.2266 0.0000 0.6476 0.3323 0.3018 1.0000 0.2668 1.0000 0.0044 0.1796
schreiber2018/fcn 0.0309 0.1338 0.1516 0.0266 0.1338 0.6636 0.6291 0.1338 0.0227 1.0000 0.0000 0.8388 1.0000 1.0000 0.2668 1.0000 0.5034 0.0963 0.0192
schreiber2018/ismir2018 0.0026 0.0075 0.0169 0.0007 0.4545 1.0000 1.0000 0.5235 0.0963 0.4240 0.0000 0.8238 0.4807 0.4545 1.0000 0.5034 1.0000 0.0118 0.1435
sun2021/default 0.6875 1.0000 1.0000 0.5078 0.0015 0.0347 0.0266 0.0015 0.0002 0.1153 0.0000 0.0309 0.0963 0.0768 0.0044 0.0963 0.0118 1.0000 0.0002
zplane/auftakt_v3 0.0000 0.0002 0.0003 0.0000 0.6476 0.1153 0.0963 0.6776 1.0000 0.0010 0.0000 0.1078 0.0192 0.0127 0.1796 0.0192 0.1435 0.0002 1.0000

Table 11: McNemar p-values, using reference annotations 3.0 as groundtruth with Accuracy2 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Accuracy1 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

Accuracy1 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 11: Mean Accuracy1 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 12: Mean Accuracy1 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 on cvar-Subsets for 3.0 based on cvar-Values from 1.0

Figure 13: Mean Accuracy1 compared to version 3.0 for tracks with cvar < τ based on beat annotations from 3.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

Accuracy2 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 14: Mean Accuracy2 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 15: Mean Accuracy2 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on cvar-Subsets for 3.0 based on cvar-Values from 1.0

Figure 16: Mean Accuracy2 compared to version 3.0 for tracks with cvar < τ based on beat annotations from 3.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy1 on Tempo-Subsets for 1.0

Figure 17: Mean Accuracy1 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 on Tempo-Subsets for 2.0

Figure 18: Mean Accuracy1 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 on Tempo-Subsets for 3.0

Figure 19: Mean Accuracy1 for estimates compared to version 3.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy2 on Tempo-Subsets for 1.0

Figure 20: Mean Accuracy2 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on Tempo-Subsets for 2.0

Figure 21: Mean Accuracy2 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on Tempo-Subsets for 3.0

Figure 22: Mean Accuracy2 for estimates compared to version 3.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy1 for Tempo

When fitting a generalized additive model (GAM) to Accuracy1-values and a ground truth, what Accuracy1 can we expect with confidence?

Estimated Accuracy1 for Tempo for 1.0

Predictions of GAMs trained on Accuracy1 for estimates for reference 1.0.

Figure 23: Accuracy1 predictions of a generalized additive model (GAM) fit to Accuracy1 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy1 for Tempo for 2.0

Predictions of GAMs trained on Accuracy1 for estimates for reference 2.0.

Figure 24: Accuracy1 predictions of a generalized additive model (GAM) fit to Accuracy1 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy1 for Tempo for 3.0

Predictions of GAMs trained on Accuracy1 for estimates for reference 3.0.

Figure 25: Accuracy1 predictions of a generalized additive model (GAM) fit to Accuracy1 results for 3.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy2 for Tempo

When fitting a generalized additive model (GAM) to Accuracy2-values and a ground truth, what Accuracy2 can we expect with confidence?

Estimated Accuracy2 for Tempo for 1.0

Predictions of GAMs trained on Accuracy2 for estimates for reference 1.0.

Figure 26: Accuracy2 predictions of a generalized additive model (GAM) fit to Accuracy2 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy2 for Tempo for 2.0

Predictions of GAMs trained on Accuracy2 for estimates for reference 2.0.

Figure 27: Accuracy2 predictions of a generalized additive model (GAM) fit to Accuracy2 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy2 for Tempo for 3.0

Predictions of GAMs trained on Accuracy2 for estimates for reference 3.0.

Figure 28: Accuracy2 predictions of a generalized additive model (GAM) fit to Accuracy2 results for 3.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

Accuracy1 for ‘tag_open’ Tags for 1.0

Figure 29: Mean Accuracy1 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 for ‘tag_open’ Tags for 2.0

Figure 30: Mean Accuracy1 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 for ‘tag_open’ Tags for 3.0

Figure 31: Mean Accuracy1 of estimates compared to version 3.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

Accuracy2 for ‘tag_open’ Tags for 1.0

Figure 32: Mean Accuracy2 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for ‘tag_open’ Tags for 2.0

Figure 33: Mean Accuracy2 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for ‘tag_open’ Tags for 3.0

Figure 34: Mean Accuracy2 of estimates compared to version 3.0 depending on tag from namespace ‘tag_open’.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 and OE2

OE1 is defined as octave error between an estimate E and a reference value R.This means that the most common errors—by a factor of 2 or ½—have the same magnitude, namely 1: OE2(E) = log2(E/R).

OE2 is the signed OE1 corresponding to the minimum absolute OE1 allowing the octaveerrors 2, 3, 1/2, and 1/3: OE2(E) = arg minx(|x|) with x ∈ {OE1(E), OE1(2E), OE1(3E), OE1(½E), OE1(⅓E)}

Mean OE1/OE2 Results for 1.0

Estimator OE1_MEAN OE1_STDEV OE2_MEAN OE2_STDEV
boeck2015/tempodetector2016_default -0.0178 0.3321 0.0092 0.0843
schreiber2018/ismir2018 0.0251 0.3390 0.0090 0.1093
boeck2020/dar 0.0025 0.3515 0.0078 0.0837
schreiber2018/cnn 0.0402 0.3517 0.0150 0.1248
boeck2019/multi_task_hjdb -0.0100 0.3579 0.0132 0.0946
sun2021/default 0.0177 0.3594 0.0087 0.0914
boeck2019/multi_task -0.0042 0.3690 0.0138 0.0918
oliveira2010/ibt 0.0633 0.3835 0.0002 0.1433
schreiber2018/fcn 0.0136 0.4133 0.0147 0.0950
schreiber2014/default -0.1485 0.4165 0.0002 0.1083
klapuri2006/percival2014 0.0366 0.4218 0.0096 0.1268
schreiber2017/ismir2017 -0.0908 0.4244 0.0117 0.1149
zplane/auftakt_v3 -0.0050 0.4379 -0.0005 0.1391
echonest/version_3_2_1 -0.1828 0.4396 0.0009 0.1036
davies2009/mirex_qm_tempotracker 0.1669 0.4416 0.0095 0.1098
percival2014/stem -0.0826 0.4537 0.0165 0.1003
schreiber2017/mirex2017 -0.1249 0.4554 0.0099 0.1200
gkiokas2012/default -0.0182 0.5383 0.0189 0.1088
scheirer1998/percival2014 -0.0452 0.5400 0.0256 0.1798

Table 12: Mean OE1/OE2 for estimates compared to version 1.0 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE1: CSV JSON LATEX PICKLE

Raw data OE2: CSV JSON LATEX PICKLE

OE1 distribution for 1.0

Figure 35: OE1 for estimates compared to version 1.0. Shown are the mean OE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 distribution for 1.0

Figure 36: OE2 for estimates compared to version 1.0. Shown are the mean OE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean OE1/OE2 Results for 2.0

Estimator OE1_MEAN OE1_STDEV OE2_MEAN OE2_STDEV
boeck2015/tempodetector2016_default -0.0332 0.3376 -0.0035 0.0612
schreiber2018/ismir2018 0.0098 0.3512 -0.0018 0.0976
schreiber2018/cnn 0.0249 0.3575 -0.0003 0.1097
boeck2020/dar -0.0128 0.3581 -0.0075 0.0680
sun2021/default 0.0024 0.3705 -0.0066 0.0711
boeck2019/multi_task_hjdb -0.0254 0.3737 -0.0021 0.0711
oliveira2010/ibt 0.0480 0.3851 0.0029 0.1337
boeck2019/multi_task -0.0196 0.3865 -0.0015 0.0681
klapuri2006/percival2014 0.0213 0.4203 -0.0057 0.1104
schreiber2014/default -0.1638 0.4234 -0.0106 0.0995
schreiber2018/fcn -0.0017 0.4235 0.0039 0.0750
davies2009/mirex_qm_tempotracker 0.1515 0.4353 0.0085 0.1013
schreiber2017/ismir2017 -0.1061 0.4372 0.0009 0.0961
echonest/version_3_2_1 -0.1981 0.4434 -0.0054 0.0916
zplane/auftakt_v3 -0.0203 0.4444 -0.0068 0.1232
percival2014/stem -0.0979 0.4600 0.0102 0.0831
schreiber2017/mirex2017 -0.1403 0.4666 -0.0010 0.1050
scheirer1998/percival2014 -0.0610 0.5405 0.0239 0.1732
gkiokas2012/default -0.0336 0.5414 0.0081 0.0896

Table 13: Mean OE1/OE2 for estimates compared to version 2.0 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE1: CSV JSON LATEX PICKLE

Raw data OE2: CSV JSON LATEX PICKLE

OE1 distribution for 2.0

Figure 37: OE1 for estimates compared to version 2.0. Shown are the mean OE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 distribution for 2.0

Figure 38: OE2 for estimates compared to version 2.0. Shown are the mean OE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean OE1/OE2 Results for 3.0

Estimator OE1_MEAN OE1_STDEV OE2_MEAN OE2_STDEV
boeck2015/tempodetector2016_default -0.0342 0.3365 -0.0046 0.0615
schreiber2018/ismir2018 0.0087 0.3481 -0.0029 0.0960
schreiber2018/cnn 0.0238 0.3559 0.0031 0.1099
boeck2020/dar -0.0139 0.3568 -0.0086 0.0682
sun2021/default 0.0013 0.3691 -0.0077 0.0713
boeck2019/multi_task_hjdb -0.0264 0.3721 -0.0031 0.0709
oliveira2010/ibt 0.0469 0.3828 0.0018 0.1350
boeck2019/multi_task -0.0206 0.3841 -0.0026 0.0691
klapuri2006/percival2014 0.0202 0.4177 -0.0068 0.1106
schreiber2018/fcn -0.0028 0.4193 0.0028 0.0758
schreiber2014/default -0.1649 0.4221 -0.0117 0.0997
davies2009/mirex_qm_tempotracker 0.1505 0.4324 0.0074 0.1005
schreiber2017/ismir2017 -0.1072 0.4349 -0.0002 0.0956
zplane/auftakt_v3 -0.0214 0.4411 -0.0079 0.1262
echonest/version_3_2_1 -0.1993 0.4417 -0.0066 0.0913
percival2014/stem -0.0989 0.4568 0.0092 0.0810
schreiber2017/mirex2017 -0.1413 0.4656 -0.0020 0.1059
gkiokas2012/default -0.0346 0.5384 0.0070 0.0904
scheirer1998/percival2014 -0.0621 0.5388 0.0228 0.1736

Table 14: Mean OE1/OE2 for estimates compared to version 3.0 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE1: CSV JSON LATEX PICKLE

Raw data OE2: CSV JSON LATEX PICKLE

OE1 distribution for 3.0

Figure 39: OE1 for estimates compared to version 3.0. Shown are the mean OE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 distribution for 3.0

Figure 40: OE2 for estimates compared to version 3.0. Shown are the mean OE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.5938 0.7672 0.4306 0.0000 0.0000 0.9913 0.0677 0.0026 0.0341 0.4576 0.0000 0.0153 0.0018 0.0208 0.2878 0.0625 0.1422 0.6677
boeck2019/multi_task 0.5938 1.0000 0.7486 0.7762 0.0000 0.0000 0.7105 0.1644 0.0096 0.0063 0.2050 0.0000 0.0008 0.0001 0.0745 0.5399 0.1978 0.3664 0.9777
boeck2019/multi_task_hjdb 0.7672 0.7486 1.0000 0.5725 0.0000 0.0000 0.8269 0.1139 0.0047 0.0114 0.2685 0.0000 0.0026 0.0001 0.0393 0.4422 0.1235 0.2709 0.8561
boeck2020/dar 0.4306 0.7762 0.5725 1.0000 0.0000 0.0000 0.5987 0.3078 0.0474 0.0088 0.1971 0.0000 0.0012 0.0001 0.1351 0.7191 0.3435 0.5247 0.8132
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
echonest/version_3_2_1 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0007 0.0010 0.2335 0.0010 0.0488 0.0000 0.0000 0.0000 0.0000 0.0000
gkiokas2012/default 0.9913 0.7105 0.8269 0.5987 0.0000 0.0000 1.0000 0.0495 0.0072 0.0355 0.3517 0.0000 0.0466 0.0119 0.0839 0.3557 0.2170 0.3410 0.6674
klapuri2006/percival2014 0.0677 0.1644 0.1139 0.3078 0.0000 0.0000 0.0495 1.0000 0.1107 0.0000 0.0047 0.0000 0.0000 0.0000 0.8964 0.4430 0.6348 0.5606 0.0621
oliveira2010/ibt 0.0026 0.0096 0.0047 0.0474 0.0000 0.0000 0.0072 0.1107 1.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.3292 0.0826 0.0809 0.1000 0.0010
percival2014/stem 0.0341 0.0063 0.0114 0.0088 0.0000 0.0007 0.0355 0.0000 0.0000 1.0000 0.3064 0.0075 0.7640 0.2000 0.0000 0.0019 0.0000 0.0011 0.0015
scheirer1998/percival2014 0.4576 0.2050 0.2685 0.1971 0.0000 0.0010 0.3517 0.0047 0.0001 0.3064 1.0000 0.0058 0.2376 0.0591 0.0078 0.1425 0.0348 0.0697 0.1864
schreiber2014/default 0.0000 0.0000 0.0000 0.0000 0.0000 0.2335 0.0000 0.0000 0.0000 0.0075 0.0058 1.0000 0.0350 0.4504 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0153 0.0008 0.0026 0.0012 0.0000 0.0010 0.0466 0.0000 0.0000 0.7640 0.2376 0.0350 1.0000 0.1374 0.0000 0.0008 0.0000 0.0005 0.0035
schreiber2017/mirex2017 0.0018 0.0001 0.0001 0.0001 0.0000 0.0488 0.0119 0.0000 0.0000 0.2000 0.0591 0.4504 0.1374 1.0000 0.0000 0.0001 0.0000 0.0000 0.0004
schreiber2018/cnn 0.0208 0.0745 0.0393 0.1351 0.0000 0.0000 0.0839 0.8964 0.3292 0.0000 0.0078 0.0000 0.0000 0.0000 1.0000 0.2661 0.4333 0.3520 0.1130
schreiber2018/fcn 0.2878 0.5399 0.4422 0.7191 0.0000 0.0000 0.3557 0.4430 0.0826 0.0019 0.1425 0.0000 0.0008 0.0001 0.2661 1.0000 0.6005 0.8887 0.5287
schreiber2018/ismir2018 0.0625 0.1978 0.1235 0.3435 0.0000 0.0000 0.2170 0.6348 0.0809 0.0000 0.0348 0.0000 0.0000 0.0000 0.4333 0.6005 1.0000 0.7609 0.2161
sun2021/default 0.1422 0.3664 0.2709 0.5247 0.0000 0.0000 0.3410 0.5606 0.1000 0.0011 0.0697 0.0000 0.0005 0.0000 0.3520 0.8887 0.7609 1.0000 0.4568
zplane/auftakt_v3 0.6677 0.9777 0.8561 0.8132 0.0000 0.0000 0.6674 0.0621 0.0010 0.0015 0.1864 0.0000 0.0035 0.0004 0.1130 0.5287 0.2161 0.4568 1.0000

Table 15: Paired t-test p-values, using reference annotations 1.0 as groundtruth with OE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.5938 0.7672 0.4306 0.0000 0.0000 0.9913 0.0677 0.0026 0.0341 0.4576 0.0000 0.0153 0.0018 0.0208 0.2878 0.0625 0.1422 0.6677
boeck2019/multi_task 0.5938 1.0000 0.7486 0.7762 0.0000 0.0000 0.7105 0.1644 0.0096 0.0063 0.2050 0.0000 0.0008 0.0001 0.0745 0.5399 0.1978 0.3664 0.9777
boeck2019/multi_task_hjdb 0.7672 0.7486 1.0000 0.5725 0.0000 0.0000 0.8269 0.1139 0.0047 0.0114 0.2685 0.0000 0.0026 0.0001 0.0393 0.4422 0.1235 0.2709 0.8561
boeck2020/dar 0.4306 0.7762 0.5725 1.0000 0.0000 0.0000 0.5987 0.3078 0.0474 0.0088 0.1971 0.0000 0.0012 0.0001 0.1351 0.7191 0.3435 0.5247 0.8132
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
echonest/version_3_2_1 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0007 0.0010 0.2335 0.0010 0.0488 0.0000 0.0000 0.0000 0.0000 0.0000
gkiokas2012/default 0.9913 0.7105 0.8269 0.5987 0.0000 0.0000 1.0000 0.0495 0.0072 0.0355 0.3517 0.0000 0.0466 0.0119 0.0839 0.3557 0.2170 0.3410 0.6674
klapuri2006/percival2014 0.0677 0.1644 0.1139 0.3078 0.0000 0.0000 0.0495 1.0000 0.1107 0.0000 0.0047 0.0000 0.0000 0.0000 0.8964 0.4430 0.6348 0.5606 0.0621
oliveira2010/ibt 0.0026 0.0096 0.0047 0.0474 0.0000 0.0000 0.0072 0.1107 1.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.3292 0.0826 0.0809 0.1000 0.0010
percival2014/stem 0.0341 0.0063 0.0114 0.0088 0.0000 0.0007 0.0355 0.0000 0.0000 1.0000 0.3064 0.0075 0.7640 0.2000 0.0000 0.0019 0.0000 0.0011 0.0015
scheirer1998/percival2014 0.4576 0.2050 0.2685 0.1971 0.0000 0.0010 0.3517 0.0047 0.0001 0.3064 1.0000 0.0058 0.2376 0.0591 0.0078 0.1425 0.0348 0.0697 0.1864
schreiber2014/default 0.0000 0.0000 0.0000 0.0000 0.0000 0.2335 0.0000 0.0000 0.0000 0.0075 0.0058 1.0000 0.0350 0.4504 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0153 0.0008 0.0026 0.0012 0.0000 0.0010 0.0466 0.0000 0.0000 0.7640 0.2376 0.0350 1.0000 0.1374 0.0000 0.0008 0.0000 0.0005 0.0035
schreiber2017/mirex2017 0.0018 0.0001 0.0001 0.0001 0.0000 0.0488 0.0119 0.0000 0.0000 0.2000 0.0591 0.4504 0.1374 1.0000 0.0000 0.0001 0.0000 0.0000 0.0004
schreiber2018/cnn 0.0208 0.0745 0.0393 0.1351 0.0000 0.0000 0.0839 0.8964 0.3292 0.0000 0.0078 0.0000 0.0000 0.0000 1.0000 0.2661 0.4333 0.3520 0.1130
schreiber2018/fcn 0.2878 0.5399 0.4422 0.7191 0.0000 0.0000 0.3557 0.4430 0.0826 0.0019 0.1425 0.0000 0.0008 0.0001 0.2661 1.0000 0.6005 0.8887 0.5287
schreiber2018/ismir2018 0.0625 0.1978 0.1235 0.3435 0.0000 0.0000 0.2170 0.6348 0.0809 0.0000 0.0348 0.0000 0.0000 0.0000 0.4333 0.6005 1.0000 0.7609 0.2161
sun2021/default 0.1422 0.3664 0.2709 0.5247 0.0000 0.0000 0.3410 0.5606 0.1000 0.0011 0.0697 0.0000 0.0005 0.0000 0.3520 0.8887 0.7609 1.0000 0.4568
zplane/auftakt_v3 0.6677 0.9777 0.8561 0.8132 0.0000 0.0000 0.6674 0.0621 0.0010 0.0015 0.1864 0.0000 0.0035 0.0004 0.1130 0.5287 0.2161 0.4568 1.0000

Table 16: Paired t-test p-values, using reference annotations 2.0 as groundtruth with OE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.5938 0.7672 0.4306 0.0000 0.0000 0.9913 0.0677 0.0026 0.0341 0.4576 0.0000 0.0153 0.0018 0.0208 0.2878 0.0625 0.1422 0.6677
boeck2019/multi_task 0.5938 1.0000 0.7486 0.7762 0.0000 0.0000 0.7105 0.1644 0.0096 0.0063 0.2050 0.0000 0.0008 0.0001 0.0745 0.5399 0.1978 0.3664 0.9777
boeck2019/multi_task_hjdb 0.7672 0.7486 1.0000 0.5725 0.0000 0.0000 0.8269 0.1139 0.0047 0.0114 0.2685 0.0000 0.0026 0.0001 0.0393 0.4422 0.1235 0.2709 0.8561
boeck2020/dar 0.4306 0.7762 0.5725 1.0000 0.0000 0.0000 0.5987 0.3078 0.0474 0.0088 0.1971 0.0000 0.0012 0.0001 0.1351 0.7191 0.3435 0.5247 0.8132
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
echonest/version_3_2_1 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000 0.0007 0.0010 0.2335 0.0010 0.0488 0.0000 0.0000 0.0000 0.0000 0.0000
gkiokas2012/default 0.9913 0.7105 0.8269 0.5987 0.0000 0.0000 1.0000 0.0495 0.0072 0.0355 0.3517 0.0000 0.0466 0.0119 0.0839 0.3557 0.2170 0.3410 0.6674
klapuri2006/percival2014 0.0677 0.1644 0.1139 0.3078 0.0000 0.0000 0.0495 1.0000 0.1107 0.0000 0.0047 0.0000 0.0000 0.0000 0.8964 0.4430 0.6348 0.5606 0.0621
oliveira2010/ibt 0.0026 0.0096 0.0047 0.0474 0.0000 0.0000 0.0072 0.1107 1.0000 0.0000 0.0001 0.0000 0.0000 0.0000 0.3292 0.0826 0.0809 0.1000 0.0010
percival2014/stem 0.0341 0.0063 0.0114 0.0088 0.0000 0.0007 0.0355 0.0000 0.0000 1.0000 0.3064 0.0075 0.7640 0.2000 0.0000 0.0019 0.0000 0.0011 0.0015
scheirer1998/percival2014 0.4576 0.2050 0.2685 0.1971 0.0000 0.0010 0.3517 0.0047 0.0001 0.3064 1.0000 0.0058 0.2376 0.0591 0.0078 0.1425 0.0348 0.0697 0.1864
schreiber2014/default 0.0000 0.0000 0.0000 0.0000 0.0000 0.2335 0.0000 0.0000 0.0000 0.0075 0.0058 1.0000 0.0350 0.4504 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0153 0.0008 0.0026 0.0012 0.0000 0.0010 0.0466 0.0000 0.0000 0.7640 0.2376 0.0350 1.0000 0.1374 0.0000 0.0008 0.0000 0.0005 0.0035
schreiber2017/mirex2017 0.0018 0.0001 0.0001 0.0001 0.0000 0.0488 0.0119 0.0000 0.0000 0.2000 0.0591 0.4504 0.1374 1.0000 0.0000 0.0001 0.0000 0.0000 0.0004
schreiber2018/cnn 0.0208 0.0745 0.0393 0.1351 0.0000 0.0000 0.0839 0.8964 0.3292 0.0000 0.0078 0.0000 0.0000 0.0000 1.0000 0.2661 0.4333 0.3520 0.1130
schreiber2018/fcn 0.2878 0.5399 0.4422 0.7191 0.0000 0.0000 0.3557 0.4430 0.0826 0.0019 0.1425 0.0000 0.0008 0.0001 0.2661 1.0000 0.6005 0.8887 0.5287
schreiber2018/ismir2018 0.0625 0.1978 0.1235 0.3435 0.0000 0.0000 0.2170 0.6348 0.0809 0.0000 0.0348 0.0000 0.0000 0.0000 0.4333 0.6005 1.0000 0.7609 0.2161
sun2021/default 0.1422 0.3664 0.2709 0.5247 0.0000 0.0000 0.3410 0.5606 0.1000 0.0011 0.0697 0.0000 0.0005 0.0000 0.3520 0.8887 0.7609 1.0000 0.4568
zplane/auftakt_v3 0.6677 0.9777 0.8561 0.8132 0.0000 0.0000 0.6674 0.0621 0.0010 0.0015 0.1864 0.0000 0.0035 0.0004 0.1130 0.5287 0.2161 0.4568 1.0000

Table 17: Paired t-test p-values, using reference annotations 3.0 as groundtruth with OE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.3130 0.3172 0.7394 0.9645 0.2667 0.1958 0.9588 0.3735 0.2712 0.2077 0.2162 0.7056 0.9208 0.4597 0.4862 0.9792 0.9168 0.3027
boeck2019/multi_task 0.3130 1.0000 0.8774 0.2001 0.6401 0.1176 0.4638 0.6059 0.1784 0.6925 0.4067 0.0741 0.7705 0.5394 0.8701 0.9070 0.5250 0.2943 0.1235
boeck2019/multi_task_hjdb 0.3172 0.8774 1.0000 0.1831 0.6741 0.0916 0.4501 0.6344 0.2038 0.6330 0.3691 0.0761 0.8055 0.5923 0.8267 0.8564 0.6069 0.3033 0.1459
boeck2020/dar 0.7394 0.2001 0.1831 1.0000 0.8389 0.3329 0.1223 0.8195 0.4421 0.1909 0.1897 0.2634 0.5800 0.7646 0.4021 0.3945 0.8925 0.8247 0.3822
davies2009/mirex_qm_tempotracker 0.9645 0.6401 0.6741 0.8389 1.0000 0.2750 0.2925 0.9967 0.3551 0.4907 0.2803 0.3580 0.8176 0.9732 0.6158 0.5626 0.9480 0.9290 0.3809
echonest/version_3_2_1 0.2667 0.1176 0.0916 0.3329 0.2750 1.0000 0.0265 0.3708 0.8338 0.0632 0.0648 0.9395 0.1327 0.3018 0.1642 0.1410 0.4892 0.3086 0.7348
gkiokas2012/default 0.1958 0.4638 0.4501 0.1223 0.2925 0.0265 1.0000 0.3049 0.0531 0.7723 0.6182 0.0394 0.3670 0.2816 0.6248 0.6293 0.2500 0.1340 0.0464
klapuri2006/percival2014 0.9588 0.6059 0.6344 0.8195 0.9967 0.3708 0.3049 1.0000 0.4537 0.4561 0.2315 0.3106 0.8123 0.9757 0.6024 0.6133 0.9517 0.9121 0.3130
oliveira2010/ibt 0.3735 0.1784 0.2038 0.4421 0.3551 0.8338 0.0531 0.4537 1.0000 0.1193 0.0729 0.9963 0.2679 0.3669 0.1840 0.2229 0.4451 0.4163 0.9452
percival2014/stem 0.2712 0.6925 0.6330 0.1909 0.4907 0.0632 0.7723 0.4561 0.1193 1.0000 0.5529 0.0278 0.5084 0.3971 0.8589 0.8297 0.3998 0.2366 0.0921
scheirer1998/percival2014 0.2077 0.4067 0.3691 0.1897 0.2803 0.0648 0.6182 0.2315 0.0729 0.5529 1.0000 0.0604 0.3219 0.2407 0.4966 0.5532 0.2625 0.2341 0.0342
schreiber2014/default 0.2162 0.0741 0.0761 0.2634 0.3580 0.9395 0.0394 0.3106 0.9963 0.0278 0.0604 1.0000 0.1791 0.2454 0.1183 0.0728 0.3575 0.2489 0.9420
schreiber2017/ismir2017 0.7056 0.7705 0.8055 0.5800 0.8176 0.1327 0.3670 0.8123 0.2679 0.5084 0.3219 0.1791 1.0000 0.7501 0.6886 0.7298 0.7602 0.6356 0.2305
schreiber2017/mirex2017 0.9208 0.5394 0.5923 0.7646 0.9732 0.3018 0.2816 0.9757 0.3669 0.3971 0.2407 0.2454 0.7501 1.0000 0.5770 0.5967 0.9170 0.8705 0.2765
schreiber2018/cnn 0.4597 0.8701 0.8267 0.4021 0.6158 0.1642 0.6248 0.6024 0.1840 0.8589 0.4966 0.1183 0.6886 0.5770 1.0000 0.9670 0.4497 0.4054 0.1689
schreiber2018/fcn 0.4862 0.9070 0.8564 0.3945 0.5626 0.1410 0.6293 0.6133 0.2229 0.8297 0.5532 0.0728 0.7298 0.5967 0.9670 1.0000 0.3957 0.4358 0.1688
schreiber2018/ismir2018 0.9792 0.5250 0.6069 0.8925 0.9480 0.4892 0.2500 0.9517 0.4451 0.3998 0.2625 0.3575 0.7602 0.9170 0.4497 0.3957 1.0000 0.9756 0.4232
sun2021/default 0.9168 0.2943 0.3033 0.8247 0.9290 0.3086 0.1340 0.9121 0.4163 0.2366 0.2341 0.2489 0.6356 0.8705 0.4054 0.4358 0.9756 1.0000 0.3086
zplane/auftakt_v3 0.3027 0.1235 0.1459 0.3822 0.3809 0.7348 0.0464 0.3130 0.9452 0.0921 0.0342 0.9420 0.2305 0.2765 0.1689 0.1688 0.4232 0.3086 1.0000

Table 18: Paired t-test p-values, using reference annotations 1.0 as groundtruth with OE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.5828 0.6174 0.2239 0.0540 0.7574 0.0678 0.7561 0.4967 0.0145 0.0204 0.2464 0.4697 0.6796 0.6627 0.2139 0.7983 0.3774 0.6877
boeck2019/multi_task 0.5828 1.0000 0.8774 0.2001 0.1604 0.7759 0.1325 0.6059 0.6504 0.0752 0.0409 0.1939 0.7240 0.9262 0.8701 0.3899 0.9603 0.2943 0.5350
boeck2019/multi_task_hjdb 0.6174 0.8774 1.0000 0.1831 0.1183 0.6196 0.1413 0.6344 0.6196 0.0367 0.0314 0.2022 0.6270 0.8598 0.8267 0.3613 0.9755 0.3033 0.5851
boeck2020/dar 0.2239 0.2001 0.1831 1.0000 0.0170 0.7527 0.0184 0.8195 0.2840 0.0053 0.0101 0.6141 0.2280 0.3310 0.4021 0.0940 0.4417 0.8247 0.9391
davies2009/mirex_qm_tempotracker 0.0540 0.1604 0.1183 0.0170 1.0000 0.0480 0.9614 0.0909 0.6117 0.8248 0.2332 0.0357 0.3838 0.2754 0.3368 0.5510 0.1703 0.0403 0.1196
echonest/version_3_2_1 0.7574 0.7759 0.6196 0.7527 0.0480 1.0000 0.1280 0.9844 0.5174 0.0600 0.0198 0.5478 0.3465 0.6479 0.6167 0.2098 0.8044 0.8647 0.7065
gkiokas2012/default 0.0678 0.1325 0.1413 0.0184 0.9614 0.1280 1.0000 0.0893 0.5487 0.7994 0.1868 0.0211 0.3254 0.2450 0.2863 0.5466 0.1577 0.0179 0.0748
klapuri2006/percival2014 0.7561 0.6059 0.6344 0.8195 0.0909 0.9844 0.0893 1.0000 0.4294 0.0695 0.0195 0.5780 0.4397 0.5771 0.6024 0.2936 0.6748 0.9121 0.9085
oliveira2010/ibt 0.4967 0.6504 0.6196 0.2840 0.6117 0.5174 0.5487 0.4294 1.0000 0.5189 0.1586 0.2027 0.8259 0.7071 0.7632 0.9255 0.6612 0.3287 0.3737
percival2014/stem 0.0145 0.0752 0.0367 0.0053 0.8248 0.0600 0.7994 0.0695 0.5189 1.0000 0.3388 0.0059 0.1796 0.1400 0.2460 0.3643 0.1360 0.0044 0.0847
scheirer1998/percival2014 0.0204 0.0409 0.0314 0.0101 0.2332 0.0198 0.1868 0.0195 0.1586 0.3388 1.0000 0.0073 0.0745 0.0438 0.0670 0.1632 0.0429 0.0153 0.0110
schreiber2014/default 0.2464 0.1939 0.2022 0.6141 0.0357 0.5478 0.0211 0.5780 0.2027 0.0059 0.0073 1.0000 0.1517 0.2131 0.2511 0.0509 0.3291 0.5565 0.6997
schreiber2017/ismir2017 0.4697 0.7240 0.6270 0.2280 0.3838 0.3465 0.3254 0.4397 0.8259 0.1796 0.0745 0.1517 1.0000 0.7501 0.8829 0.6894 0.7425 0.2472 0.4464
schreiber2017/mirex2017 0.6796 0.9262 0.8598 0.3310 0.2754 0.6479 0.2450 0.5771 0.7071 0.1400 0.0438 0.2131 0.7501 1.0000 0.9432 0.5486 0.9099 0.4274 0.5359
schreiber2018/cnn 0.6627 0.8701 0.8267 0.4021 0.3368 0.6167 0.2863 0.6024 0.7632 0.2460 0.0670 0.2511 0.8829 0.9432 1.0000 0.5661 0.8198 0.4054 0.5750
schreiber2018/fcn 0.2139 0.3899 0.3613 0.0940 0.5510 0.2098 0.5466 0.2936 0.9255 0.3643 0.1632 0.0509 0.6894 0.5486 0.5661 1.0000 0.3957 0.0922 0.2714
schreiber2018/ismir2018 0.7983 0.9603 0.9755 0.4417 0.1703 0.8044 0.1577 0.6748 0.6612 0.1360 0.0429 0.3291 0.7425 0.9099 0.8198 0.3957 1.0000 0.4778 0.6070
sun2021/default 0.3774 0.2943 0.3033 0.8247 0.0403 0.8647 0.0179 0.9121 0.3287 0.0044 0.0153 0.5565 0.2472 0.4274 0.4054 0.0922 0.4778 1.0000 0.9788
zplane/auftakt_v3 0.6877 0.5350 0.5851 0.9391 0.1196 0.7065 0.0748 0.9085 0.3737 0.0847 0.0110 0.6997 0.4464 0.5359 0.5750 0.2714 0.6070 0.9788 1.0000

Table 19: Paired t-test p-values, using reference annotations 2.0 as groundtruth with OE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.5828 0.6174 0.2239 0.0540 0.7574 0.0678 0.7561 0.4629 0.0145 0.0204 0.2464 0.4697 0.6796 0.3168 0.2139 0.7983 0.3774 0.6877
boeck2019/multi_task 0.5828 1.0000 0.8774 0.2001 0.1604 0.7759 0.1325 0.6059 0.6258 0.0752 0.0409 0.1939 0.7240 0.9262 0.4723 0.3899 0.9603 0.2943 0.5350
boeck2019/multi_task_hjdb 0.6174 0.8774 1.0000 0.1831 0.1183 0.6196 0.1413 0.6344 0.5933 0.0367 0.0314 0.2022 0.6270 0.8598 0.4325 0.3613 0.9755 0.3033 0.5851
boeck2020/dar 0.2239 0.2001 0.1831 1.0000 0.0170 0.7527 0.0184 0.8195 0.2544 0.0053 0.0101 0.6141 0.2280 0.3310 0.1914 0.0940 0.4417 0.8247 0.9391
davies2009/mirex_qm_tempotracker 0.0540 0.1604 0.1183 0.0170 1.0000 0.0480 0.9614 0.0909 0.5583 0.8248 0.2332 0.0357 0.3838 0.2754 0.6543 0.5510 0.1703 0.0403 0.1196
echonest/version_3_2_1 0.7574 0.7759 0.6196 0.7527 0.0480 1.0000 0.1280 0.9844 0.4789 0.0600 0.0198 0.5478 0.3465 0.6479 0.3465 0.2098 0.8044 0.8647 0.7065
gkiokas2012/default 0.0678 0.1325 0.1413 0.0184 0.9614 0.1280 1.0000 0.0893 0.5614 0.7994 0.1868 0.0211 0.3254 0.2450 0.6367 0.5466 0.1577 0.0179 0.0748
klapuri2006/percival2014 0.7561 0.6059 0.6344 0.8195 0.0909 0.9844 0.0893 1.0000 0.4460 0.0695 0.0195 0.5780 0.4397 0.5771 0.3428 0.2936 0.6748 0.9121 0.9085
oliveira2010/ibt 0.4629 0.6258 0.5933 0.2544 0.5583 0.4789 0.5614 0.4460 1.0000 0.5029 0.1451 0.2069 0.8336 0.6947 0.9053 0.9257 0.6264 0.3262 0.4063
percival2014/stem 0.0145 0.0752 0.0367 0.0053 0.8248 0.0600 0.7994 0.0695 0.5029 1.0000 0.3388 0.0059 0.1796 0.1400 0.4484 0.3643 0.1360 0.0044 0.0847
scheirer1998/percival2014 0.0204 0.0409 0.0314 0.0101 0.2332 0.0198 0.1868 0.0195 0.1451 0.3388 1.0000 0.0073 0.0745 0.0438 0.1537 0.1632 0.0429 0.0153 0.0110
schreiber2014/default 0.2464 0.1939 0.2022 0.6141 0.0357 0.5478 0.0211 0.5780 0.2069 0.0059 0.0073 1.0000 0.1517 0.2131 0.1064 0.0509 0.3291 0.5565 0.6997
schreiber2017/ismir2017 0.4697 0.7240 0.6270 0.2280 0.3838 0.3465 0.3254 0.4397 0.8336 0.1796 0.0745 0.1517 1.0000 0.7501 0.7085 0.6894 0.7425 0.2472 0.4464
schreiber2017/mirex2017 0.6796 0.9262 0.8598 0.3310 0.2754 0.6479 0.2450 0.5771 0.6947 0.1400 0.0438 0.2131 0.7501 1.0000 0.5977 0.5486 0.9099 0.4274 0.5359
schreiber2018/cnn 0.3168 0.4723 0.4325 0.1914 0.6543 0.3465 0.6367 0.3428 0.9053 0.4484 0.1537 0.1064 0.7085 0.5977 1.0000 0.9629 0.4377 0.1582 0.3610
schreiber2018/fcn 0.2139 0.3899 0.3613 0.0940 0.5510 0.2098 0.5466 0.2936 0.9257 0.3643 0.1632 0.0509 0.6894 0.5486 0.9629 1.0000 0.3957 0.0922 0.2714
schreiber2018/ismir2018 0.7983 0.9603 0.9755 0.4417 0.1703 0.8044 0.1577 0.6748 0.6264 0.1360 0.0429 0.3291 0.7425 0.9099 0.4377 0.3957 1.0000 0.4778 0.6070
sun2021/default 0.3774 0.2943 0.3033 0.8247 0.0403 0.8647 0.0179 0.9121 0.3262 0.0044 0.0153 0.5565 0.2472 0.4274 0.1582 0.0922 0.4778 1.0000 0.9788
zplane/auftakt_v3 0.6877 0.5350 0.5851 0.9391 0.1196 0.7065 0.0748 0.9085 0.4063 0.0847 0.0110 0.6997 0.4464 0.5359 0.3610 0.2714 0.6070 0.9788 1.0000

Table 20: Paired t-test p-values, using reference annotations 3.0 as groundtruth with OE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

OE1 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

OE1 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 41: Mean OE1 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 42: Mean OE1 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 on cvar-Subsets for 3.0 based on cvar-Values from 1.0

Figure 43: Mean OE1 compared to version 3.0 for tracks with cvar < τ based on beat annotations from 3.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

OE2 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 44: Mean OE2 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 45: Mean OE2 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on cvar-Subsets for 3.0 based on cvar-Values from 1.0

Figure 46: Mean OE2 compared to version 3.0 for tracks with cvar < τ based on beat annotations from 3.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE1 on Tempo-Subsets for 1.0

Figure 47: Mean OE1 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 on Tempo-Subsets for 2.0

Figure 48: Mean OE1 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 on Tempo-Subsets for 3.0

Figure 49: Mean OE1 for estimates compared to version 3.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE2 on Tempo-Subsets for 1.0

Figure 50: Mean OE2 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on Tempo-Subsets for 2.0

Figure 51: Mean OE2 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on Tempo-Subsets for 3.0

Figure 52: Mean OE2 for estimates compared to version 3.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE1 for Tempo

When fitting a generalized additive model (GAM) to OE1-values and a ground truth, what OE1 can we expect with confidence?

Estimated OE1 for Tempo for 1.0

Predictions of GAMs trained on OE1 for estimates for reference 1.0.

Figure 53: OE1 predictions of a generalized additive model (GAM) fit to OE1 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE1 for Tempo for 2.0

Predictions of GAMs trained on OE1 for estimates for reference 2.0.

Figure 54: OE1 predictions of a generalized additive model (GAM) fit to OE1 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE1 for Tempo for 3.0

Predictions of GAMs trained on OE1 for estimates for reference 3.0.

Figure 55: OE1 predictions of a generalized additive model (GAM) fit to OE1 results for 3.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE2 for Tempo

When fitting a generalized additive model (GAM) to OE2-values and a ground truth, what OE2 can we expect with confidence?

Estimated OE2 for Tempo for 1.0

Predictions of GAMs trained on OE2 for estimates for reference 1.0.

Figure 56: OE2 predictions of a generalized additive model (GAM) fit to OE2 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE2 for Tempo for 2.0

Predictions of GAMs trained on OE2 for estimates for reference 2.0.

Figure 57: OE2 predictions of a generalized additive model (GAM) fit to OE2 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE2 for Tempo for 3.0

Predictions of GAMs trained on OE2 for estimates for reference 3.0.

Figure 58: OE2 predictions of a generalized additive model (GAM) fit to OE2 results for 3.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

OE1 for ‘tag_open’ Tags for 1.0

Figure 59: OE1 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

OE1 for ‘tag_open’ Tags for 2.0

Figure 60: OE1 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

OE1 for ‘tag_open’ Tags for 3.0

Figure 61: OE1 of estimates compared to version 3.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

OE2 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

OE2 for ‘tag_open’ Tags for 1.0

Figure 62: OE2 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

OE2 for ‘tag_open’ Tags for 2.0

Figure 63: OE2 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

OE2 for ‘tag_open’ Tags for 3.0

Figure 64: OE2 of estimates compared to version 3.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE1 and AOE2

AOE1 is defined as absolute octave error between an estimate and a reference value: AOE1(E) = |log2(E/R)|.

AOE2 is the minimum of AOE1 allowing the octave errors 2, 3, 1/2, and 1/3: AOE2(E) = min(AOE1(E), AOE1(2E), AOE1(3E), AOE1(½E), AOE1(⅓E)).

Mean AOE1/AOE2 Results for 1.0

Estimator AOE1_MEAN AOE1_STDEV AOE2_MEAN AOE2_STDEV
boeck2015/tempodetector2016_default 0.1277 0.3071 0.0323 0.0784
boeck2020/dar 0.1325 0.3256 0.0306 0.0783
schreiber2018/ismir2018 0.1437 0.3080 0.0459 0.0996
boeck2019/multi_task_hjdb 0.1440 0.3278 0.0341 0.0893
boeck2019/multi_task 0.1508 0.3368 0.0328 0.0868
sun2021/default 0.1537 0.3254 0.0366 0.0842
schreiber2018/cnn 0.1555 0.3180 0.0519 0.1144
schreiber2018/fcn 0.1804 0.3720 0.0392 0.0878
oliveira2010/ibt 0.1940 0.3368 0.0661 0.1271
schreiber2017/ismir2017 0.2083 0.3808 0.0465 0.1057
klapuri2006/percival2014 0.2110 0.3670 0.0526 0.1158
schreiber2017/mirex2017 0.2200 0.4178 0.0451 0.1117
schreiber2014/default 0.2250 0.3807 0.0438 0.0991
zplane/auftakt_v3 0.2261 0.3750 0.0583 0.1263
percival2014/stem 0.2357 0.3964 0.0384 0.0941
davies2009/mirex_qm_tempotracker 0.2413 0.4057 0.0584 0.0934
echonest/version_3_2_1 0.2493 0.4055 0.0405 0.0954
gkiokas2012/default 0.2992 0.4479 0.0434 0.1015
scheirer1998/percival2014 0.3447 0.4182 0.1016 0.1506

Table 21: Mean AOE1/AOE2 for estimates compared to version 1.0 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE1: CSV JSON LATEX PICKLE

Raw data AOE2: CSV JSON LATEX PICKLE

AOE1 distribution for 1.0

Figure 65: AOE1 for estimates compared to version 1.0. Shown are the mean AOE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 distribution for 1.0

Figure 66: AOE2 for estimates compared to version 1.0. Shown are the mean AOE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean AOE1/AOE2 Results for 2.0

Estimator AOE1_MEAN AOE1_STDEV AOE2_MEAN AOE2_STDEV
boeck2015/tempodetector2016_default 0.1246 0.3156 0.0217 0.0573
boeck2020/dar 0.1295 0.3341 0.0206 0.0652
schreiber2018/ismir2018 0.1439 0.3205 0.0376 0.0901
boeck2019/multi_task_hjdb 0.1460 0.3449 0.0221 0.0676
schreiber2018/cnn 0.1506 0.3252 0.0420 0.1013
sun2021/default 0.1529 0.3376 0.0272 0.0660
boeck2019/multi_task 0.1541 0.3550 0.0210 0.0648
schreiber2018/fcn 0.1814 0.3827 0.0290 0.0693
oliveira2010/ibt 0.1884 0.3392 0.0584 0.1203
klapuri2006/percival2014 0.2036 0.3683 0.0416 0.1024
schreiber2017/ismir2017 0.2118 0.3970 0.0339 0.0900
schreiber2017/mirex2017 0.2233 0.4331 0.0334 0.0996
zplane/auftakt_v3 0.2261 0.3831 0.0470 0.1141
davies2009/mirex_qm_tempotracker 0.2297 0.3996 0.0505 0.0882
schreiber2014/default 0.2304 0.3912 0.0361 0.0934
percival2014/stem 0.2369 0.4063 0.0266 0.0794
echonest/version_3_2_1 0.2514 0.4155 0.0335 0.0854
gkiokas2012/default 0.2964 0.4543 0.0324 0.0840
scheirer1998/percival2014 0.3435 0.4218 0.0953 0.1466

Table 22: Mean AOE1/AOE2 for estimates compared to version 2.0 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE1: CSV JSON LATEX PICKLE

Raw data AOE2: CSV JSON LATEX PICKLE

AOE1 distribution for 2.0

Figure 67: AOE1 for estimates compared to version 2.0. Shown are the mean AOE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 distribution for 2.0

Figure 68: AOE2 for estimates compared to version 2.0. Shown are the mean AOE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean AOE1/AOE2 Results for 3.0

Estimator AOE1_MEAN AOE1_STDEV AOE2_MEAN AOE2_STDEV
boeck2015/tempodetector2016_default 0.1243 0.3146 0.0218 0.0577
boeck2020/dar 0.1290 0.3329 0.0208 0.0655
schreiber2018/ismir2018 0.1422 0.3178 0.0377 0.0884
boeck2019/multi_task_hjdb 0.1455 0.3435 0.0223 0.0673
schreiber2018/cnn 0.1504 0.3234 0.0428 0.1013
sun2021/default 0.1530 0.3359 0.0281 0.0660
boeck2019/multi_task 0.1536 0.3526 0.0218 0.0657
schreiber2018/fcn 0.1801 0.3787 0.0299 0.0698
oliveira2010/ibt 0.1875 0.3370 0.0592 0.1214
klapuri2006/percival2014 0.2026 0.3659 0.0420 0.1025
schreiber2017/ismir2017 0.2111 0.3951 0.0346 0.0891
schreiber2017/mirex2017 0.2234 0.4323 0.0344 0.1002
zplane/auftakt_v3 0.2249 0.3801 0.0491 0.1165
davies2009/mirex_qm_tempotracker 0.2283 0.3968 0.0504 0.0872
schreiber2014/default 0.2306 0.3901 0.0372 0.0932
percival2014/stem 0.2353 0.4038 0.0270 0.0769
echonest/version_3_2_1 0.2514 0.4142 0.0340 0.0850
gkiokas2012/default 0.2961 0.4510 0.0336 0.0842
scheirer1998/percival2014 0.3424 0.4207 0.0955 0.1467

Table 23: Mean AOE1/AOE2 for estimates compared to version 3.0 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE1: CSV JSON LATEX PICKLE

Raw data AOE2: CSV JSON LATEX PICKLE

AOE1 distribution for 3.0

Figure 69: AOE1 for estimates compared to version 3.0. Shown are the mean AOE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 distribution for 3.0

Figure 70: AOE2 for estimates compared to version 3.0. Shown are the mean AOE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.3183 0.4767 0.8417 0.0001 0.0001 0.0000 0.0029 0.0102 0.0002 0.0000 0.0003 0.0022 0.0026 0.2520 0.0637 0.4657 0.2744 0.0004
boeck2019/multi_task 0.3183 1.0000 0.6885 0.4067 0.0034 0.0013 0.0000 0.0299 0.0743 0.0024 0.0000 0.0113 0.0230 0.0226 0.8367 0.2850 0.7403 0.8967 0.0046
boeck2019/multi_task_hjdb 0.4767 0.6885 1.0000 0.5659 0.0011 0.0005 0.0000 0.0155 0.0340 0.0011 0.0000 0.0058 0.0135 0.0094 0.6135 0.2170 0.9895 0.6791 0.0017
boeck2020/dar 0.8417 0.4067 0.5659 1.0000 0.0009 0.0004 0.0000 0.0112 0.0257 0.0006 0.0000 0.0022 0.0042 0.0034 0.3207 0.1000 0.6101 0.3316 0.0011
davies2009/mirex_qm_tempotracker 0.0001 0.0034 0.0011 0.0009 1.0000 0.7607 0.0539 0.1281 0.0123 0.8503 0.0010 0.6292 0.2979 0.5218 0.0027 0.0508 0.0002 0.0076 0.5185
echonest/version_3_2_1 0.0001 0.0013 0.0005 0.0004 0.7607 1.0000 0.1165 0.2270 0.0732 0.6450 0.0034 0.3711 0.1157 0.2505 0.0030 0.0383 0.0008 0.0028 0.4183
gkiokas2012/default 0.0000 0.0000 0.0000 0.0000 0.0539 0.1165 1.0000 0.0011 0.0003 0.0340 0.2384 0.0142 0.0062 0.0203 0.0000 0.0002 0.0000 0.0000 0.0107
klapuri2006/percival2014 0.0029 0.0299 0.0155 0.0112 0.1281 0.2270 0.0011 1.0000 0.2724 0.2784 0.0000 0.6140 0.9220 0.7737 0.0336 0.2745 0.0023 0.0650 0.4645
oliveira2010/ibt 0.0102 0.0743 0.0340 0.0257 0.0123 0.0732 0.0003 0.2724 1.0000 0.0730 0.0000 0.2803 0.5851 0.3812 0.0990 0.6170 0.0147 0.1344 0.0923
percival2014/stem 0.0002 0.0024 0.0011 0.0006 0.8503 0.6450 0.0340 0.2784 0.0730 1.0000 0.0002 0.6548 0.2578 0.5728 0.0042 0.0613 0.0003 0.0059 0.6778
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0010 0.0034 0.2384 0.0000 0.0000 0.0002 1.0000 0.0002 0.0000 0.0002 0.0000 0.0000 0.0000 0.0000 0.0001
schreiber2014/default 0.0003 0.0113 0.0058 0.0022 0.6292 0.3711 0.0142 0.6140 0.2803 0.6548 0.0002 1.0000 0.5022 0.8588 0.0117 0.1548 0.0033 0.0211 0.9653
schreiber2017/ismir2017 0.0022 0.0230 0.0135 0.0042 0.2979 0.1157 0.0062 0.9220 0.5851 0.2578 0.0000 0.5022 1.0000 0.5635 0.0510 0.3250 0.0076 0.0589 0.5011
schreiber2017/mirex2017 0.0026 0.0226 0.0094 0.0034 0.5218 0.2505 0.0203 0.7737 0.3812 0.5728 0.0002 0.8588 0.5635 1.0000 0.0306 0.1818 0.0048 0.0357 0.8302
schreiber2018/cnn 0.2520 0.8367 0.6135 0.3207 0.0027 0.0030 0.0000 0.0336 0.0990 0.0042 0.0000 0.0117 0.0510 0.0306 1.0000 0.2784 0.5194 0.9382 0.0090
schreiber2018/fcn 0.0637 0.2850 0.2170 0.1000 0.0508 0.0383 0.0002 0.2745 0.6170 0.0613 0.0000 0.1548 0.3250 0.1818 0.2784 1.0000 0.0857 0.3424 0.1073
schreiber2018/ismir2018 0.4657 0.7403 0.9895 0.6101 0.0002 0.0008 0.0000 0.0023 0.0147 0.0003 0.0000 0.0033 0.0076 0.0048 0.5194 0.0857 1.0000 0.6658 0.0003
sun2021/default 0.2744 0.8967 0.6791 0.3316 0.0076 0.0028 0.0000 0.0650 0.1344 0.0059 0.0000 0.0211 0.0589 0.0357 0.9382 0.3424 0.6658 1.0000 0.0117
zplane/auftakt_v3 0.0004 0.0046 0.0017 0.0011 0.5185 0.4183 0.0107 0.4645 0.0923 0.6778 0.0001 0.9653 0.5011 0.8302 0.0090 0.1073 0.0003 0.0117 1.0000

Table 24: Paired t-test p-values, using reference annotations 1.0 as groundtruth with AOE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.2161 0.3639 0.8429 0.0005 0.0000 0.0000 0.0063 0.0142 0.0001 0.0000 0.0001 0.0013 0.0016 0.2930 0.0484 0.3820 0.2382 0.0003
boeck2019/multi_task 0.2161 1.0000 0.6403 0.2972 0.0188 0.0020 0.0001 0.0859 0.1726 0.0035 0.0000 0.0100 0.0227 0.0267 0.8841 0.3384 0.6409 0.9569 0.0082
boeck2019/multi_task_hjdb 0.3639 0.6403 1.0000 0.4446 0.0066 0.0008 0.0000 0.0440 0.0835 0.0013 0.0000 0.0046 0.0126 0.0098 0.8483 0.2397 0.9222 0.7784 0.0025
boeck2020/dar 0.8429 0.2972 0.4446 1.0000 0.0028 0.0003 0.0000 0.0206 0.0393 0.0005 0.0000 0.0013 0.0030 0.0026 0.3900 0.0804 0.5334 0.3175 0.0009
davies2009/mirex_qm_tempotracker 0.0005 0.0188 0.0066 0.0028 1.0000 0.5078 0.0306 0.1988 0.0322 0.8137 0.0003 0.9837 0.5866 0.8505 0.0065 0.1314 0.0014 0.0215 0.8816
echonest/version_3_2_1 0.0000 0.0020 0.0008 0.0003 0.5078 1.0000 0.1697 0.1404 0.0464 0.6304 0.0058 0.4628 0.1433 0.2845 0.0021 0.0399 0.0009 0.0029 0.3887
gkiokas2012/default 0.0000 0.0001 0.0000 0.0000 0.0306 0.1697 1.0000 0.0008 0.0003 0.0498 0.2243 0.0311 0.0125 0.0365 0.0000 0.0005 0.0000 0.0001 0.0166
klapuri2006/percival2014 0.0063 0.0859 0.0440 0.0206 0.1988 0.1404 0.0008 1.0000 0.3551 0.1568 0.0000 0.3418 0.7835 0.5433 0.0491 0.4473 0.0110 0.1118 0.3055
oliveira2010/ibt 0.0142 0.1726 0.0835 0.0393 0.0322 0.0464 0.0003 0.3551 1.0000 0.0461 0.0000 0.1541 0.3918 0.2564 0.1053 0.7984 0.0348 0.1904 0.0597
percival2014/stem 0.0001 0.0035 0.0013 0.0005 0.8137 0.6304 0.0498 0.1568 0.0461 1.0000 0.0004 0.7904 0.3085 0.6317 0.0028 0.0644 0.0004 0.0057 0.6471
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0003 0.0058 0.2243 0.0000 0.0000 0.0004 1.0000 0.0004 0.0000 0.0005 0.0000 0.0000 0.0000 0.0000 0.0001
schreiber2014/default 0.0001 0.0100 0.0046 0.0013 0.9837 0.4628 0.0311 0.3418 0.1541 0.7904 0.0004 1.0000 0.4617 0.8023 0.0047 0.1249 0.0022 0.0141 0.8667
schreiber2017/ismir2017 0.0013 0.0227 0.0126 0.0030 0.5866 0.1433 0.0125 0.7835 0.3918 0.3085 0.0000 0.4617 1.0000 0.5926 0.0307 0.2988 0.0065 0.0478 0.5993
schreiber2017/mirex2017 0.0016 0.0267 0.0098 0.0026 0.8505 0.2845 0.0365 0.5433 0.2564 0.6317 0.0005 0.8023 0.5926 1.0000 0.0178 0.1656 0.0044 0.0286 0.9227
schreiber2018/cnn 0.2930 0.8841 0.8483 0.3900 0.0065 0.0021 0.0000 0.0491 0.1053 0.0028 0.0000 0.0047 0.0307 0.0178 1.0000 0.1861 0.7209 0.9233 0.0063
schreiber2018/fcn 0.0484 0.3384 0.2397 0.0804 0.1314 0.0399 0.0005 0.4473 0.7984 0.0644 0.0000 0.1249 0.2988 0.1656 0.1861 1.0000 0.0821 0.3125 0.1189
schreiber2018/ismir2018 0.3820 0.6409 0.9222 0.5334 0.0014 0.0009 0.0000 0.0110 0.0348 0.0004 0.0000 0.0022 0.0065 0.0044 0.7209 0.0821 1.0000 0.6984 0.0004
sun2021/default 0.2382 0.9569 0.7784 0.3175 0.0215 0.0029 0.0001 0.1118 0.1904 0.0057 0.0000 0.0141 0.0478 0.0286 0.9233 0.3125 0.6984 1.0000 0.0111
zplane/auftakt_v3 0.0003 0.0082 0.0025 0.0009 0.8816 0.3887 0.0166 0.3055 0.0597 0.6471 0.0001 0.8667 0.5993 0.9227 0.0063 0.1189 0.0004 0.0111 1.0000

Table 25: Paired t-test p-values, using reference annotations 2.0 as groundtruth with AOE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.2192 0.3690 0.8520 0.0006 0.0000 0.0000 0.0065 0.0145 0.0001 0.0000 0.0001 0.0013 0.0015 0.2893 0.0507 0.4143 0.2310 0.0003
boeck2019/multi_task 0.2192 1.0000 0.6426 0.2945 0.0190 0.0019 0.0001 0.0864 0.1726 0.0037 0.0000 0.0092 0.0229 0.0249 0.8935 0.3471 0.6018 0.9793 0.0084
boeck2019/multi_task_hjdb 0.3690 0.6426 1.0000 0.4415 0.0069 0.0007 0.0000 0.0447 0.0840 0.0014 0.0000 0.0043 0.0130 0.0090 0.8374 0.2476 0.8793 0.7580 0.0026
boeck2020/dar 0.8520 0.2945 0.4415 1.0000 0.0029 0.0003 0.0000 0.0206 0.0391 0.0005 0.0000 0.0011 0.0030 0.0023 0.3797 0.0832 0.5638 0.3005 0.0010
davies2009/mirex_qm_tempotracker 0.0006 0.0190 0.0069 0.0029 1.0000 0.4803 0.0279 0.2053 0.0334 0.8179 0.0003 0.9461 0.6011 0.8851 0.0073 0.1305 0.0013 0.0233 0.8878
echonest/version_3_2_1 0.0000 0.0019 0.0007 0.0003 0.4803 1.0000 0.1687 0.1300 0.0424 0.5915 0.0062 0.4674 0.1370 0.2862 0.0019 0.0352 0.0007 0.0029 0.3682
gkiokas2012/default 0.0000 0.0001 0.0000 0.0000 0.0279 0.1687 1.0000 0.0007 0.0002 0.0443 0.2315 0.0314 0.0116 0.0365 0.0000 0.0004 0.0000 0.0001 0.0145
klapuri2006/percival2014 0.0065 0.0864 0.0447 0.0206 0.2053 0.1300 0.0007 1.0000 0.3602 0.1617 0.0000 0.3170 0.7720 0.5189 0.0522 0.4414 0.0099 0.1179 0.3057
oliveira2010/ibt 0.0145 0.1726 0.0840 0.0391 0.0334 0.0424 0.0002 0.3602 1.0000 0.0482 0.0000 0.1420 0.3850 0.2426 0.1111 0.7875 0.0310 0.2004 0.0591
percival2014/stem 0.0001 0.0037 0.0014 0.0005 0.8179 0.5915 0.0443 0.1617 0.0482 1.0000 0.0004 0.8466 0.3246 0.6755 0.0031 0.0657 0.0004 0.0064 0.6593
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0003 0.0062 0.2315 0.0000 0.0000 0.0004 1.0000 0.0005 0.0000 0.0006 0.0000 0.0000 0.0000 0.0000 0.0001
schreiber2014/default 0.0001 0.0092 0.0043 0.0011 0.9461 0.4674 0.0314 0.3170 0.1420 0.8466 0.0005 1.0000 0.4413 0.7998 0.0043 0.1125 0.0017 0.0138 0.8243
schreiber2017/ismir2017 0.0013 0.0229 0.0130 0.0030 0.6011 0.1370 0.0116 0.7720 0.3850 0.3246 0.0000 0.4413 1.0000 0.5648 0.0312 0.2870 0.0057 0.0498 0.6128
schreiber2017/mirex2017 0.0015 0.0249 0.0090 0.0023 0.8851 0.2862 0.0365 0.5189 0.2426 0.6755 0.0006 0.7998 0.5648 1.0000 0.0171 0.1515 0.0037 0.0280 0.9588
schreiber2018/cnn 0.2893 0.8935 0.8374 0.3797 0.0073 0.0019 0.0000 0.0522 0.1111 0.0031 0.0000 0.0043 0.0312 0.0171 1.0000 0.1986 0.6613 0.9108 0.0066
schreiber2018/fcn 0.0507 0.3471 0.2476 0.0832 0.1305 0.0352 0.0004 0.4414 0.7875 0.0657 0.0000 0.1125 0.2870 0.1515 0.1986 1.0000 0.0781 0.3323 0.1174
schreiber2018/ismir2018 0.4143 0.6018 0.8793 0.5638 0.0013 0.0007 0.0000 0.0099 0.0310 0.0004 0.0000 0.0017 0.0057 0.0037 0.6613 0.0781 1.0000 0.6399 0.0003
sun2021/default 0.2310 0.9793 0.7580 0.3005 0.0233 0.0029 0.0001 0.1179 0.2004 0.0064 0.0000 0.0138 0.0498 0.0280 0.9108 0.3323 0.6399 1.0000 0.0122
zplane/auftakt_v3 0.0003 0.0084 0.0026 0.0010 0.8878 0.3682 0.0145 0.3057 0.0591 0.6593 0.0001 0.8243 0.6128 0.9588 0.0066 0.1174 0.0003 0.0122 1.0000

Table 26: Paired t-test p-values, using reference annotations 3.0 as groundtruth with AOE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.9060 0.5168 0.4308 0.0000 0.1275 0.0461 0.0022 0.0000 0.1672 0.0000 0.0238 0.0103 0.0330 0.0025 0.1621 0.0198 0.0797 0.0003
boeck2019/multi_task 0.9060 1.0000 0.6816 0.6050 0.0000 0.0640 0.0539 0.0068 0.0000 0.2931 0.0000 0.0750 0.0059 0.0463 0.0048 0.2258 0.0163 0.3371 0.0006
boeck2019/multi_task_hjdb 0.5168 0.6816 1.0000 0.3054 0.0000 0.2677 0.1180 0.0081 0.0000 0.3861 0.0000 0.0941 0.0216 0.0717 0.0105 0.3448 0.0562 0.4601 0.0015
boeck2020/dar 0.4308 0.6050 0.3054 1.0000 0.0000 0.0584 0.0257 0.0010 0.0000 0.0855 0.0000 0.0068 0.0049 0.0151 0.0008 0.1016 0.0062 0.0398 0.0001
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0002 0.0047 0.2812 0.1873 0.0000 0.0000 0.0183 0.0246 0.0339 0.3224 0.0007 0.0232 0.0001 0.9936
echonest/version_3_2_1 0.1275 0.0640 0.2677 0.0584 0.0002 1.0000 0.6962 0.0344 0.0002 0.7261 0.0000 0.6212 0.4650 0.5042 0.0956 0.8139 0.5058 0.5014 0.0164
gkiokas2012/default 0.0461 0.0539 0.1180 0.0257 0.0047 0.6962 1.0000 0.1517 0.0007 0.3903 0.0000 0.9594 0.6162 0.8127 0.1800 0.4263 0.6554 0.2375 0.0266
klapuri2006/percival2014 0.0022 0.0068 0.0081 0.0010 0.2812 0.0344 0.1517 1.0000 0.0328 0.0483 0.0000 0.2258 0.3421 0.2496 0.9375 0.0443 0.3750 0.0210 0.4732
oliveira2010/ibt 0.0000 0.0000 0.0000 0.0000 0.1873 0.0002 0.0007 0.0328 1.0000 0.0000 0.0030 0.0047 0.0017 0.0013 0.0401 0.0008 0.0024 0.0002 0.2674
percival2014/stem 0.1672 0.2931 0.3861 0.0855 0.0000 0.7261 0.3903 0.0483 0.0000 1.0000 0.0000 0.3870 0.1518 0.2931 0.0049 0.8921 0.1020 0.7203 0.0033
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0030 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001
schreiber2014/default 0.0238 0.0750 0.0941 0.0068 0.0183 0.6212 0.9594 0.2258 0.0047 0.3870 0.0000 1.0000 0.6829 0.8442 0.2252 0.4292 0.7466 0.1925 0.0654
schreiber2017/ismir2017 0.0103 0.0059 0.0216 0.0049 0.0246 0.4650 0.6162 0.3421 0.0017 0.1518 0.0000 0.6829 1.0000 0.7473 0.4173 0.2605 0.9264 0.0895 0.0738
schreiber2017/mirex2017 0.0330 0.0463 0.0717 0.0151 0.0339 0.5042 0.8127 0.2496 0.0013 0.2931 0.0000 0.8442 0.7473 1.0000 0.3092 0.4097 0.8944 0.1945 0.0581
schreiber2018/cnn 0.0025 0.0048 0.0105 0.0008 0.3224 0.0956 0.1800 0.9375 0.0401 0.0049 0.0000 0.2252 0.4173 0.3092 1.0000 0.0493 0.1908 0.0246 0.3172
schreiber2018/fcn 0.1621 0.2258 0.3448 0.1016 0.0007 0.8139 0.4263 0.0443 0.0008 0.8921 0.0000 0.4292 0.2605 0.4097 0.0493 1.0000 0.2682 0.6235 0.0067
schreiber2018/ismir2018 0.0198 0.0163 0.0562 0.0062 0.0232 0.5058 0.6554 0.3750 0.0024 0.1020 0.0000 0.7466 0.9264 0.8944 0.1908 0.2682 1.0000 0.0921 0.0625
sun2021/default 0.0797 0.3371 0.4601 0.0398 0.0001 0.5014 0.2375 0.0210 0.0002 0.7203 0.0000 0.1925 0.0895 0.1945 0.0246 0.6235 0.0921 1.0000 0.0038
zplane/auftakt_v3 0.0003 0.0006 0.0015 0.0001 0.9936 0.0164 0.0266 0.4732 0.2674 0.0033 0.0001 0.0654 0.0738 0.0581 0.3172 0.0067 0.0625 0.0038 1.0000

Table 27: Paired t-test p-values, using reference annotations 1.0 as groundtruth with AOE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.8291 0.8866 0.6496 0.0000 0.0422 0.0593 0.0023 0.0000 0.3065 0.0000 0.0057 0.0241 0.0486 0.0019 0.1644 0.0079 0.0354 0.0005
boeck2019/multi_task 0.8291 1.0000 0.7342 0.9209 0.0000 0.0123 0.0445 0.0046 0.0000 0.3249 0.0000 0.0157 0.0083 0.0446 0.0022 0.1572 0.0035 0.1375 0.0005
boeck2019/multi_task_hjdb 0.8866 0.7342 1.0000 0.6756 0.0000 0.0698 0.0920 0.0051 0.0000 0.4078 0.0000 0.0171 0.0290 0.0648 0.0048 0.2303 0.0162 0.1568 0.0011
boeck2020/dar 0.6496 0.9209 0.6756 1.0000 0.0000 0.0301 0.0437 0.0022 0.0000 0.2510 0.0000 0.0026 0.0215 0.0355 0.0012 0.1505 0.0045 0.0277 0.0002
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0009 0.0008 0.1181 0.1561 0.0000 0.0000 0.0360 0.0039 0.0111 0.1941 0.0007 0.0312 0.0002 0.6198
echonest/version_3_2_1 0.0422 0.0123 0.0698 0.0301 0.0009 1.0000 0.7802 0.1836 0.0005 0.2631 0.0000 0.7055 0.8053 0.9399 0.2138 0.4377 0.6496 0.3274 0.0840
gkiokas2012/default 0.0593 0.0445 0.0920 0.0437 0.0008 0.7802 1.0000 0.1562 0.0002 0.3259 0.0000 0.6070 0.8167 0.8904 0.1274 0.5501 0.3844 0.3660 0.0395
klapuri2006/percival2014 0.0023 0.0046 0.0051 0.0022 0.1181 0.1836 0.1562 1.0000 0.0115 0.0385 0.0000 0.4580 0.2377 0.2062 0.9621 0.0655 0.5974 0.0356 0.5048
oliveira2010/ibt 0.0000 0.0000 0.0000 0.0000 0.1561 0.0005 0.0002 0.0115 1.0000 0.0000 0.0016 0.0065 0.0002 0.0003 0.0178 0.0005 0.0027 0.0001 0.1088
percival2014/stem 0.3065 0.3249 0.4078 0.2510 0.0000 0.2631 0.3259 0.0385 0.0000 1.0000 0.0000 0.1491 0.2293 0.3096 0.0021 0.7092 0.0386 0.9185 0.0027
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0016 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0057 0.0157 0.0171 0.0026 0.0360 0.7055 0.6070 0.4580 0.0065 0.1491 0.0000 1.0000 0.7438 0.6815 0.3991 0.2539 0.8290 0.1145 0.1722
schreiber2017/ismir2017 0.0241 0.0083 0.0290 0.0215 0.0039 0.8053 0.8167 0.2377 0.0002 0.2293 0.0000 0.7438 1.0000 0.9197 0.2382 0.4713 0.5222 0.2616 0.0534
schreiber2017/mirex2017 0.0486 0.0446 0.0648 0.0355 0.0111 0.9399 0.8904 0.2062 0.0003 0.3096 0.0000 0.6815 0.9197 1.0000 0.2210 0.5493 0.5396 0.3436 0.0490
schreiber2018/cnn 0.0019 0.0022 0.0048 0.0012 0.1941 0.2138 0.1274 0.9621 0.0178 0.0021 0.0000 0.3991 0.2382 0.2210 1.0000 0.0509 0.3561 0.0349 0.4262
schreiber2018/fcn 0.1644 0.1572 0.2303 0.1505 0.0007 0.4377 0.5501 0.0655 0.0005 0.7092 0.0000 0.2539 0.4713 0.5493 0.0509 1.0000 0.1582 0.7573 0.0131
schreiber2018/ismir2018 0.0079 0.0035 0.0162 0.0045 0.0312 0.6496 0.3844 0.5974 0.0027 0.0386 0.0000 0.8290 0.5222 0.5396 0.3561 0.1582 1.0000 0.0812 0.1594
sun2021/default 0.0354 0.1375 0.1568 0.0277 0.0002 0.3274 0.3660 0.0356 0.0001 0.9185 0.0000 0.1145 0.2616 0.3436 0.0349 0.7573 0.0812 1.0000 0.0093
zplane/auftakt_v3 0.0005 0.0005 0.0011 0.0002 0.6198 0.0840 0.0395 0.5048 0.1088 0.0027 0.0000 0.1722 0.0534 0.0490 0.4262 0.0131 0.1594 0.0093 1.0000

Table 28: Paired t-test p-values, using reference annotations 2.0 as groundtruth with AOE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default boeck2019/multi_task boeck2019/multi_task_hjdb boeck2020/dar davies2009/mirex_qm_tempotracker echonest/version_3_2_1 gkiokas2012/default klapuri2006/percival2014 oliveira2010/ibt percival2014/stem scheirer1998/percival2014 schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018 sun2021/default zplane/auftakt_v3
boeck2015/tempodetector2016_default 1.0000 0.9840 0.8485 0.6923 0.0000 0.0333 0.0377 0.0020 0.0000 0.2639 0.0000 0.0022 0.0175 0.0340 0.0013 0.1243 0.0069 0.0136 0.0002
boeck2019/multi_task 0.9840 1.0000 0.8621 0.8305 0.0000 0.0120 0.0363 0.0055 0.0000 0.3522 0.0000 0.0124 0.0088 0.0411 0.0022 0.1500 0.0040 0.1249 0.0003
boeck2019/multi_task_hjdb 0.8485 0.8621 1.0000 0.6846 0.0000 0.0586 0.0637 0.0047 0.0000 0.3819 0.0000 0.0098 0.0220 0.0495 0.0042 0.1882 0.0156 0.1017 0.0006
boeck2020/dar 0.6923 0.8305 0.6846 1.0000 0.0000 0.0241 0.0311 0.0019 0.0000 0.2249 0.0000 0.0011 0.0173 0.0260 0.0011 0.1216 0.0041 0.0163 0.0001
davies2009/mirex_qm_tempotracker 0.0000 0.0000 0.0000 0.0000 1.0000 0.0012 0.0014 0.1337 0.1112 0.0000 0.0000 0.0459 0.0053 0.0167 0.2309 0.0011 0.0315 0.0002 0.8510
echonest/version_3_2_1 0.0333 0.0120 0.0586 0.0241 0.0012 1.0000 0.8508 0.1813 0.0004 0.2386 0.0000 0.6388 0.8436 0.9852 0.2052 0.4819 0.7166 0.3397 0.0481
gkiokas2012/default 0.0377 0.0363 0.0637 0.0311 0.0014 0.8508 1.0000 0.1959 0.0003 0.2471 0.0000 0.6076 0.8728 0.9132 0.1437 0.5160 0.4905 0.3428 0.0298
klapuri2006/percival2014 0.0020 0.0055 0.0047 0.0019 0.1337 0.1813 0.1959 1.0000 0.0099 0.0339 0.0000 0.5105 0.2604 0.2408 0.9205 0.0823 0.5724 0.0414 0.3757
oliveira2010/ibt 0.0000 0.0000 0.0000 0.0000 0.1112 0.0004 0.0003 0.0099 1.0000 0.0000 0.0020 0.0070 0.0002 0.0005 0.0179 0.0005 0.0017 0.0001 0.1536
percival2014/stem 0.2639 0.3522 0.3819 0.2249 0.0000 0.2386 0.2471 0.0339 0.0000 1.0000 0.0000 0.1045 0.1920 0.2565 0.0013 0.6254 0.0330 0.8293 0.0011
scheirer1998/percival2014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0020 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2014/default 0.0022 0.0124 0.0098 0.0011 0.0459 0.6388 0.6076 0.5105 0.0070 0.1045 0.0000 1.0000 0.6984 0.6623 0.4187 0.2328 0.9391 0.1037 0.1346
schreiber2017/ismir2017 0.0175 0.0088 0.0220 0.0173 0.0053 0.8436 0.8728 0.2604 0.0002 0.1920 0.0000 0.6984 1.0000 0.9653 0.2304 0.4808 0.5831 0.2681 0.0355
schreiber2017/mirex2017 0.0340 0.0411 0.0495 0.0260 0.0167 0.9852 0.9132 0.2408 0.0005 0.2565 0.0000 0.6623 0.9653 1.0000 0.2401 0.5416 0.6197 0.3376 0.0350
schreiber2018/cnn 0.0013 0.0022 0.0042 0.0011 0.2309 0.2052 0.1437 0.9205 0.0179 0.0013 0.0000 0.4187 0.2304 0.2401 1.0000 0.0486 0.2729 0.0364 0.3148
schreiber2018/fcn 0.1243 0.1500 0.1882 0.1216 0.0011 0.4819 0.5160 0.0823 0.0005 0.6254 0.0000 0.2328 0.4808 0.5416 0.0486 1.0000 0.1934 0.7559 0.0081
schreiber2018/ismir2018 0.0069 0.0040 0.0156 0.0041 0.0315 0.7166 0.4905 0.5724 0.0017 0.0330 0.0000 0.9391 0.5831 0.6197 0.2729 0.1934 1.0000 0.0981 0.0833
sun2021/default 0.0136 0.1249 0.1017 0.0163 0.0002 0.3397 0.3428 0.0414 0.0001 0.8293 0.0000 0.1037 0.2681 0.3376 0.0364 0.7559 0.0981 1.0000 0.0061
zplane/auftakt_v3 0.0002 0.0003 0.0006 0.0001 0.8510 0.0481 0.0298 0.3757 0.1536 0.0011 0.0000 0.1346 0.0355 0.0350 0.3148 0.0081 0.0833 0.0061 1.0000

Table 29: Paired t-test p-values, using reference annotations 3.0 as groundtruth with AOE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

AOE1 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

AOE1 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 71: Mean AOE1 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 72: Mean AOE1 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 on cvar-Subsets for 3.0 based on cvar-Values from 1.0

Figure 73: Mean AOE1 compared to version 3.0 for tracks with cvar < τ based on beat annotations from 3.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

AOE2 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 74: Mean AOE2 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 75: Mean AOE2 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on cvar-Subsets for 3.0 based on cvar-Values from 1.0

Figure 76: Mean AOE2 compared to version 3.0 for tracks with cvar < τ based on beat annotations from 3.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE1 on Tempo-Subsets for 1.0

Figure 77: Mean AOE1 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 on Tempo-Subsets for 2.0

Figure 78: Mean AOE1 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 on Tempo-Subsets for 3.0

Figure 79: Mean AOE1 for estimates compared to version 3.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE2 on Tempo-Subsets for 1.0

Figure 80: Mean AOE2 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on Tempo-Subsets for 2.0

Figure 81: Mean AOE2 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on Tempo-Subsets for 3.0

Figure 82: Mean AOE2 for estimates compared to version 3.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE1 for Tempo

When fitting a generalized additive model (GAM) to AOE1-values and a ground truth, what AOE1 can we expect with confidence?

Estimated AOE1 for Tempo for 1.0

Predictions of GAMs trained on AOE1 for estimates for reference 1.0.

Figure 83: AOE1 predictions of a generalized additive model (GAM) fit to AOE1 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE1 for Tempo for 2.0

Predictions of GAMs trained on AOE1 for estimates for reference 2.0.

Figure 84: AOE1 predictions of a generalized additive model (GAM) fit to AOE1 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE1 for Tempo for 3.0

Predictions of GAMs trained on AOE1 for estimates for reference 3.0.

Figure 85: AOE1 predictions of a generalized additive model (GAM) fit to AOE1 results for 3.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE2 for Tempo

When fitting a generalized additive model (GAM) to AOE2-values and a ground truth, what AOE2 can we expect with confidence?

Estimated AOE2 for Tempo for 1.0

Predictions of GAMs trained on AOE2 for estimates for reference 1.0.

Figure 86: AOE2 predictions of a generalized additive model (GAM) fit to AOE2 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE2 for Tempo for 2.0

Predictions of GAMs trained on AOE2 for estimates for reference 2.0.

Figure 87: AOE2 predictions of a generalized additive model (GAM) fit to AOE2 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE2 for Tempo for 3.0

Predictions of GAMs trained on AOE2 for estimates for reference 3.0.

Figure 88: AOE2 predictions of a generalized additive model (GAM) fit to AOE2 results for 3.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

AOE1 for ‘tag_open’ Tags for 1.0

Figure 89: AOE1 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE1 for ‘tag_open’ Tags for 2.0

Figure 90: AOE1 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE1 for ‘tag_open’ Tags for 3.0

Figure 91: AOE1 of estimates compared to version 3.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE2 for ‘tag_open’ Tags

How well does an estimator perform, when only taking tracks into account that are tagged with some kind of label? Note that some values may be based on very few estimates.

AOE2 for ‘tag_open’ Tags for 1.0

Figure 92: AOE2 of estimates compared to version 1.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE2 for ‘tag_open’ Tags for 2.0

Figure 93: AOE2 of estimates compared to version 2.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG

AOE2 for ‘tag_open’ Tags for 3.0

Figure 94: AOE2 of estimates compared to version 3.0 depending on tag from namespace ‘tag_open’.

SVG PDF PNG


Generated by tempo_eval 0.1.1 on 2022-06-29 18:44. Size L.