Skip to the content.

wjd

This is the tempo_eval report for the ‘wjd’ corpus.

Reports for other corpora may be found here.

Table of Contents

References for ‘wjd’

References

1.0

Attribute Value
Corpus wjd
Version 1.0
Curator Martin Pfleiderer
Data Source manual annotation
Annotation Tools derived from beat annotations
Annotation Rules mean of inter beat intervals
Annotator, bibtex Pfleiderer2017
Annotator, ref_url https://jazzomat.hfm-weimar.de

2.0

Attribute Value
Corpus wjd
Version 2.0
Curator Hendrik Schreiber
Data Source manual annotation
Annotation Tools derived from beat annotations
Annotation Rules median of corresponding inter beat intervals
Annotator, bibtex Pfleiderer2017
Annotator, ref_url https://jazzomat.hfm-weimar.de

Basic Statistics

Reference Size Min Max Avg Stdev Sweet Oct. Start Sweet Oct. Coverage
1.0 456 36.99 361.84 177.45 68.50 124.00 0.60
2.0 456 36.06 360.83 177.24 68.45 124.00 0.61

Table 1: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 1: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Beat-Based Tempo Variation

Figure 2: Fraction of the dataset with beat-annotated tracks with cvar < τ.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimates for ‘wjd’

Estimators

boeck2015/tempodetector2016_default

Attribute Value
Corpus wjd
Version 0.17.dev0
Annotation Tools TempoDetector.2016, madmom, https://github.com/CPJKU/madmom
Annotator, bibtex Boeck2015

davies2009/mirex_qm_tempotracker

Attribute Value  
Corpus wjd  
Version 1.0  
Annotation Tools QM Tempotracker, Sonic Annotator plugin. https://code.soundsoftware.ac.uk/projects/mirex2013/repository/show/audio_tempo_estimation/qm-tempotracker Note that the current macOS build of ‘qm-vamp-plugins’ was used.  
Annotator, bibtex Davies2009 Davies2007

percival2014/stem

Attribute Value
Corpus wjd
Version 1.0
Annotation Tools percival 2014, ‘tempo’ implementation from Marsyas, http://marsyas.info, git checkout tempo-stem
Annotator, bibtex Percival2014

schreiber2014/default

Attribute Value
Corpus wjd
Version 0.0.1
Annotation Tools schreiber 2014, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2014

schreiber2017/ismir2017

Attribute Value
Corpus wjd
Version 0.0.4
Annotation Tools schreiber 2017, model=ismir2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2017

schreiber2017/mirex2017

Attribute Value
Corpus wjd
Version 0.0.4
Annotation Tools schreiber 2017, model=mirex2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2017

schreiber2018/cnn

Attribute Value
Corpus wjd
Version 0.0.2
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=cnn), https://github.com/hendriks73/tempo-cnn

schreiber2018/fcn

Attribute Value
Corpus wjd
Version 0.0.2
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=fcn), https://github.com/hendriks73/tempo-cnn

schreiber2018/ismir2018

Attribute Value
Corpus wjd
Version 0.0.2
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=ismir2018), https://github.com/hendriks73/tempo-cnn

Basic Statistics

Estimator Size Min Max Avg Stdev Sweet Oct. Start Sweet Oct. Coverage
boeck2015/tempodetector2016_default 456 48.39 240.00 131.67 41.00 97.00 0.73
davies2009/mirex_qm_tempotracker 456 69.84 215.33 130.12 25.81 90.00 0.91
percival2014/stem 456 53.42 20671.90 147.51 963.58 68.00 0.82
schreiber2014/default 456 56.99 149.43 97.74 23.60 68.00 0.87
schreiber2017/ismir2017 456 54.51 199.67 104.19 26.64 69.00 0.90
schreiber2017/mirex2017 456 54.81 198.72 104.16 26.32 69.00 0.87
schreiber2018/cnn 456 61.00 232.00 139.60 39.74 109.00 0.78
schreiber2018/fcn 456 48.00 237.00 126.78 40.12 101.00 0.68
schreiber2018/ismir2018 456 67.00 237.00 137.52 34.93 101.00 0.82

Table 2: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 3: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy

Accuracy1 is defined as the percentage of correct estimates, allowing a 4% tolerance for individual BPM values.

Accuracy2 additionally permits estimates to be wrong by a factor of 2, 3, 1/2 or 1/3 (so-called octave errors).

See [Gouyon2006].

Note: When comparing accuracy values for different algorithms, keep in mind that an algorithm may have been trained on the test set or that the test set may have even been created using one of the tested algorithms.

Accuracy Results for 1.0

Estimator Accuracy1 Accuracy2
schreiber2018/cnn 0.5811 0.9627
schreiber2018/ismir2018 0.5789 0.9496
boeck2015/tempodetector2016_default 0.5680 0.9912
schreiber2018/fcn 0.4934 0.9276
davies2009/mirex_qm_tempotracker 0.4320 0.9276
schreiber2017/mirex2017 0.3026 0.9013
schreiber2017/ismir2017 0.3004 0.8838
percival2014/stem 0.2675 0.8991
schreiber2014/default 0.2215 0.8947

Table 3: Mean accuracy of estimates compared to version 1.0 with 4% tolerance ordered by Accuracy1.

CSV JSON LATEX PICKLE

Raw data Accuracy1: CSV JSON LATEX PICKLE

Raw data Accuracy2: CSV JSON LATEX PICKLE

Accuracy1 for 1.0

Figure 4: Mean Accuracy1 for estimates compared to version 1.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for 1.0

Figure 5: Mean Accuracy2 for estimates compared to version 1.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy Results for 2.0

Estimator Accuracy1 Accuracy2
schreiber2018/cnn 0.5877 0.9715
schreiber2018/ismir2018 0.5833 0.9605
boeck2015/tempodetector2016_default 0.5724 0.9956
schreiber2018/fcn 0.5022 0.9364
davies2009/mirex_qm_tempotracker 0.4364 0.9364
schreiber2017/mirex2017 0.3048 0.9101
schreiber2017/ismir2017 0.3026 0.8925
percival2014/stem 0.2675 0.9079
schreiber2014/default 0.2215 0.8925

Table 4: Mean accuracy of estimates compared to version 2.0 with 4% tolerance ordered by Accuracy1.

CSV JSON LATEX PICKLE

Raw data Accuracy1: CSV JSON LATEX PICKLE

Raw data Accuracy2: CSV JSON LATEX PICKLE

Accuracy1 for 2.0

Figure 6: Mean Accuracy1 for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for 2.0

Figure 7: Mean Accuracy2 for estimates compared to version 2.0 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Differing Items

For which items did a given estimator not estimate a correct value with respect to a given ground truth? Are there items which are either very difficult, not suitable for the task, or incorrectly annotated and therefore never estimated correctly, regardless which estimator is used?

Differing Items Accuracy1

Items with different tempo annotations (Accuracy1, 4% tolerance) in different versions:

1.0 compared with boeck2015/tempodetector2016_default (197 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Desafinado_Solo’ ‘BenWebster_ByeByeBlackbird_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ ‘BennyGoodman_TigerRag-2_Solo’ ‘BennyGoodman_Whispering_Solo’ … CSV

1.0 compared with davies2009/mirex_qm_tempotracker (259 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Stardust-1_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_IGotItBad_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ … CSV

1.0 compared with percival2014/stem (334 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_InAMellowTone_Solo’ ‘ArtPepper_Stardust-1_Solo’ ‘BenWebster_ByeByeBlackbird_Solo’ ‘BenWebster_NightAndDay_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyCarter_LongAgoAndFarAway-1_Solo’ ‘BennyCarter_LongAgoAndFarAway-2_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ … CSV

1.0 compared with schreiber2014/default (355 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Desafinado_Solo’ ‘ArtPepper_InAMellowTone_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BenWebster_ByeByeBlackbird_Solo’ ‘BenWebster_NightAndDay_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyCarter_LongAgoAndFarAway-1_Solo’ ‘BennyCarter_LongAgoAndFarAway-2_Solo’ ‘BennyGoodman_Avalon_Solo’ … CSV

1.0 compared with schreiber2017/ismir2017 (319 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Desafinado_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BenWebster_ByeByeBlackbird_Solo’ ‘BenWebster_NightAndDay_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyCarter_LongAgoAndFarAway-1_Solo’ ‘BennyCarter_LongAgoAndFarAway-2_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ … CSV

1.0 compared with schreiber2017/mirex2017 (318 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Desafinado_Solo’ ‘ArtPepper_InAMellowTone_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BenWebster_NightAndDay_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyCarter_LongAgoAndFarAway-1_Solo’ ‘BennyCarter_LongAgoAndFarAway-2_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ … CSV

1.0 compared with schreiber2018/cnn (191 differences): ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_IGotItBad_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ ‘BennyGoodman_TigerRag-2_Solo’ ‘BobBerg_Angles_Solo’ ‘BobBerg_BluesForBela_Solo’ … CSV

1.0 compared with schreiber2018/fcn (231 differences): ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ ‘BennyGoodman_TigerRag-2_Solo’ ‘BixBeiderbecke_RiverboatShuffle_Solo’ ‘BobBerg_Angles_Solo’ ‘BobBerg_BluesForBela_Solo’ … CSV

1.0 compared with schreiber2018/ismir2018 (192 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_Stardust-1_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_IGotItBad_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ ‘BennyGoodman_TigerRag-2_Solo’ … CSV

2.0 compared with boeck2015/tempodetector2016_default (195 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Desafinado_Solo’ ‘BenWebster_ByeByeBlackbird_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ ‘BennyGoodman_TigerRag-2_Solo’ ‘BennyGoodman_Whispering_Solo’ … CSV

2.0 compared with davies2009/mirex_qm_tempotracker (257 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Stardust-1_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_IGotItBad_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ … CSV

2.0 compared with percival2014/stem (334 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_InAMellowTone_Solo’ ‘ArtPepper_Stardust-1_Solo’ ‘BenWebster_ByeByeBlackbird_Solo’ ‘BenWebster_NightAndDay_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyCarter_LongAgoAndFarAway-1_Solo’ ‘BennyCarter_LongAgoAndFarAway-2_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ … CSV

2.0 compared with schreiber2014/default (355 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Desafinado_Solo’ ‘ArtPepper_InAMellowTone_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BenWebster_ByeByeBlackbird_Solo’ ‘BenWebster_NightAndDay_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyCarter_LongAgoAndFarAway-1_Solo’ ‘BennyCarter_LongAgoAndFarAway-2_Solo’ ‘BennyGoodman_Avalon_Solo’ … CSV

2.0 compared with schreiber2017/ismir2017 (318 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Desafinado_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BenWebster_ByeByeBlackbird_Solo’ ‘BenWebster_NightAndDay_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyCarter_LongAgoAndFarAway-1_Solo’ ‘BennyCarter_LongAgoAndFarAway-2_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ … CSV

2.0 compared with schreiber2017/mirex2017 (317 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_Desafinado_Solo’ ‘ArtPepper_InAMellowTone_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BenWebster_NightAndDay_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyCarter_LongAgoAndFarAway-1_Solo’ ‘BennyCarter_LongAgoAndFarAway-2_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ … CSV

2.0 compared with schreiber2018/cnn (188 differences): ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_IGotItBad_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ ‘BennyGoodman_TigerRag-2_Solo’ ‘BobBerg_Angles_Solo’ ‘BobBerg_BluesForBela_Solo’ … CSV

2.0 compared with schreiber2018/fcn (227 differences): ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ ‘BennyGoodman_TigerRag-2_Solo’ ‘BixBeiderbecke_RiverboatShuffle_Solo’ ‘BobBerg_Angles_Solo’ ‘BobBerg_BluesForBela_Solo’ … CSV

2.0 compared with schreiber2018/ismir2018 (190 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_Stardust-1_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_IGotItBad_Solo’ ‘BennyCarter_JustFriends_Solo’ ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ ‘BennyGoodman_TigerRag-2_Solo’ … CSV

None of the estimators estimated the following 128 items ‘correctly’ using Accuracy1: ‘BennyGoodman_Avalon_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BennyGoodman_Runnin’Wild_Solo’ ‘BennyGoodman_TigerRag-1_Solo’ ‘BennyGoodman_TigerRag-2_Solo’ ‘BobBerg_Angles_Solo’ ‘BobBerg_BluesForBela_Solo’ ‘BranfordMarsalis_TheNearnessOfYou_Solo’ ‘BranfordMarsalis_ThreeLittleWords_Solo’ ‘BuckClayton_DestinationK.C._Solo’ … CSV

Differing Items Accuracy2

Items with different tempo annotations (Accuracy2, 4% tolerance) in different versions:

1.0 compared with boeck2015/tempodetector2016_default (4 differences): ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘DonByas_OutOfNowhere_Solo’ ‘SonnyStitt_Teapot_Solo’ CSV

1.0 compared with davies2009/mirex_qm_tempotracker (33 differences): ‘ArtPepper_Stardust-1_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_IGotItBad_Solo’ ‘BranfordMarsalis_TheNearnessOfYou_Solo’ ‘CharlieParker_HowDeepIsTheOcean_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘DavidLiebman_NoGreaterLove_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DavidMurray_BodyAndSoul-1_Solo’ … CSV

1.0 compared with percival2014/stem (46 differences): ‘ArtPepper_Stardust-1_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BobBerg_Angles_Solo’ ‘BobBerg_YouAndTheNightAndTheMusic_Solo’ ‘BranfordMarsalis_TheNearnessOfYou_Solo’ ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘CliffordBrown_I’llRememberApril_AlternateTake2_Solo’ ‘CliffordBrown_I’llRememberApril_Solo’ ‘DavidLiebman_SecretLove_Solo’ … CSV

1.0 compared with schreiber2014/default (48 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_InAMellowTone_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BobBerg_YouAndTheNightAndTheMusic_Solo’ ‘BranfordMarsalis_TheNearnessOfYou_Solo’ ‘CharlieParker_EmbraceableYou_Solo’ ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_Item1,D.I.T._Solo’ ‘ChrisPotter_PopTune#1_Solo’ … CSV

1.0 compared with schreiber2017/ismir2017 (53 differences): ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BranfordMarsalis_TheNearnessOfYou_Solo’ ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_Item1,D.I.T._Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘CliffordBrown_I’llRememberApril_AlternateTake2_Solo’ ‘CliffordBrown_I’llRememberApril_Solo’ ‘DavidLiebman_SecretLove_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ … CSV

1.0 compared with schreiber2017/mirex2017 (45 differences): ‘BranfordMarsalis_TheNearnessOfYou_Solo’ ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘CliffordBrown_I’llRememberApril_AlternateTake2_Solo’ ‘CliffordBrown_I’llRememberApril_Solo’ ‘DavidLiebman_SecretLove_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DizzyGillespie_Be=Bop_Solo’ ‘DonByas_Be=Bop_Solo’ … CSV

1.0 compared with schreiber2018/cnn (17 differences): ‘BranfordMarsalis_TheNearnessOfYou_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DonByas_OutOfNowhere_Solo’ ‘JoeHenderson_In’NOut-2_Solo’ ‘JohnAbercrombie_Ralph’sPianoWaltz_Solo’ ‘JohnColtrane_BlueTrain_Solo’ ‘JohnColtrane_Impressions_1961_Solo’ ‘JoshuaRedman_SweetSorrow_Solo’ ‘KennyWheeler_DoubleVision_Solo’ … CSV

1.0 compared with schreiber2018/fcn (33 differences): ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘DavidLiebman_Milestones_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DavidMurray_BodyAndSoul-2_Solo’ ‘DizzyGillespie_Be=Bop_Solo’ ‘DonByas_Be=Bop_Solo’ ‘DonByas_OutOfNowhere_Solo’ ‘FatsNavarro_Anthropology_No1_Solo’ … CSV

1.0 compared with schreiber2018/ismir2018 (23 differences): ‘CharlieParker_EmbraceableYou_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_InASentimentalMood_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DexterGordon_Montmartre_Solo’ ‘DonByas_OutOfNowhere_Solo’ ‘JoeHenderson_In’NOut-2_Solo’ ‘JoeLovano_BodyAndSoul-2_Solo’ ‘JohnAbercrombie_Ralph’sPianoWaltz_Solo’ … CSV

2.0 compared with boeck2015/tempodetector2016_default (2 differences): ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ CSV

2.0 compared with davies2009/mirex_qm_tempotracker (29 differences): ‘ArtPepper_Stardust-1_Solo’ ‘ArtPepper_Stardust-2_Solo’ ‘BennyCarter_IGotItBad_Solo’ ‘BranfordMarsalis_TheNearnessOfYou_Solo’ ‘CharlieParker_HowDeepIsTheOcean_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘DavidLiebman_NoGreaterLove_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DavidMurray_BodyAndSoul-1_Solo’ … CSV

2.0 compared with percival2014/stem (42 differences): ‘ArtPepper_Stardust-1_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BobBerg_Angles_Solo’ ‘BobBerg_YouAndTheNightAndTheMusic_Solo’ ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘CliffordBrown_I’llRememberApril_AlternateTake2_Solo’ ‘CliffordBrown_I’llRememberApril_Solo’ ‘DavidLiebman_SecretLove_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ … CSV

2.0 compared with schreiber2014/default (49 differences): ‘ArtPepper_Anthropology_Solo’ ‘ArtPepper_BluesForBlanche_Solo’ ‘ArtPepper_InAMellowTone_Solo’ ‘BennyGoodman_HandfulOfKeys_Solo’ ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘BobBerg_YouAndTheNightAndTheMusic_Solo’ ‘BranfordMarsalis_TheNearnessOfYou_Solo’ ‘CharlieParker_EmbraceableYou_Solo’ ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_Item1,D.I.T._Solo’ … CSV

2.0 compared with schreiber2017/ismir2017 (49 differences): ‘BennyGoodman_Nobody’sSweetheart_Solo’ ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_Item1,D.I.T._Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘CliffordBrown_I’llRememberApril_AlternateTake2_Solo’ ‘CliffordBrown_I’llRememberApril_Solo’ ‘DavidLiebman_SecretLove_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DizzyGillespie_Be=Bop_Solo’ … CSV

2.0 compared with schreiber2017/mirex2017 (41 differences): ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘CliffordBrown_I’llRememberApril_AlternateTake2_Solo’ ‘CliffordBrown_I’llRememberApril_Solo’ ‘DavidLiebman_SecretLove_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DizzyGillespie_Be=Bop_Solo’ ‘DonByas_Be=Bop_Solo’ ‘FatsNavarro_Anthropology_No1_Solo’ … CSV

2.0 compared with schreiber2018/cnn (13 differences): ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘JoeHenderson_In’NOut-2_Solo’ ‘JohnAbercrombie_Ralph’sPianoWaltz_Solo’ ‘JohnColtrane_Impressions_1961_Solo’ ‘JoshuaRedman_SweetSorrow_Solo’ ‘KennyWheeler_DoubleVision_Solo’ ‘MilesDavis_TuneUp_Solo’ ‘RoyEldridge_St.LouisBlues_Solo’ ‘WayneShorter_InfantEyes_Solo’ … CSV

2.0 compared with schreiber2018/fcn (29 differences): ‘CharlieParker_Ko=Ko_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘DavidLiebman_Milestones_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DizzyGillespie_Be=Bop_Solo’ ‘DonByas_Be=Bop_Solo’ ‘FatsNavarro_Anthropology_No1_Solo’ ‘JoeHenderson_In’NOut-2_Solo’ ‘JoeLovano_LittleWillieLeapsIn_Solo’ … CSV

2.0 compared with schreiber2018/ismir2018 (18 differences): ‘CharlieParker_EmbraceableYou_Solo’ ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ ‘DavidMurray_BluesForTwo-1_Solo’ ‘DavidMurray_BluesForTwo-2_Solo’ ‘DexterGordon_Montmartre_Solo’ ‘JoeHenderson_In’NOut-2_Solo’ ‘JoeLovano_BodyAndSoul-2_Solo’ ‘JohnAbercrombie_Ralph’sPianoWaltz_Solo’ ‘JohnColtrane_NatureBoy_Solo’ ‘JoshuaRedman_SweetSorrow_Solo’ … CSV

None of the estimators estimated the following 2 items ‘correctly’ using Accuracy2: ‘ChrisPotter_Arjuna_Solo’ ‘ChrisPotter_PopTune#1_Solo’ CSV

Significance of Differences

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.5382 0.0026 0.6750
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0116 0.0000
percival2014/stem 0.0000 0.0000 1.0000 0.0099 0.0812 0.0396 0.0000 0.0000 0.0000
schreiber2014/default 0.0000 0.0000 0.0099 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.0000 0.0812 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0396 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000
schreiber2018/cnn 0.5382 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.8899
schreiber2018/fcn 0.0026 0.0116 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000
schreiber2018/ismir2018 0.6750 0.0000 0.0000 0.0000 0.0000 0.0000 0.8899 0.0000 1.0000

Table 5: McNemar p-values, using reference annotations 2.0 as groundtruth with Accuracy1 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.6098 0.0015 0.6750
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0188 0.0000
percival2014/stem 0.0000 0.0000 1.0000 0.0086 0.1013 0.0519 0.0000 0.0000 0.0000
schreiber2014/default 0.0000 0.0000 0.0086 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.0000 0.1013 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0519 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000
schreiber2018/cnn 0.6098 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000
schreiber2018/fcn 0.0015 0.0188 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000
schreiber2018/ismir2018 0.6750 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000

Table 6: McNemar p-values, using reference annotations 1.0 as groundtruth with Accuracy1 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0010 0.0000 0.0000
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.0919 0.0169 0.0187 0.1550 0.0090 1.0000 0.0801
percival2014/stem 0.0000 0.0919 1.0000 0.3817 0.3240 1.0000 0.0000 0.0660 0.0004
schreiber2014/default 0.0000 0.0169 0.3817 1.0000 1.0000 0.2682 0.0000 0.0022 0.0000
schreiber2017/ismir2017 0.0000 0.0187 0.3240 1.0000 1.0000 0.0386 0.0000 0.0017 0.0000
schreiber2017/mirex2017 0.0000 0.1550 1.0000 0.2682 0.0386 1.0000 0.0000 0.0501 0.0008
schreiber2018/cnn 0.0010 0.0090 0.0000 0.0000 0.0000 0.0000 1.0000 0.0009 0.2266
schreiber2018/fcn 0.0000 1.0000 0.0660 0.0022 0.0017 0.0501 0.0009 1.0000 0.0347
schreiber2018/ismir2018 0.0000 0.0801 0.0004 0.0000 0.0000 0.0008 0.2266 0.0347 1.0000

Table 7: McNemar p-values, using reference annotations 2.0 as groundtruth with Accuracy2 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0000 0.0000
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.0919 0.0769 0.0205 0.1619 0.0113 1.0000 0.1433
percival2014/stem 0.0000 0.0919 1.0000 0.8804 0.3368 1.0000 0.0000 0.0725 0.0014
schreiber2014/default 0.0000 0.0769 0.8804 1.0000 0.4869 0.7359 0.0000 0.0201 0.0003
schreiber2017/ismir2017 0.0000 0.0205 0.3368 0.4869 1.0000 0.0386 0.0000 0.0022 0.0000
schreiber2017/mirex2017 0.0000 0.1619 1.0000 0.7359 0.0386 1.0000 0.0000 0.0576 0.0016
schreiber2018/cnn 0.0002 0.0113 0.0000 0.0000 0.0000 0.0000 1.0000 0.0015 0.1796
schreiber2018/fcn 0.0000 1.0000 0.0725 0.0201 0.0022 0.0576 0.0015 1.0000 0.0755
schreiber2018/ismir2018 0.0000 0.1433 0.0014 0.0003 0.0000 0.0016 0.1796 0.0755 1.0000

Table 8: McNemar p-values, using reference annotations 1.0 as groundtruth with Accuracy2 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Accuracy1 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

Accuracy1 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 8: Mean Accuracy1 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 9: Mean Accuracy1 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

Accuracy2 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 10: Mean Accuracy2 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 11: Mean Accuracy2 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy1 on Tempo-Subsets for 1.0

Figure 12: Mean Accuracy1 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy1 on Tempo-Subsets for 2.0

Figure 13: Mean Accuracy1 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy2 on Tempo-Subsets for 1.0

Figure 14: Mean Accuracy2 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on Tempo-Subsets for 2.0

Figure 15: Mean Accuracy2 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy1 for Tempo

When fitting a generalized additive model (GAM) to Accuracy1-values and a ground truth, what Accuracy1 can we expect with confidence?

Estimated Accuracy1 for Tempo for 1.0

Predictions of GAMs trained on Accuracy1 for estimates for reference 1.0.

Figure 16: Accuracy1 predictions of a generalized additive model (GAM) fit to Accuracy1 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy1 for Tempo for 2.0

Predictions of GAMs trained on Accuracy1 for estimates for reference 2.0.

Figure 17: Accuracy1 predictions of a generalized additive model (GAM) fit to Accuracy1 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy2 for Tempo

When fitting a generalized additive model (GAM) to Accuracy2-values and a ground truth, what Accuracy2 can we expect with confidence?

Estimated Accuracy2 for Tempo for 1.0

Predictions of GAMs trained on Accuracy2 for estimates for reference 1.0.

Figure 18: Accuracy2 predictions of a generalized additive model (GAM) fit to Accuracy2 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy2 for Tempo for 2.0

Predictions of GAMs trained on Accuracy2 for estimates for reference 2.0.

Figure 19: Accuracy2 predictions of a generalized additive model (GAM) fit to Accuracy2 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 and OE2

OE1 is defined as octave error between an estimate E and a reference value R.This means that the most common errors—by a factor of 2 or ½—have the same magnitude, namely 1: OE2(E) = log2(E/R).

OE2 is the signed OE1 corresponding to the minimum absolute OE1 allowing the octaveerrors 2, 3, 1/2, and 1/3: OE2(E) = arg minx(|x|) with x ∈ {OE1(E), OE1(2E), OE1(3E), OE1(½E), OE1(⅓E)}

Mean OE1/OE2 Results for 1.0

Estimator OE1_MEAN OE1_STDEV OE2_MEAN OE2_STDEV
boeck2015/tempodetector2016_default -0.3815 0.5469 -0.0021 0.0180
schreiber2018/ismir2018 -0.2925 0.5633 -0.0052 0.0738
schreiber2018/cnn -0.2846 0.5704 -0.0017 0.0658
schreiber2014/default -0.7802 0.5874 -0.0347 0.1120
schreiber2018/fcn -0.4363 0.6287 -0.0214 0.1000
schreiber2017/mirex2017 -0.6919 0.6515 -0.0330 0.1093
davies2009/mirex_qm_tempotracker -0.3538 0.6628 0.0203 0.0657
schreiber2017/ismir2017 -0.6919 0.6881 -0.0393 0.1205
percival2014/stem -0.6983 0.7411 -0.0191 0.2769

Table 9: Mean OE1/OE2 for estimates compared to version 1.0 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE1: CSV JSON LATEX PICKLE

Raw data OE2: CSV JSON LATEX PICKLE

OE1 distribution for 1.0

Figure 20: OE1 for estimates compared to version 1.0. Shown are the mean OE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 distribution for 1.0

Figure 21: OE2 for estimates compared to version 1.0. Shown are the mean OE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean OE1/OE2 Results for 2.0

Estimator OE1_MEAN OE1_STDEV OE2_MEAN OE2_STDEV
boeck2015/tempodetector2016_default -0.3798 0.5469 -0.0005 0.0131
schreiber2018/ismir2018 -0.2908 0.5629 -0.0036 0.0731
schreiber2018/cnn -0.2829 0.5702 -0.0000 0.0649
schreiber2014/default -0.7786 0.5869 -0.0330 0.1120
schreiber2018/fcn -0.4347 0.6288 -0.0198 0.0995
schreiber2017/mirex2017 -0.6902 0.6510 -0.0313 0.1087
davies2009/mirex_qm_tempotracker -0.3521 0.6626 0.0220 0.0649
schreiber2017/ismir2017 -0.6902 0.6877 -0.0377 0.1201
percival2014/stem -0.6966 0.7402 -0.0174 0.2765

Table 10: Mean OE1/OE2 for estimates compared to version 2.0 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE1: CSV JSON LATEX PICKLE

Raw data OE2: CSV JSON LATEX PICKLE

OE1 distribution for 2.0

Figure 22: OE1 for estimates compared to version 2.0. Shown are the mean OE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 distribution for 2.0

Figure 23: OE2 for estimates compared to version 2.0. Shown are the mean OE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.2530 0.0000 0.0000 0.0000 0.0000 0.0000 0.0243 0.0000
davies2009/mirex_qm_tempotracker 0.2530 1.0000 0.0000 0.0000 0.0000 0.0000 0.0048 0.0018 0.0033
percival2014/stem 0.0000 0.0000 1.0000 0.0012 0.8177 0.8110 0.0000 0.0000 0.0000
schreiber2014/default 0.0000 0.0000 0.0012 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.0000 0.8177 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.8110 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000
schreiber2018/cnn 0.0000 0.0048 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.6148
schreiber2018/fcn 0.0243 0.0018 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000
schreiber2018/ismir2018 0.0000 0.0033 0.0000 0.0000 0.0000 0.0000 0.6148 0.0000 1.0000

Table 11: Paired t-test p-values, using reference annotations 2.0 as groundtruth with OE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.2530 0.0000 0.0000 0.0000 0.0000 0.0000 0.0243 0.0000
davies2009/mirex_qm_tempotracker 0.2530 1.0000 0.0000 0.0000 0.0000 0.0000 0.0048 0.0018 0.0033
percival2014/stem 0.0000 0.0000 1.0000 0.0012 0.8177 0.8110 0.0000 0.0000 0.0000
schreiber2014/default 0.0000 0.0000 0.0012 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.0000 0.8177 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.8110 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000
schreiber2018/cnn 0.0000 0.0048 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.6148
schreiber2018/fcn 0.0243 0.0018 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000
schreiber2018/ismir2018 0.0000 0.0033 0.0000 0.0000 0.0000 0.0000 0.6148 0.0000 1.0000

Table 12: Paired t-test p-values, using reference annotations 1.0 as groundtruth with OE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.1911 0.0000 0.0000 0.0000 0.8880 0.0000 0.3670
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.0032 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.1911 0.0032 1.0000 0.2589 0.1183 0.2850 0.1921 0.8591 0.2969
schreiber2014/default 0.0000 0.0000 0.2589 1.0000 0.3157 0.7151 0.0000 0.0101 0.0000
schreiber2017/ismir2017 0.0000 0.0000 0.1183 0.3157 1.0000 0.0521 0.0000 0.0008 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.2850 0.7151 0.0521 1.0000 0.0000 0.0166 0.0000
schreiber2018/cnn 0.8880 0.0000 0.1921 0.0000 0.0000 0.0000 1.0000 0.0000 0.2090
schreiber2018/fcn 0.0000 0.0000 0.8591 0.0101 0.0008 0.0166 0.0000 1.0000 0.0003
schreiber2018/ismir2018 0.3670 0.0000 0.2969 0.0000 0.0000 0.0000 0.2090 0.0003 1.0000

Table 13: Paired t-test p-values, using reference annotations 2.0 as groundtruth with OE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.1911 0.0000 0.0000 0.0000 0.8880 0.0000 0.3670
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.0032 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
percival2014/stem 0.1911 0.0032 1.0000 0.2589 0.1183 0.2850 0.1921 0.8591 0.2969
schreiber2014/default 0.0000 0.0000 0.2589 1.0000 0.3157 0.7151 0.0000 0.0101 0.0000
schreiber2017/ismir2017 0.0000 0.0000 0.1183 0.3157 1.0000 0.0521 0.0000 0.0008 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.2850 0.7151 0.0521 1.0000 0.0000 0.0166 0.0000
schreiber2018/cnn 0.8880 0.0000 0.1921 0.0000 0.0000 0.0000 1.0000 0.0000 0.2090
schreiber2018/fcn 0.0000 0.0000 0.8591 0.0101 0.0008 0.0166 0.0000 1.0000 0.0003
schreiber2018/ismir2018 0.3670 0.0000 0.2969 0.0000 0.0000 0.0000 0.2090 0.0003 1.0000

Table 14: Paired t-test p-values, using reference annotations 1.0 as groundtruth with OE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

OE1 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

OE1 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 24: Mean OE1 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 25: Mean OE1 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

OE2 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 26: Mean OE2 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 27: Mean OE2 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE1 on Tempo-Subsets for 1.0

Figure 28: Mean OE1 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 on Tempo-Subsets for 2.0

Figure 29: Mean OE1 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE2 on Tempo-Subsets for 1.0

Figure 30: Mean OE2 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on Tempo-Subsets for 2.0

Figure 31: Mean OE2 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE1 for Tempo

When fitting a generalized additive model (GAM) to OE1-values and a ground truth, what OE1 can we expect with confidence?

Estimated OE1 for Tempo for 1.0

Predictions of GAMs trained on OE1 for estimates for reference 1.0.

Figure 32: OE1 predictions of a generalized additive model (GAM) fit to OE1 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE1 for Tempo for 2.0

Predictions of GAMs trained on OE1 for estimates for reference 2.0.

Figure 33: OE1 predictions of a generalized additive model (GAM) fit to OE1 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE2 for Tempo

When fitting a generalized additive model (GAM) to OE2-values and a ground truth, what OE2 can we expect with confidence?

Estimated OE2 for Tempo for 1.0

Predictions of GAMs trained on OE2 for estimates for reference 1.0.

Figure 34: OE2 predictions of a generalized additive model (GAM) fit to OE2 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE2 for Tempo for 2.0

Predictions of GAMs trained on OE2 for estimates for reference 2.0.

Figure 35: OE2 predictions of a generalized additive model (GAM) fit to OE2 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 and AOE2

AOE1 is defined as absolute octave error between an estimate and a reference value: AOE1(E) = |log2(E/R)|.

AOE2 is the minimum of AOE1 allowing the octave errors 2, 3, 1/2, and 1/3: AOE2(E) = min(AOE1(E), AOE1(2E), AOE1(3E), AOE1(½E), AOE1(⅓E)).

Mean AOE1/AOE2 Results for 1.0

Estimator AOE1_MEAN AOE1_STDEV AOE2_MEAN AOE2_STDEV
schreiber2018/ismir2018 0.4102 0.4844 0.0246 0.0697
schreiber2018/cnn 0.4122 0.4862 0.0224 0.0619
boeck2015/tempodetector2016_default 0.4385 0.5024 0.0107 0.0147
schreiber2018/fcn 0.5266 0.5554 0.0363 0.0956
davies2009/mirex_qm_tempotracker 0.5634 0.4971 0.0322 0.0607
schreiber2017/mirex2017 0.7638 0.5656 0.0401 0.1069
schreiber2017/ismir2017 0.7812 0.5848 0.0473 0.1176
percival2014/stem 0.8097 0.6174 0.0518 0.2727
schreiber2014/default 0.8311 0.5130 0.0448 0.1084

Table 15: Mean AOE1/AOE2 for estimates compared to version 1.0 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE1: CSV JSON LATEX PICKLE

Raw data AOE2: CSV JSON LATEX PICKLE

AOE1 distribution for 1.0

Figure 36: AOE1 for estimates compared to version 1.0. Shown are the mean AOE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 distribution for 1.0

Figure 37: AOE2 for estimates compared to version 1.0. Shown are the mean AOE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Mean AOE1/AOE2 Results for 2.0

Estimator AOE1_MEAN AOE1_STDEV AOE2_MEAN AOE2_STDEV
schreiber2018/ismir2018 0.4088 0.4841 0.0237 0.0693
schreiber2018/cnn 0.4109 0.4861 0.0214 0.0613
boeck2015/tempodetector2016_default 0.4368 0.5025 0.0092 0.0094
schreiber2018/fcn 0.5251 0.5556 0.0353 0.0951
davies2009/mirex_qm_tempotracker 0.5628 0.4963 0.0328 0.0602
schreiber2017/mirex2017 0.7619 0.5655 0.0385 0.1063
schreiber2017/ismir2017 0.7793 0.5848 0.0457 0.1172
percival2014/stem 0.8079 0.6168 0.0502 0.2725
schreiber2014/default 0.8292 0.5128 0.0439 0.1082

Table 16: Mean AOE1/AOE2 for estimates compared to version 2.0 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE1: CSV JSON LATEX PICKLE

Raw data AOE2: CSV JSON LATEX PICKLE

AOE1 distribution for 2.0

Figure 38: AOE1 for estimates compared to version 2.0. Shown are the mean AOE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 distribution for 2.0

Figure 39: AOE2 for estimates compared to version 2.0. Shown are the mean AOE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2297 0.0002 0.1843
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1453 0.0000
percival2014/stem 0.0000 0.0000 1.0000 0.3781 0.2559 0.0551 0.0000 0.0000 0.0000
schreiber2014/default 0.0000 0.0000 0.3781 1.0000 0.0141 0.0005 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.0000 0.2559 0.0141 1.0000 0.2183 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0551 0.0005 0.2183 1.0000 0.0000 0.0000 0.0000
schreiber2018/cnn 0.2297 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.8929
schreiber2018/fcn 0.0002 0.1453 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000
schreiber2018/ismir2018 0.1843 0.0000 0.0000 0.0000 0.0000 0.0000 0.8929 0.0000 1.0000

Table 17: Paired t-test p-values, using reference annotations 2.0 as groundtruth with AOE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.2236 0.0002 0.1792
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.1556 0.0000
percival2014/stem 0.0000 0.0000 1.0000 0.3780 0.2561 0.0555 0.0000 0.0000 0.0000
schreiber2014/default 0.0000 0.0000 0.3780 1.0000 0.0141 0.0005 0.0000 0.0000 0.0000
schreiber2017/ismir2017 0.0000 0.0000 0.2561 0.0141 1.0000 0.2189 0.0000 0.0000 0.0000
schreiber2017/mirex2017 0.0000 0.0000 0.0555 0.0005 0.2189 1.0000 0.0000 0.0000 0.0000
schreiber2018/cnn 0.2236 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.8940
schreiber2018/fcn 0.0002 0.1556 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000
schreiber2018/ismir2018 0.1792 0.0000 0.0000 0.0000 0.0000 0.0000 0.8940 0.0000 1.0000

Table 18: Paired t-test p-values, using reference annotations 1.0 as groundtruth with AOE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.1790 0.0450 0.0295 0.3036 0.0029 0.6070 0.0219
percival2014/stem 0.0014 0.1790 1.0000 0.6005 0.7264 0.3585 0.0255 0.2521 0.0420
schreiber2014/default 0.0000 0.0450 0.6005 1.0000 0.6790 0.2439 0.0000 0.0816 0.0002
schreiber2017/ismir2017 0.0000 0.0295 0.7264 0.6790 1.0000 0.0193 0.0000 0.0429 0.0002
schreiber2017/mirex2017 0.0000 0.3036 0.3585 0.2439 0.0193 1.0000 0.0016 0.4933 0.0071
schreiber2018/cnn 0.0000 0.0029 0.0255 0.0000 0.0000 0.0016 1.0000 0.0005 0.3602
schreiber2018/fcn 0.0000 0.6070 0.2521 0.0816 0.0429 0.4933 0.0005 1.0000 0.0031
schreiber2018/ismir2018 0.0000 0.0219 0.0420 0.0002 0.0002 0.0071 0.3602 0.0031 1.0000

Table 19: Paired t-test p-values, using reference annotations 2.0 as groundtruth with AOE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn schreiber2018/ismir2018
boeck2015/tempodetector2016_default 1.0000 0.0000 0.0014 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.1315 0.0231 0.0113 0.1579 0.0098 0.3967 0.0535
percival2014/stem 0.0014 0.1315 1.0000 0.5648 0.7250 0.3586 0.0231 0.2363 0.0374
schreiber2014/default 0.0000 0.0231 0.5648 1.0000 0.5816 0.3011 0.0000 0.0860 0.0002
schreiber2017/ismir2017 0.0000 0.0113 0.7250 0.5816 1.0000 0.0198 0.0000 0.0345 0.0001
schreiber2017/mirex2017 0.0000 0.1579 0.3586 0.3011 0.0198 1.0000 0.0012 0.4285 0.0051
schreiber2018/cnn 0.0000 0.0098 0.0231 0.0000 0.0000 0.0012 1.0000 0.0005 0.3855
schreiber2018/fcn 0.0000 0.3967 0.2363 0.0860 0.0345 0.4285 0.0005 1.0000 0.0027
schreiber2018/ismir2018 0.0000 0.0535 0.0374 0.0002 0.0001 0.0051 0.3855 0.0027 1.0000

Table 20: Paired t-test p-values, using reference annotations 1.0 as groundtruth with AOE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

AOE1 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

AOE1 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 40: Mean AOE1 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 41: Mean AOE1 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on cvar-Subsets

How well does an estimator perform, when only taking tracks into account that have a cvar-value of less than τ, i.e., have a more or less stable beat?

AOE2 on cvar-Subsets for 1.0 based on cvar-Values from 1.0

Figure 42: Mean AOE2 compared to version 1.0 for tracks with cvar < τ based on beat annotations from 1.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on cvar-Subsets for 2.0 based on cvar-Values from 1.0

Figure 43: Mean AOE2 compared to version 2.0 for tracks with cvar < τ based on beat annotations from 2.0.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE1 on Tempo-Subsets for 1.0

Figure 44: Mean AOE1 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 on Tempo-Subsets for 2.0

Figure 45: Mean AOE1 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE2 on Tempo-Subsets for 1.0

Figure 46: Mean AOE2 for estimates compared to version 1.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on Tempo-Subsets for 2.0

Figure 47: Mean AOE2 for estimates compared to version 2.0 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE1 for Tempo

When fitting a generalized additive model (GAM) to AOE1-values and a ground truth, what AOE1 can we expect with confidence?

Estimated AOE1 for Tempo for 1.0

Predictions of GAMs trained on AOE1 for estimates for reference 1.0.

Figure 48: AOE1 predictions of a generalized additive model (GAM) fit to AOE1 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE1 for Tempo for 2.0

Predictions of GAMs trained on AOE1 for estimates for reference 2.0.

Figure 49: AOE1 predictions of a generalized additive model (GAM) fit to AOE1 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE2 for Tempo

When fitting a generalized additive model (GAM) to AOE2-values and a ground truth, what AOE2 can we expect with confidence?

Estimated AOE2 for Tempo for 1.0

Predictions of GAMs trained on AOE2 for estimates for reference 1.0.

Figure 50: AOE2 predictions of a generalized additive model (GAM) fit to AOE2 results for 1.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE2 for Tempo for 2.0

Predictions of GAMs trained on AOE2 for estimates for reference 2.0.

Figure 51: AOE2 predictions of a generalized additive model (GAM) fit to AOE2 results for 2.0. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG


Generated by tempo_eval 0.1.1 on 2022-06-29 18:59. Size L.