Skip to the content.

queen

This is the tempo_eval report for the ‘queen’ corpus.

Reports for other corpora may be found here.

Table of Contents

Because reference annotations are not available, we treat the estimate schreiber2018/ismir2018 as reference. It has the highest Mean Mutual Agreement (MMA), based on Accuracy1 with 4% tolerance.

References for ‘queen’

References

schreiber2018/ismir2018

Attribute Value
Corpus  
Version 0.0.3
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=ismir2018), https://github.com/hendriks73/tempo-cnn

Basic Statistics

Reference Size Min Max Avg Stdev Sweet Oct. Start Sweet Oct. Coverage
schreiber2018/ismir2018 51 69.00 156.00 104.10 23.31 67.00 0.92

Table 1: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 1: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimates for ‘queen’

Estimators

boeck2015/tempodetector2016_default

Attribute Value
Corpus queen
Version 0.17.dev0
Annotation Tools TempoDetector.2016, madmom, https://github.com/CPJKU/madmom
Annotator, bibtex Boeck2015

davies2009/mirex_qm_tempotracker

Attribute Value  
Corpus queen  
Version 1.0  
Annotation Tools QM Tempotracker, Sonic Annotator plugin. https://code.soundsoftware.ac.uk/projects/mirex2013/repository/show/audio_tempo_estimation/qm-tempotracker Note that the current macOS build of ‘qm-vamp-plugins’ was used.  
Annotator, bibtex Davies2009 Davies2007

percival2014/stem

Attribute Value
Corpus queen
Version 1.0
Annotation Tools percival 2014, ‘tempo’ implementation from Marsyas, http://marsyas.info, git checkout tempo-stem
Annotator, bibtex Percival2014

schreiber2014/default

Attribute Value
Corpus queen
Version 0.0.1
Annotation Tools schreiber 2014, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2014

schreiber2017/ismir2017

Attribute Value
Corpus queen
Version 0.0.4
Annotation Tools schreiber 2017, model=ismir2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2017

schreiber2017/mirex2017

Attribute Value
Corpus queen
Version 0.0.4
Annotation Tools schreiber 2017, model=mirex2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex Schreiber2017

schreiber2018/cnn

Attribute Value
Corpus  
Version 0.0.3
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=cnn), https://github.com/hendriks73/tempo-cnn

schreiber2018/fcn

Attribute Value
Corpus  
Version 0.0.3
Data Source Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools schreiber tempo-cnn (model=fcn), https://github.com/hendriks73/tempo-cnn

Basic Statistics

Estimator Size Min Max Avg Stdev Sweet Oct. Start Sweet Oct. Coverage
boeck2015/tempodetector2016_default 51 41.96 157.89 95.53 26.50 67.00 0.82
davies2009/mirex_qm_tempotracker 51 71.78 166.71 120.76 26.00 81.00 0.94
percival2014/stem 51 63.41 142.56 99.08 21.25 67.00 0.96
schreiber2014/default 51 59.73 143.87 97.74 21.00 63.00 0.92
schreiber2017/ismir2017 51 63.09 180.19 101.61 23.67 66.00 0.94
schreiber2017/mirex2017 51 44.79 180.19 100.03 25.23 66.00 0.92
schreiber2018/cnn 51 63.00 157.00 103.78 24.44 67.00 0.90
schreiber2018/fcn 51 60.00 157.00 98.20 23.64 67.00 0.92

Table 2: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 2: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy

Accuracy1 is defined as the percentage of correct estimates, allowing a 4% tolerance for individual BPM values.

Accuracy2 additionally permits estimates to be wrong by a factor of 2, 3, 1/2 or 1/3 (so-called octave errors).

See [Gouyon2006].

Note: When comparing accuracy values for different algorithms, keep in mind that an algorithm may have been trained on the test set or that the test set may have even been created using one of the tested algorithms.

Accuracy Results for schreiber2018/ismir2018

Estimator Accuracy1 Accuracy2
schreiber2018/cnn 0.9412 0.9608
schreiber2018/fcn 0.8824 0.9216
schreiber2017/mirex2017 0.8824 0.9804
schreiber2017/ismir2017 0.8824 0.9608
percival2014/stem 0.8431 0.9412
boeck2015/tempodetector2016_default 0.8235 0.9608
schreiber2014/default 0.8235 0.9020
davies2009/mirex_qm_tempotracker 0.7647 0.9216

Table 3: Mean accuracy of estimates compared to version schreiber2018/ismir2018 with 4% tolerance ordered by Accuracy1.

CSV JSON LATEX PICKLE

Raw data Accuracy1: CSV JSON LATEX PICKLE

Raw data Accuracy2: CSV JSON LATEX PICKLE

Accuracy1 for schreiber2018/ismir2018

Figure 3: Mean Accuracy1 for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 for schreiber2018/ismir2018

Figure 4: Mean Accuracy2 for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Differing Items

For which items did a given estimator not estimate a correct value with respect to a given ground truth? Are there items which are either very difficult, not suitable for the task, or incorrectly annotated and therefore never estimated correctly, regardless which estimator is used?

Differing Items Accuracy1

Items with different tempo annotations (Accuracy1, 4% tolerance) in different versions:

schreiber2018/ismir2018 compared with boeck2015/tempodetector2016_default (9 differences): ‘Greatest Hits II/03 Radio Ga Ga’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits II/10 Headlong’ ‘Greatest Hits II/16 The Show Must Go On’ ‘Greatest Hits III/07 Heaven For Everyone’ ‘Greatest Hits III/09 Driven By You’ ‘Greatest Hits III/10 Living On My Own’ ‘Greatest Hits III/12 The Great Pretender’ CSV

schreiber2018/ismir2018 compared with davies2009/mirex_qm_tempotracker (12 differences): ‘Greatest Hits I/01 Bohemian Rhapsody’ ‘Greatest Hits I/05 Bicycle Race’ ‘Greatest Hits I/13 Play The Game’ ‘Greatest Hits II/09 Who Wants To Live Forever’ ‘Greatest Hits II/15 Friends Will Be Friends’ ‘Greatest Hits III/01 The Show Must Go On’ ‘Greatest Hits III/03 Barcelona’ ‘Greatest Hits III/04 Too Much Love Will Kill You’ ‘Greatest Hits III/08 Las Palabras De Amor’ ‘Greatest Hits III/11 Let Me Live’ ‘Greatest Hits III/12 The Great Pretender’ … CSV

schreiber2018/ismir2018 compared with percival2014/stem (8 differences): ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/09 Who Wants To Live Forever’ ‘Greatest Hits II/10 Headlong’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV

schreiber2018/ismir2018 compared with schreiber2014/default (9 differences): ‘Greatest Hits I/01 Bohemian Rhapsody’ ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/01 The Show Must Go On’ ‘Greatest Hits III/03 Barcelona’ ‘Greatest Hits III/09 Driven By You’ ‘Greatest Hits III/15 No-One But You’ CSV

schreiber2018/ismir2018 compared with schreiber2017/ismir2017 (6 differences): ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV

schreiber2018/ismir2018 compared with schreiber2017/mirex2017 (6 differences): ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/03 Barcelona’ CSV

schreiber2018/ismir2018 compared with schreiber2018/cnn (3 differences): ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits III/11 Let Me Live’ CSV

schreiber2018/ismir2018 compared with schreiber2018/fcn (6 differences): ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits III/05 Somebody To Love’ ‘Greatest Hits III/15 No-One But You’ CSV

All tracks were estimated ‘correctly’ by at least one system.

Differing Items Accuracy2

Items with different tempo annotations (Accuracy2, 4% tolerance) in different versions:

schreiber2018/ismir2018 compared with boeck2015/tempodetector2016_default (2 differences): ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/12 The Great Pretender’ CSV

schreiber2018/ismir2018 compared with davies2009/mirex_qm_tempotracker (4 differences): ‘Greatest Hits I/05 Bicycle Race’ ‘Greatest Hits III/03 Barcelona’ ‘Greatest Hits III/04 Too Much Love Will Kill You’ ‘Greatest Hits III/12 The Great Pretender’ CSV

schreiber2018/ismir2018 compared with percival2014/stem (3 differences): ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV

schreiber2018/ismir2018 compared with schreiber2014/default (5 differences): ‘Greatest Hits I/01 Bohemian Rhapsody’ ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/01 The Show Must Go On’ ‘Greatest Hits III/03 Barcelona’ CSV

schreiber2018/ismir2018 compared with schreiber2017/ismir2017 (2 differences): ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV

schreiber2018/ismir2018 compared with schreiber2017/mirex2017 (1 differences): ‘Greatest Hits I/17 We Are The Champions’ CSV

schreiber2018/ismir2018 compared with schreiber2018/cnn (2 differences): ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ CSV

schreiber2018/ismir2018 compared with schreiber2018/fcn (4 differences): ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits III/05 Somebody To Love’ CSV

All tracks were estimated ‘correctly’ by at least one system.

Significance of Differences

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn
boeck2015/tempodetector2016_default 1.0000 0.6476 1.0000 1.0000 0.5488 0.5488 0.1094 0.5811
davies2009/mirex_qm_tempotracker 0.6476 1.0000 0.4807 0.6072 0.2379 0.2101 0.0225 0.2379
percival2014/stem 1.0000 0.4807 1.0000 1.0000 0.6250 0.6875 0.1250 0.6875
schreiber2014/default 1.0000 0.6072 1.0000 1.0000 0.4531 0.3750 0.1094 0.5078
schreiber2017/ismir2017 0.5488 0.2379 0.6250 0.4531 1.0000 1.0000 0.3750 1.0000
schreiber2017/mirex2017 0.5488 0.2101 0.6875 0.3750 1.0000 1.0000 0.3750 1.0000
schreiber2018/cnn 0.1094 0.0225 0.1250 0.1094 0.3750 0.3750 1.0000 0.3750
schreiber2018/fcn 0.5811 0.2379 0.6875 0.5078 1.0000 1.0000 0.3750 1.0000

Table 4: McNemar p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with Accuracy1 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn
boeck2015/tempodetector2016_default 1.0000 0.6250 1.0000 0.3750 1.0000 1.0000 1.0000 0.6875
davies2009/mirex_qm_tempotracker 0.6250 1.0000 1.0000 1.0000 0.6875 0.3750 0.6875 1.0000
percival2014/stem 1.0000 1.0000 1.0000 0.7266 1.0000 0.5000 1.0000 1.0000
schreiber2014/default 0.3750 1.0000 0.7266 1.0000 0.4531 0.2188 0.4531 1.0000
schreiber2017/ismir2017 1.0000 0.6875 1.0000 0.4531 1.0000 1.0000 1.0000 0.6250
schreiber2017/mirex2017 1.0000 0.3750 0.5000 0.2188 1.0000 1.0000 1.0000 0.2500
schreiber2018/cnn 1.0000 0.6875 1.0000 0.4531 1.0000 1.0000 1.0000 0.5000
schreiber2018/fcn 0.6875 1.0000 1.0000 1.0000 0.6250 0.2500 0.5000 1.0000

Table 5: McNemar p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with Accuracy2 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Accuracy1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy1 on Tempo-Subsets for schreiber2018/ismir2018

Figure 5: Mean Accuracy1 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy2 on Tempo-Subsets for schreiber2018/ismir2018

Figure 6: Mean Accuracy2 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy1 for Tempo

When fitting a generalized additive model (GAM) to Accuracy1-values and a ground truth, what Accuracy1 can we expect with confidence?

Estimated Accuracy1 for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on Accuracy1 for estimates for reference schreiber2018/ismir2018.

Figure 7: Accuracy1 predictions of a generalized additive model (GAM) fit to Accuracy1 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy2 for Tempo

When fitting a generalized additive model (GAM) to Accuracy2-values and a ground truth, what Accuracy2 can we expect with confidence?

Estimated Accuracy2 for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on Accuracy2 for estimates for reference schreiber2018/ismir2018.

Figure 8: Accuracy2 predictions of a generalized additive model (GAM) fit to Accuracy2 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

MIREX-Style Evaluation

P-Score is defined as the average of two tempi weighted by their perceptual strength, allowing an 8% tolerance for both tempo values [MIREX 2006 Definition].

One Correct is the fraction of estimate pairs of which at least one of the two values is equal to a reference value (within an 8% tolerance).

Both Correct is the fraction of estimate pairs of which both values are equal to the reference values (within an 8% tolerance).

See [McKinney2007].

Note: Very few datasets actually have multiple annotations per track along with a salience distributions. References without suitable annotations are not shown.

MIREX Results for schreiber2018/ismir2018

Estimator P-Score One Correct Both Correct
schreiber2018/fcn 0.9628 1.0000 0.7843
schreiber2018/cnn 0.9586 1.0000 0.7647
schreiber2017/mirex2017 0.9461 0.9804 0.7255
schreiber2017/ismir2017 0.9261 0.9804 0.6471
schreiber2014/default 0.9148 0.9608 0.7059
davies2009/mirex_qm_tempotracker 0.8995 0.9608 0.6078
boeck2015/tempodetector2016_default 0.8964 0.9804 0.4902
percival2014/stem 0.7848 0.9412 0.0000

Table 6: Compared to schreiber2018/ismir2018 with 8.0% tolerance.

CSV JSON LATEX PICKLE

Raw data P-Score: CSV JSON LATEX PICKLE

Raw data One Correct: CSV JSON LATEX PICKLE

Raw data Both Correct: CSV JSON LATEX PICKLE

P-Score for schreiber2018/ismir2018

Figure 9: Mean P-Score for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

One Correct for schreiber2018/ismir2018

Figure 10: Mean One Correct for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Both Correct for schreiber2018/ismir2018

Figure 11: Mean Both Correct for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

P-Score on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean P-Score for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

P-Score on Tempo-Subsets for schreiber2018/ismir2018

Figure 12: Mean P-Score for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

One Correct on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean One Correct for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

One Correct on Tempo-Subsets for schreiber2018/ismir2018

Figure 13: Mean One Correct for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Both Correct on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Both Correct for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Both Correct on Tempo-Subsets for schreiber2018/ismir2018

Figure 14: Mean Both Correct for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated P-Score for Tempo

When fitting a generalized additive model (GAM) to P-Score-values and a ground truth, what P-Score can we expect with confidence?

Estimated P-Score for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on P-Score for estimates for reference schreiber2018/ismir2018.

Figure 15: P-Score predictions of a generalized additive model (GAM) fit to P-Score results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated One Correct for Tempo

When fitting a generalized additive model (GAM) to One Correct-values and a ground truth, what One Correct can we expect with confidence?

Estimated One Correct for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on One Correct for estimates for reference schreiber2018/ismir2018.

Figure 16: One Correct predictions of a generalized additive model (GAM) fit to One Correct results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Both Correct for Tempo

When fitting a generalized additive model (GAM) to Both Correct-values and a ground truth, what Both Correct can we expect with confidence?

Estimated Both Correct for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on Both Correct for estimates for reference schreiber2018/ismir2018.

Figure 17: Both Correct predictions of a generalized additive model (GAM) fit to Both Correct results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

OE1 and OE2

OE1 is defined as octave error between an estimate E and a reference value R.This means that the most common errors—by a factor of 2 or ½—have the same magnitude, namely 1: OE2(E) = log2(E/R).

OE2 is the signed OE1 corresponding to the minimum absolute OE1 allowing the octaveerrors 2, 3, 1/2, and 1/3: OE2(E) = arg minx(|x|) with x ∈ {OE1(E), OE1(2E), OE1(3E), OE1(½E), OE1(⅓E)}

Mean OE1/OE2 Results for schreiber2018/ismir2018

Estimator OE1_MEAN OE1_STDEV OE2_MEAN OE2_STDEV
schreiber2018/cnn -0.0082 0.2078 0.0114 0.0595
schreiber2018/fcn -0.0892 0.2617 0.0284 0.0975
schreiber2014/default -0.0886 0.2842 0.0094 0.0722
schreiber2017/ismir2017 -0.0367 0.2975 0.0026 0.0843
schreiber2017/mirex2017 -0.0677 0.3108 0.0107 0.0598
percival2014/stem -0.0687 0.3340 0.0097 0.1030
davies2009/mirex_qm_tempotracker 0.2155 0.3779 0.0195 0.0805
boeck2015/tempodetector2016_default -0.1466 0.3969 0.0021 0.0830

Table 7: Mean OE1/OE2 for estimates compared to version schreiber2018/ismir2018 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE1: CSV JSON LATEX PICKLE

Raw data OE2: CSV JSON LATEX PICKLE

OE1 distribution for schreiber2018/ismir2018

Figure 18: OE1 for estimates compared to version schreiber2018/ismir2018. Shown are the mean OE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 distribution for schreiber2018/ismir2018

Figure 19: OE2 for estimates compared to version schreiber2018/ismir2018. Shown are the mean OE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn
boeck2015/tempodetector2016_default 1.0000 0.0000 0.2152 0.3461 0.1227 0.2823 0.0211 0.3881
davies2009/mirex_qm_tempotracker 0.0000 1.0000 0.0000 0.0000 0.0004 0.0001 0.0001 0.0000
percival2014/stem 0.2152 0.0000 1.0000 0.6620 0.3778 0.9820 0.1995 0.6343
schreiber2014/default 0.3461 0.0000 0.6620 1.0000 0.1912 0.6485 0.0637 0.9847
schreiber2017/ismir2017 0.1227 0.0004 0.3778 0.1912 1.0000 0.1737 0.4930 0.1854
schreiber2017/mirex2017 0.2823 0.0001 0.9820 0.6485 0.1737 1.0000 0.1747 0.6150
schreiber2018/cnn 0.0211 0.0001 0.1995 0.0637 0.4930 0.1747 1.0000 0.0301
schreiber2018/fcn 0.3881 0.0000 0.6343 0.9847 0.1854 0.6150 0.0301 1.0000

Table 8: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with OE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn
boeck2015/tempodetector2016_default 1.0000 0.3109 0.6835 0.6556 0.9801 0.5500 0.5150 0.1524
davies2009/mirex_qm_tempotracker 0.3109 1.0000 0.5906 0.5900 0.3123 0.5431 0.5614 0.6077
percival2014/stem 0.6835 0.5906 1.0000 0.9862 0.3951 0.9350 0.8874 0.1059
schreiber2014/default 0.6556 0.5900 0.9862 1.0000 0.6567 0.9219 0.8811 0.2836
schreiber2017/ismir2017 0.9801 0.3123 0.3951 0.6567 1.0000 0.3221 0.2938 0.0655
schreiber2017/mirex2017 0.5500 0.5431 0.9350 0.9219 0.3221 1.0000 0.6729 0.1224
schreiber2018/cnn 0.5150 0.5614 0.8874 0.8811 0.2938 0.6729 1.0000 0.1405
schreiber2018/fcn 0.1524 0.6077 0.1059 0.2836 0.0655 0.1224 0.1405 1.0000

Table 9: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with OE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

OE1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE1 on Tempo-Subsets for schreiber2018/ismir2018

Figure 20: Mean OE1 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE2 on Tempo-Subsets for schreiber2018/ismir2018

Figure 21: Mean OE2 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE1 for Tempo

When fitting a generalized additive model (GAM) to OE1-values and a ground truth, what OE1 can we expect with confidence?

Estimated OE1 for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on OE1 for estimates for reference schreiber2018/ismir2018.

Figure 22: OE1 predictions of a generalized additive model (GAM) fit to OE1 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE2 for Tempo

When fitting a generalized additive model (GAM) to OE2-values and a ground truth, what OE2 can we expect with confidence?

Estimated OE2 for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on OE2 for estimates for reference schreiber2018/ismir2018.

Figure 23: OE2 predictions of a generalized additive model (GAM) fit to OE2 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE1 and AOE2

AOE1 is defined as absolute octave error between an estimate and a reference value: AOE1(E) = |log2(E/R)|.

AOE2 is the minimum of AOE1 allowing the octave errors 2, 3, 1/2, and 1/3: AOE2(E) = min(AOE1(E), AOE1(2E), AOE1(3E), AOE1(½E), AOE1(⅓E)).

Mean AOE1/AOE2 Results for schreiber2018/ismir2018

Estimator AOE1_MEAN AOE1_STDEV AOE2_MEAN AOE2_STDEV
schreiber2018/cnn 0.0522 0.2013 0.0130 0.0592
schreiber2018/fcn 0.0972 0.2588 0.0311 0.0967
schreiber2017/ismir2017 0.1060 0.2804 0.0234 0.0810
schreiber2014/default 0.1108 0.2763 0.0278 0.0673
schreiber2017/mirex2017 0.1126 0.2975 0.0153 0.0588
percival2014/stem 0.1363 0.3126 0.0310 0.0987
boeck2015/tempodetector2016_default 0.1777 0.3840 0.0240 0.0795
davies2009/mirex_qm_tempotracker 0.2155 0.3779 0.0406 0.0722

Table 10: Mean AOE1/AOE2 for estimates compared to version schreiber2018/ismir2018 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE1: CSV JSON LATEX PICKLE

Raw data AOE2: CSV JSON LATEX PICKLE

AOE1 distribution for schreiber2018/ismir2018

Figure 24: AOE1 for estimates compared to version schreiber2018/ismir2018. Shown are the mean AOE1 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 distribution for schreiber2018/ismir2018

Figure 25: AOE2 for estimates compared to version schreiber2018/ismir2018. Shown are the mean AOE2 and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn
boeck2015/tempodetector2016_default 1.0000 0.6560 0.5079 0.2735 0.2657 0.3250 0.0369 0.2222
davies2009/mirex_qm_tempotracker 0.6560 1.0000 0.2673 0.1418 0.1332 0.1518 0.0059 0.0967
percival2014/stem 0.5079 0.2673 1.0000 0.5731 0.4019 0.5760 0.0712 0.3608
schreiber2014/default 0.2735 0.1418 0.5731 1.0000 0.9014 0.9621 0.1776 0.6669
schreiber2017/ismir2017 0.2657 0.1332 0.4019 0.9014 1.0000 0.7598 0.1903 0.8240
schreiber2017/mirex2017 0.3250 0.1518 0.5760 0.9621 0.7598 1.0000 0.1668 0.7166
schreiber2018/cnn 0.0369 0.0059 0.0712 0.1776 0.1903 0.1668 1.0000 0.2365
schreiber2018/fcn 0.2222 0.0967 0.3608 0.6669 0.8240 0.7166 0.2365 1.0000

Table 11: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with AOE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator boeck2015/tempodetector2016_default davies2009/mirex_qm_tempotracker percival2014/stem schreiber2014/default schreiber2017/ismir2017 schreiber2017/mirex2017 schreiber2018/cnn schreiber2018/fcn
boeck2015/tempodetector2016_default 1.0000 0.2795 0.7052 0.7866 0.9701 0.5359 0.4390 0.6973
davies2009/mirex_qm_tempotracker 0.2795 1.0000 0.5815 0.1943 0.2619 0.0505 0.0409 0.5820
percival2014/stem 0.7052 0.5815 1.0000 0.8555 0.3654 0.1794 0.1275 0.9947
schreiber2014/default 0.7866 0.1943 0.8555 1.0000 0.7706 0.3209 0.2580 0.8501
schreiber2017/ismir2017 0.9701 0.2619 0.3654 0.7706 1.0000 0.3221 0.2167 0.5901
schreiber2017/mirex2017 0.5359 0.0505 0.1794 0.3209 0.3221 1.0000 0.1712 0.1683
schreiber2018/cnn 0.4390 0.0409 0.1275 0.2580 0.2167 0.1712 1.0000 0.1162
schreiber2018/fcn 0.6973 0.5820 0.9947 0.8501 0.5901 0.1683 0.1162 1.0000

Table 12: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with AOE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

AOE1 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE1 on Tempo-Subsets for schreiber2018/ismir2018

Figure 26: Mean AOE1 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE2 on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE2 on Tempo-Subsets for schreiber2018/ismir2018

Figure 27: Mean AOE2 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE1 for Tempo

When fitting a generalized additive model (GAM) to AOE1-values and a ground truth, what AOE1 can we expect with confidence?

Estimated AOE1 for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on AOE1 for estimates for reference schreiber2018/ismir2018.

Figure 28: AOE1 predictions of a generalized additive model (GAM) fit to AOE1 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE2 for Tempo

When fitting a generalized additive model (GAM) to AOE2-values and a ground truth, what AOE2 can we expect with confidence?

Estimated AOE2 for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on AOE2 for estimates for reference schreiber2018/ismir2018.

Figure 29: AOE2 predictions of a generalized additive model (GAM) fit to AOE2 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG


Generated by tempo_eval 0.1.1 on 2022-06-29 18:50. Size L.