queen
This is the tempo_eval report for the ‘queen’ corpus.
Reports for other corpora may be found here.
Table of Contents
- References for ‘queen’
- Estimates for ‘queen’
- Estimators
- Basic Statistics
- Smoothed Tempo Distribution
- Accuracy
- MIREX-Style Evaluation
- MIREX Results for schreiber2018/ismir2018
- P-Score for schreiber2018/ismir2018
- One Correct for schreiber2018/ismir2018
- Both Correct for schreiber2018/ismir2018
- P-Score on Tempo-Subsets
- One Correct on Tempo-Subsets
- Both Correct on Tempo-Subsets
- Estimated P-Score for Tempo
- Estimated One Correct for Tempo
- Estimated Both Correct for Tempo
- OE1 and OE2
- AOE1 and AOE2
Because reference annotations are not available, we treat the estimate schreiber2018/ismir2018 as reference. It has the highest Mean Mutual Agreement (MMA), based on Accuracy1 with 4% tolerance.
References for ‘queen’
References
schreiber2018/ismir2018
Attribute | Value |
---|---|
Corpus | |
Version | 0.0.3 |
Data Source | Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. |
Annotation Tools | schreiber tempo-cnn (model=ismir2018), https://github.com/hendriks73/tempo-cnn |
Basic Statistics
Reference | Size | Min | Max | Avg | Stdev | Sweet Oct. Start | Sweet Oct. Coverage |
---|---|---|---|---|---|---|---|
schreiber2018/ismir2018 | 51 | 69.00 | 156.00 | 104.10 | 23.31 | 67.00 | 0.92 |
Smoothed Tempo Distribution
Figure 1: Percentage of values in tempo interval.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimates for ‘queen’
Estimators
boeck2015/tempodetector2016_default
Attribute | Value |
---|---|
Corpus | queen |
Version | 0.17.dev0 |
Annotation Tools | TempoDetector.2016, madmom, https://github.com/CPJKU/madmom |
Annotator, bibtex | Boeck2015 |
davies2009/mirex_qm_tempotracker
Attribute | Value | |
---|---|---|
Corpus | queen | |
Version | 1.0 | |
Annotation Tools | QM Tempotracker, Sonic Annotator plugin. https://code.soundsoftware.ac.uk/projects/mirex2013/repository/show/audio_tempo_estimation/qm-tempotracker Note that the current macOS build of ‘qm-vamp-plugins’ was used. | |
Annotator, bibtex | Davies2009 | Davies2007 |
percival2014/stem
Attribute | Value |
---|---|
Corpus | queen |
Version | 1.0 |
Annotation Tools | percival 2014, ‘tempo’ implementation from Marsyas, http://marsyas.info, git checkout tempo-stem |
Annotator, bibtex | Percival2014 |
schreiber2014/default
Attribute | Value |
---|---|
Corpus | queen |
Version | 0.0.1 |
Annotation Tools | schreiber 2014, http://www.tagtraum.com/tempo_estimation.html |
Annotator, bibtex | Schreiber2014 |
schreiber2017/ismir2017
Attribute | Value |
---|---|
Corpus | queen |
Version | 0.0.4 |
Annotation Tools | schreiber 2017, model=ismir2017, http://www.tagtraum.com/tempo_estimation.html |
Annotator, bibtex | Schreiber2017 |
schreiber2017/mirex2017
Attribute | Value |
---|---|
Corpus | queen |
Version | 0.0.4 |
Annotation Tools | schreiber 2017, model=mirex2017, http://www.tagtraum.com/tempo_estimation.html |
Annotator, bibtex | Schreiber2017 |
schreiber2018/cnn
Attribute | Value |
---|---|
Corpus | |
Version | 0.0.3 |
Data Source | Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. |
Annotation Tools | schreiber tempo-cnn (model=cnn), https://github.com/hendriks73/tempo-cnn |
schreiber2018/fcn
Attribute | Value |
---|---|
Corpus | |
Version | 0.0.3 |
Data Source | Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. |
Annotation Tools | schreiber tempo-cnn (model=fcn), https://github.com/hendriks73/tempo-cnn |
Basic Statistics
Estimator | Size | Min | Max | Avg | Stdev | Sweet Oct. Start | Sweet Oct. Coverage |
---|---|---|---|---|---|---|---|
boeck2015/tempodetector2016_default | 51 | 41.96 | 157.89 | 95.53 | 26.50 | 67.00 | 0.82 |
davies2009/mirex_qm_tempotracker | 51 | 71.78 | 166.71 | 120.76 | 26.00 | 81.00 | 0.94 |
percival2014/stem | 51 | 63.41 | 142.56 | 99.08 | 21.25 | 67.00 | 0.96 |
schreiber2014/default | 51 | 59.73 | 143.87 | 97.74 | 21.00 | 63.00 | 0.92 |
schreiber2017/ismir2017 | 51 | 63.09 | 180.19 | 101.61 | 23.67 | 66.00 | 0.94 |
schreiber2017/mirex2017 | 51 | 44.79 | 180.19 | 100.03 | 25.23 | 66.00 | 0.92 |
schreiber2018/cnn | 51 | 63.00 | 157.00 | 103.78 | 24.44 | 67.00 | 0.90 |
schreiber2018/fcn | 51 | 60.00 | 157.00 | 98.20 | 23.64 | 67.00 | 0.92 |
Smoothed Tempo Distribution
Figure 2: Percentage of values in tempo interval.
CSV JSON LATEX PICKLE SVG PDF PNG
Accuracy
Accuracy1 is defined as the percentage of correct estimates, allowing a 4% tolerance for individual BPM values.
Accuracy2 additionally permits estimates to be wrong by a factor of 2, 3, 1/2 or 1/3 (so-called octave errors).
See [Gouyon2006].
Note: When comparing accuracy values for different algorithms, keep in mind that an algorithm may have been trained on the test set or that the test set may have even been created using one of the tested algorithms.
Accuracy Results for schreiber2018/ismir2018
Estimator | Accuracy1 | Accuracy2 |
---|---|---|
schreiber2018/cnn | 0.9412 | 0.9608 |
schreiber2018/fcn | 0.8824 | 0.9216 |
schreiber2017/mirex2017 | 0.8824 | 0.9804 |
schreiber2017/ismir2017 | 0.8824 | 0.9608 |
percival2014/stem | 0.8431 | 0.9412 |
boeck2015/tempodetector2016_default | 0.8235 | 0.9608 |
schreiber2014/default | 0.8235 | 0.9020 |
davies2009/mirex_qm_tempotracker | 0.7647 | 0.9216 |
Table 3: Mean accuracy of estimates compared to version schreiber2018/ismir2018 with 4% tolerance ordered by Accuracy1.
Raw data Accuracy1: CSV JSON LATEX PICKLE
Raw data Accuracy2: CSV JSON LATEX PICKLE
Accuracy1 for schreiber2018/ismir2018
Figure 3: Mean Accuracy1 for estimates compared to version schreiber2018/ismir2018 depending on tolerance.
CSV JSON LATEX PICKLE SVG PDF PNG
Accuracy2 for schreiber2018/ismir2018
Figure 4: Mean Accuracy2 for estimates compared to version schreiber2018/ismir2018 depending on tolerance.
CSV JSON LATEX PICKLE SVG PDF PNG
Differing Items
For which items did a given estimator not estimate a correct value with respect to a given ground truth? Are there items which are either very difficult, not suitable for the task, or incorrectly annotated and therefore never estimated correctly, regardless which estimator is used?
Differing Items Accuracy1
Items with different tempo annotations (Accuracy1, 4% tolerance) in different versions:
schreiber2018/ismir2018 compared with boeck2015/tempodetector2016_default (9 differences): ‘Greatest Hits II/03 Radio Ga Ga’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits II/10 Headlong’ ‘Greatest Hits II/16 The Show Must Go On’ ‘Greatest Hits III/07 Heaven For Everyone’ ‘Greatest Hits III/09 Driven By You’ ‘Greatest Hits III/10 Living On My Own’ ‘Greatest Hits III/12 The Great Pretender’ CSV
schreiber2018/ismir2018 compared with davies2009/mirex_qm_tempotracker (12 differences): ‘Greatest Hits I/01 Bohemian Rhapsody’ ‘Greatest Hits I/05 Bicycle Race’ ‘Greatest Hits I/13 Play The Game’ ‘Greatest Hits II/09 Who Wants To Live Forever’ ‘Greatest Hits II/15 Friends Will Be Friends’ ‘Greatest Hits III/01 The Show Must Go On’ ‘Greatest Hits III/03 Barcelona’ ‘Greatest Hits III/04 Too Much Love Will Kill You’ ‘Greatest Hits III/08 Las Palabras De Amor’ ‘Greatest Hits III/11 Let Me Live’ ‘Greatest Hits III/12 The Great Pretender’ … CSV
schreiber2018/ismir2018 compared with percival2014/stem (8 differences): ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/09 Who Wants To Live Forever’ ‘Greatest Hits II/10 Headlong’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV
schreiber2018/ismir2018 compared with schreiber2014/default (9 differences): ‘Greatest Hits I/01 Bohemian Rhapsody’ ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/01 The Show Must Go On’ ‘Greatest Hits III/03 Barcelona’ ‘Greatest Hits III/09 Driven By You’ ‘Greatest Hits III/15 No-One But You’ CSV
schreiber2018/ismir2018 compared with schreiber2017/ismir2017 (6 differences): ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV
schreiber2018/ismir2018 compared with schreiber2017/mirex2017 (6 differences): ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/03 Barcelona’ CSV
schreiber2018/ismir2018 compared with schreiber2018/cnn (3 differences): ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits III/11 Let Me Live’ CSV
schreiber2018/ismir2018 compared with schreiber2018/fcn (6 differences): ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits III/05 Somebody To Love’ ‘Greatest Hits III/15 No-One But You’ CSV
All tracks were estimated ‘correctly’ by at least one system.
Differing Items Accuracy2
Items with different tempo annotations (Accuracy2, 4% tolerance) in different versions:
schreiber2018/ismir2018 compared with boeck2015/tempodetector2016_default (2 differences): ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/12 The Great Pretender’ CSV
schreiber2018/ismir2018 compared with davies2009/mirex_qm_tempotracker (4 differences): ‘Greatest Hits I/05 Bicycle Race’ ‘Greatest Hits III/03 Barcelona’ ‘Greatest Hits III/04 Too Much Love Will Kill You’ ‘Greatest Hits III/12 The Great Pretender’ CSV
schreiber2018/ismir2018 compared with percival2014/stem (3 differences): ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV
schreiber2018/ismir2018 compared with schreiber2014/default (5 differences): ‘Greatest Hits I/01 Bohemian Rhapsody’ ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/01 The Show Must Go On’ ‘Greatest Hits III/03 Barcelona’ CSV
schreiber2018/ismir2018 compared with schreiber2017/ismir2017 (2 differences): ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV
schreiber2018/ismir2018 compared with schreiber2017/mirex2017 (1 differences): ‘Greatest Hits I/17 We Are The Champions’ CSV
schreiber2018/ismir2018 compared with schreiber2018/cnn (2 differences): ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ CSV
schreiber2018/ismir2018 compared with schreiber2018/fcn (4 differences): ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits III/05 Somebody To Love’ CSV
All tracks were estimated ‘correctly’ by at least one system.
Significance of Differences
Estimator | boeck2015/tempodetector2016_default | davies2009/mirex_qm_tempotracker | percival2014/stem | schreiber2014/default | schreiber2017/ismir2017 | schreiber2017/mirex2017 | schreiber2018/cnn | schreiber2018/fcn |
---|---|---|---|---|---|---|---|---|
boeck2015/tempodetector2016_default | 1.0000 | 0.6476 | 1.0000 | 1.0000 | 0.5488 | 0.5488 | 0.1094 | 0.5811 |
davies2009/mirex_qm_tempotracker | 0.6476 | 1.0000 | 0.4807 | 0.6072 | 0.2379 | 0.2101 | 0.0225 | 0.2379 |
percival2014/stem | 1.0000 | 0.4807 | 1.0000 | 1.0000 | 0.6250 | 0.6875 | 0.1250 | 0.6875 |
schreiber2014/default | 1.0000 | 0.6072 | 1.0000 | 1.0000 | 0.4531 | 0.3750 | 0.1094 | 0.5078 |
schreiber2017/ismir2017 | 0.5488 | 0.2379 | 0.6250 | 0.4531 | 1.0000 | 1.0000 | 0.3750 | 1.0000 |
schreiber2017/mirex2017 | 0.5488 | 0.2101 | 0.6875 | 0.3750 | 1.0000 | 1.0000 | 0.3750 | 1.0000 |
schreiber2018/cnn | 0.1094 | 0.0225 | 0.1250 | 0.1094 | 0.3750 | 0.3750 | 1.0000 | 0.3750 |
schreiber2018/fcn | 0.5811 | 0.2379 | 0.6875 | 0.5078 | 1.0000 | 1.0000 | 0.3750 | 1.0000 |
Table 4: McNemar p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with Accuracy1 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.
Estimator | boeck2015/tempodetector2016_default | davies2009/mirex_qm_tempotracker | percival2014/stem | schreiber2014/default | schreiber2017/ismir2017 | schreiber2017/mirex2017 | schreiber2018/cnn | schreiber2018/fcn |
---|---|---|---|---|---|---|---|---|
boeck2015/tempodetector2016_default | 1.0000 | 0.6250 | 1.0000 | 0.3750 | 1.0000 | 1.0000 | 1.0000 | 0.6875 |
davies2009/mirex_qm_tempotracker | 0.6250 | 1.0000 | 1.0000 | 1.0000 | 0.6875 | 0.3750 | 0.6875 | 1.0000 |
percival2014/stem | 1.0000 | 1.0000 | 1.0000 | 0.7266 | 1.0000 | 0.5000 | 1.0000 | 1.0000 |
schreiber2014/default | 0.3750 | 1.0000 | 0.7266 | 1.0000 | 0.4531 | 0.2188 | 0.4531 | 1.0000 |
schreiber2017/ismir2017 | 1.0000 | 0.6875 | 1.0000 | 0.4531 | 1.0000 | 1.0000 | 1.0000 | 0.6250 |
schreiber2017/mirex2017 | 1.0000 | 0.3750 | 0.5000 | 0.2188 | 1.0000 | 1.0000 | 1.0000 | 0.2500 |
schreiber2018/cnn | 1.0000 | 0.6875 | 1.0000 | 0.4531 | 1.0000 | 1.0000 | 1.0000 | 0.5000 |
schreiber2018/fcn | 0.6875 | 1.0000 | 1.0000 | 1.0000 | 0.6250 | 0.2500 | 0.5000 | 1.0000 |
Table 5: McNemar p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with Accuracy2 [Gouyon2006]. H0: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H0, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.
Accuracy1 on Tempo-Subsets
How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.
Accuracy1 on Tempo-Subsets for schreiber2018/ismir2018
Figure 5: Mean Accuracy1 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.
CSV JSON LATEX PICKLE SVG PDF PNG
Accuracy2 on Tempo-Subsets
How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.
Accuracy2 on Tempo-Subsets for schreiber2018/ismir2018
Figure 6: Mean Accuracy2 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimated Accuracy1 for Tempo
When fitting a generalized additive model (GAM) to Accuracy1-values and a ground truth, what Accuracy1 can we expect with confidence?
Estimated Accuracy1 for Tempo for schreiber2018/ismir2018
Predictions of GAMs trained on Accuracy1 for estimates for reference schreiber2018/ismir2018.
Figure 7: Accuracy1 predictions of a generalized additive model (GAM) fit to Accuracy1 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimated Accuracy2 for Tempo
When fitting a generalized additive model (GAM) to Accuracy2-values and a ground truth, what Accuracy2 can we expect with confidence?
Estimated Accuracy2 for Tempo for schreiber2018/ismir2018
Predictions of GAMs trained on Accuracy2 for estimates for reference schreiber2018/ismir2018.
Figure 8: Accuracy2 predictions of a generalized additive model (GAM) fit to Accuracy2 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.
CSV JSON LATEX PICKLE SVG PDF PNG
MIREX-Style Evaluation
P-Score is defined as the average of two tempi weighted by their perceptual strength, allowing an 8% tolerance for both tempo values [MIREX 2006 Definition].
One Correct is the fraction of estimate pairs of which at least one of the two values is equal to a reference value (within an 8% tolerance).
Both Correct is the fraction of estimate pairs of which both values are equal to the reference values (within an 8% tolerance).
See [McKinney2007].
Note: Very few datasets actually have multiple annotations per track along with a salience distributions. References without suitable annotations are not shown.
MIREX Results for schreiber2018/ismir2018
Estimator | P-Score | One Correct | Both Correct |
---|---|---|---|
schreiber2018/fcn | 0.9628 | 1.0000 | 0.7843 |
schreiber2018/cnn | 0.9586 | 1.0000 | 0.7647 |
schreiber2017/mirex2017 | 0.9461 | 0.9804 | 0.7255 |
schreiber2017/ismir2017 | 0.9261 | 0.9804 | 0.6471 |
schreiber2014/default | 0.9148 | 0.9608 | 0.7059 |
davies2009/mirex_qm_tempotracker | 0.8995 | 0.9608 | 0.6078 |
boeck2015/tempodetector2016_default | 0.8964 | 0.9804 | 0.4902 |
percival2014/stem | 0.7848 | 0.9412 | 0.0000 |
Table 6: Compared to schreiber2018/ismir2018 with 8.0% tolerance.
Raw data P-Score: CSV JSON LATEX PICKLE
Raw data One Correct: CSV JSON LATEX PICKLE
Raw data Both Correct: CSV JSON LATEX PICKLE
P-Score for schreiber2018/ismir2018
Figure 9: Mean P-Score for estimates compared to version schreiber2018/ismir2018 depending on tolerance.
CSV JSON LATEX PICKLE SVG PDF PNG
One Correct for schreiber2018/ismir2018
Figure 10: Mean One Correct for estimates compared to version schreiber2018/ismir2018 depending on tolerance.
CSV JSON LATEX PICKLE SVG PDF PNG
Both Correct for schreiber2018/ismir2018
Figure 11: Mean Both Correct for estimates compared to version schreiber2018/ismir2018 depending on tolerance.
CSV JSON LATEX PICKLE SVG PDF PNG
P-Score on Tempo-Subsets
How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean P-Score for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.
P-Score on Tempo-Subsets for schreiber2018/ismir2018
Figure 12: Mean P-Score for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.
CSV JSON LATEX PICKLE SVG PDF PNG
One Correct on Tempo-Subsets
How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean One Correct for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.
One Correct on Tempo-Subsets for schreiber2018/ismir2018
Figure 13: Mean One Correct for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.
CSV JSON LATEX PICKLE SVG PDF PNG
Both Correct on Tempo-Subsets
How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Both Correct for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.
Both Correct on Tempo-Subsets for schreiber2018/ismir2018
Figure 14: Mean Both Correct for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimated P-Score for Tempo
When fitting a generalized additive model (GAM) to P-Score-values and a ground truth, what P-Score can we expect with confidence?
Estimated P-Score for Tempo for schreiber2018/ismir2018
Predictions of GAMs trained on P-Score for estimates for reference schreiber2018/ismir2018.
Figure 15: P-Score predictions of a generalized additive model (GAM) fit to P-Score results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimated One Correct for Tempo
When fitting a generalized additive model (GAM) to One Correct-values and a ground truth, what One Correct can we expect with confidence?
Estimated One Correct for Tempo for schreiber2018/ismir2018
Predictions of GAMs trained on One Correct for estimates for reference schreiber2018/ismir2018.
Figure 16: One Correct predictions of a generalized additive model (GAM) fit to One Correct results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimated Both Correct for Tempo
When fitting a generalized additive model (GAM) to Both Correct-values and a ground truth, what Both Correct can we expect with confidence?
Estimated Both Correct for Tempo for schreiber2018/ismir2018
Predictions of GAMs trained on Both Correct for estimates for reference schreiber2018/ismir2018.
Figure 17: Both Correct predictions of a generalized additive model (GAM) fit to Both Correct results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.
CSV JSON LATEX PICKLE SVG PDF PNG
OE1 and OE2
OE1 is defined as octave error between an estimate E
and a reference value R
.This means that the most common errors—by a factor of 2 or ½—have the same magnitude, namely 1: OE2(E) = log2(E/R)
.
OE2 is the signed OE1 corresponding to the minimum absolute OE1 allowing the octaveerrors 2, 3, 1/2, and 1/3: OE2(E) = arg minx(|x|) with x ∈ {OE1(E), OE1(2E), OE1(3E), OE1(½E), OE1(⅓E)}
Mean OE1/OE2 Results for schreiber2018/ismir2018
Estimator | OE1_MEAN | OE1_STDEV | OE2_MEAN | OE2_STDEV |
---|---|---|---|---|
schreiber2018/cnn | -0.0082 | 0.2078 | 0.0114 | 0.0595 |
schreiber2018/fcn | -0.0892 | 0.2617 | 0.0284 | 0.0975 |
schreiber2014/default | -0.0886 | 0.2842 | 0.0094 | 0.0722 |
schreiber2017/ismir2017 | -0.0367 | 0.2975 | 0.0026 | 0.0843 |
schreiber2017/mirex2017 | -0.0677 | 0.3108 | 0.0107 | 0.0598 |
percival2014/stem | -0.0687 | 0.3340 | 0.0097 | 0.1030 |
davies2009/mirex_qm_tempotracker | 0.2155 | 0.3779 | 0.0195 | 0.0805 |
boeck2015/tempodetector2016_default | -0.1466 | 0.3969 | 0.0021 | 0.0830 |
Table 7: Mean OE1/OE2 for estimates compared to version schreiber2018/ismir2018 ordered by standard deviation.
Raw data OE1: CSV JSON LATEX PICKLE
Raw data OE2: CSV JSON LATEX PICKLE
OE1 distribution for schreiber2018/ismir2018
Figure 18: OE1 for estimates compared to version schreiber2018/ismir2018. Shown are the mean OE1 and an empirical distribution of the sample, using kernel density estimation (KDE).
CSV JSON LATEX PICKLE SVG PDF PNG
OE2 distribution for schreiber2018/ismir2018
Figure 19: OE2 for estimates compared to version schreiber2018/ismir2018. Shown are the mean OE2 and an empirical distribution of the sample, using kernel density estimation (KDE).
CSV JSON LATEX PICKLE SVG PDF PNG
Significance of Differences
Estimator | boeck2015/tempodetector2016_default | davies2009/mirex_qm_tempotracker | percival2014/stem | schreiber2014/default | schreiber2017/ismir2017 | schreiber2017/mirex2017 | schreiber2018/cnn | schreiber2018/fcn |
---|---|---|---|---|---|---|---|---|
boeck2015/tempodetector2016_default | 1.0000 | 0.0000 | 0.2152 | 0.3461 | 0.1227 | 0.2823 | 0.0211 | 0.3881 |
davies2009/mirex_qm_tempotracker | 0.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0004 | 0.0001 | 0.0001 | 0.0000 |
percival2014/stem | 0.2152 | 0.0000 | 1.0000 | 0.6620 | 0.3778 | 0.9820 | 0.1995 | 0.6343 |
schreiber2014/default | 0.3461 | 0.0000 | 0.6620 | 1.0000 | 0.1912 | 0.6485 | 0.0637 | 0.9847 |
schreiber2017/ismir2017 | 0.1227 | 0.0004 | 0.3778 | 0.1912 | 1.0000 | 0.1737 | 0.4930 | 0.1854 |
schreiber2017/mirex2017 | 0.2823 | 0.0001 | 0.9820 | 0.6485 | 0.1737 | 1.0000 | 0.1747 | 0.6150 |
schreiber2018/cnn | 0.0211 | 0.0001 | 0.1995 | 0.0637 | 0.4930 | 0.1747 | 1.0000 | 0.0301 |
schreiber2018/fcn | 0.3881 | 0.0000 | 0.6343 | 0.9847 | 0.1854 | 0.6150 | 0.0301 | 1.0000 |
Table 8: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with OE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.
Estimator | boeck2015/tempodetector2016_default | davies2009/mirex_qm_tempotracker | percival2014/stem | schreiber2014/default | schreiber2017/ismir2017 | schreiber2017/mirex2017 | schreiber2018/cnn | schreiber2018/fcn |
---|---|---|---|---|---|---|---|---|
boeck2015/tempodetector2016_default | 1.0000 | 0.3109 | 0.6835 | 0.6556 | 0.9801 | 0.5500 | 0.5150 | 0.1524 |
davies2009/mirex_qm_tempotracker | 0.3109 | 1.0000 | 0.5906 | 0.5900 | 0.3123 | 0.5431 | 0.5614 | 0.6077 |
percival2014/stem | 0.6835 | 0.5906 | 1.0000 | 0.9862 | 0.3951 | 0.9350 | 0.8874 | 0.1059 |
schreiber2014/default | 0.6556 | 0.5900 | 0.9862 | 1.0000 | 0.6567 | 0.9219 | 0.8811 | 0.2836 |
schreiber2017/ismir2017 | 0.9801 | 0.3123 | 0.3951 | 0.6567 | 1.0000 | 0.3221 | 0.2938 | 0.0655 |
schreiber2017/mirex2017 | 0.5500 | 0.5431 | 0.9350 | 0.9219 | 0.3221 | 1.0000 | 0.6729 | 0.1224 |
schreiber2018/cnn | 0.5150 | 0.5614 | 0.8874 | 0.8811 | 0.2938 | 0.6729 | 1.0000 | 0.1405 |
schreiber2018/fcn | 0.1524 | 0.6077 | 0.1059 | 0.2836 | 0.0655 | 0.1224 | 0.1405 | 1.0000 |
Table 9: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with OE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.
OE1 on Tempo-Subsets
How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.
OE1 on Tempo-Subsets for schreiber2018/ismir2018
Figure 20: Mean OE1 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.
CSV JSON LATEX PICKLE SVG PDF PNG
OE2 on Tempo-Subsets
How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.
OE2 on Tempo-Subsets for schreiber2018/ismir2018
Figure 21: Mean OE2 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimated OE1 for Tempo
When fitting a generalized additive model (GAM) to OE1-values and a ground truth, what OE1 can we expect with confidence?
Estimated OE1 for Tempo for schreiber2018/ismir2018
Predictions of GAMs trained on OE1 for estimates for reference schreiber2018/ismir2018.
Figure 22: OE1 predictions of a generalized additive model (GAM) fit to OE1 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimated OE2 for Tempo
When fitting a generalized additive model (GAM) to OE2-values and a ground truth, what OE2 can we expect with confidence?
Estimated OE2 for Tempo for schreiber2018/ismir2018
Predictions of GAMs trained on OE2 for estimates for reference schreiber2018/ismir2018.
Figure 23: OE2 predictions of a generalized additive model (GAM) fit to OE2 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.
CSV JSON LATEX PICKLE SVG PDF PNG
AOE1 and AOE2
AOE1 is defined as absolute octave error between an estimate and a reference value: AOE1(E) = |log2(E/R)|
.
AOE2 is the minimum of AOE1 allowing the octave errors 2, 3, 1/2, and 1/3: AOE2(E) = min(AOE1(E), AOE1(2E), AOE1(3E), AOE1(½E), AOE1(⅓E))
.
Mean AOE1/AOE2 Results for schreiber2018/ismir2018
Estimator | AOE1_MEAN | AOE1_STDEV | AOE2_MEAN | AOE2_STDEV |
---|---|---|---|---|
schreiber2018/cnn | 0.0522 | 0.2013 | 0.0130 | 0.0592 |
schreiber2018/fcn | 0.0972 | 0.2588 | 0.0311 | 0.0967 |
schreiber2017/ismir2017 | 0.1060 | 0.2804 | 0.0234 | 0.0810 |
schreiber2014/default | 0.1108 | 0.2763 | 0.0278 | 0.0673 |
schreiber2017/mirex2017 | 0.1126 | 0.2975 | 0.0153 | 0.0588 |
percival2014/stem | 0.1363 | 0.3126 | 0.0310 | 0.0987 |
boeck2015/tempodetector2016_default | 0.1777 | 0.3840 | 0.0240 | 0.0795 |
davies2009/mirex_qm_tempotracker | 0.2155 | 0.3779 | 0.0406 | 0.0722 |
Table 10: Mean AOE1/AOE2 for estimates compared to version schreiber2018/ismir2018 ordered by mean.
Raw data AOE1: CSV JSON LATEX PICKLE
Raw data AOE2: CSV JSON LATEX PICKLE
AOE1 distribution for schreiber2018/ismir2018
Figure 24: AOE1 for estimates compared to version schreiber2018/ismir2018. Shown are the mean AOE1 and an empirical distribution of the sample, using kernel density estimation (KDE).
CSV JSON LATEX PICKLE SVG PDF PNG
AOE2 distribution for schreiber2018/ismir2018
Figure 25: AOE2 for estimates compared to version schreiber2018/ismir2018. Shown are the mean AOE2 and an empirical distribution of the sample, using kernel density estimation (KDE).
CSV JSON LATEX PICKLE SVG PDF PNG
Significance of Differences
Estimator | boeck2015/tempodetector2016_default | davies2009/mirex_qm_tempotracker | percival2014/stem | schreiber2014/default | schreiber2017/ismir2017 | schreiber2017/mirex2017 | schreiber2018/cnn | schreiber2018/fcn |
---|---|---|---|---|---|---|---|---|
boeck2015/tempodetector2016_default | 1.0000 | 0.6560 | 0.5079 | 0.2735 | 0.2657 | 0.3250 | 0.0369 | 0.2222 |
davies2009/mirex_qm_tempotracker | 0.6560 | 1.0000 | 0.2673 | 0.1418 | 0.1332 | 0.1518 | 0.0059 | 0.0967 |
percival2014/stem | 0.5079 | 0.2673 | 1.0000 | 0.5731 | 0.4019 | 0.5760 | 0.0712 | 0.3608 |
schreiber2014/default | 0.2735 | 0.1418 | 0.5731 | 1.0000 | 0.9014 | 0.9621 | 0.1776 | 0.6669 |
schreiber2017/ismir2017 | 0.2657 | 0.1332 | 0.4019 | 0.9014 | 1.0000 | 0.7598 | 0.1903 | 0.8240 |
schreiber2017/mirex2017 | 0.3250 | 0.1518 | 0.5760 | 0.9621 | 0.7598 | 1.0000 | 0.1668 | 0.7166 |
schreiber2018/cnn | 0.0369 | 0.0059 | 0.0712 | 0.1776 | 0.1903 | 0.1668 | 1.0000 | 0.2365 |
schreiber2018/fcn | 0.2222 | 0.0967 | 0.3608 | 0.6669 | 0.8240 | 0.7166 | 0.2365 | 1.0000 |
Table 11: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with AOE1. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.
Estimator | boeck2015/tempodetector2016_default | davies2009/mirex_qm_tempotracker | percival2014/stem | schreiber2014/default | schreiber2017/ismir2017 | schreiber2017/mirex2017 | schreiber2018/cnn | schreiber2018/fcn |
---|---|---|---|---|---|---|---|---|
boeck2015/tempodetector2016_default | 1.0000 | 0.2795 | 0.7052 | 0.7866 | 0.9701 | 0.5359 | 0.4390 | 0.6973 |
davies2009/mirex_qm_tempotracker | 0.2795 | 1.0000 | 0.5815 | 0.1943 | 0.2619 | 0.0505 | 0.0409 | 0.5820 |
percival2014/stem | 0.7052 | 0.5815 | 1.0000 | 0.8555 | 0.3654 | 0.1794 | 0.1275 | 0.9947 |
schreiber2014/default | 0.7866 | 0.1943 | 0.8555 | 1.0000 | 0.7706 | 0.3209 | 0.2580 | 0.8501 |
schreiber2017/ismir2017 | 0.9701 | 0.2619 | 0.3654 | 0.7706 | 1.0000 | 0.3221 | 0.2167 | 0.5901 |
schreiber2017/mirex2017 | 0.5359 | 0.0505 | 0.1794 | 0.3209 | 0.3221 | 1.0000 | 0.1712 | 0.1683 |
schreiber2018/cnn | 0.4390 | 0.0409 | 0.1275 | 0.2580 | 0.2167 | 0.1712 | 1.0000 | 0.1162 |
schreiber2018/fcn | 0.6973 | 0.5820 | 0.9947 | 0.8501 | 0.5901 | 0.1683 | 0.1162 | 1.0000 |
Table 12: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with AOE2. H0: the true mean difference between paired samples is zero. If p<=ɑ, reject H0, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.
AOE1 on Tempo-Subsets
How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE1 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.
AOE1 on Tempo-Subsets for schreiber2018/ismir2018
Figure 26: Mean AOE1 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.
CSV JSON LATEX PICKLE SVG PDF PNG
AOE2 on Tempo-Subsets
How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE2 for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.
AOE2 on Tempo-Subsets for schreiber2018/ismir2018
Figure 27: Mean AOE2 for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimated AOE1 for Tempo
When fitting a generalized additive model (GAM) to AOE1-values and a ground truth, what AOE1 can we expect with confidence?
Estimated AOE1 for Tempo for schreiber2018/ismir2018
Predictions of GAMs trained on AOE1 for estimates for reference schreiber2018/ismir2018.
Figure 28: AOE1 predictions of a generalized additive model (GAM) fit to AOE1 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.
CSV JSON LATEX PICKLE SVG PDF PNG
Estimated AOE2 for Tempo
When fitting a generalized additive model (GAM) to AOE2-values and a ground truth, what AOE2 can we expect with confidence?
Estimated AOE2 for Tempo for schreiber2018/ismir2018
Predictions of GAMs trained on AOE2 for estimates for reference schreiber2018/ismir2018.
Figure 29: AOE2 predictions of a generalized additive model (GAM) fit to AOE2 results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.
CSV JSON LATEX PICKLE SVG PDF PNG
Generated by tempo_eval 0.1.1 on 2022-06-29 18:50. Size L.