queen

This is the tempo_eval report for the ‘queen’ corpus.

Reports for other corpora may be found here.

References for ‘queen’
Estimates for ‘queen’

Because reference annotations are not available, we treat the estimate schreiber2018/ismir2018 as reference. It has the highest Mean Mutual Agreement (MMA), based on Accuracy1 with 4% tolerance.

References for ‘queen’

References

schreiber2018/ismir2018

Attribute	Value
Corpus
Version	0.0.3
Data Source	Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools	schreiber tempo-cnn (model=ismir2018), https://github.com/hendriks73/tempo-cnn

Basic Statistics

Reference	Size	Min	Max	Avg	Stdev	Sweet Oct. Start	Sweet Oct. Coverage
schreiber2018/ismir2018	51	69.00	156.00	104.10	23.31	67.00	0.92

Table 1: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 1: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimates for ‘queen’

Estimators

boeck2015/tempodetector2016_default

Attribute	Value
Corpus	queen
Version	0.17.dev0
Annotation Tools	TempoDetector.2016, madmom, https://github.com/CPJKU/madmom
Annotator, bibtex	Boeck2015

davies2009/mirex_qm_tempotracker

Attribute	Value
Corpus	queen
Version	1.0
Annotation Tools	QM Tempotracker, Sonic Annotator plugin. https://code.soundsoftware.ac.uk/projects/mirex2013/repository/show/audio_tempo_estimation/qm-tempotracker Note that the current macOS build of ‘qm-vamp-plugins’ was used.
Annotator, bibtex	Davies2009	Davies2007

percival2014/stem

Attribute	Value
Corpus	queen
Version	1.0
Annotation Tools	percival 2014, ‘tempo’ implementation from Marsyas, http://marsyas.info, git checkout tempo-stem
Annotator, bibtex	Percival2014

schreiber2014/default

Attribute	Value
Corpus	queen
Version	0.0.1
Annotation Tools	schreiber 2014, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex	Schreiber2014

schreiber2017/ismir2017

Attribute	Value
Corpus	queen
Version	0.0.4
Annotation Tools	schreiber 2017, model=ismir2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex	Schreiber2017

schreiber2017/mirex2017

Attribute	Value
Corpus	queen
Version	0.0.4
Annotation Tools	schreiber 2017, model=mirex2017, http://www.tagtraum.com/tempo_estimation.html
Annotator, bibtex	Schreiber2017

schreiber2018/cnn

Attribute	Value
Corpus
Version	0.0.3
Data Source	Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools	schreiber tempo-cnn (model=cnn), https://github.com/hendriks73/tempo-cnn

schreiber2018/fcn

Attribute	Value
Corpus
Version	0.0.3
Data Source	Hendrik Schreiber, Meinard Müller. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018.
Annotation Tools	schreiber tempo-cnn (model=fcn), https://github.com/hendriks73/tempo-cnn

Basic Statistics

Estimator	Size	Min	Max	Avg	Stdev	Sweet Oct. Start	Sweet Oct. Coverage
boeck2015/tempodetector2016_default	51	41.96	157.89	95.53	26.50	67.00	0.82
davies2009/mirex_qm_tempotracker	51	71.78	166.71	120.76	26.00	81.00	0.94
percival2014/stem	51	63.41	142.56	99.08	21.25	67.00	0.96
schreiber2014/default	51	59.73	143.87	97.74	21.00	63.00	0.92
schreiber2017/ismir2017	51	63.09	180.19	101.61	23.67	66.00	0.94
schreiber2017/mirex2017	51	44.79	180.19	100.03	25.23	66.00	0.92
schreiber2018/cnn	51	63.00	157.00	103.78	24.44	67.00	0.90
schreiber2018/fcn	51	60.00	157.00	98.20	23.64	67.00	0.92

Table 2: Basic statistics.

CSV JSON LATEX PICKLE

Smoothed Tempo Distribution

Figure 2: Percentage of values in tempo interval.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy

Accuracy₁ is defined as the percentage of correct estimates, allowing a 4% tolerance for individual BPM values.

Accuracy₂ additionally permits estimates to be wrong by a factor of 2, 3, 1/2 or 1/3 (so-called octave errors).

See [Gouyon2006].

Note: When comparing accuracy values for different algorithms, keep in mind that an algorithm may have been trained on the test set or that the test set may have even been created using one of the tested algorithms.

Accuracy Results for schreiber2018/ismir2018

Estimator	Accuracy1	Accuracy2
schreiber2018/cnn	0.9412	0.9608
schreiber2018/fcn	0.8824	0.9216
schreiber2017/mirex2017	0.8824	0.9804
schreiber2017/ismir2017	0.8824	0.9608
percival2014/stem	0.8431	0.9412
boeck2015/tempodetector2016_default	0.8235	0.9608
schreiber2014/default	0.8235	0.9020
davies2009/mirex_qm_tempotracker	0.7647	0.9216

Table 3: Mean accuracy of estimates compared to version schreiber2018/ismir2018 with 4% tolerance ordered by Accuracy₁.

CSV JSON LATEX PICKLE

Raw data Accuracy₁: CSV JSON LATEX PICKLE

Raw data Accuracy₂: CSV JSON LATEX PICKLE

Accuracy₁ for schreiber2018/ismir2018

Figure 3: Mean Accuracy₁ for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy₂ for schreiber2018/ismir2018

Figure 4: Mean Accuracy₂ for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Differing Items

For which items did a given estimator not estimate a correct value with respect to a given ground truth? Are there items which are either very difficult, not suitable for the task, or incorrectly annotated and therefore never estimated correctly, regardless which estimator is used?

Differing Items Accuracy₁

Items with different tempo annotations (Accuracy₁, 4% tolerance) in different versions:

schreiber2018/ismir2018 compared with boeck2015/tempodetector2016_default (9 differences): ‘Greatest Hits II/03 Radio Ga Ga’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits II/10 Headlong’ ‘Greatest Hits II/16 The Show Must Go On’ ‘Greatest Hits III/07 Heaven For Everyone’ ‘Greatest Hits III/09 Driven By You’ ‘Greatest Hits III/10 Living On My Own’ ‘Greatest Hits III/12 The Great Pretender’ CSV

schreiber2018/ismir2018 compared with davies2009/mirex_qm_tempotracker (12 differences): ‘Greatest Hits I/01 Bohemian Rhapsody’ ‘Greatest Hits I/05 Bicycle Race’ ‘Greatest Hits I/13 Play The Game’ ‘Greatest Hits II/09 Who Wants To Live Forever’ ‘Greatest Hits II/15 Friends Will Be Friends’ ‘Greatest Hits III/01 The Show Must Go On’ ‘Greatest Hits III/03 Barcelona’ ‘Greatest Hits III/04 Too Much Love Will Kill You’ ‘Greatest Hits III/08 Las Palabras De Amor’ ‘Greatest Hits III/11 Let Me Live’ ‘Greatest Hits III/12 The Great Pretender’ … CSV

schreiber2018/ismir2018 compared with percival2014/stem (8 differences): ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/09 Who Wants To Live Forever’ ‘Greatest Hits II/10 Headlong’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV

schreiber2018/ismir2018 compared with schreiber2014/default (9 differences): ‘Greatest Hits I/01 Bohemian Rhapsody’ ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/01 The Show Must Go On’ ‘Greatest Hits III/03 Barcelona’ ‘Greatest Hits III/09 Driven By You’ ‘Greatest Hits III/15 No-One But You’ CSV

schreiber2018/ismir2018 compared with schreiber2017/ismir2017 (6 differences): ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV

schreiber2018/ismir2018 compared with schreiber2017/mirex2017 (6 differences): ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/03 Barcelona’ CSV

schreiber2018/ismir2018 compared with schreiber2018/cnn (3 differences): ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits III/11 Let Me Live’ CSV

schreiber2018/ismir2018 compared with schreiber2018/fcn (6 differences): ‘Greatest Hits I/09 Crazy Little Thing Called Love’ ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits III/05 Somebody To Love’ ‘Greatest Hits III/15 No-One But You’ CSV

All tracks were estimated ‘correctly’ by at least one system.

Differing Items Accuracy₂

Items with different tempo annotations (Accuracy₂, 4% tolerance) in different versions:

schreiber2018/ismir2018 compared with boeck2015/tempodetector2016_default (2 differences): ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/12 The Great Pretender’ CSV

schreiber2018/ismir2018 compared with davies2009/mirex_qm_tempotracker (4 differences): ‘Greatest Hits I/05 Bicycle Race’ ‘Greatest Hits III/03 Barcelona’ ‘Greatest Hits III/04 Too Much Love Will Kill You’ ‘Greatest Hits III/12 The Great Pretender’ CSV

schreiber2018/ismir2018 compared with percival2014/stem (3 differences): ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV

schreiber2018/ismir2018 compared with schreiber2014/default (5 differences): ‘Greatest Hits I/01 Bohemian Rhapsody’ ‘Greatest Hits I/07 Don’t Stop Me Now’ ‘Greatest Hits II/08 Breakthru’ ‘Greatest Hits III/01 The Show Must Go On’ ‘Greatest Hits III/03 Barcelona’ CSV

schreiber2018/ismir2018 compared with schreiber2017/ismir2017 (2 differences): ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits III/17 Thank God It’s Christmas’ CSV

schreiber2018/ismir2018 compared with schreiber2017/mirex2017 (1 differences): ‘Greatest Hits I/17 We Are The Champions’ CSV

schreiber2018/ismir2018 compared with schreiber2018/cnn (2 differences): ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ CSV

schreiber2018/ismir2018 compared with schreiber2018/fcn (4 differences): ‘Greatest Hits I/10 Somebody To Love’ ‘Greatest Hits I/17 We Are The Champions’ ‘Greatest Hits II/06 Innuendo’ ‘Greatest Hits III/05 Somebody To Love’ CSV

All tracks were estimated ‘correctly’ by at least one system.

Significance of Differences

Estimator	boeck2015/tempodetector2016_default	davies2009/mirex_qm_tempotracker	percival2014/stem	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn
boeck2015/tempodetector2016_default	1.0000	0.6476	1.0000	1.0000	0.5488	0.5488	0.1094	0.5811
davies2009/mirex_qm_tempotracker	0.6476	1.0000	0.4807	0.6072	0.2379	0.2101	0.0225	0.2379
percival2014/stem	1.0000	0.4807	1.0000	1.0000	0.6250	0.6875	0.1250	0.6875
schreiber2014/default	1.0000	0.6072	1.0000	1.0000	0.4531	0.3750	0.1094	0.5078
schreiber2017/ismir2017	0.5488	0.2379	0.6250	0.4531	1.0000	1.0000	0.3750	1.0000
schreiber2017/mirex2017	0.5488	0.2101	0.6875	0.3750	1.0000	1.0000	0.3750	1.0000
schreiber2018/cnn	0.1094	0.0225	0.1250	0.1094	0.3750	0.3750	1.0000	0.3750
schreiber2018/fcn	0.5811	0.2379	0.6875	0.5078	1.0000	1.0000	0.3750	1.0000

Table 4: McNemar p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with Accuracy₁ [Gouyon2006]. H₀: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H₀, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	davies2009/mirex_qm_tempotracker	percival2014/stem	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn
boeck2015/tempodetector2016_default	1.0000	0.6250	1.0000	0.3750	1.0000	1.0000	1.0000	0.6875
davies2009/mirex_qm_tempotracker	0.6250	1.0000	1.0000	1.0000	0.6875	0.3750	0.6875	1.0000
percival2014/stem	1.0000	1.0000	1.0000	0.7266	1.0000	0.5000	1.0000	1.0000
schreiber2014/default	0.3750	1.0000	0.7266	1.0000	0.4531	0.2188	0.4531	1.0000
schreiber2017/ismir2017	1.0000	0.6875	1.0000	0.4531	1.0000	1.0000	1.0000	0.6250
schreiber2017/mirex2017	1.0000	0.3750	0.5000	0.2188	1.0000	1.0000	1.0000	0.2500
schreiber2018/cnn	1.0000	0.6875	1.0000	0.4531	1.0000	1.0000	1.0000	0.5000
schreiber2018/fcn	0.6875	1.0000	1.0000	1.0000	0.6250	0.2500	0.5000	1.0000

Table 5: McNemar p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with Accuracy₂ [Gouyon2006]. H₀: both estimators disagree with the groundtruth to the same amount. If p<=ɑ, reject H₀, i.e. we have a significant difference in the disagreement with the groundtruth. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Accuracy₁ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy₁ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy₁ on Tempo-Subsets for schreiber2018/ismir2018

Figure 5: Mean Accuracy₁ for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Accuracy₂ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Accuracy₂ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Accuracy₂ on Tempo-Subsets for schreiber2018/ismir2018

Figure 6: Mean Accuracy₂ for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy₁ for Tempo

When fitting a generalized additive model (GAM) to Accuracy₁-values and a ground truth, what Accuracy₁ can we expect with confidence?

Estimated Accuracy₁ for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on Accuracy₁ for estimates for reference schreiber2018/ismir2018.

Figure 7: Accuracy₁ predictions of a generalized additive model (GAM) fit to Accuracy₁ results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Accuracy₂ for Tempo

When fitting a generalized additive model (GAM) to Accuracy₂-values and a ground truth, what Accuracy₂ can we expect with confidence?

Estimated Accuracy₂ for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on Accuracy₂ for estimates for reference schreiber2018/ismir2018.

Figure 8: Accuracy₂ predictions of a generalized additive model (GAM) fit to Accuracy₂ results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

MIREX-Style Evaluation

P-Score is defined as the average of two tempi weighted by their perceptual strength, allowing an 8% tolerance for both tempo values [MIREX 2006 Definition].

One Correct is the fraction of estimate pairs of which at least one of the two values is equal to a reference value (within an 8% tolerance).

Both Correct is the fraction of estimate pairs of which both values are equal to the reference values (within an 8% tolerance).

See [McKinney2007].

Note: Very few datasets actually have multiple annotations per track along with a salience distributions. References without suitable annotations are not shown.

MIREX Results for schreiber2018/ismir2018

Estimator	P-Score	One Correct	Both Correct
schreiber2018/fcn	0.9628	1.0000	0.7843
schreiber2018/cnn	0.9586	1.0000	0.7647
schreiber2017/mirex2017	0.9461	0.9804	0.7255
schreiber2017/ismir2017	0.9261	0.9804	0.6471
schreiber2014/default	0.9148	0.9608	0.7059
davies2009/mirex_qm_tempotracker	0.8995	0.9608	0.6078
boeck2015/tempodetector2016_default	0.8964	0.9804	0.4902
percival2014/stem	0.7848	0.9412	0.0000

Table 6: Compared to schreiber2018/ismir2018 with 8.0% tolerance.

CSV JSON LATEX PICKLE

Raw data P-Score: CSV JSON LATEX PICKLE

Raw data One Correct: CSV JSON LATEX PICKLE

Raw data Both Correct: CSV JSON LATEX PICKLE

P-Score for schreiber2018/ismir2018

Figure 9: Mean P-Score for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

One Correct for schreiber2018/ismir2018

Figure 10: Mean One Correct for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

Both Correct for schreiber2018/ismir2018

Figure 11: Mean Both Correct for estimates compared to version schreiber2018/ismir2018 depending on tolerance.

CSV JSON LATEX PICKLE SVG PDF PNG

P-Score on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean P-Score for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

P-Score on Tempo-Subsets for schreiber2018/ismir2018

Figure 12: Mean P-Score for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

One Correct on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean One Correct for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

One Correct on Tempo-Subsets for schreiber2018/ismir2018

Figure 13: Mean One Correct for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Both Correct on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean Both Correct for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

Both Correct on Tempo-Subsets for schreiber2018/ismir2018

Figure 14: Mean Both Correct for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated P-Score for Tempo

When fitting a generalized additive model (GAM) to P-Score-values and a ground truth, what P-Score can we expect with confidence?

Estimated P-Score for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on P-Score for estimates for reference schreiber2018/ismir2018.

Figure 15: P-Score predictions of a generalized additive model (GAM) fit to P-Score results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated One Correct for Tempo

When fitting a generalized additive model (GAM) to One Correct-values and a ground truth, what One Correct can we expect with confidence?

Estimated One Correct for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on One Correct for estimates for reference schreiber2018/ismir2018.

Figure 16: One Correct predictions of a generalized additive model (GAM) fit to One Correct results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated Both Correct for Tempo

When fitting a generalized additive model (GAM) to Both Correct-values and a ground truth, what Both Correct can we expect with confidence?

Estimated Both Correct for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on Both Correct for estimates for reference schreiber2018/ismir2018.

Figure 17: Both Correct predictions of a generalized additive model (GAM) fit to Both Correct results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

OE₁ and OE₂

OE₁ is defined as octave error between an estimate E and a reference value R.This means that the most common errors—by a factor of 2 or ½—have the same magnitude, namely 1: OE₂(E) = log₂(E/R).

OE₂ is the signed OE₁ corresponding to the minimum absolute OE₁ allowing the octaveerrors 2, 3, 1/2, and 1/3: OE₂(E) = arg min_x(|x|) with x ∈ {OE₁(E), OE₁(2E), OE₁(3E), OE₁(½E), OE₁(⅓E)}

Mean OE₁/OE₂ Results for schreiber2018/ismir2018

Estimator	OE1_MEAN	OE1_STDEV	OE2_MEAN	OE2_STDEV
schreiber2018/cnn	-0.0082	0.2078	0.0114	0.0595
schreiber2018/fcn	-0.0892	0.2617	0.0284	0.0975
schreiber2014/default	-0.0886	0.2842	0.0094	0.0722
schreiber2017/ismir2017	-0.0367	0.2975	0.0026	0.0843
schreiber2017/mirex2017	-0.0677	0.3108	0.0107	0.0598
percival2014/stem	-0.0687	0.3340	0.0097	0.1030
davies2009/mirex_qm_tempotracker	0.2155	0.3779	0.0195	0.0805
boeck2015/tempodetector2016_default	-0.1466	0.3969	0.0021	0.0830

Table 7: Mean OE1/OE2 for estimates compared to version schreiber2018/ismir2018 ordered by standard deviation.

CSV JSON LATEX PICKLE

Raw data OE₁: CSV JSON LATEX PICKLE

Raw data OE₂: CSV JSON LATEX PICKLE

OE₁ distribution for schreiber2018/ismir2018

Figure 18: OE₁ for estimates compared to version schreiber2018/ismir2018. Shown are the mean OE₁ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

OE₂ distribution for schreiber2018/ismir2018

Figure 19: OE₂ for estimates compared to version schreiber2018/ismir2018. Shown are the mean OE₂ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator	boeck2015/tempodetector2016_default	davies2009/mirex_qm_tempotracker	percival2014/stem	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn
boeck2015/tempodetector2016_default	1.0000	0.0000	0.2152	0.3461	0.1227	0.2823	0.0211	0.3881
davies2009/mirex_qm_tempotracker	0.0000	1.0000	0.0000	0.0000	0.0004	0.0001	0.0001	0.0000
percival2014/stem	0.2152	0.0000	1.0000	0.6620	0.3778	0.9820	0.1995	0.6343
schreiber2014/default	0.3461	0.0000	0.6620	1.0000	0.1912	0.6485	0.0637	0.9847
schreiber2017/ismir2017	0.1227	0.0004	0.3778	0.1912	1.0000	0.1737	0.4930	0.1854
schreiber2017/mirex2017	0.2823	0.0001	0.9820	0.6485	0.1737	1.0000	0.1747	0.6150
schreiber2018/cnn	0.0211	0.0001	0.1995	0.0637	0.4930	0.1747	1.0000	0.0301
schreiber2018/fcn	0.3881	0.0000	0.6343	0.9847	0.1854	0.6150	0.0301	1.0000

Table 8: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with OE₁. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	davies2009/mirex_qm_tempotracker	percival2014/stem	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn
boeck2015/tempodetector2016_default	1.0000	0.3109	0.6835	0.6556	0.9801	0.5500	0.5150	0.1524
davies2009/mirex_qm_tempotracker	0.3109	1.0000	0.5906	0.5900	0.3123	0.5431	0.5614	0.6077
percival2014/stem	0.6835	0.5906	1.0000	0.9862	0.3951	0.9350	0.8874	0.1059
schreiber2014/default	0.6556	0.5900	0.9862	1.0000	0.6567	0.9219	0.8811	0.2836
schreiber2017/ismir2017	0.9801	0.3123	0.3951	0.6567	1.0000	0.3221	0.2938	0.0655
schreiber2017/mirex2017	0.5500	0.5431	0.9350	0.9219	0.3221	1.0000	0.6729	0.1224
schreiber2018/cnn	0.5150	0.5614	0.8874	0.8811	0.2938	0.6729	1.0000	0.1405
schreiber2018/fcn	0.1524	0.6077	0.1059	0.2836	0.0655	0.1224	0.1405	1.0000

Table 9: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with OE₂. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

OE₁ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE₁ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE₁ on Tempo-Subsets for schreiber2018/ismir2018

Figure 20: Mean OE₁ for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

OE₂ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean OE₂ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

OE₂ on Tempo-Subsets for schreiber2018/ismir2018

Figure 21: Mean OE₂ for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE₁ for Tempo

When fitting a generalized additive model (GAM) to OE₁-values and a ground truth, what OE₁ can we expect with confidence?

Estimated OE₁ for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on OE₁ for estimates for reference schreiber2018/ismir2018.

Figure 22: OE₁ predictions of a generalized additive model (GAM) fit to OE₁ results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated OE₂ for Tempo

When fitting a generalized additive model (GAM) to OE₂-values and a ground truth, what OE₂ can we expect with confidence?

Estimated OE₂ for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on OE₂ for estimates for reference schreiber2018/ismir2018.

Figure 23: OE₂ predictions of a generalized additive model (GAM) fit to OE₂ results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE₁ and AOE₂

AOE₁ is defined as absolute octave error between an estimate and a reference value: AOE₁(E) = |log₂(E/R)|.

AOE₂ is the minimum of AOE₁ allowing the octave errors 2, 3, 1/2, and 1/3: AOE₂(E) = min(AOE₁(E), AOE₁(2E), AOE₁(3E), AOE₁(½E), AOE₁(⅓E)).

Mean AOE₁/AOE₂ Results for schreiber2018/ismir2018

Estimator	AOE1_MEAN	AOE1_STDEV	AOE2_MEAN	AOE2_STDEV
schreiber2018/cnn	0.0522	0.2013	0.0130	0.0592
schreiber2018/fcn	0.0972	0.2588	0.0311	0.0967
schreiber2017/ismir2017	0.1060	0.2804	0.0234	0.0810
schreiber2014/default	0.1108	0.2763	0.0278	0.0673
schreiber2017/mirex2017	0.1126	0.2975	0.0153	0.0588
percival2014/stem	0.1363	0.3126	0.0310	0.0987
boeck2015/tempodetector2016_default	0.1777	0.3840	0.0240	0.0795
davies2009/mirex_qm_tempotracker	0.2155	0.3779	0.0406	0.0722

Table 10: Mean AOE1/AOE2 for estimates compared to version schreiber2018/ismir2018 ordered by mean.

CSV JSON LATEX PICKLE

Raw data AOE₁: CSV JSON LATEX PICKLE

Raw data AOE₂: CSV JSON LATEX PICKLE

AOE₁ distribution for schreiber2018/ismir2018

Figure 24: AOE₁ for estimates compared to version schreiber2018/ismir2018. Shown are the mean AOE₁ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

AOE₂ distribution for schreiber2018/ismir2018

Figure 25: AOE₂ for estimates compared to version schreiber2018/ismir2018. Shown are the mean AOE₂ and an empirical distribution of the sample, using kernel density estimation (KDE).

CSV JSON LATEX PICKLE SVG PDF PNG

Significance of Differences

Estimator	boeck2015/tempodetector2016_default	davies2009/mirex_qm_tempotracker	percival2014/stem	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn
boeck2015/tempodetector2016_default	1.0000	0.6560	0.5079	0.2735	0.2657	0.3250	0.0369	0.2222
davies2009/mirex_qm_tempotracker	0.6560	1.0000	0.2673	0.1418	0.1332	0.1518	0.0059	0.0967
percival2014/stem	0.5079	0.2673	1.0000	0.5731	0.4019	0.5760	0.0712	0.3608
schreiber2014/default	0.2735	0.1418	0.5731	1.0000	0.9014	0.9621	0.1776	0.6669
schreiber2017/ismir2017	0.2657	0.1332	0.4019	0.9014	1.0000	0.7598	0.1903	0.8240
schreiber2017/mirex2017	0.3250	0.1518	0.5760	0.9621	0.7598	1.0000	0.1668	0.7166
schreiber2018/cnn	0.0369	0.0059	0.0712	0.1776	0.1903	0.1668	1.0000	0.2365
schreiber2018/fcn	0.2222	0.0967	0.3608	0.6669	0.8240	0.7166	0.2365	1.0000

Table 11: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with AOE₁. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

Estimator	boeck2015/tempodetector2016_default	davies2009/mirex_qm_tempotracker	percival2014/stem	schreiber2014/default	schreiber2017/ismir2017	schreiber2017/mirex2017	schreiber2018/cnn	schreiber2018/fcn
boeck2015/tempodetector2016_default	1.0000	0.2795	0.7052	0.7866	0.9701	0.5359	0.4390	0.6973
davies2009/mirex_qm_tempotracker	0.2795	1.0000	0.5815	0.1943	0.2619	0.0505	0.0409	0.5820
percival2014/stem	0.7052	0.5815	1.0000	0.8555	0.3654	0.1794	0.1275	0.9947
schreiber2014/default	0.7866	0.1943	0.8555	1.0000	0.7706	0.3209	0.2580	0.8501
schreiber2017/ismir2017	0.9701	0.2619	0.3654	0.7706	1.0000	0.3221	0.2167	0.5901
schreiber2017/mirex2017	0.5359	0.0505	0.1794	0.3209	0.3221	1.0000	0.1712	0.1683
schreiber2018/cnn	0.4390	0.0409	0.1275	0.2580	0.2167	0.1712	1.0000	0.1162
schreiber2018/fcn	0.6973	0.5820	0.9947	0.8501	0.5901	0.1683	0.1162	1.0000

Table 12: Paired t-test p-values, using reference annotations schreiber2018/ismir2018 as groundtruth with AOE₂. H₀: the true mean difference between paired samples is zero. If p<=ɑ, reject H₀, i.e. we have a significant difference between estimates from the two algorithms. In the table, p-values<0.05 are set in bold.

CSV JSON LATEX PICKLE

AOE₁ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE₁ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE₁ on Tempo-Subsets for schreiber2018/ismir2018

Figure 26: Mean AOE₁ for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

AOE₂ on Tempo-Subsets

How well does an estimator perform, when only taking a subset of the reference annotations into account? The graphs show mean AOE₂ for reference subsets with tempi in [T-10,T+10] BPM. Note that the graphs do not show confidence intervals and that some values may be based on very few estimates.

AOE₂ on Tempo-Subsets for schreiber2018/ismir2018

Figure 27: Mean AOE₂ for estimates compared to version schreiber2018/ismir2018 for tempo intervals around T.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE₁ for Tempo

When fitting a generalized additive model (GAM) to AOE₁-values and a ground truth, what AOE₁ can we expect with confidence?

Estimated AOE₁ for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on AOE₁ for estimates for reference schreiber2018/ismir2018.

Figure 28: AOE₁ predictions of a generalized additive model (GAM) fit to AOE₁ results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Estimated AOE₂ for Tempo

When fitting a generalized additive model (GAM) to AOE₂-values and a ground truth, what AOE₂ can we expect with confidence?

Estimated AOE₂ for Tempo for schreiber2018/ismir2018

Predictions of GAMs trained on AOE₂ for estimates for reference schreiber2018/ismir2018.

Figure 29: AOE₂ predictions of a generalized additive model (GAM) fit to AOE₂ results for schreiber2018/ismir2018. The 95% confidence interval around the prediction is shaded in gray.

CSV JSON LATEX PICKLE SVG PDF PNG

Generated by tempo_eval 0.1.1 on 2022-06-29 18:50. Size L.

queen

Table of Contents

References for ‘queen’

References

schreiber2018/ismir2018

Basic Statistics

Smoothed Tempo Distribution

Estimates for ‘queen’

Estimators

boeck2015/tempodetector2016_default

davies2009/mirex_qm_tempotracker

percival2014/stem

schreiber2014/default

schreiber2017/ismir2017

schreiber2017/mirex2017

schreiber2018/cnn

schreiber2018/fcn

Basic Statistics

Smoothed Tempo Distribution

Accuracy

Accuracy Results for schreiber2018/ismir2018

Accuracy1 for schreiber2018/ismir2018

Accuracy2 for schreiber2018/ismir2018

Differing Items

Differing Items Accuracy1

Differing Items Accuracy2

Significance of Differences

Accuracy1 on Tempo-Subsets

Accuracy1 on Tempo-Subsets for schreiber2018/ismir2018

Accuracy2 on Tempo-Subsets

Accuracy2 on Tempo-Subsets for schreiber2018/ismir2018

Estimated Accuracy1 for Tempo

Estimated Accuracy1 for Tempo for schreiber2018/ismir2018

Estimated Accuracy2 for Tempo

Estimated Accuracy2 for Tempo for schreiber2018/ismir2018

MIREX-Style Evaluation

MIREX Results for schreiber2018/ismir2018

P-Score for schreiber2018/ismir2018

One Correct for schreiber2018/ismir2018

Both Correct for schreiber2018/ismir2018

P-Score on Tempo-Subsets

P-Score on Tempo-Subsets for schreiber2018/ismir2018

One Correct on Tempo-Subsets

One Correct on Tempo-Subsets for schreiber2018/ismir2018

Both Correct on Tempo-Subsets

Both Correct on Tempo-Subsets for schreiber2018/ismir2018

Estimated P-Score for Tempo

Estimated P-Score for Tempo for schreiber2018/ismir2018

Estimated One Correct for Tempo

Estimated One Correct for Tempo for schreiber2018/ismir2018

Estimated Both Correct for Tempo

Estimated Both Correct for Tempo for schreiber2018/ismir2018

OE1 and OE2

Mean OE1/OE2 Results for schreiber2018/ismir2018

OE1 distribution for schreiber2018/ismir2018

OE2 distribution for schreiber2018/ismir2018

Significance of Differences

OE1 on Tempo-Subsets

OE1 on Tempo-Subsets for schreiber2018/ismir2018

OE2 on Tempo-Subsets

OE2 on Tempo-Subsets for schreiber2018/ismir2018

Estimated OE1 for Tempo

Estimated OE1 for Tempo for schreiber2018/ismir2018

Estimated OE2 for Tempo

Estimated OE2 for Tempo for schreiber2018/ismir2018

AOE1 and AOE2

Mean AOE1/AOE2 Results for schreiber2018/ismir2018

AOE1 distribution for schreiber2018/ismir2018

AOE2 distribution for schreiber2018/ismir2018

Significance of Differences

AOE1 on Tempo-Subsets

AOE1 on Tempo-Subsets for schreiber2018/ismir2018

AOE2 on Tempo-Subsets

AOE2 on Tempo-Subsets for schreiber2018/ismir2018

Estimated AOE1 for Tempo

Estimated AOE1 for Tempo for schreiber2018/ismir2018

Estimated AOE2 for Tempo

Estimated AOE2 for Tempo for schreiber2018/ismir2018

Accuracy₁ for schreiber2018/ismir2018

Accuracy₂ for schreiber2018/ismir2018

Differing Items Accuracy₁

Differing Items Accuracy₂

Accuracy₁ on Tempo-Subsets

Accuracy₁ on Tempo-Subsets for schreiber2018/ismir2018

Accuracy₂ on Tempo-Subsets

Accuracy₂ on Tempo-Subsets for schreiber2018/ismir2018

Estimated Accuracy₁ for Tempo

Estimated Accuracy₁ for Tempo for schreiber2018/ismir2018

Estimated Accuracy₂ for Tempo

Estimated Accuracy₂ for Tempo for schreiber2018/ismir2018

OE₁ and OE₂

Mean OE₁/OE₂ Results for schreiber2018/ismir2018

OE₁ distribution for schreiber2018/ismir2018

OE₂ distribution for schreiber2018/ismir2018

OE₁ on Tempo-Subsets

OE₁ on Tempo-Subsets for schreiber2018/ismir2018

OE₂ on Tempo-Subsets

OE₂ on Tempo-Subsets for schreiber2018/ismir2018

Estimated OE₁ for Tempo

Estimated OE₁ for Tempo for schreiber2018/ismir2018

Estimated OE₂ for Tempo

Estimated OE₂ for Tempo for schreiber2018/ismir2018

AOE₁ and AOE₂

Mean AOE₁/AOE₂ Results for schreiber2018/ismir2018

AOE₁ distribution for schreiber2018/ismir2018

AOE₂ distribution for schreiber2018/ismir2018

AOE₁ on Tempo-Subsets

AOE₁ on Tempo-Subsets for schreiber2018/ismir2018

AOE₂ on Tempo-Subsets

AOE₂ on Tempo-Subsets for schreiber2018/ismir2018

Estimated AOE₁ for Tempo

Estimated AOE₁ for Tempo for schreiber2018/ismir2018

Estimated AOE₂ for Tempo

Estimated AOE₂ for Tempo for schreiber2018/ismir2018