Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
List of meta-analyses / independent benchmarking of machine learning and data mining papers

They basically all suggest that apparent improvements to the state of the art in ML and related fields are often not real, or at least the result of factors other than what the authors claim.

The state of sparsity in deep neural networks

What is the state of neural network pruning?

On the State of the Art of Evaluation in Neural Language Models

Do Transformer Modifications Transfer Across Implementations and Applications?

Are we really making much progress? A worrying analysis of recent neural recommendation approaches

Improvements that don't add up: ad-hoc retrieval results since 1998

On the need for time series data mining benchmarks: a survey and empirical demonstration

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

Stop Oversampling for Class Imbalance Learning: A Critical Review

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? An Extensive Empirical Study on Language Tasks

No True State-of-the-Art? OOD Detection Methods are Inconsistent across Datasets

Querying and mining of time series data: experimental comparison of representations and distance measures

Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress

When Do Curricula Work?

Compressed Communication for Distributed DeepLearning: Survey and Quantitative Evaluation

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning

On Empirical Comparisons of Optimizers for Deep Learning

Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification

// this one actually has a consistent finding: nonnegative ICA works best as measured by loss after a fixed number of training iterations, followed by SVD. Initialization for Nonnegative Matrix Factorization: a Comprehensive Review

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? An Extensive Empirical Study on Language Tasks

What's Wrong with Social Science and How to Fix It: Reflections After Reading 2578 Papers. This one is not about ML, but I'm including for relevance. Especially since it shows that even economists, who probably understand statistical testing better than most deep learning researchers, hit basically the same issues as everyone else.

On Efficient Real-Time Semantic Segmentation: A Survey Semantic segmentation actually is making progress, as measured in a standardized experimental setup on fixed hardware.

Leakage and the Reproducibility Crisis in ML-based Science. See also their website

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment