jayralencar/A Survey of Data Augmentation Approaches for NLP.json

## A Survey of Data Augmentation Approaches for NLP.json
{
    "title": "A Survey of Data Augmentation Approaches for NLP",
    "url":"https://arxiv.org/pdf/2105.03075.pdf",
    "year": 2021,
    "author": "Feng et al.",
    "sections" : [
        {
            "title": "Background",
            "subsections": [
                {
                    "title": "What is data augmentation?",
                    "articles": [
                        {
                            "title": "A survey on Image Data Augmentation for Deep Learning",
                            "year": 2019,
                            "author": "Shorten and Khoshgoftaar",
                            "abstract": "Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data. Unfortunately, many application domains do not have access to big data, such as medical image analysis. This survey focuses on Data Augmentation, a data-space solution to the problem of limited data. Data Augmentation encompasses a suite of techniques that enhance the size and quality of training datasets such that better Deep Learning models can be built using them. The image augmentation algorithms discussed in this survey include geometric transformations, color space augmentations, kernel filters, mixing images, random erasing, feature space augmentation, adversarial training, generative adversarial networks, neural style transfer, and meta-learning. The application of augmentation methods based on GANs are heavily covered in this survey. In addition to augmentation techniques, this paper will briefly discuss other characteristics of Data Augmentation such as test-time augmentation, resolution impact, final dataset size, and curriculum learning. This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing Data Augmentation. Readers will understand how Data Augmentation can improve the performance of their models and expand limited datasets to take advantage of the capabilities of big data.",
                            "urls_google_scholar": [
                                "https://link.springer.com/article/10.1186/s40537-019-0197-0?code=a6ae644c-3bfc-43d9-b292-82d77d5890d5",
                                "https://search.proquest.com/openview/37cea85c33a967e56e2a3ef8757017ed/1?pq-origsite=gscholar&cbl=2046140",
                                "https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0197-0/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/3813b88a4ec3c63919df47e9694b577f4691f7e5",
                                "https://www.semanticscholar.org/paper/A-survey-on-Image-Data-Augmentation-for-Deep-Shorten-Khoshgoftaar/3813b88a4ec3c63919df47e9694b577f4691f7e5"
                            ],
                            "context_in_section": "Data augmentation (DA) encompasses methods of increasing training data diversity without directly collecting more data. Most strategies either add slightly modified copies of existing data or create synthetic data, aiming for the augmented data to act as a regularizer and reduce overfitting when training ML models."
                        },
                        {
                            "title": "Data augmentation instead of explicit regularization",
                            "year": 2020,
                            "author": "Hernández-García and König",
                            "abstract": "Contrary to most machine learning models, modern deep artificial neural networks typically include multiple components that contribute to regularization. Despite the fact that some (explicit) regularization techniques, such as weight decay and dropout, require costly fine-tuning of sensitive hyperparameters, the interplay between them and other elements that provide implicit regularization is not well understood yet. Shedding light upon these interactions is key to efficiently using computational resources and may contribute to solving the puzzle of generalization in deep learning. Here, we first provide formal definitions of explicit and implicit regularization that help understand essential differences between techniques. Second, we contrast data augmentation with weight decay and dropout. Our results show that visual object categorization models trained with data augmentation alone achieve the same performance or higher than models trained also with weight decay and dropout, as is common practice. We conclude that the contribution on generalization of weight decay and dropout is not only superfluous when sufficient implicit regularization is provided, but also such techniques can dramatically deteriorate the performance if the hyperparameters are not carefully tuned for the architecture and data set. In contrast, data augmentation systematically provides large generalization gains and does not require hyperparameter re-tuning. In view of our results, we suggest to optimize neural networks without weight decay and dropout to save computational resources, hence carbon emissions, and focus more on data augmentation and other inductive biases to improve performance and robustness. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1806.03852",
                                "https://openreview.net/forum?id=H1eqOnNYDH",
                                "https://www.researchgate.net/profile/Alex-Hernandez-Garcia/publication/325709564_Data_augmentation_instead_of_explicit_regularization/links/5b223d720f7e9b0e3740c868/Data-augmentation-instead-of-explicit-regularization.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180603852H/abstract",
                                "https://openreview.net/forum?id=ByJWeR1AW"
                            ],
                            "urls_semantic_scholar" : [
                                "https://www.semanticscholar.org/paper/5a118cc15a3b3be19af4f2ea72c1dcb77e5f755b",
                                "https://www.semanticscholar.org/paper/Data-augmentation-instead-of-explicit-Hern%C3%A1ndez-Garc%C3%ADa-K%C3%B6nig/5a118cc15a3b3be19af4f2ea72c1dcb77e5f755b"
                            ],
                            "context_in_section": "Data augmentation (DA) encompasses methods of increasing training data diversity without directly collecting more data. Most strategies either add slightly modified copies of existing data or create synthetic data, aiming for the augmented data to act as a regularizer and reduce overfitting when training ML models."
                        }
                    ]
                },
                {
                    "title": "What are the goals and trade-offs?",
                    "articles": [
                        {
                            "title": "Character-level Convolutional Networks for Text Classification",
                            "year": 2015,
                            "author": "Zhang et al.",
                            "abstract": "This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.",
                            "urls_google_scholar": [
                                "http://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-clasclas-sification.pdf",
                                "https://papers.nips.cc/paper/5782-character-level-convolutional-networks-fortext-classification.pdf",
                                "https://openreview.net/forum?id=By4yduZd-S",
                                "http://xzh.me/docs/charconvnet.pdf",
                                "http://www.chrissmurphy.com/wp-content/uploads/2019/01/CNN-YANN-NYU-1509.01626.pdf",
                                "https://dl.acm.org/doi/abs/10.5555/2969239.2969312",
                                "https://5y1.org/download/64151eca7bdc5ebdeb91acd0fcbf21b5.pdf",
                                "https://www.researchgate.net/profile/Xiang-Zhang-115/publication/281607724_Character-level_Convolutional_Networks_for_Text_Classification/links/56097c5e08ae1396914a1af7/Character-level-Convolutional-Networks-for-Text-Classification.pdf",
                                "https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1052.7263&rep=rep1&type=pdf",
                                "https://ui.adsabs.harvard.edu/abs/2015arXiv150901626Z/abstract",
                                "https://nyuscholars.nyu.edu/en/publications/character-level-convolutional-networks-for-text-classification",
                                "https://arxiv.org/abs/1509.01626",
                                "http://people.ee.duke.edu/~lcarin/Zhe11.18.2016.pdf",
                                "https://www.academia.edu/download/59205432/%E4%B8%AD%E6%96%87%E6%83%85%E6%84%9F%E5%88%86%E6%9E%90lstm20190510-82953-13pxp3w.pdf",
                                "http://nyc.lti.cs.cmu.edu/classes/11-741/Papers/zhang-nips15.pdf",
                                "http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1052.7263&rep=rep1&type=pdf"
                            ],
                            "urls_semantic_scholar" : [
                                "https://www.semanticscholar.org/paper/Character-level-Convolutional-Networks-for-Text-Zhang-Zhao/51a55df1f023571a7e07e338ee45a3e3d66ef73e",
                                "https://www.semanticscholar.org/paper/51a55df1f023571a7e07e338ee45a3e3d66ef73e"
                            ],
                            "context_in_section": "Despite challenges associated with text, many DA techniques for NLP have been proposed, ranging from rule-based manipulations."
                        },
                        {
                            "title": "Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation",
                            "year": 2020,
                            "author": "Liu et al.",
                            "abstract": "Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2012.02952",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv201202952L/abstract",
                                "https://www.aclweb.org/anthology/2020.emnlp-main.726.pdf",
                                "https://aclanthology.org/2020.emnlp-main.726.pdf"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/61e2f89f902fdc3c5e155504c74adb6621add442",
                                "https://www.semanticscholar.org/paper/Data-Boost%3A-Text-Data-Augmentation-through-Learning-Liu-Xu/61e2f89f902fdc3c5e155504c74adb6621add442"
                            ],
                            "context_in_section": "to more complicated generative approaches."
                        },
                        {
                            "title": "Robust Training under Linguistic Adversity",
                            "year": 2017,
                            "author": "Li et al.",
                            "abstract": "Deep neural networks have achieved remarkable results across many language processing tasks, however they have been shown to be susceptible to overfitting and highly sensitive to noise, including adversarial attacks. In this work, we propose a linguistically-motivated approach for training robust models based on exposing the model to corrupted text examples at training time. We consider several flavours of linguistically plausible corruption, include lexical semantic and syntactic methods. Empirically, we evaluate our method with a convolutional neural model across a range of sentiment analysis datasets. Compared with a baseline and the dropout method, our method achieves better overall performance.",
                            "urls_google_scholar": [
                                "https://www.aclweb.org/anthology/E17-2004.pdf",
                                "https://www.aclweb.org/anthology/E17-2.pdf#page=53"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/73300838d524d062e8341b242765fb6efaf48f43",
                                "https://www.semanticscholar.org/paper/Robust-Training-under-Linguistic-Adversity-Baldwin-Cohn/73300838d524d062e8341b242765fb6efaf48f43"
                            ],
                            "context_in_section": "Rule-based techniques are easy-to-implement but usually offer incremental performance improvements."
                        },
                        {
                            "title": "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks",
                            "year": 2019,
                            "author": "Wei and Zou",
                            "abstract": "We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1901.11196",
                                "https://openreview.net/forum?id=BJelsDvo84",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190111196W/abstract",
                                "https://www.aclweb.org/anthology/D19-1670/"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/162cad5df347bdac469331df540440b320b5aa21",
                                "https://www.semanticscholar.org/paper/EDA%3A-Easy-Data-Augmentation-Techniques-for-Boosting-Wei-Zou/162cad5df347bdac469331df540440b320b5aa21"
                            ],
                            "context_in_section": "Rule-based techniques are easy-to-implement but usually offer incremental performance improvements."
                        },
                        {
                            "title": "Text Augmentation in a Multi-Task View",
                            "year": 2021,
                            "author": "Wei et al.",
                            "abstract": "Traditional data augmentation aims to increase the coverage of the input distribution by generating augmented examples that strongly resemble original samples in an online fashion where augmented examples dominate training. In this paper, we propose an alternative perspective—a multi-task view (MTV) of data augmentation—in which the primary task trains on original examples and the auxiliary task trains on augmented examples. In MTV data augmentation, both original and augmented samples are weighted substantively during training, relaxing the constraint that augmented examples must resemble original data and thereby allowing us to apply stronger augmentation functions. In empirical experiments using four common data augmentation techniques on three benchmark text classification datasets, we find that using the MTV leads to higher and more robust performance than traditional augmentation.",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2101.05469",
                                "http://arxiv-download.xixiaoyao.cn/pdf/2101.05469.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2021arXiv210105469W/abstract",
                                "https://www.aclweb.org/anthology/2021.eacl-main.252/",
                                "https://aclanthology.org/2021.eacl-main.252.pdf",
                                "https://arxiv-download.xixiaoyao.cn/pdf/2101.05469.pdf"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Text-Augmentation-in-a-Multi-Task-View-Wei-Huang/bd0a7c8f4d5a7460a44883b8e9fc81e654eaa4b8",
                                "https://www.semanticscholar.org/paper/bd0a7c8f4d5a7460a44883b8e9fc81e654eaa4b8"
                            ],
                            "context_in_section": "Rule-based techniques are easy-to-implement but usually offer incremental performance improvements."
                        },
                        {
                            "title": "Quantifying the Evaluation of Heuristic Methods for Textual Data Augmentation",
                            "year": 2020,
                            "author": "Kashefi and Hwa ",
                            "abstract": "Data augmentation has been shown to be effective in providing more training data for machine learning and resulting in more robust classifiers. However, for some problems, there may be multiple augmentation heuristics, and the choices of which one to use may significantly impact the success of the training. In this work, we propose a metric for evaluating augmentation heuristics; specifically, we quantify the extent to which an example is “hard to distinguish” by considering the difference between the distribution of the augmented samples of different classes. Experimenting with multiple heuristics in two prediction tasks (positive/negative sentiment and verbosity/conciseness) validates our claims by revealing the connection between the distribution difference of different classes and the classification accuracy.",
                            "urls_google_scholar": [
                                "https://www.aclweb.org/anthology/2020.wnut-1.26/",
                                "https://www.aclweb.org/anthology/2020.wnut-1.26.pdf",
                                "https://par.nsf.gov/biblio/10248657",
                                "https://www.researchgate.net/profile/Omid-Kashefi/publication/346546763_Quantifying_the_Evaluation_of_Heuristic_Methods_for_Textual_Data_Augmentation/links/5fc6a4f8299bf188d4e8d09e/Quantifying-the-Evaluation-of-Heuristic-Methods-for-Textual-Data-Augmentation.pdf"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Quantifying-the-Evaluation-of-Heuristic-Methods-for-Kashefi-Hwa/8df7a8e0572e6e3da86b4c868efadae6ca61e781",
                                "https://www.semanticscholar.org/paper/8df7a8e0572e6e3da86b4c868efadae6ca61e781",
                                "https://www.semanticscholar.org/paper/Quantifying-the-Evaluation-of-Heuristic-Methods-for-Kashefi-Hwa/8df7a8e0572e6e3da86b4c868efadae6ca61e781/figure/0"
                            ],
                            "context_in_section": "Kashefi and Hwa (2020) devise a KL-Divergence-based unsupervised procedure to preemptively choose among DA heuristics, rather than a typical \"run-all-heuristics\" comparison, which can be very time and cost intensive"
                        }
                    ]
                },
                {
                    "title": "Interpretation of DA",
                    "alternative_titles": ["Interpretation of Data Augmentation"],
                    "articles": [
                        {
                            "title": "A Kernel Theory of Modern Data Augmentation",
                            "year": 2019,
                            "author": "Dao et al.",
                            "abstract": "Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines. In this paper, we seek to establish a theoretical framework for understanding data augmentation. We approach this from two directions: First, we provide a general model of augmentation as a Markov process, and show that kernels appear naturally with respect to this model, even when we do not employ kernel classification. Next, we analyze more directly the effect of augmentation on kernel classifiers, showing that data augmentation can be approximated by first-order feature averaging and second-order variance regularization components. These frameworks both serve to illustrate the ways in which data augmentation affects the downstream learning model, and the resulting analyses provide novel connections between prior work in invariant kernels, tangent propagation, and robust optimization. Finally, we provide several proof-of-concept applications showing that our theory can be useful for accelerating machine learning workflows, such as reducing the amount of computation needed to train using augmented data, and predicting the utility of a transformation prior to training. ",
                            "urls_google_scholar": [
                                "http://proceedings.mlr.press/v97/dao19b.html",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6879382/",
                                "https://pubmed.ncbi.nlm.nih.gov/31777848/",
                                "https://icml.cc/media/Slides/icml/2019/101(11-14-00)-11-15-05-4715-a_kernel_theory.pdf",
                                "https://openreview.net/forum?id=Hy-2coZdZS",
                                "https://arxiv.org/abs/1803.06084",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180306084D/abstract",
                                "http://proceedings.mlr.press/v97/dao19b/dao19b-supp.pdf",
                                "https://europepmc.org/article/med/31777848",
                                "https://pdfs.semanticscholar.org/f2f2/5e4b550998dda05a8f253356c603249e1ea4.pdf"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/494e63c4a333bf3be8a28dc0213b0e0d39c2ef04",
                                "https://www.semanticscholar.org/paper/A-Kernel-Theory-of-Modern-Data-Augmentation-Dao-Gu/494e63c4a333bf3be8a28dc0213b0e0d39c2ef04",
                                "https://pdfs.semanticscholar.org/aa45/44f209147fc744b861380c0e5fd807eaca0d.pdf"
                            ],
                            "context_in_section": "Dao et al. (2019) note that \"data augmentation is typically performed in an adhoc manner with little understanding of the underlying theoretical principles\", and claim the typical explanation of DA as regularization to be insufficient. Dao et al. (2019) think of DA transformations as kernels, and find two ways DA helps: averaging of features and variance regularization"
                        },
                        {
                            "title": "Training with Noise is Equivalent to Tikhonov Regularization",
                            "year": 1995,
                            "author": "Bishop",
                            "abstract": "It is well known that the addition of noise to the input data of a neural network during training can, in some circumstances, lead to significant improvements in generalization performance. Previous work has shown that such training with noise is equivalent to a form of regularization in which an extra term is added to the error function. However, the regularization term, which involves second derivatives of the error function, is not bounded below, and so can lead to difficulties if used directly in a learning algorithm based on error minimization. In this paper we show that, for the purposes of network training, the regularization term can be reduced to a positive definite form which involves only first derivatives of the network mapping. For a sum-of-squares error function, the regularization term belongs to the class of generalized Tikhonov regularizers. Direct minimization of the regularized error function provides a practical alternative to training with noise.",
                            "urls_google_scholar": [
                                "https://ieeexplore.ieee.org/abstract/document/6796505/",
                                "https://www.mitpressjournals.org/doi/abs/10.1162/neco.1995.7.1.108",
                                "https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/bishop-tikhonov-nc-95.pdf",
                                "http://pages.cs.wisc.edu/~yliang/cs760_spring21/Bishop_Noise.pdf",
                                "https://www.research.ed.ac.uk/en/publications/training-with-noise-is-equivalent-to-tikhonov-regularization",
                                "https://dl.acm.org/doi/abs/10.1162/neco.1995.7.1.108",
                                "https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.112.3270&rep=rep1&type=pdf",
                                "https://research.aston.ac.uk/en/publications/training-with-noise-is-equivalent-to-tikhonov-regularization",
                                "http://cognet.mit.edu/journal/10.1162/neco.1995.7.1.108",
                                "https://ci.nii.ac.jp/naid/80008055235/",
                                "https://direct.mit.edu/neco/article-abstract/7/1/108/5828",
                                "https://www.academia.edu/download/62261332/bishop-tikhonov-nc-9520200303-87950-kk4x93.pdf",
                                "http://cognet.mit.edu/node/30208",
                                "http://pages.cs.wisc.edu/~yliang/cs760_spring20/Bishop_Noise.pdf",
                                "https://ci.nii.ac.jp/naid/30036176655/",
                                "http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.3008&rep=rep1&type=pdf"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/c3ecd8e19e016d15670c8953b4b9afaa5186b0f3",
                                "https://www.semanticscholar.org/paper/Training-with-Noise-is-Equivalent-to-Tikhonov-Bishop/c3ecd8e19e016d15670c8953b4b9afaa5186b0f3"
                            ],
                            "context_in_section": "Bishop (1995) show training with noised examples is reducible to Tikhonov regularization (subsumes L2)."
                        },
                        {
                            "title": "Does Data Augmentation Lead to Positive Margin?",
                            "year": 2019,
                            "author": "Rajput et al.",
                            "abstract": " Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness. DA artificially expands the training set by applying random noise, rotations, crops, or even adversarial perturbations to the input data. Although DA is widely used, its capacity to provably improve robustness is not fully understood. In this work, we analyze the robustness that DA begets by quantifying the margin that DA enforces on empirical risk minimizers. We first focus on linear separators, and then a class of nonlinear models whose labeling is constant within small convex hulls of data points. We present lower bounds on the number of augmented data points required for non-zero margin, and show that commonly used DA techniques may only introduce significant margin after adding exponentially many points to the data set. ",
                            "urls_google_scholar": [
                                "http://proceedings.mlr.press/v97/rajput19a.html",
                                "https://par.nsf.gov/servlets/purl/10110114",
                                "http://postersession.ai.s3.amazonaws.com/b255f1d8-6939-4a12-a92b-73ee2c38f4e9.pdf",
                                "http://proceedings.mlr.press/v97/rajput19a/rajput19a-supp.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190503177R/abstract",
                                "https://openreview.net/forum?id=Sy4cmibd-H",
                                "https://arxiv.org/abs/1905.03177",
                                "https://par.nsf.gov/biblio/10149504"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/6976743376db2432243a99ef7b411308bcc8d1b8",
                                "https://www.semanticscholar.org/paper/Does-Data-Augmentation-Lead-to-Positive-Margin-Rajput-Feng/6976743376db2432243a99ef7b411308bcc8d1b8"
                            ],
                            "context_in_section": "Rajput et al. (2019) show that DA can increase the positive margin for classifiers, but only when augmenting exponentially many examples for common DA methods."
                        },
                        {
                            "title": "A Group-Theoretic Framework for Data Augmentation",
                            "year": 2020,
                            "author": "Chen et al.",
                            "abstract": "Data augmentation is a widely used trick when training deep neural networks: in addition to the original data, properly transformed data are also added to the training set. However, to the best of our knowledge, a clear mathematical framework to explain the performance benefits of data augmentation is not available. In this paper, we develop such a theoretical framework. We show data augmentation is equivalent to an averaging operation over the orbits of a certain group that keeps the data distribution approximately invariant. We prove that it leads to variance reduction. We study empirical risk minimization, and the examples of exponential families, linear regression, and certain two-layer neural networks. We also discuss how data augmentation could be used in problems with symmetry where other approaches are prevalent, such as in cryo-electron microscopy (cryo-EM). ",
                            "urls_google_scholar": [
                                "http://proceedings.mlr.press/v97/rajput19a.html",
                                "https://par.nsf.gov/servlets/purl/10110114",
                                "http://postersession.ai.s3.amazonaws.com/b255f1d8-6939-4a12-a92b-73ee2c38f4e9.pdf",
                                "http://proceedings.mlr.press/v97/rajput19a/rajput19a-supp.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190503177R/abstract",
                                "https://openreview.net/forum?id=Sy4cmibd-H",
                                "https://arxiv.org/abs/1905.03177",
                                "https://par.nsf.gov/biblio/10149504"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/35bb704396a670da64e774e35f87e4cc482ec6cb",
                                "https://www.semanticscholar.org/paper/A-Group-Theoretic-Framework-for-Data-Augmentation-Chen-Dobriban/35bb704396a670da64e774e35f87e4cc482ec6cb"
                            ],
                            "context_in_section": " Chen et al. (2020d) show that DA leads to variance reduction by averaging over orbits of the group that keep the data distribution approximately invariant."
                        }
                    ]
                }
            ]
        },
        {
            "title": "Techniques & Methods",
            "subsections":[
                {
                    "title":"Rule-Based Techniques",
                    "articles": [
                        {
                            "title": "Low-shot Visual Recognition by Shrinking and Hallucinating Features",
                            "year": 2017,
                            "author": "Hariharan and Girshick",
                            "abstract": "Low-shot visual learning---the ability to recognize novel object categories from very few examples---is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a low-shot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then propose a) representation regularization techniques, and b) techniques to hallucinate additional training examples for data-starved classes. Together, our methods improve the effectiveness of convolutional networks in low-shot learning, improving the one-shot accuracy on novel classes by 2.3x on the challenging ImageNet dataset. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1606.02819",
                                "http://openaccess.thecvf.com/content_iccv_2017/html/Hariharan_Low-Shot_Visual_Recognition_ICCV_2017_paper.html",
                                "https://ui.adsabs.harvard.edu/abs/2016arXiv160602819H/abstract",
                                "https://ieeexplore.ieee.org/abstract/document/8237590/",
                                "https://research.fb.com/wp-content/uploads/2017/09/1523.pdf",
                                "https://onikle.com/articles/31207"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Low-Shot-Visual-Recognition-by-Shrinking-and-Hariharan-Girshick/7c7ab73469c25437332f5c1c1c5cb67c7b2f0855",
                                "https://www.semanticscholar.org/paper/7c7ab73469c25437332f5c1c1c5cb67c7b2f0855",
                                "https://www.semanticscholar.org/paper/Low-Shot-Visual-Recognition-by-Shrinking-and-Hariharan-Girshick/7c7ab73469c25437332f5c1c1c5cb67c7b2f0855/figure/3",
                                "https://www.semanticscholar.org/paper/Low-Shot-Visual-Recognition-by-Shrinking-and-Hariharan-Girshick/7c7ab73469c25437332f5c1c1c5cb67c7b2f0855/figure/4"
                            ],
                            "context_in_section": " Many few-shot learning approaches (Hariharan and Girshick, 2017; Schwartz et al., 2018) leverage estimated feature space \"analogy\" transformations between examples of known classes to augment for novel classes (see §4.4)"
                        },
                        {
                            "title": "Δ-encoder: an effective sample synthesis method for few-shot object recognition",
                            "year": 2018,
                            "author": "Schwartz et al.",
                            "abstract": "Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we propose a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted Δ-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or \"deltas\", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves the state-of-the-art of one-shot object-recognition and performs comparably in the few-shot case.",
                            "urls_google_scholar": [
                                "https://dl.acm.org/doi/abs/10.5555/3327144.3327208",
                                "https://arxiv.org/abs/1806.04734",
                                "https://openreview.net/forum?id=DBa9gVvdVJ0",
                                "https://onikle.com/articles/19111",
                                "https://scent-project.eu/wp-content/uploads/2019/10/Data-Encoder-An-Effective-Sample-Synthesis-Method-for-Few-Shot-Object-Recognition.pdf",
                                "http://papers.neurips.cc/paper/7549-delta-encoder-an-effective-sample-synthesis-method-for-few-shot-object-recognition.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180604734S/abstract",
                                "https://vista.cs.technion.ac.il/wp-content/uploads/2018/09/poster_SchKarShtHarMarKumFerGirBroNIPS18.pdf",
                                "https://pdfs.semanticscholar.org/6d55/daee74b83af740d7e4c04a404e84ec1f55d4.pdf",
                                "https://neurips.cc/media/Slides/nips/2018/220e(05-09-45)-05-09-50-12634-Delta-encoder:_.pdf",
                                "https://openreview.net/forum?id=HJZOtvZu-H"
                            ],
                            "urls_semantic_scholar":[
                                "https://pdfs.semanticscholar.org/6d55/daee74b83af740d7e4c04a404e84ec1f55d4.pdf",
                                "https://www.semanticscholar.org/paper/77e8e3e578b8abd17303f13af1df22080fd6afbb",
                                "https://www.semanticscholar.org/paper/Delta-encoder%3A-an-effective-sample-synthesis-method-Schwartz-Karlinsky/77e8e3e578b8abd17303f13af1df22080fd6afbb",
                                "https://www.semanticscholar.org/paper/Delta-encoder%3A-an-effective-sample-synthesis-method-Schwartz-Karlinsky/77e8e3e578b8abd17303f13af1df22080fd6afbb/figure/1"
                            ],
                            "context_in_section": " Many few-shot learning approaches (Hariharan and Girshick, 2017; Schwartz et al., 2018) leverage estimated feature space \"analogy\" transformations between examples of known classes to augment for novel classes (see §4.4)"
                        },
                        {
                            "title": "Data Augmentation with Manifold Exploring Geometric Transformations for Increased Performance and Robustness",
                            "year": 2019,
                            "author": "Paschali et al.",
                            "abstract": "This augmentation method populates any training dataset with images that lie on the border of the manifolds between two-classes and maximizes the variance the network is exposed to during training. Our method was thoroughly evaluated on the challenging tasks of fine-grained skin lesion classification from limited data, and breast tumor classification of mammograms. Compared with traditional augmentation methods, and with images synthesized by Generative Adversarial Networks our method not only achieves state-of-the-art performance but also significantly improves the network's robustness. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1901.04420",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190104420P/abstract",
                                "https://www.researchgate.net/profile/Walter-Simson/publication/330382208_Data_Augmentation_with_Manifold_Exploring_Geometric_Transformations_for_Increased_Performance_and_Robustness/links/5cee8f5ca6fdcc791692cea2/Data-Augmentation-with-Manifold-Exploring-Geometric-Transformations-for-Increased-Performance-and-Robustness.pdf",
                                "http://far.in.tum.de/pub/paschali2019ipmi/paschali2019ipmi.pdf",
                                "http://campar.in.tum.de/pub/paschali2019ipmi/paschali2019ipmi.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/362758f77d607aae1aac5c541aa2663822f6302b",
                                "https://www.semanticscholar.org/paper/Data-Augmentation-with-Manifold-Exploring-Geometric-Paschali-Simson/362758f77d607aae1aac5c541aa2663822f6302b"
                            ],
                            "context_in_section": "Paschali et al. (2019) use iterative affine transformations and projections to maximally \"stretch\" an example along the class-manifold."
                        },
                        {
                            "title": "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks",
                            "year": 2019,
                            "author": "Wei and Zou",
                            "abstract": "We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1901.11196",
                                "https://openreview.net/forum?id=BJelsDvo84",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190111196W/abstract",
                                "https://www.aclweb.org/anthology/D19-1670/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/162cad5df347bdac469331df540440b320b5aa21",
                                "https://www.semanticscholar.org/paper/EDA%3A-Easy-Data-Augmentation-Techniques-for-Boosting-Wei-Zou/162cad5df347bdac469331df540440b320b5aa21",
                                "https://www.semanticscholar.org/paper/EDA%3A-Easy-Data-Augmentation-Techniques-for-Boosting-Wei-Zou/162cad5df347bdac469331df540440b320b5aa21/figure/3"
                            ],
                            "context_in_section": "Wei and Zou (2019) propose EASY DATA AUGMENTATION (EDA), a set of token-level random perturbation operations including random insertion, deletion, and swap. They show improved performance on many text classification task"
                        },
                        {
                            "title": "Unsupervised Data Augmentation for Consistency Training",
                            "year": 2020,
                            "author": "Xie et al.",
                            "abstract": "Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used. Code is available at https://github.com/google-research/uda.",
                            "urls_google_scholar": [
                                "https://proceedings.neurips.cc/paper/2020/hash/44feb0096faa8326192570788b38c1d1-Abstract.html",
                                "https://arxiv.org/abs/1904.12848",
                                "https://www1.cgmh.org.tw/intr/intr2/c3sf00/caim/Content/doc/JR/PDF/190801%20Unsupervised%20Data%20Augmentation.pdf",
                                "https://static.aminer.cn/upload/pdf/904/524/1211/5f0bde8e9e795ea206ff8ef5_1.pdf",
                                "https://onikle.com/articles/19153",
                                "https://openreview.net/forum?id=ByeL1R4FvS",
                                "http://arxiv-export-lb.library.cornell.edu/abs/1904.12848",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190412848X/abstract",
                                "https://openreview.net/forum?id=o9iAYQM5Ma"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Unsupervised-Data-Augmentation-for-Consistency-Xie-Dai/0feea94f89d395436bf41bd10c797447eecbc128/figure/7",
                                "https://www.semanticscholar.org/paper/Unsupervised-Data-Augmentation-for-Consistency-Xie-Dai/0feea94f89d395436bf41bd10c797447eecbc128/figure/2",
                                "https://www.semanticscholar.org/paper/Unsupervised-Data-Augmentation-for-Consistency-Xie-Dai/0feea94f89d395436bf41bd10c797447eecbc128/figure/11",
                                "https://www.semanticscholar.org/paper/Unsupervised-Data-Augmentation-for-Consistency-Xie-Dai/0feea94f89d395436bf41bd10c797447eecbc128"
                            ],
                            "context_in_section": "UDA (Xie et al., 2020) show how supervised DA methods can be exploited for unsupervised data through consistency training on (x, DA(x)) pairs."
                        },
                        {
                            "title": "Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory",
                            "year": 2020,
                            "author": "Chen et al.",
                            "abstract": "Most NLP datasets are manually labeled, so suffer from inconsistent labeling or limited size. We propose methods for automatically improving datasets by viewing them as graphs with expected semantic properties. We construct a paraphrase graph from the provided sentence pair labels, and create an augmented dataset by directly inferring labels from the original sentence pairs using a transitivity property. We use structural balance theory to identify likely mislabelings in the graph, and flip their labels. We evaluate our methods on paraphrase models trained using these datasets starting from a pretrained BERT model, and find that the automatically-enhanced training sets result in more accurate models.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.findings-emnlp.426/",
                                "https://arxiv.org/abs/2011.01856",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv201101856C/abstract",
                                "https://arxiv.org/pdf/2011.01856.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/55ff1999a059ddc90e5f797dbbbb802989971ba9",
                                "https://www.semanticscholar.org/paper/Finding-Friends-and-Flipping-Frenemies%3A-Automatic-Chen-Ji/55ff1999a059ddc90e5f797dbbbb802989971ba9"
                            ],
                            "context_in_section": "For paraphrase identification, Chen et al. (2020b) construct a signed graph over the data, with individual sentences as nodes and pair labels as signed edges. They use balance theory and transitivity to infer augmented sentence pairs from this graph."
                        },
                        {
                            "title": "Data Augmentation via Dependency Tree Morphing for Low-Resource Languages",
                            "year": 2018,
                            "author": "Şahin and Steedman",
                            "abstract": "Neural NLP systems achieve high scores in the presence of sizable training dataset. Lack of such datasets leads to poor system performances in the case low-resource languages. We present two simple text augmentation techniques using dependency trees, inspired from image processing. We “crop” sentences by removing dependency links, and we “rotate” sentences by moving the tree fragments around the root. We apply these techniques to augment the training sets of low-resource languages in Universal Dependencies project. We implement a character-level sequence tagging model and evaluate the augmented datasets on part-of-speech tagging task. We show that crop and rotate provides improvements over the models trained with non-augmented data for majority of the languages, especially for languages with rich case marking systems.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/D18-1545/",
                                "https://arxiv.org/abs/1903.09460",
                                "https://www.aclweb.org/anthology/D18-1545.pdf",
                                "https://openreview.net/forum?id=B1-SbzGd-H",
                                "https://public.ukp.informatik.tu-darmstadt.de/UKP_Webpage/publications/2018/2018_EMNLP_GG_Data_Augmentation.pdf",
                                "https://www.research.ed.ac.uk/en/publications/data-augmentation-via-dependency-tree-morphing-for-low-resource-l",
                                "https://www.researchgate.net/profile/Goezde-Isgueder/publication/328968573_Data_Augmentation_via_Dependency_Tree_Morphing_for_Low-Resource_Languages/links/5bede066a6fdcc3a8dd9aa98/Data-Augmentation-via-Dependency-Tree-Morphing-for-Low-Resource-Languages.pdf",
                                "https://www.aclweb.org/anthology/D18-1545/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/7a75dcef18df48157743226574f85ca4dd0f110b",
                                "https://www.semanticscholar.org/paper/Data-Augmentation-via-Dependency-Tree-Morphing-for-Sahin-Steedman/7a75dcef18df48157743226574f85ca4dd0f110b/figure/0",
                                "https://www.semanticscholar.org/paper/Data-Augmentation-via-Dependency-Tree-Morphing-for-Sahin-Steedman/7a75dcef18df48157743226574f85ca4dd0f110b"
                            ],
                            "context_in_section": "Motivated by image cropping and rotation,  ̧Sahin and Steedman (2018) propose dependency tree morphing. For dependency-annotated sentences, children of the same parent are swapped (à la rotation) or some deleted (à la cropping), as seen in Figure 2. This is most beneficial for language families with rich case marking systems (e.g. Baltic and Slavic)."
                        }
                    ]
                },
                {
                    "title":"Example Interpolation Techniques",
                    "articles": [
                        {
                            "title": "mixup: Beyond Empirical Risk Minimization",
                            "year": 2017,
                            "author": "Zhang et al.",
                            "abstract": "     Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1710.09412",
                                "https://ryansaxe.com/pdfs/mixup.pdf",
                                "https://openreview.net/forum?id=r1Ddp1-Rb&;noteId=r1Ddp1-Rb),",
                                "http://arxiv-download.xixiaoyao.cn/pdf/2101.05469.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2017arXiv171009412Z/abstract",
                                "https://research.fb.com/wp-content/uploads/2018/03/mixup_beyond-empirical-risk-minimization.pdf",
                                "http://personeltest.ru/aways/arxiv.org/pdf/1710.09412.pdf",
                                "https://openreview.net/forum?id=r1Ddp1-Rb&"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/mixup%3A-Beyond-Empirical-Risk-Minimization-Zhang-Ciss%C3%A9/4feef0fd284feb1233399b400eb897f59ec92755",
                                "https://www.semanticscholar.org/paper/4feef0fd284feb1233399b400eb897f59ec92755"
                            ],
                            "context_in_section": "Another class of DA techniques, pioneered by MIXUP (Zhang et al., 2017), interpolates the inputs and labels of two or more real examples. This class of techniques is also sometimes referred to as Mixed Sample Data Augmentation (MSDA). "
                        },
                        {
                            "title": "Manifold Mixup: Better Representations by Interpolating Hidden States",
                            "year": 2019,
                            "author": "Verma et al.",
                            "abstract": " Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose \\manifoldmixup{}, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. \\manifoldmixup{} leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with \\manifoldmixup{} learn flatter class-representations, that is, with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it empirically on practical situations, and connect it to the previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, \\manifoldmixup{} improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood. ",
                            "urls_google_scholar": [
                                "http://proceedings.mlr.press/v97/verma19a.html",
                                "http://proceedings.mlr.press/v97/verma19a/verma19a.pdf",
                                "https://www.researchgate.net/profile/Vikas-Verma-23/publication/325778354_Manifold_Mixup_Encouraging_Meaningful_On-Manifold_Interpolation_as_a_Regularizer/links/5dc06154a6fdcc2128046fac/Manifold-Mixup-Encouraging-Meaningful-On-Manifold-Interpolation-as-a-Regularizer.pdf",
                                "https://research.fb.com/wp-content/uploads/2019/06/Manifold-Mixup-Better-Representations-by-Interpolating-Hidden-States.pdf",
                                "https://research.aalto.fi/files/38544180/Verma_Manifold_Mixup.19a_1.pdf",
                                "https://openreview.net/forum?id=HJZDQh-OZB",
                                "https://acris.aalto.fi/ws/portalfiles/portal/38544180/Verma_Manifold_Mixup.19a_1.pdf",
                                "https://aaltodoc.aalto.fi/handle/123456789/41271",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180605236V/abstract",
                                "https://arxiv.org/abs/1806.05236",
                                "https://onikle.com/articles/46506",
                                "https://www.researchgate.net/profile/Amir-Najafi-3/publication/333430495_Manifold_Mixup_Better_Representations_by_Interpolating_Hidden_States/links/5ced65ba299bf109da771131/Manifold-Mixup-Better-Representations-by-Interpolating-Hidden-States.pdf",
                                "https://research.aalto.fi/en/publications/manifold-mixup-better-representations-by-interpolating-hidden-sta",
                                "https://aaltodoc.aalto.fi/doc_public/export/bibtex/?url=https://aaltodoc.aalto.fi/handle/123456789/41271"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Manifold-Mixup%3A-Better-Representations-by-Hidden-Verma-Lamb/1b59eea8ec4684381a885b59acd09c9151a49487",
                                "https://www.semanticscholar.org/paper/Manifold-Mixup%3A-Learning-Better-Representations-by-Verma-Lamb/61d7ca903206136209943fe351752dc61389ea11"
                            ],
                            "context_in_section": "Ensuing work has explored interpolating inner components (Verma et al., 2019; Faramarzi et al., 2020), more general mixing schemes (Guo, 2020), and adding adversaries (Beckham et al., 2019)."
                        },
                        {
                            "title": "PatchUp: A Regularization Technique for Convolutional Neural Networks",
                            "year": 2020,
                            "author": "Faramarzi et al.",
                            "abstract": "Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs), that is applied on selected contiguous blocks of feature maps from a random pair of samples. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches like Mixup and CutMix. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR-10, CIFAR-100, and SVHN datasets with PreactResnet18, PreactResnet34, and WideResnet-28-10 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide better generalization to affine transformations of samples and is more robust against adversarial attacks. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2006.07794"  ,
                                "https://arxiv.org/pdf/2006.07794.pdf",
                                "https://gaokeji.info/pdf/2006.07794.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv200607794F/abstract"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/PatchUp%3A-A-Regularization-Technique-for-Neural-Faramarzi-Amini/be21780b2e2fa8abb2243e91d5af5c7bd49d4079"
                            ],
                            "context_in_section": "Ensuing work has explored interpolating inner components (Verma et al., 2019; Faramarzi et al., 2020), more general mixing schemes (Guo, 2020), and adding adversaries (Beckham et al., 2019)."
                        },
                        {
                            "title": "Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification ",
                            "year": 2020,
                            "author": "Guo",
                            "abstract": "Data augmentation with Mixup (Zhang et al. 2018) has shown to be an effective model regularizer for current art deep classification networks. It generates out-of-manifold samples through linearly interpolating inputs and their corresponding labels of random sample pairs. Despite its great successes, Mixup requires convex combination of the inputs as well as the modeling targets of a sample pair, thus significantly limits the space of its synthetic samples and consequently its regularization effect. To cope with this limitation, we propose “nonlinear Mixup”. Unlike Mixup where the input and label pairs share the same, linear, scalar mixing policy, our approach embraces nonlinear interpolation policy for both the input and label pairs, where the mixing policy for the labels is adaptively learned based on the mixed input. Experiments on benchmark sentence classification datasets indicate that our approach significantly improves upon Mixup. Our empirical studies also show that the out-of-manifold samples generated by our strategy encourage training samples in each class to form a tight representation cluster that is far from others.",
                            "urls_google_scholar": [
                                "https://ojs.aaai.org/index.php/AAAI/article/view/5822",
                                "https://aiide.org/ojs/index.php/AAAI/article/view/5822"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Nonlinear-Mixup%3A-Out-Of-Manifold-Data-Augmentation-Guo/f09c89465e6facacd8c27c5648624b311653bf5b"
                            ],
                            "context_in_section": "Ensuing work has explored interpolating inner components (Verma et al., 2019; Faramarzi et al., 2020), more general mixing schemes (Guo, 2020), and adding adversaries (Beckham et al., 2019)."
                        },
                        {
                            "title": "On Adversarial Mixup Resynthesis",
                            "year": 2019,
                            "author": "Beckham et al.",
                            "abstract": "In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders. We explore models that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of semi-supervised learning, where we learn a mixing function whose objective is to produce interpolations of hidden states, or masked combinations of latent representations that are consistent with a conditioned class label. We show quantitative and qualitative evidence that such a formulation is an interesting avenue of research.",
                            "urls_google_scholar": [
                                "https://papers.nips.cc/paper/2019/hash/f708f064faaf32a43e4d3c784e6af9ea-Abstract.html",
                                "https://arxiv.org/pdf/1903.02709",
                                "http://papers.nips.cc/paper/8686-on-adversarial-mixup-resynthesis",
                                "https://openreview.net/forum?id=ryMsu4BlIr",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190302709B/abstract",
                                "https://research.aalto.fi/en/publications/on-adversarial-mixup-resynthesis",
                                "http://postersession.ai.s3.amazonaws.com/5ccaf533-75f3-4b84-ac62-26e48e11702e.pdf",
                                "https://www.researchgate.net/profile/Vikas-Verma-23/publication/331587962_Adversarial_Mixup_Resynthesizers/links/5dc0615692851c81802c4ef8/Adversarial-Mixup-Resynthesizers.pdf",
                                "https://dl.acm.org/doi/abs/10.5555/3454287.3454678",
                                "https://openreview.net/pdf?id=ge3sG7jIhB",
                                "http://papers.nips.cc/paper/8686-on-adversarial-mixup-resynthesis-supplemental.zip"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/On-Adversarial-Mixup-Resynthesis-Beckham-Honari/f1aa40ba7e3166744955ceae2e6d8d60515e7021"
                            ],
                            "context_in_section": "Ensuing work has explored interpolating inner components (Verma et al., 2019; Faramarzi et al., 2020), more general mixing schemes (Guo, 2020), and adding adversaries (Beckham et al., 2019)."
                        },
                        {
                            "title": "CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features",
                            "year": 2019,
                            "author": "Yub et al.",
                            "abstract": " Regional dropout strategies have been proposed to enhance performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as opposed to head of a person), thereby letting the network generalize better and have better object localization capabilities. On the other hand, current methods for regional dropout removes informative pixels on training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it suffers from information loss causing inefficiency in training. We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout, CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task. Moreover, unlike previous augmentation methods, our CutMix-trained ImageNet classifier, when used as a pretrained model, results in consistent performance gain in Pascal detection and MS-COCO image captioning benchmarks. We also show that CutMix can improve the model robustness against input corruptions and its out-of distribution detection performance.",
                            "urls_google_scholar": [
                                "https://openaccess.thecvf.com/content_ICCV_2019/html/Yun_CutMix_Regularization_Strategy_to_Train_Strong_Classifiers_With_Localizable_Features_ICCV_2019_paper.html",
                                "https://openaccess.thecvf.com/content_ICCV_2019/papers/Yun_CutMix_Regularization_Strategy_to_Train_Strong_Classifiers_With_Localizable_Features_ICCV_2019_paper.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190504899Y/abstract",
                                "https://openreview.net/pdf?id=TERxSfkLrj",
                                "https://www.computer.org/csdl/proceedings-article/iccv/2019/480300g022/1hVlwBUnh4s",
                                "https://www.researchgate.net/profile/Youngjoon-Yoo/publication/333078138_CutMix_Regularization_Strategy_to_Train_Strong_Classifiers_with_Localizable_Features/links/5ced198ca6fdcc18c8e7745f/CutMix-Regularization-Strategy-to-Train-Strong-Classifiers-with-Localizable-Features.pdf",
                                "https://deepai.org/publication/cutmix-regularization-strategy-to-train-strong-classifiers-with-localizable-features",
                                "https://onikle.com/articles/9846",
                                "https://ieeexplore.ieee.org/abstract/document/9008296/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/CutMix%3A-Regularization-Strategy-to-Train-Strong-Yun-Han/ed17929e66da7f8fbc3666bf5eb613d302ddde0c"
                            ],
                            "context_in_section": "Another class of extensions of MIXUP which has been growing in the vision community attempts to fuse raw input image pairs together into a single input image, rather than improve the continuous interpolation mechanism. Examples of this paradigm include CUTMIX (Yun et al., 2019), CUTOUT (DeVries and Taylor, 2017) and COPY-PASTE (Ghiasi et al., 2020). For instance, CUTMIX replaces a small sub-region of Image A with a patch sampled from Image B, with the labels mixed in proportion to sub-region sizes. There is potential to borrow ideas and inspiration from these works for NLP, e.g. for multimodal work involving both images and text (see \"Multimodal challenges\" in §6)."
                        },
                        {
                            "title": "Improved Regularization of Convolutional Neural Networks with Cutout",
                            "year": 2017,
                            "author": "DeVries and Taylor",
                            "abstract": "Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance. We evaluate this method by applying it to current state-of-the-art architectures on the CIFAR-10, CIFAR-100, and SVHN datasets, yielding new state-of-the-art results of 2.56%, 15.20%, and 1.30% test error respectively. Code is available at this https URL ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1708.04552",
                                "https://arxiv.org/pdf/1708.04552.pdf",
                                "https://onikle.com/articles/30764",
                                "https://ui.adsabs.harvard.edu/abs/2017arXiv170804552D/abstract",
                                "https://openreview.net/pdf?id=Hylu9l9OwH",
                                "https://eva.fing.edu.uy/pluginfile.php/316645/mod_resource/content/1/1708.04552.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Improved-Regularization-of-Convolutional-Neural-Devries-Taylor/eb35fdc11a325f21a8ce0ca65058f7480a2fc91f"
                            ],
                            "context_in_section": "Another class of extensions of MIXUP which has been growing in the vision community attempts to fuse raw input image pairs together into a single input image, rather than improve the continuous interpolation mechanism. Examples of this paradigm include CUTMIX (Yun et al., 2019), CUTOUT (DeVries and Taylor, 2017) and COPY-PASTE (Ghiasi et al., 2020)."
                        },
                        {
                            "title": "Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation",
                            "year": 2021,
                            "author": "Ghiasi et al.",
                            "abstract": "Building instance segmentation models that are data-efficient and can handle rare object categories is an important challenge in computer vision. Leveraging data augmentations is a promising direction towards addressing this challenge. Here, we perform a systematic study of the Copy-Paste augmentation ([13, 12]) for instance segmentation where we randomly paste objects onto an image. Prior studies on Copy-Paste relied on modeling the surrounding visual context for pasting the objects. However, we find that the simple mechanism of pasting objects randomly is good enough and can provide solid gains on top of strong baselines. Furthermore, we show Copy-Paste is additive with semi-supervised methods that leverage extra data through pseudo labeling (e.g. self-training). On COCO instance segmentation, we achieve 49.1 mask AP and 57.3 box AP, an improvement of +0.6 mask AP and +1.5 box AP over the previous state-of-the-art. We further demonstrate that Copy-Paste can lead to significant improvements on the LVIS benchmark. Our baseline model outperforms the LVIS 2020 Challenge winning entry by +3.6 mask AP on rare categories. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2012.07177",
                                "http://openaccess.thecvf.com/content/CVPR2021/html/Ghiasi_Simple_Copy-Paste_Is_a_Strong_Data_Augmentation_Method_for_Instance_CVPR_2021_paper.html",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv201207177G/abstract"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Simple-Copy-Paste-is-a-Strong-Data-Augmentation-for-Ghiasi-Cui/914a593b7f2e980470075a9955f1407641669a8f"
                            ],
                            "context_in_section": "Another class of extensions of MIXUP which has been growing in the vision community attempts to fuse raw input image pairs together into a single input image, rather than improve the continuous interpolation mechanism. Examples of this paradigm include CUTMIX (Yun et al., 2019), CUTOUT (DeVries and Taylor, 2017) and COPY-PASTE (Ghiasi et al., 2020)."
                        },
                        {
                            "title": "MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification",
                            "year": 2020,
                            "author": "Chen et al.",
                            "abstract": "This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation method called TMix. TMix creates a large amount of augmented training samples by interpolating text in hidden space. Moreover, we leverage recent advances in data augmentation to guess low-entropy labels for unlabeled data, hence making them as easy to use as labeled data. By mixing labeled, unlabeled and augmented data, MixText significantly outperformed current pre-trained and fined-tuned models and other state-of-the-art semi-supervised learning methods on several text classification benchmarks. The improvement is especially prominent when supervision is extremely limited. We have publicly released our code at https://github.com/GT-SALT/MixText.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.acl-main.194/",
                                "https://aclanthology.org/2020.acl-main.194.pdf",
                                "https://arxiv.org/abs/2004.12239",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv200412239C/abstract",
                                "https://www.cc.gatech.edu/~dyang888/docs/mixtext_acl_2020.pdf",
                                "https://www.aclweb.org/anthology/2020.acl-main.194/",
                                "https://openreview.net/forum?id=5MTi6yj4-No"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/MixText%3A-Linguistically-Informed-Interpolation-of-Chen-Yang/ae2c03cbe6162dadf65edd2ff7dfc5333524dca5"
                            ],
                            "context_in_section": "A bottleneck to using MIXUP for NLP tasks was the requirement of continuous inputs. This has been overcome by mixing embeddings or higher hidden layers (Chen et al., 2020c)."
                        },
                        {
                            "title": "SpeechMix — Augmenting Deep Sound Recognition Using Hidden Space Interpolations",
                            "year": 2020,
                            "author": "Jindal et al.",
                            "abstract": "This paper presents SpeechMix, a regularization and data augmentation technique for deep sound recognition. Our strategy is to create virtual training samples by interpolating speech samples in hidden space. SpeechMix has the potential to generate an infinite number of new augmented speech samples since the combination of speech samples is continuous. Thus, it allows downstream models to avoid overfitting drastically. Unlike other mixing strategies that only work on the input space, we apply our method on the intermediate layers to capture a broader representation of the feature space. Through an extensive quantitative evaluation, we demonstrate the effectiveness of SpeechMix in comparison to standard learning regimes and previously applied mixing strategies. Furthermore, we highlight how different hidden layers contribute to the improvements in classification using an ablation study. ",
                            "urls_google_scholar": [
                                "https://www.isca-speech.org/archive/interspeech_2020/jindal20_interspeech.html",
                                "https://www.isca-speech.org/archive/pdfs/interspeech_2020/jindal20_interspeech.pdf",
                                "http://www.interspeech2020.org/uploadfile/pdf/Mon-2-8-10.pdf",
                                "https://isca-speech.org/archive/Interspeech_2020/pdfs/3147.pdf",
                                "https://indico2.conference4me.psnc.pl/event/35/contributions/2910/attachments/559/586/Mon-2-8-10.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/SpeechMix-Augmenting-Deep-Sound-Recognition-Using-Jindal-Ranganatha/a50680d557f74456eda52aa30f96c87b77c83661"
                            ],
                            "context_in_section": "Later variants propose speech-tailored mixing schemes (Jindal et al., 2020b) and interpolation with adversarial examples (Cheng et al., 2020), among others."
                        },
                        {
                            "title": "AdvAug: Robust Adversarial Augmentation for Neural Machine Translation",
                            "year": 2020,
                            "author": "Cheng et al.",
                            "abstract": "In this paper, we propose a new adversarial augmentation method for Neural Machine Translation (NMT). The main idea is to minimize the vicinal risk over virtual sentences sampled from two vicinity distributions, in which the crucial one is a novel vicinity distribution for adversarial sentences that describes a smooth interpolated embedding space centered around observed training sentence pairs. We then discuss our approach, AdvAug, to train NMT models using the embeddings of virtual sentences in sequence-to-sequence learning. Experiments on Chinese-English, English-French, and English-German translation benchmarks show that AdvAug achieves significant improvements over theTransformer (up to 4.9 BLEU points), and substantially outperforms other data augmentation techniques (e.g.back-translation) without using extra corpora.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.acl-main.529/",
                                "https://arxiv.org/abs/2006.11834",
                                "http://www.lujiang.info/resources/advaug_acl20.pdf",
                                "https://www.researchgate.net/profile/Lu-Jiang-7/publication/342377786_AdvAug_Robust_Adversarial_Augmentation_for_Neural_Machine_Translation/links/5efac0f9299bf18816f36933/AdvAug-Robust-Adversarial-Augmentation-for-Neural-Machine-Translation.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv200611834C/abstract",
                                "https://www.aclweb.org/anthology/2020.acl-main.529/",
                                "https://research.google/pubs/pub49123/",
                                "https://www.aclweb.org/anthology/2020.acl-main.529v2.pdf"
                            ],
                            "urls_semantic_scholar":[
                                 "https://www.semanticscholar.org/paper/AdvAug%3A-Robust-Adversarial-Augmentation-for-Neural-Cheng-Jiang/1e7d3a9846da556bc7b84ae1410d257b89448c30"
                            ],
                            "context_in_section": "Later variants propose speech-tailored mixing schemes (Jindal et al., 2020b) and interpolation with adversarial examples (Cheng et al., 2020), among others."
                        },
                        {
                            "title": "Sequence-Level Mixed Sample Data Augmentation",
                            "year": 2020,
                            "author": "Guo et al.",
                            "abstract": "Despite their empirical success, neural networks still have difficulty capturing compositional aspects of natural language. This work proposes a simple data augmentation approach to encourage compositional behavior in neural models for sequence-to-sequence problems. Our approach, SeqMix, creates new synthetic examples by softly combining input/output sequences from the training set. We connect this approach to existing techniques such as SwitchOut and word dropout, and show that these techniques are all essentially approximating variants of a single objective. SeqMix consistently yields approximately 1.0 BLEU improvement on five different translation datasets over strong Transformer baselines. On tasks that require strong compositional generalization such as SCAN and semantic parsing, SeqMix also offers further improvements.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.emnlp-main.447/",
                                "https://aclanthology.org/2020.emnlp-main.447.pdf",
                                "https://arxiv.org/abs/2011.09039",
                                "https://par.nsf.gov/servlets/purl/10214139",
                                "https://openreview.net/forum?id=Q3GaKpNP6f",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv201109039G/abstract",
                                "https://www.aclweb.org/anthology/2020.emnlp-main.447.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Sequence-level-Mixed-Sample-Data-Augmentation-Guo-Kim/106fb432d2b62f3824a9d6f4a1b30e1f8b6ea9d7"
                            ],
                            "context_in_section": "SEQ2MIXUP (Guo et al., 2020) generalizes MIXUP for sequence transduction tasks in two ways - the \"hard\" version samples a binary mask (from a Bernoulli with a β(α, α) prior) and picks from one of two sequences at each token position, while the \"soft\" version softly interpolates between sequences based on a coefficient sampled from β(α, α). The \"soft\" version is found to outperform the \"hard\" version and earlier interpolation-based techniques like SWITCHOUT (Wang et al., 2018a)"
                        },
                        {
                            "title": "SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation",
                            "year": 2019,
                            "author": "Wang et al.",
                            "abstract": "In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). We formulate the design of a data augmentation policy with desirable properties as an optimization problem, and derive a generic analytic solution. This solution not only subsumes some existing augmentation schemes, but also leads to an extremely simple data augmentation strategy for NMT: randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies. We name this method SwitchOut. Experiments on three translation datasets of different scales show that SwitchOut yields consistent improvements of about 0.5 BLEU, achieving better or comparable performances to strong alternatives such as word dropout (Sennrich et al., 2016a). Code to implement this method is included in the appendix.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/D18-1100/",
                                "https://arxiv.org/abs/1808.07512",
                                "https://openreview.net/forum?id=SyWw-Mfd-B",
                                "https://www.aclweb.org/anthology/D18-1100.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180807512W/abstract",
                                "https://research.google/pubs/pub47719/",
                                "https://aclanthology.org/D18-1100.pdf",
                                "https://www.aclweb.org/anthology/D18-1100/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/SwitchOut%3A-an-Efficient-Data-Augmentation-Algorithm-Wang-Pham/0ee468b9b709a2610c4b574d67218e7960350224"
                            ],
                            "context_in_section": "CONTEXT"
                        }
                    ]
                },
                {
                    "title": "Model-Based Techniques",
                    "articles": [
                        {
                            "title": "Improving Neural Machine Translation Models with Monolingual Data",
                            "year": 2016,
                            "author": "Sennirch et al.",
                            "abstract": "     Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training. Target-side monolingual data plays an important role in boosting fluency for phrase-based statistical machine translation, and we investigate the use of monolingual data for NMT. In contrast to previous work, which combines NMT models with separately trained language models, we note that encoder-decoder NMT architectures already have the capacity to learn the same information as a language model, and we explore strategies to train with monolingual data without changing the neural network architecture. By pairing monolingual training data with an automatic back-translation, we can treat it as additional parallel training data, and we obtain substantial improvements on the WMT 15 task English<->German (+2.8-3.7 BLEU), and for the low-resourced IWSLT 14 task Turkish->English (+2.1-3.4 BLEU), obtaining new state-of-the-art results. We also show that fine-tuning on in-domain monolingual and parallel data gives substantial improvements for the IWSLT 15 task English->German. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1511.06709",
                                "https://u.cs.biu.ac.il/~yogo/courses/mt2016/papers/rico-monolingual-hack.pdf",
                                "https://openreview.net/forum?id=rJWqh3lu-B",
                                "https://aclanthology.org/P16-1009.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2015arXiv151106709S/abstract",
                                "https://www.researchgate.net/profile/Rico-Sennrich/publication/284476349_Improving_Neural_Machine_Translation_Models_with_Monolingual_Data/links/60744f0292851c8a7bc03331/Improving-Neural-Machine-Translation-Models-with-Monolingual-Data.pdf",
                                "http://personeltest.ru/aways/www.aclweb.org/anthology/P16-1009.pdf",
                                "https://www.research.ed.ac.uk/en/publications/improving-neural-machine-translation-models-with-monolingual-data",
                                "https://www.aclweb.org/anthology/P16-1009.pdf",
                                "http://www.qt21.eu/wp-content/uploads/2018/08/qt21-d2-4.pdf#page=88",
                                "https://www.aclweb.org/anthology/P16-1009/",
                                "http://u.cs.biu.ac.il/~yogo/courses/mt2016/papers/rico-monolingual-hack.pdf",
                                "https://www.research.ed.ac.uk/en/publications/improving-neural-machine-translation-models-with-monolingual-data/projects/?status=FINISHED"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Improving-Neural-Machine-Translation-Models-with-Sennrich-Haddow/f3b96ef2dc1fc5e14982f1b963db8db6a54183bb"
                            ],
                            "context_in_section": "Seq2seq and language models have also been used for DA. The popular BACKTRANSLATION method (Sennrich et al., 2016) translates a sequence into another language and then back into the original language"
                        },
                        {
                            "title": "Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation",
                            "year": 2019,
                            "author": "Kumar et al.",
                            "abstract": "Inducing diversity in the task of paraphrasing is an important problem in NLP with applications in data augmentation and conversational agents. Previous paraphrasing approaches have mainly focused on the issue of generating semantically similar paraphrases while paying little attention towards diversity. In fact, most of the methods rely solely on top-k beam search sequences to obtain a set of paraphrases. The resulting set, however, contains many structurally similar sentences. In this work, we focus on the task of obtaining highly diverse paraphrases while not compromising on paraphrasing quality. We provide a novel formulation of the problem in terms of monotone submodular function maximization, specifically targeted towards the task of paraphrasing. Additionally, we demonstrate the effectiveness of our method for data augmentation on multiple tasks such as intent classification and paraphrase recognition. In order to drive further research, we have made the source code available.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/N19-1363/",
                                "https://aclanthology.org/N19-1363.pdf",
                                "https://www.aclweb.org/anthology/N19-1363.pdf",
                                "https://openreview.net/forum?id=rkbyNXWOZH",
                                "https://www.aclweb.org/anthology/N19-1363/",
                                "http://aclanthology.lst.uni-saarland.de/N19-1363.pdf",
                                "http://eprints.iisc.ac.in/id/eprint/65675",
                                "http://eprints.iisc.ac.in/65675/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Submodular-Optimization-based-Diverse-Paraphrasing-Kumar-Bhattamishra/1230e896fb702e81e165b642e0112370443167f8"
                            ],
                            "context_in_section": "Kumar et al. (2019a) train seq2seq models with their proposed method DiPS which learns to generate diverse paraphrases of input text using a modified decoder with a submodular objective, and show its effectiveness as DA for several classification tasks."
                        },
                        {
                            "title": "Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations",
                            "year": 2018,
                            "author": "Kobayashi",
                            "abstract": "We propose a novel data augmentation for labeled sentences called contextual augmentation. We assume an invariance that sentences are natural even if the words in the sentences are replaced with other words with paradigmatic relations. We stochastically replace words with other words that are predicted by a bi-directional language model at the word positions. Words predicted according to a context are numerous but appropriate for the augmentation of the original words. Furthermore, we retrofit a language model with a label-conditional architecture, which allows the model to augment sentences without breaking the label-compatibility. Through the experiments for six various different text classification tasks, we demonstrate that the proposed method improves classifiers based on the convolutional or recurrent neural networks.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/N18-2072/",
                                "https://aclanthology.org/N18-2072.pdf",
                                "https://arxiv.org/abs/1805.06201",
                                "https://openreview.net/forum?id=H1bXoXWO-S",
                                "https://www.aclweb.org/anthology/N18-2072.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180506201K/abstract",
                                "https://www.aclweb.org/anthology/N18-2072/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Contextual-Augmentation%3A-Data-Augmentation-by-Words-Kobayashi/78a5efc6d0ef9a1e4cfd375e189474305d0b099f"
                            ],
                            "context_in_section": "Kobayashi (2018) generate augmented examples by replacing words with others randomly drawn according to the recurrent language model’s distribution based on the current context "
                        },
                        {
                            "title": "Generative Data Augmentation for Commonsense Reasoning",
                            "year": 2020,
                            "author": "Yang et al.",
                            "abstract": "Recent advances in commonsense reasoning depend on large-scale human-annotated training sets to achieve peak performance. However, manual curation of training sets is expensive and has been shown to introduce annotation artifacts that neural models can readily exploit and overfit to. We propose a novel generative data augmentation technique, G-DAUGˆC, that aims to achieve more accurate and robust learning in a low-resource setting. Our approach generates synthetic examples using pretrained language models and selects the most informative and diverse set of examples for data augmentation. On experiments with multiple commonsense reasoning benchmarks, G-DAUGˆC consistently outperforms existing data augmentation methods based on back-translation, establishing a new state-of-the-art on WinoGrande, CODAH, and CommonsenseQA, as well as enhances out-of-distribution generalization, proving to be robust against adversaries or perturbations. Our analysis demonstrates that G-DAUGˆC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.findings-emnlp.90/",
                                "https://aclanthology.org/2020.findings-emnlp.90.pdf",
                                "https://arxiv.org/abs/2004.11546",
                                "http://114.215.220.151:8000/20200427/G-DAUG-%20Generative%20Data%20Augmentation%20for%20Commonsense%20Reasoning.pdf",
                                "https://openreview.net/forum?id=nArVGlY9g_D",
                                "https://pdfs.semanticscholar.org/f7bc/987a820fcf83263f69bd8d938a28d8c5dc44.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv200411546Y/abstract",
                                "https://aclanthology.org/2020.findings-emnlp.90.pdf",
                                "https://openreview.net/forum?id=qnOgoTvOllo"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Generative-Data-Augmentation-for-Commonsense-Yang-Malaviya/1b8a5210f5ec202d3c6b880f389ca9ddc6fbaf9a",
                                "https://www.semanticscholar.org/paper/G-DAug%3A-Generative-Data-Augmentation-for-Reasoning-Yang-Malaviya/509a275d2563e08a193d4b032f43dd9eb9e6c575"
                            ],
                            "context_in_section": "Yang et al. (2020) propose GDAUGc which generates synthetic examples using pretrained transformer language models, and selects the most informative and diverse set for augmentation."
                        },
                        {
                            "title": "Soft Contextual Data Augmentation for Neural Machine Translation",
                            "year": 2019,
                            "author": "Gao et al.",
                            "abstract": "While data augmentation is an important trick to boost the accuracy of deep learning methods in computer vision tasks, its study in natural language tasks is still very limited. In this paper, we present a novel data augmentation method for neural machine translation.Different from previous augmentation methods that randomly drop, swap or replace words with other words in a sentence, we softly augment a randomly chosen word in a sentence by its contextual mixture of multiple related words. More accurately, we replace the one-hot representation of a word by a distribution (provided by a language model) over the vocabulary, i.e., replacing the embedding of this word by a weighted combination of multiple semantically similar words. Since the weights of those words depend on the contextual information of the word to be replaced,the newly generated sentences capture much richer information than previous augmentation methods. Experimental results on both small scale and large scale machine translation data sets demonstrate the superiority of our method over strong baselines.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/P19-1555/",
                                "https://aclanthology.org/P19-1555.pdf",
                                "https://www.aclweb.org/anthology/P19-1555/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Soft-Contextual-Data-Augmentation-for-Neural-Zhu-Gao/258e92bd6ceaeb78e7384eeea57b4c7a2c356cfa"
                            ],
                            "context_in_section": "Gao et al. (2019) advocate retaining the full distribution through \"soft\" augmented examples, showing gains on machine translation."
                        },
                        {
                            "title": "Named Entity Recognition for Social Media Texts with Semantic Augmentation",
                            "year": 2020,
                            "author": "Nie et al.",
                            "abstract": "Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts, especially user-generated social media content. Semantic augmentation is a potential way to alleviate this problem. Given that rich semantic information is implicitly preserved in pre-trained word embeddings, they are potential ideal resources for semantic augmentation. In this paper, we propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account. In particular, we obtain the augmented semantic information from a large-scale corpus, and propose an attentive semantic augmentation module and a gate module to encode and aggregate such information, respectively. Extensive experiments are performed on three benchmark datasets collected from English and Chinese social media platforms, where the results demonstrate the superiority of our approach to previous studies across all three datasets.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.emnlp-main.107/",
                                "https://aclanthology.org/2020.emnlp-main.107.pdf",
                                "https://arxiv.org/abs/2010.15458",
                                "https://openreview.net/pdf?id=JsGhcAjBoPI",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv201015458N/abstract",
                                "https://www.aclweb.org/anthology/2020.emnlp-main.107.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Named-Entity-Recognition-for-Social-Media-Texts-Nie-Tian/947ae5547af5bc418414e843e3d0a833c54c56dd"
                            ],
                            "context_in_section": "Nie et al. (2020) augment word representations with a context-sensitive attention-based mixture of their semantic neighbors from a pretrained embedding space, and show its effectiveness for NER on social media text."
                        },
                        {
                            "title": "SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness",
                            "year": 2020,
                            "author": "Ng et al.",
                            "abstract": "Models that perform well on a training domain often fail to generalize to out-of-domain (OOD) examples. Data augmentation is a common method used to prevent overfitting and improve OOD generalization. However, in natural language, it is difficult to generate new examples that stay on the underlying data manifold. We introduce SSMBA, a data augmentation method for generating synthetic training examples by using a pair of corruption and reconstruction functions to move randomly on a data manifold. We investigate the use of SSMBA in the natural language domain, leveraging the manifold assumption to reconstruct corrupted text with masked language models. In experiments on robustness benchmarks across 3 tasks and 9 datasets, SSMBA consistently outperforms existing data augmentation methods and baseline models on both in-domain and OOD data, achieving gains of 0.8% on OOD Amazon reviews, 1.8% accuracy on OOD MNLI, and 1.4 BLEU on in-domain IWSLT14 German-English.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.emnlp-main.97/",
                                "https://aclanthology.org/2020.emnlp-main.97.pdf",
                                "https://arxiv.org/abs/2009.10195",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv200910195N/abstract",
                                "https://openreview.net/pdf?id=V3XIE0-wBol",
                                "https://www.aclweb.org/anthology/2020.emnlp-main.97.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/SSMBA%3A-Self-Supervised-Manifold-Based-Data-for-Ng-Cho/67f343b5212d3a71965d2c217bb567f8af0bbcdb"
                            ],
                            "context_in_section": "Inspired by denoising autoencoders, Ng et al. (2020) use a corrupt-and-reconstruct approach, with the corruption function q(x′|x) masking an arbitrary number of word positions and the reconstruction function r(x|x′) unmasking them using BERT (Devlin et al., 2019). Their approach works well on domain-shifted test sets across 9 datasets on sentiment, NLI, and NMT."
                        },
                        {
                            "title": "Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic Text Exchange",
                            "year": 2019,
                            "author": "Feng et al.",
                            "abstract": "In this paper, we present a novel method for measurably adjusting the semantics of text while preserving its sentiment and fluency, a task we call semantic text exchange. This is useful for text data augmentation and the semantic correction of text generated by chatbots and virtual assistants. We introduce a pipeline called SMERTI that combines entity replacement, similarity masking, and text infilling. We measure our pipeline’s success by its Semantic Text Exchange Score (STES): the ability to preserve the original text’s sentiment and fluency while adjusting semantic content. We propose to use masking (replacement) rate threshold as an adjustable parameter to control the amount of semantic change in the text. Our experiments demonstrate that SMERTI can outperform baseline models on Yelp reviews, Amazon reviews, and news headlines.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/D19-1272/",
                                "https://aclanthology.org/D19-1272.pdf",
                                "https://arxiv.org/abs/1909.00088",
                                "https://www.aclweb.org/anthology/D19-1272/",
                                "https://www.researchgate.net/profile/Steven-Feng/publication/335599220_Keep_Calm_and_Switch_On_Preserving_Sentiment_and_Fluency_in_Semantic_Text_Exchange/links/5d8fc77b92851c33e9462f6f/Keep-Calm-and-Switch-On-Preserving-Sentiment-and-Fluency-in-Semantic-Text-Exchange.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190900088F/abstract"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Keep-Calm-and-Switch-On!-Preserving-Sentiment-and-Feng-Li/00ea88920eca898909bd8dd455df25ec7313e3e8",
                                "https://www.semanticscholar.org/paper/Keep-Calm-and-Switch-On-!-Preserving-Sentiment-and/a3cf599683ddf379c8e57794f74111eabc82b859"
                            ],
                            "context_in_section": "Feng et al. (2019) propose a task called SEMANTIC TEXT EXCHANGE (STE) which involves adjusting the overall semantics of a text to fit the context of a new word/phrase that is inserted called the replacement entity (RE). They do so by using a system called SMERTI and a masked LM approach. While not proposed directly for DA, it can be used as such, as investigated in Feng et al. (2020)."
                        },
                        {
                            "title": "GenAug: Data Augmentation for Finetuning Text Generators",
                            "year": 2020,
                            "author": "Feng et al.",
                            "abstract": "In this paper, we investigate data augmentation for text generation, which we call GenAug. Text generation and language modeling are important tasks within natural language processing, and are especially challenging for low-data regimes. We propose and evaluate various augmentation methods, including some that incorporate external knowledge, for finetuning GPT-2 on a subset of Yelp Reviews. We also examine the relationship between the amount of augmentation and the quality of the generated text. We utilize several metrics that evaluate important aspects of the generated text including its diversity and fluency. Our experiments demonstrate that insertion of character-level synthetic noise and keyword replacement with hypernyms are effective augmentation methods, and that the quality of generations improves to a peak at approximately three times the amount of original data.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.deelio-1.4/",
                                "https://aclanthology.org/2020.deelio-1.4.pdf",
                                "https://arxiv.org/abs/2010.01794",
                                "https://www.researchgate.net/profile/Steven-Feng/publication/347235611_GenAug_Data_Augmentation_for_Finetuning_Text_Generators/links/603ca4084585158939d9939f/GenAug-Data-Augmentation-for-Finetuning-Text-Generators.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv201001794F/abstract",
                                "https://www.aclweb.org/anthology/2020.deelio-1.4/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/GenAug%3A-Data-Augmentation-for-Finetuning-Text-Feng-Gangal/c299a4083443bea26188567979f20b8305554c0b"
                            ],
                            "context_in_section": "Feng et al. (2019) propose a task called SEMANTIC TEXT EXCHANGE (STE) which involves adjusting the overall semantics of a text to fit the context of a new word/phrase that is inserted called the replacement entity (RE). They do so by using a system called SMERTI and a masked LM approach. While not proposed directly for DA, it can be used as such, as investigated in Feng et al. (2020)."
                        },
                        {
                            "title": "Do Not Have Enough Data? Deep Learning to the Rescue! ",
                            "year": 2020,
                            "author": "Anaby-Tavor et al.",
                            "abstract": "Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1911.03118",
                                "https://ojs.aaai.org/index.php/AAAI/article/view/6233",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv191103118A/abstract"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/e4f24dee6921b7de0cdca1ef8c8cef25d0ff24b6",
                                "https://www.semanticscholar.org/paper/Do-Not-Have-Enough-Data-Deep-Learning-to-the-Anaby-Tavor-Carmeli/e4f24dee6921b7de0cdca1ef8c8cef25d0ff24b6"
                            ],
                            "context_in_section": "Rather than starting from an existing example and modifying it, some model-based DA approaches directly estimate a generative process from the training set and sample from it. Anaby-Tavor et al. (2020) learn a label-conditioned generator by finetuning GPT-2 (Radford et al., 2019) on the training data, using this to generate candidate examples per class. A classifier trained on the original training set is then used to select top k candidate examples which confidently belong to the respective class for augmentation."
                        },
                        {
                            "title": "Textual Data Augmentation for Efficient Active Learning on Tiny Datasets",
                            "year": 2020,
                            "author": "Quteineh et al.",
                            "abstract": "In this paper we propose a novel data augmentation approach where guided outputs of a language generation model, e.g. GPT-2, when labeled, can improve the performance of text classifiers through an active learning process. We transform the data generation task into an optimization problem which maximizes the usefulness of the generated output, using Monte Carlo Tree Search (MCTS) as the optimization strategy and incorporating entropy as one of the optimization criteria. We test our approach against a Non-Guided Data Generation (NGDG) process that does not optimize for a reward function. Starting with a small set of data, our results show an increased performance with MCTS of 26% on the TREC-6 Questions dataset, and 10% on the Stanford Sentiment Treebank SST-2 dataset. Compared with NGDG, we are able to achieve increases of 3% and 5% on TREC-6 and SST-2.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.emnlp-main.600/",
                                "https://aclanthology.org/2020.emnlp-main.600.pdf",
                                "http://repository.essex.ac.uk/29084/1/2020.emnlp-main.600.pdf",
                                "http://repository.essex.ac.uk/29084/",
                                "https://core.ac.uk/download/pdf/349055548.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Textual-Data-Augmentation-for-Efficient-Active-on-Quteineh-Samothrakis/667b3fa29fa2563c22d59684bf84548aac897abf"
                            ],
                            "context_in_section": "Quteineh et al. (2020) use a similar label-conditioned GPT-2 generation method, and demonstrate its effectiveness as a DA method in an active learning setup."
                        },
                        {
                            "title": "Adversarial Example Generation with Syntactically Controlled Paraphrase Networks",
                            "year": 2018,
                            "author": "Iyyer et al.",
                            "abstract": "We propose syntactically controlled paraphrase networks (SCPNs) and use them to generate adversarial examples. Given a sentence and a target syntactic form (e.g., a constituency parse), SCPNs are trained to produce a paraphrase of the sentence with the desired syntax. We show it is possible to create training data for this task by first doing backtranslation at a very large scale, and then using a parser to label the syntactic transformations that naturally occur during this process. Such data allows us to train a neural encoder-decoder model with extra inputs to specify the target syntax. A combination of automated and human evaluations show that SCPNs generate paraphrases that follow their target specifications without decreasing paraphrase quality when compared to baseline (uncontrolled) paraphrase systems. Furthermore, they are more capable of generating syntactically adversarial examples that both (1) “fool” pretrained models and (2) improve the robustness of these models to syntactic variation when used to augment their training data.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/N18-1170/",
                                "https://aclanthology.org/N18-1170.pdf",
                                "https://arxiv.org/abs/1804.06059",
                                "https://www.aclweb.org/anthology/N18-1170.pdf",
                                "https://www.researchgate.net/profile/Kevin-Gimpel/publication/325447063_Adversarial_Example_Generation_with_Syntactically_Controlled_Paraphrase_Networks/links/5b2d39620f7e9b0df5be6764/Adversarial-Example-Generation-with-Syntactically-Controlled-Paraphrase-Networks.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180406059I/abstract",
                                "https://openreview.net/forum?id=rkWMXQZ_bB",
                                "https://www.aclweb.org/anthology/N18-1170/",
                                "http://www.cs.cmu.edu/~jwieting/wieting2018Controlled.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Adversarial-Example-Generation-with-Syntactically-Iyyer-Wieting/2b110fce160468eb179b6c43ea27e098757a56dd"
                            ],
                            "context_in_section": "Other approaches include syntactic or controlled paraphrasing (Iyyer et al., 2018; Kumar et al., 2020)"
                        },
                        {
                            "title": "Syntax-Guided Controlled Generation of Paraphrases ",
                            "year": 2020,
                            "author": "Kumar et al.",
                            "abstract": "Given a sentence (e.g., “I like mangoes”) and a constraint (e.g., sentiment flip), the goal of controlled text generation is to produce a sentence that adapts the input sentence to meet the requirements of the constraint (e.g., “I hate mangoes”). Going beyond such simple constraints, recent work has started exploring the incorporation of complex syntactic-guidance as constraints in the task of controlled paraphrase generation. In these methods, syntactic-guidance is sourced from a separate exemplar sentence. However, these prior works have only utilized limited syntactic information available in the parse tree of the exemplar sentence. We address this limitation in the paper and propose Syntax Guided Controlled Paraphraser (SGCP), an end-to-end framework for syntactic paraphrase generation. We find that Sgcp can generate syntax-conforming sentences while not compromising on relevance. We perform extensive automated and human evaluations over multiple real-world English language datasets to demonstrate the efficacy of Sgcp over state-of-the-art baselines. To drive future research, we have made Sgcp’s source code available.1",
                            "urls_google_scholar": [
                                "https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00318/96454/Syntax-Guided-Controlled-Generation-of-Paraphrases",
                                "https://direct.mit.edu/tacl/article-abstract/doi/10.1162/tacl_a_00318/96454",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv200508417K/abstract",
                                "https://www.aclweb.org/anthology/2020.tacl-1.22/",
                                "https://transacl.org/index.php/tacl/article/view/1967",
                                "https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00318",
                                "https://arxiv.org/abs/2005.08417",
                                "https://transacl.org/ojs/index.php/tacl/article/view/1967"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Syntax-Guided-Controlled-Generation-of-Paraphrases-Kumar-Ahuja/1b9bb6dabb084e0ec06c3956d8384312880bb28e"
                            ],
                            "context_in_section": "Other approaches include syntactic or controlled paraphrasing (Iyyer et al., 2018; Kumar et al., 2020)"
                        },
                        {
                            "title": "NAREOR: The Narrative Reordering Problem",
                            "year": 2021,
                            "author": "Gangal et al.",
                            "abstract": "Many implicit inferences exist in text depending on how it is structured that can critically impact the text's interpretation and meaning. One such structural aspect present in text with chronology is the order of its presentation. For narratives or stories, this is known as the narrative order. Reordering a narrative can impact the temporal, causal, event-based, and other inferences readers draw from it, which in turn can have strong effects both on its interpretation and interestingness. In this paper, we propose and investigate the task of Narrative Reordering (NAREOR) which involves rewriting a given story in a different narrative order while preserving its plot. We present a dataset, NAREORC, with human rewritings of stories within ROCStories in non-linear orders, and conduct a detailed analysis of it. Further, we propose novel task-specific training methods with suitable evaluation metrics. We perform experiments on NAREORC using state-of-the-art models such as BART and T5 and conduct extensive automatic and human evaluations. We demonstrate that although our models can perform decently, NAREOR is a challenging task with potential for further exploration. We also investigate two applications of NAREOR: generation of more interesting variations of stories and serving as adversarial sets for temporal/event-related tasks, besides discussing other prospective ones, such as for pedagogical setups related to language skills like essay writing and applications to medicine involving clinical narratives. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2104.06669",
                                "https://arxiv.org/pdf/2104.06669",
                                "https://ui.adsabs.harvard.edu/abs/2021arXiv210406669G/abstract"
                            ],
                            "context_in_section": "document-level paraphrasing (Gangal et al.,2021)",
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/NAREOR%3A-The-Narrative-Reordering-Problem-Gangal-Feng/f44514bc40cbb1a5a68f9edce238a7dd116298b5"
                            ]
                        },
                        {
                            "title": "Counterexample-Guided Data Augmentation",
                            "year": 2018,
                            "author": "Dreossi et al.",
                            "abstract": "We present a novel framework for augmenting data sets for machine learning based on counterexamples. Counterexamples are misclassified examples that have important properties for retraining and improving the model. Key components of our framework include a counterexample generator, which produces data items that are misclassified by the model and error tables, a novel data structure that stores information pertaining to misclassifications. Error tables can be used to explain the model's vulnerabilities and are used to efficiently generate counterexamples for augmentation. We show the efficacy of the proposed framework by comparing it to classical augmentation techniques on a case study of object detection in autonomous driving based on deep neural networks. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1805.06962",
                                "https://arxiv.org/pdf/1805.06962.pdf",
                                "https://people.eecs.berkeley.edu/~sseshia/pubdir/ijcai18.pdf",
                                "https://dl.acm.org/doi/abs/10.5555/3304889.3304947",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180506962D/abstract",
                                "https://par.nsf.gov/biblio/10109211",
                                "https://openreview.net/forum?id=B1bfCGM_WH",
                                "https://escholarship.org/content/qt7wz6t9k2/qt7wz6t9k2.pdf",
                                "https://www.ijcai.org/proceedings/2018/0286.pdf",
                                "https://escholarship.org/content/qt7wz6t9k2/qt7wz6t9k2_noSplash_248a8d61c4a5fcbc283ecbc6851e4a2a.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Counterexample-Guided-Data-Augmentation-Dreossi-Ghosh/bed848fabbc22707a4130bb2a86d78ff7085a1aa"
                            ],
                            "context_in_section": "augmenting misclassified examples (Dreossi et al., 2018"
                        },
                        {
                            "title":"Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks",
                            "year": 2021,
                            "author": "Thakur et al.",
                            "abstract": "     There are two approaches for pairwise sentence scoring: Cross-encoders, which perform full-attention over the input pair, and Bi-encoders, which map each input independently to a dense vector space. While cross-encoders often achieve higher performance, they are too slow for many practical use cases. Bi-encoders, on the other hand, require substantial training data and fine-tuning over the target task to achieve competitive performance. We present a simple yet efficient data augmentation strategy called Augmented SBERT, where we use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder. We show that, in this process, selecting the sentence pairs is non-trivial and crucial for the success of the method. We evaluate our approach on multiple tasks (in-domain) as well as on a domain adaptation task. Augmented SBERT achieves an improvement of up to 6 points for in-domain and of up to 37 points for domain adaptation tasks compared to the original bi-encoder performance. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2010.08240",
                                "https://arxiv.org/pdf/2010.08240.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv201008240T/abstract",
                                "https://aclanthology.org/2021.naacl-main.28.pdf",
                                "https://www.aclweb.org/anthology/2021.naacl-main.28/"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Augmented-SBERT%3A-Data-Augmentation-Method-for-for-Thakur-Reimers/496658ac0483942a8720407f16bb94227f5627fe"
                            ],
                            "context_in_section": " BERT cross-encoder labeling of new inputs (Thakur et al., 2021)"
                        },
                        {
                            "title": "Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation",
                            "year": 2020,
                            "author": "Liu et al.",
                            "abstract": "Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2012.02952",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv201202952L/abstract",
                                "https://www.aclweb.org/anthology/2020.emnlp-main.726.pdf",
                                "https://aclanthology.org/2020.emnlp-main.726.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Data-Boost%3A-Text-Data-Augmentation-through-Learning-Liu-Xu/61e2f89f902fdc3c5e155504c74adb6621add442"
                            ],
                            "context_in_section": "guided generation using large-scale generative language models (Liu et al., 2020b,c)"
                        },
                        {
                            "title": "Learning Data Manipulation for Augmentation and Weighting",
                            "year": 2019,
                            "author": "Hu et al.",
                            "abstract": "     Manipulating data, such as weighting data examples or augmenting with new instances, has been increasingly used to improve model training. Previous work has studied various rule- or learning-based approaches designed for specific types of data manipulation. In this work, we propose a new method that supports learning different manipulation schemes with the same gradient-based algorithm. Our approach builds upon a recent connection of supervised learning and reinforcement learning (RL), and adapts an off-the-shelf reward learning algorithm from RL for joint data manipulation learning and model training. Different parameterization of the \"data reward\" function instantiates different manipulation schemes. We showcase data augmentation that learns a text transformation network, and data weighting that dynamically adapts the data sample importance. Experiments show the resulting algorithms significantly improve the image and text classification performance in low data regime and class-imbalance problems. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1910.12795",
                                "https://arxiv.org/pdf/1910.12795.pdf",
                                "https://pdfs.semanticscholar.org/1453/46191bff2e0b6268afbfcef40928aaf1953a.pdf",
                                "https://papers.nips.cc/paper/9706-learning-data-manipulation-for-augmentation-and-weighting-supplemental.zip",
                                "http://papers.neurips.cc/paper/9706-learning-data-manipulation-for-augmentation-and-weighting.pdf",
                                "http://postersession.ai.s3.amazonaws.com/9e82c14d-cc72-4651-8723-4c21eb1c9803.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv191012795H/abstract"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Learning-Data-Manipulation-for-Augmentation-and-Hu-Tan/16908e1d9b47ebca816d8cc92c5aa101eb7d7605"
                            ],
                            "context_in_section": "and automated text augmentation (Hu et al., 2019; Cai et al., 2020)"
                        },
                        {
                            "title": "Data Manipulation: Towards Effective Instance Learning for Neural Dialogue Generation via Learning to Augment and Reweight",
                            "year": 2020,
                            "author": "Cai et al.",
                            "abstract": "Current state-of-the-art neural dialogue models learn from human conversations following the data-driven paradigm. As such, a reliable training corpus is the crux of building a robust and well-behaved dialogue model. However, due to the open-ended nature of human conversations, the quality of user-generated training data varies greatly, and effective training samples are typically insufficient while noisy samples frequently appear. This impedes the learning of those data-driven neural dialogue models. Therefore, effective dialogue learning requires not only more reliable learning samples, but also fewer noisy samples. In this paper, we propose a data manipulation framework to proactively reshape the data distribution towards reliable samples by augmenting and highlighting effective learning samples as well as reducing the effect of inefficient samples simultaneously. In particular, the data manipulation model selectively augments the training samples and assigns an importance weight to each instance to reform the training data. Note that, the proposed data manipulation framework is fully data-driven and learnable. It not only manipulates training samples to optimize the dialogue generation model, but also learns to increase its manipulation skills through gradient descent with validation samples. Extensive experiments show that our framework can improve the dialogue generation performance with respect to various automatic evaluation metrics and human judgments.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/2020.acl-main.564/",
                                "https://aclanthology.org/2020.acl-main.564.pdf",
                                "https://arxiv.org/abs/2004.02594",
                                "https://www.aclweb.org/anthology/2020.acl-main.564/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Data-Manipulation%3A-Towards-Effective-Instance-for-Cai-Chen/3790b390b07d406fa2e16fa24bd23bc5136ad581"
                            ],
                            "context_in_section": "and automated text augmentation (Hu et al., 2019; Cai et al., 2020)"
                        },
                        {
                            "title": "AutoAugment: Learning Augmentation Strategies From Data",
                            "year": 2019,
                            "author": "Cubuk et al.",
                            "abstract": " Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies. In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. Our method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data). On ImageNet, we attain a Top-1 accuracy of 83.5% which is 0.4% better than the previous record of 83.1%. On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-the-art. Augmentation policies we find are transferable between datasets. The policy learned on ImageNet transfers well to achieve significant improvements on other datasets, such as Oxford Flowers, Caltech-101, Oxford-IIT Pets, FGVC Aircraft, and Stanford Cars. ",
                            "urls_google_scholar": [
                                "https://openaccess.thecvf.com/content_CVPR_2019/html/Cubuk_AutoAugment_Learning_Augmentation_Strategies_From_Data_CVPR_2019_paper.html",
                                "https://openaccess.thecvf.com/content_CVPR_2019/papers/Cubuk_AutoAugment_Learning_Augmentation_Strategies_From_Data_CVPR_2019_paper.pdf",
                                "https://ieeexplore.ieee.org/abstract/document/8953317/",
                                "http://personeltest.ru/aways/arxiv.org/pdf/1805.09501.pdf",
                                "https://www.computer.org/csdl/proceedings-article/cvpr/2019/329300a113/1gyrBRBt8ys",
                                "https://openreview.net/forum?id=rQV-I-7xu6r"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/AutoAugment%3A-Learning-Augmentation-Strategies-From-Cubuk-Zoph/21de3a36cb51adc205fad8a1d3d69118891dc3dd"
                            ],
                            "context_in_section": "Models can also learn to combine together simpler DA primitives (Cubuk et al., 2018; Ratner et al., 2017)"
                        },
                        {
                            "title": "Learning to Compose Domain-Specific Transformations for Data Augmentation",
                            "year": 2017,
                            "author": "Ratner et al.",
                            "abstract": " Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, we describe a simple procedure called AutoAugment to automatically search for improved data augmentation policies. In our implementation, we have designed a search space where a policy consists of many sub-policies, one of which is randomly chosen for each image in each mini-batch. A sub-policy consists of two operations, each operation being an image processing function such as translation, rotation, or shearing, and the probabilities and magnitudes with which the functions are applied. We use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. Our method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data). On ImageNet, we attain a Top-1 accuracy of 83.5% which is 0.4% better than the previous record of 83.1%. On CIFAR-10, we achieve an error rate of 1.5%, which is 0.6% better than the previous state-of-the-art. Augmentation policies we find are transferable between datasets. The policy learned on ImageNet transfers well to achieve significant improvements on other datasets, such as Oxford Flowers, Caltech-101, Oxford-IIT Pets, FGVC Aircraft, and Stanford Cars. ",
                            "urls_google_scholar": [
                                "https://proceedings.neurips.cc/paper/2017/file/f26dab9bf6a137c3b6782e562794c2f2-Paper.pdf",
                                "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5786274/",
                                "https://dl.acm.org/doi/abs/10.5555/3294996.3295083",
                                "https://arxiv.org/abs/1709.01643",
                                "https://europepmc.org/article/med/29375240",
                                "https://dawn.cs.stanford.edu/pubs/tandas-nips2017.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2017arXiv170901643R/abstract",
                                "https://openreview.net/forum?id=r1VPoU-_ZB",
                                "https://pubmed.ncbi.nlm.nih.gov/29375240/",
                                "http://papers.neurips.cc/paper/6916-learning-to-compose-domain-specific-transformations-for-data-augmentation.pdf",
                                "http://dawn.cs.stanford.edu/pubs/tandas-nips2017.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Learning-to-Compose-Domain-Specific-Transformations-Ratner-Ehrenberg/1ae265be8ccf396115ef06405f3b8421851998a9"
                            ],
                            "context_in_section": "Models can also learn to combine together simpler DA primitives (Cubuk et al., 2018; Ratner et al., 2017)"
                        },
                        {
                            "title": "Learning The Difference That Makes A Difference With Counterfactually-Augmented Data ",
                            "year": 2019,
                            "author": "Kaushik et al.",
                            "abstract": "Abstract: Despite alarm over the reliance of machine learning systems on so-called spurious patterns, the term lacks coherent meaning in standard statistical frameworks. However, the language of causality offers clarity: spurious associations are due to confounding (e.g., a common cause), but not direct or indirect causal effects. In this paper, we focus on natural language processing, introducing methods and resources  for training models less sensitive to spurious patterns. Given documents and their initial labels, we task humans with revising each document so that it (i) accords with a counterfactual target label; (ii) retains internal coherence;  and (iii) avoids unnecessary changes. Interestingly, on sentiment analysis and natural language inference tasks, classifiers trained on original data fail on their  counterfactually-revised counterparts and vice versa. Classifiers trained on combined datasets  perform remarkably well, just shy of those specialized to either domain. While classifiers trained on either original or manipulated data alone  are sensitive to spurious features (e.g., mentions of genre), models trained on the combined data are less sensitive to this signal. Both datasets are publicly available.",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1909.12434",
                                "https://openreview.net/forum?id=Sklgs0NFvr",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190912434K/abstract",
                                "https://openreview.net/forum?id=Sklgs0NFvr&utm_campaign=NLP%20News&utm_medium=email&utm_source=Revue%20newsletter"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Learning-the-Difference-that-Makes-a-Difference-Kaushik-Hovy/efe9e62b40183da78174f76e875504642e8c87b6"
                            ],
                            "context_in_section": "or add human-in-the-loop (Kaushik et al., 2020, 2021)"
                        },
                        {
                            "title": "Explaining the Efficacy of Counterfactually Augmented Data ",
                            "year": 2020,
                            "author": "Kaushik et al.",
                            "abstract": "Abstract: Despite alarm over the reliance of machine learning systems on so-called spurious patterns, the term lacks coherent meaning in standard statistical frameworks. However, the language of causality offers clarity: spurious associations are due to confounding (e.g., a common cause), but not direct or indirect causal effects. In this paper, we focus on natural language processing, introducing methods and resources  for training models less sensitive to spurious patterns. Given documents and their initial labels, we task humans with revising each document so that it (i) accords with a counterfactual target label; (ii) retains internal coherence;  and (iii) avoids unnecessary changes. Interestingly, on sentiment analysis and natural language inference tasks, classifiers trained on original data fail on their  counterfactually-revised counterparts and vice versa. Classifiers trained on combined datasets  perform remarkably well, just shy of those specialized to either domain. While classifiers trained on either original or manipulated data alone  are sensitive to spurious features (e.g., mentions of genre), models trained on the combined data are less sensitive to this signal. Both datasets are publicly available.",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2010.02114",
                                "http://arxiv-export-lb.library.cornell.edu/abs/2010.02114",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv201002114K/abstract",
                                "https://openreview.net/forum?id=HHiiQKWsOcV"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Explaining-The-Efficacy-of-Data-Kaushik-Setlur/24fcdaf969089e6a411f7cebc9274bbc53c25e42"
                            ],
                            "context_in_section": "or add human-in-the-loop (Kaushik et al., 2020, 2021)"
                        }
                    ]
                }
            ]
        },
        {
            "title":"Applications",
            "subsections": [
                {
                    "title": "Low-Resource Languages",
                    "articles":[
                        {
                            "title": "Generalized Data Augmentation for Low-Resource Translation",
                            "year": 2019,
                            "author": "Xia et al.",
                            "abstract": "Low-resource language pairs with a paucity of parallel data pose challenges for machine translation in terms of both adequacy and fluency. Data augmentation utilizing a large amount of monolingual data is regarded as an effective way to alleviate the problem. In this paper, we propose a general framework of data augmentation for low-resource machine translation not only using target-side monolingual data, but also by pivoting through a related high-resource language. Specifically, we experiment with a two-step pivoting method to convert high-resource data to the low-resource language, making best use of available resources to better approximate the true distribution of the low-resource language. First, we inject low-resource words into high-resource sentences through an induced bilingual dictionary. Second, we further edit the high-resource data injected with low-resource words using a modified unsupervised machine translation framework. Extensive experiments on four low-resource datasets show that under extreme low-resource settings, our data augmentation techniques improve translation quality by up to 1.5 to 8 BLEU points compared to supervised back-translation baselines.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/P19-1579/",
                                "https://aclanthology.org/P19-1579.pdf",
                                "https://arxiv.org/abs/1906.03785",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190603785X/abstract",
                                "http://www.phontron.com/paper/xia19acl.pdf",
                                "https://par.nsf.gov/biblio/10104996",
                                "http://www.phontron.com/paper/xia19acl.pdf",
                                "https://www.aclweb.org/anthology/P19-1579.pdf",
                                "https://www.aclweb.org/anthology/P19-1579/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Generalized-Data-Augmentation-for-Low-Resource-Xia-Kong/595306f993993e44e2c2f674367103f44df03d9b"
                            ],
                            "context_in_section": "There are ways to leverage high-resource languages for low-resource languages, particularly if they have similar linguistic properties. Xia et al. (2019) use this approach to improve low-resource NMT."
                        },
                        {
                            "title": "A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation ",
                            "year": 2020,
                            "author": "Li et al.",
                            "abstract": " One important issue that affects the performance of neural machine translation is the scale of available parallel data. For low-resource languages, the amount of parallel data is not sufficient, which results in poor translation quality. In this paper, we propose a diversity data augmentation method that does not use extra monolingual data. We expand the training data by generating diversity pseudo parallel data on the source and target sides. To generate diversity data, the restricted sampling strategy is employed at the decoding steps. Finally, we filter and merge origin data and synthetic parallel corpus to train the final model. In the experiment, the proposed approach achieved 1.96 BLEU points in the IWSLT2014 German–English translation tasks, which was used to simulate a low-resource language. Our approach also consistently and substantially obtained 1.0 to 2.0 BLEU improvement in three other low-resource translation tasks, including English–Turkish, Nepali–English, and Sinhala–English translation tasks. View Full-Text ",
                            "urls_google_scholar": [
                                "https://www.mdpi.com/709286",
                                "https://search.proquest.com/openview/ea0a6dd7da064b7c1ea9fae43bb3c59c/1?pq-origsite=gscholar&cbl=2032384",
                                "http://search.ebscohost.com/login.aspx?direct=true&profile=ehost&scope=site&authtype=crawler&jrnl=20782489&AN=143674550&h=X5ehewW5fnKl0SAO2rczKEap4aIPyae%2FsFTl4ZQz0cAcShUjJ4PR9vsJIGmmYfV91T%2Bx4LiW7ehuS%2BGbE66jlw%3D%3D&crl=c"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/A-Diverse-Data-Augmentation-Strategy-for-Neural-Li-Li/0a3df1c62b79e3451e3418294e36ac2257f76968"
                            ],
                            "context_in_section": "Li et al. (2020b) use backtranslation and self-learning to generate augmented training data"
                        },
                        {
                            "title": "Data Augmentation for Low-Resource Neural Machine Translation",
                            "year": 2017,
                            "author": "Fadaee et al.",
                            "abstract": "The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, synthetically created contexts. Experimental results on simulated low-resource settings show that our method improves translation quality by up to 2.9 BLEU points over the baseline and up to 3.2 BLEU over back-translation.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/P17-2090/",
                                "https://aclanthology.org/P17-2090.pdf",
                                "https://arxiv.org/abs/1705.00440",
                                "https://ui.adsabs.harvard.edu/abs/2017arXiv170500440F/abstract",
                                "https://www.researchgate.net/profile/Marzieh-Fadaee-2/publication/318739304_Data_Augmentation_for_Low-Resource_Neural_Machine_Translation/links/5b2a689fa6fdcc72db4d9bc4/Data-Augmentation-for-Low-Resource-Neural-Machine-Translation.pdf",
                                "https://staff.fnwi.uva.nl/c.monz/html/publications/P17-2090.pdf",
                                "https://www.narcis.nl/publication/RecordID/oai:dare.uva.nl:publications%2Fa4464493-3313-465f-bda4-ce8a99cbe79a",
                                "https://staff.science.uva.nl/c.monz/ltl/publications/P17-2090.pdf",
                                "https://openreview.net/forum?id=NpgsxImYzqN",
                                "https://research.rug.nl/files/136048568/P17_2090.pdf",
                                "https://www.aclweb.org/anthology/P17-2090.pdf",
                                "https://dare.uva.nl/personal/pure/en/publications/data-augmentation-for-lowresource-neural-machine-translation(a4464493-3313-465f-bda4-ce8a99cbe79a).html",
                                "https://staff.science.uva.nl/c.monz/html/publications/P17-2090.pdf",
                                "https://www.aclweb.org/anthology/P17-2090/",
                                "https://research.rug.nl/en/publications/data-augmentation-for-low-resource-neural-machine-translation",
                                "https://www.researchgate.net/profile/Marzieh-Fadaee-2/publication/316617635_Data_Augmentation_for_Low-Resource_Neural_Machine_Translation/links/5b45ed8aaca272dc3860671d/Data-Augmentation-for-Low-Resource-Neural-Machine-Translation.pdf",
                                "https://openreview.net/forum?id=H1Z63ol_ZH"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Augmentation-for-Low-Resource-Neural-Machine-Fadaee-Bisazza/f2e15f3f2c3d4f7e8cf8445435e40d65d828ffd5"
                            ],
                            "context_in_section": " Inspired by work in CV, Fadaee et al. (2017) generate additional training examples that contain low-frequency (rare) words in synthetically created contexts."
                        },
                        {
                            "title": "CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP ",
                            "year": 2020,
                            "author": "Qin et al.",
                            "abstract": " Multi-lingual contextualized embeddings, such as multilingual-BERT (mBERT), have shown success in a variety of zero-shot cross-lingual tasks. However, these models are limited by having inconsistent contextualized representations of subwords across different languages. Existing work addresses this issue by bilingual projection and fine-tuning technique. We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT, which encourages model to align representations from source and multiple target languages once by mixing their context information. Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages. Experimental results on five tasks with 19 languages show that our method leads to significantly improved performances for all the tasks compared with mBERT. ",
                            "urls_google_scholar": [
                                "https://www.ijcai.org/proceedings/2020/533",
                                "https://arxiv.org/abs/2006.06402",
                                "https://static.aminer.cn/upload/pdf/265/1170/1990/5ee3527191e011cb3bff760a_0.pdf",
                                "https://www.ijcai.org/Proceedings/2020/0533.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2020arXiv200606402Q/abstract",
                                "http://ir.hit.edu.cn/~car/papers/ijcai20-lbqin.pdf"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/CoSDA-ML%3A-Multi-Lingual-Code-Switching-Data-for-NLP-Qin-Ni/24e972ed9da2c24d8a331a50f1ae4ae155cfd4ef",
                                "https://www.semanticscholar.org/paper/CoSDA-ML%3A-Multi-Lingual-Code-Switching-Data-for-NLP-Qin-Ni/a4d480db889781677c47ed9ebeabd5252ac11db6"
                            ],
                            "context_in_section": "Qin et al. (2020) present a DA framework to generate multi-lingual code-switching data to fine-tune multilingual-BERT. It encourages the alignment of representations from source and multiple target languages once by mixing their context information. They see improved performance across 5 tasks with 19 languages"
                        }
                    ]
                },
                {
                    "title":"Mitigating Bias",
                    "articles": [
                        {
                            "title": "Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods",
                            "year": 2018,
                            "author": "Zhao et al.",
                            "abstract": "In this paper, we introduce a new benchmark for co-reference resolution focused on gender bias, WinoBias. Our corpus contains Winograd-schema style sentences with entities corresponding to people referred by their occupation (e.g. the nurse, the doctor, the carpenter). We demonstrate that a rule-based, a feature-rich, and a neural coreference system all link gendered pronouns to pro-stereotypical entities with higher accuracy than anti-stereotypical entities, by an average difference of 21.1 in F1 score. Finally, we demonstrate a data-augmentation approach that, in combination with existing word-embedding debiasing techniques, removes the bias demonstrated by these systems in WinoBias without significantly affecting their performance on existing datasets.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/N18-2003/",
                                "https://aclanthology.org/N18-2003.pdf",
                                "https://arxiv.org/abs/1804.06876",
                                "http://markyatskar.com/publications/winobias.pdf",
                                "https://www.vicenteordonez.com/files/winobias.pdf",
                                "https://www.researchgate.net/profile/Kai-Wei-Chang-3/publication/324643739_Gender_Bias_in_Coreference_Resolution_Evaluation_and_Debiasing_Methods/links/5af7d951aca2720af9e1bd03/Gender-Bias-in-Coreference-Resolution-Evaluation-and-Debiasing-Methods.pdf",
                                "https://par.nsf.gov/biblio/10084252",
                                "https://www.aclweb.org/anthology/N18-2003.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180406876Z/abstract",
                                "https://openreview.net/forum?id=S1Z9SQbOWH",
                                "https://www.researchgate.net/profile/Kai-Wei-Chang-3/publication/325447193_Gender_Bias_in_Coreference_Resolution_Evaluation_and_Debiasing_Methods/links/5ee09d3d92851cf1386f5a1e/Gender-Bias-in-Coreference-Resolution-Evaluation-and-Debiasing-Methods.pdf",
                                "https://www.aclweb.org/anthology/N18-2003/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Gender-Bias-in-Coreference-Resolution%3A-Evaluation-Zhao-Wang/0be19fd9896e5d40222c690cc3ff553adc7c0e27"
                            ],
                            "context_in_section": "Zhao et al. (2018) attempt to mitigate gender bias in coreference resolution by creating an augmented dataset identical to the original but biased towards the underrepresented gender (using gender swapping of entities such as replacing \"he\" with \"she\") and train on the union of the two datasets"
                        },
                        {
                            "title": "Gender Bias in Neural Natural Language Processing",
                            "year": 2020,
                            "author": "Lu et al.",
                            "abstract": "We examine whether neural natural language processing (NLP) systems reflect historical biases in training data. We define a general benchmark to quantify gender bias in a variety of neural NLP tasks. Our empirical evaluation with state-of-the-art neural coreference resolution and textbook RNN-based language models trained on benchmark data sets finds significant gender bias in how models view occupations. We then mitigate bias with counterfactual data augmentation (CDA): a generic methodology for corpus augmentation via causal interventions that breaks associations between gendered and gender-neutral words. We empirically show that CDA effectively decreases gender bias while preserving accuracy. We also explore the space of mitigation strategies with CDA, a prior approach to word embedding debiasing (WED), and their compositions. We show that CDA outperforms WED, drastically so when word embeddings are trained. For pre-trained embeddings, the two methods can be effectively composed. We also find that as training proceeds on the original data set with gradient descent the gender bias grows as the loss reduces, indicating that the optimization encourages bias; CDA mitigates this behavior.",
                            "urls_google_scholar": [
                                "https://link.springer.com/chapter/10.1007%2F978-3-030-62077-6_14",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180711714L/abstract",
                                "https://arxiv.org/abs/1807.11714"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Gender-Bias-in-Neural-Natural-Language-Processing-Lu-Mardziel/fef9d9eb2d527174ac5b329b0a044e98a1808971"
                            ],
                            "context_in_section": " Lu et al. (2020) formally propose COUNTERFACTUAL DA (CDA) for gender bias mitigation, which involves causal interventions that break associations between gendered and gender-neutral words."
                        },
                        {
                            "title": "Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology",
                            "year": 2019,
                            "author": "Zmigrod et al.",
                            "abstract": "Gender stereotypes are manifest in most of the world’s languages and are consequently propagated or amplified by NLP systems. Although research has focused on mitigating gender stereotypes in English, the approaches that are commonly employed produce ungrammatical sentences in morphologically rich languages. We present a novel approach for converting between masculine-inflected and feminine-inflected sentences in such languages. For Spanish and Hebrew, our approach achieves F1 scores of 82% and 73% at the level of tags and accuracies of 90% and 87% at the level of forms. By evaluating our approach using four different languages, we show that, on average, it reduces gender stereotyping by a factor of 2.5 without any sacrifice to grammaticality.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/P19-1161/",
                                "https://aclanthology.org/P19-1161v2.pdf",
                                "https://arxiv.org/abs/1906.04571",
                                "https://www.aclweb.org/anthology/P19-1161.pdf",
                                "https://pdfs.semanticscholar.org/fbb5/5fe4ab84f0bf3c75ef80f272ed5dfa9e9fc5.pdf",
                                "http://174.138.37.75/P19-1161.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190604571Z/abstract",
                                "https://openreview.net/forum?id=pzlvp4ZbaJi",
                                "https://www.aclweb.org/anthology/P19-1161v2.pdf",
                                "http://174.138.37.75/attachments/P19-1161.Presentation.pdf",
                                "https://openreview.net/forum?id=5octptlkci_"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Counterfactual-Data-Augmentation-for-Mitigating-in-Zmigrod-Mielke/835ac3cbb41f2ec47718c5491211dd33b64f382b"
                            ],
                            "context_in_section": "Zmigrod et al. (2019) and Hall Maudslayet al. (2019) propose further improvements to counterfactual data augmentation (CDA)."
                        },
                        {
                            "title": "It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution",
                            "year": 2019,
                            "author": "Hall Maudslay et al.",
                            "abstract": "This paper treats gender bias latent in word embeddings. Previous mitigation attempts rely on the operationalisation of gender bias as a projection over a linear subspace. An alternative approach is Counterfactual Data Augmentation (CDA), in which a corpus is duplicated and augmented to remove bias, e.g. by swapping all inherently-gendered words in the copy. We perform an empirical comparison of these approaches on the English Gigaword and Wikipedia, and find that whilst both successfully reduce direct bias and perform well in tasks which quantify embedding quality, CDA variants outperform projection-based methods at the task of drawing non-biased gender analogies by an average of 19% across both corpora. We propose two improvements to CDA: Counterfactual Data Substitution (CDS), a variant of CDA in which potentially biased text is randomly substituted to avoid duplication, and the Names Intervention, a novel name-pairing technique that vastly increases the number of words being treated. CDA/S with the Names Intervention is the only approach which is able to mitigate indirect gender bias: following debiasing, previously biased words are significantly less clustered according to gender (cluster purity is reduced by 49%), thus improving on the state-of-the-art for bias mitigation.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/D19-1530/",
                                "https://aclanthology.org/D19-1530v2.pdf",
                                "https://arxiv.org/abs/1909.00871",
                                "https://rycolab.io/papers/hall-maudslay+al.emnlp-ijcnlp19.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190900871H/abstract",
                                "https://www.repository.cam.ac.uk/bitstream/handle/1810/319595/Lipstick_on_a_Pig%20(1).pdf?sequence=3",
                                "https://www.aclweb.org/anthology/D19-1530/",
                                "https://www.aclweb.org/anthology/D19-1530v2.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/It%E2%80%99s-All-in-the-Name%3A-Mitigating-Gender-Bias-with-Maudslay-Gonen/6addee53ecfd3bb223383b0adf801f5a7e1022fc"
                            ],
                            "context_in_section": "Zmigrod et al. (2019) and Hall Maudslayet al. (2019) propose further improvements to counterfactual data augmentation (CDA)."
                        },
                        {
                            "title": "Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures",
                            "year": 2020,
                            "author": "Moosavi et al.",
                            "abstract": "     Existing NLP datasets contain various biases, and models tend to quickly learn those biases, which in turn limits their robustness. Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective so that models learn less from biased examples. Besides, they mostly focus on addressing a specific bias, and while they improve the performance on adversarial evaluation sets of the targeted bias, they may bias the model in other ways, and therefore, hurt the overall robustness. In this paper, we propose to augment the input sentences in the training data with their corresponding predicate-argument structures, which provide a higher-level abstraction over different realizations of the same meaning and help the model to recognize important parts of sentences. We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases. In addition, we show that models can still be vulnerable to the lexical overlap bias, even when the training data does not contain this bias, and that the sentence augmentation also improves the robustness in this scenario. We will release our adversarial datasets to evaluate bias in such a scenario as well as our augmentation scripts at this https URL. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2010.12510",
                                "https://arxiv.org/pdf/2010.12510",
                                "https://openreview.net/pdf?id=2AGZUDRsHg"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Improving-Robustness-by-Augmenting-Training-with-Moosavi-Boer/66d7ada13f319ca4a2d71a8219f06b99206a656d"
                            ],
                            "context_in_section": "Moosavi et al. (2020) augment training sentences with their corresponding predicate-argument structures, improving the robustness of transformer models against various types of biases."
                        }
                    ]
                },
                {
                    "title": "Fixing Class Imabalance",
                    "articles": [
                        {
                            "title": "SMOTE: Synthetic Minority Over-sampling Technique",
                            "year": 2002,
                            "author": "Chawla et al.",
                            "abstract": " An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy. ",
                            "urls_google_scholar": [
                                "https://www.jair.org/index.php/jair/article/view/10302",
                                "https://www.researchgate.net/profile/Nitesh-Chawla/publication/220543125_SMOTE_Synthetic_Minority_Over-sampling_Technique/links/58a1e5a645851598babae482/SMOTE-Synthetic-Minority-Over-sampling-Technique.pdf",
                                "https://www.jair.org/index.php/jair/article/view/10302?amfpc=648259?amfpc=648259",
                                "http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a.pdf",
                                "https://www.academia.edu/download/31919505/1106.1813.pdf",
                                "http://www.scs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a.pdf",
                                "https://eva.fing.edu.uy/pluginfile.php/63458/mod_resource/content/1/smote_synthetic_minority_over_sampling_134999.pdf",
                                "https://www.csee.usf.edu/~lohall/papers/smote.pdf",
                                "https://www.sid.ir/en/journal/ViewPaper.aspx?ID=402175",
                                "https://arxiv.org/abs/1106.1813"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/SMOTE%3A-Synthetic-Minority-Over-sampling-Technique-Chawla-Bowyer/8cb44f06586f609a29d9b496cc752ec01475dffe"
                            ],
                            "context_in_section": "Fixing class imbalance typically involves a combination of undersampling and oversampling. SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) (Chawla et al., 2002), which generates augmented minority class examples through interpolation, still remains popular (Fernández et al., 2018"
                        },
                        {
                            "title": "SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary",
                            "year": 2018,
                            "author": "Fernandez et al.",
                            "abstract": "The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered \"de facto\" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several different domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also significantly contributed to new supervised learning paradigms, including multilabel classification, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of different software packages - from open source to commercial. In this paper, marking the fifteen year anniversary of SMOTE, we reflect on the SMOTE journey, discuss the current state of affairs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.",
                            "urls_google_scholar": [
                                "https://www.jair.org/index.php/jair/article/view/11192",
                                "https://sci2s.ugr.es/sites/default/files/ficherosPublicaciones/2431_live-5590-10577-jair.pdf",
                                "https://www3.nd.edu/~dial/publications/fernandez2018smote.pdf",
                                "http://150.214.190.154/sites/default/files/ficherosPublicaciones/2431_live-5590-10577-jair.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/SMOTE-for-Learning-from-Imbalanced-Data%3A-Progress-Fern%C3%A1ndez-Garc%C3%ADa/974df2ce665155cbdba5873378f3602ed1c4ce82"
                            ],
                            "context_in_section": "Fixing class imbalance typically involves a combination of undersampling and oversampling. SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) (Chawla et al., 2002), which generates augmented minority class examples through interpolation, still remains popular (Fernández et al., 2018"
                        },
                        {
                            "title": "MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation",
                            "year": 2015,
                            "author": "Charte et al.",
                            "abstract": "Learning from imbalanced data is a problem which arises in many real-world scenarios, so does the need to build classifiers able to predict more than one class label simultaneously (multilabel classification). Dealing with imbalance by means of resampling methods is an approach that has been deeply studied lately, primarily in the context of traditional (non-multilabel) classification. In this paper the process of synthetic instance generation for multilabel datasets (MLDs) is studied and MLSMOTE (Multilabel Synthetic Minority Over-sampling Technique), a new algorithm aimed to produce synthetic instances for imbalanced MLDs, is proposed. An extensive review on how imbalance in the multilabel context has been tackled in the past is provided, along with a thorough experimental study aimed to verify the benefits of the proposed algorithm. Several multilabel classification algorithms and other multilabel oversampling methods are considered, as well as ensemble-based algorithms for imbalanced multilabel classification. The empirical analysis shows that MLSMOTE is able to improve the classification results produced by existent proposals.",
                            "urls_google_scholar": [
                                "https://www.sciencedirect.com/science/article/abs/pii/S0950705115002737?via%3Dihub",
                                "https://simidat.ujaen.es/sites/default/files/biblio/2015-KBS-MLSMOTE.pdf",
                                "https://www.infona.pl/resource/bwmeta1.element.elsevier-617c00c1-3857-3b8a-89f2-cd2210880895",
                                "https://dl.acm.org/doi/abs/10.1016/j.knosys.2015.07.019"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/MLSMOTE%3A-Approaching-imbalanced-multilabel-learning-Charte-Rivera/823a88080b9178d34f52118a2d1037b467eb2662"
                            ],
                            "context_in_section": "MULTILABEL SMOTE (MLSMOTE) (Charte et al., 2015) modifies SMOTE to balance classes for multi-label classification, where classifiers predict more than one class at the same time."
                        },
                        {
                            "title": "EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks",
                            "year": 2019,
                            "author": "Wei and Zou",
                            "abstract": "We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1901.11196",
                                "https://openreview.net/forum?id=BJelsDvo84",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190111196W/abstract",
                                "https://www.aclweb.org/anthology/D19-1670/"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/162cad5df347bdac469331df540440b320b5aa21",
                                "https://www.semanticscholar.org/paper/EDA%3A-Easy-Data-Augmentation-Techniques-for-Boosting-Wei-Zou/162cad5df347bdac469331df540440b320b5aa21"
                            ],
                            "context_in_section": "Other techniques such as EDA (Wei and Zou, 2019) can possibly be used for oversampling as well."
                        }
                    ]
                },
                {
                    "title": "Few-Shot Learning",
                    "articles": [
                        {
                            "title": "Low-shot Visual Recognition by Shrinking and Hallucinating Features",
                            "year": 2017,
                            "author": "Hariharan and Girshick",
                            "abstract": "Low-shot visual learning---the ability to recognize novel object categories from very few examples---is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a low-shot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then propose a) representation regularization techniques, and b) techniques to hallucinate additional training examples for data-starved classes. Together, our methods improve the effectiveness of convolutional networks in low-shot learning, improving the one-shot accuracy on novel classes by 2.3x on the challenging ImageNet dataset. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/1606.02819",
                                "https://arxiv.org/pdf/1606.02819",
                                "http://openaccess.thecvf.com/content_iccv_2017/html/Hariharan_Low-Shot_Visual_Recognition_ICCV_2017_paper.html",
                                "https://ui.adsabs.harvard.edu/abs/2016arXiv160602819H/abstract",
                                "https://ieeexplore.ieee.org/abstract/document/8237590/",
                                "https://research.fb.com/wp-content/uploads/2017/09/1523.pdf",
                                "https://openreview.net/forum?id=S1ZKcZz_-B",
                                "https://onikle.com/articles/31207"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Low-Shot-Visual-Recognition-by-Shrinking-and-Hariharan-Girshick/7c7ab73469c25437332f5c1c1c5cb67c7b2f0855"
                            ],
                            "context_in_section": "DA methods can ease few-shot learning by adding more examples for novel classes introduced in the few-shot phase. Hariharan and Girshick (2017) use learned analogy transformations φ(z1, z2, x) between example pairs from a non-novel class z1 → z2 to generate augmented examples x → x′ for novel classes."
                        },
                        {
                            "title": "Δ-encoder: an effective sample synthesis method for few-shot object recognition",
                            "year": 2018,
                            "author": "Schwartz et al.",
                            "abstract": "Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we propose a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted Δ-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or \"deltas\", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves the state-of-the-art of one-shot object-recognition and performs comparably in the few-shot case.",
                            "urls_google_scholar": [
                                "https://dl.acm.org/doi/abs/10.5555/3327144.3327208",
                                "https://arxiv.org/abs/1806.04734",
                                "https://openreview.net/forum?id=DBa9gVvdVJ0",
                                "https://onikle.com/articles/19111",
                                "https://scent-project.eu/wp-content/uploads/2019/10/Data-Encoder-An-Effective-Sample-Synthesis-Method-for-Few-Shot-Object-Recognition.pdf",
                                "http://papers.neurips.cc/paper/7549-delta-encoder-an-effective-sample-synthesis-method-for-few-shot-object-recognition.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180604734S/abstract",
                                "https://vista.cs.technion.ac.il/wp-content/uploads/2018/09/poster_SchKarShtHarMarKumFerGirBroNIPS18.pdf",
                                "https://pdfs.semanticscholar.org/6d55/daee74b83af740d7e4c04a404e84ec1f55d4.pdf",
                                "https://neurips.cc/media/Slides/nips/2018/220e(05-09-45)-05-09-50-12634-Delta-encoder:_.pdf",
                                "https://openreview.net/forum?id=HJZOtvZu-H"
                            ],
                            "urls_semantic_scholar":[
                                "https://pdfs.semanticscholar.org/6d55/daee74b83af740d7e4c04a404e84ec1f55d4.pdf",
                                "https://www.semanticscholar.org/paper/77e8e3e578b8abd17303f13af1df22080fd6afbb",
                                "https://www.semanticscholar.org/paper/Delta-encoder%3A-an-effective-sample-synthesis-method-Schwartz-Karlinsky/77e8e3e578b8abd17303f13af1df22080fd6afbb",
                                "https://www.semanticscholar.org/paper/Delta-encoder%3A-an-effective-sample-synthesis-method-Schwartz-Karlinsky/77e8e3e578b8abd17303f13af1df22080fd6afbb/figure/1"
                            ],
                            "context_in_section": "Schwartz et al. (2018) generalize this to beyond just linear offsets, through their \"∆-network\" autoencoder which learns the distribution P (z2|z1, C) from all y∗z1 = y∗z2 = C pairs, where C is a class and y is the ground-truth labelling function. Both these methods are applied only on image tasks, but their theoretical formulations are generally applicable, and hence we discuss them."
                        },
                        {
                            "title": "A Closer Look At Feature Space Data Augmentation For Few-Shot Intent Classification",
                            "year": 2019,
                            "author": "Kumar et al.",
                            "abstract": "New conversation topics and functionalities are constantly being added to conversational AI agents like Amazon Alexa and Apple Siri. As data collection and annotation is not scalable and is often costly, only a handful of examples for the new functionalities are available, which results in poor generalization performance. We formulate it as a Few-Shot Integration (FSI) problem where a few examples are used to introduce a new intent. In this paper, we study six feature space data augmentation methods to improve classification performance in FSI setting in combination with both supervised and unsupervised representation learning methods such as BERT. Through realistic experiments on two public conversational datasets, SNIPS, and the Facebook Dialog corpus, we show that data augmentation in feature space provides an effective way to improve intent classification performance in few-shot setting beyond traditional transfer learning approaches. In particular, we show that (a) upsampling in latent space is a competitive baseline for feature space augmentation (b) adding the difference between two examples to a new example is a simple yet effective data augmentation method.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/D19-6101/",
                                "https://aclanthology.org/D19-6101.pdf",
                                "https://arxiv.org/abs/1910.04176",
                                "https://assets.amazon.science/55/08/b2013f8d4e55b2680becb636b7fc/a-closer-look-at-latent-space-data-augmentation-for-few-shot-intent-classification.pdf",
                                "https://openreview.net/forum?id=01u_BUZHsFR",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv191004176K/abstract",
                                "https://www.researchgate.net/profile/Varun-Kumar-46/publication/336370365_A_Closer_Look_At_Feature_Space_Data_Augmentation_For_Few-Shot_Intent_Classification/links/5d9e2dd792851cce3c90f5c1/A-Closer-Look-At-Feature-Space-Data-Augmentation-For-Few-Shot-Intent-Classification.pdf",
                                "https://openreview.net/forum?id=6sj4miqzUZ",
                                "https://www.aclweb.org/anthology/D19-61.pdf#page=15"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/A-Closer-Look-At-Feature-Space-Data-Augmentation-Kumar-Glaude/a165c952799cb1198ab8dd2bd8dbdf5bc846356f"
                            ],
                            "context_in_section": "Kumar et al. (2019b) apply these and other DA methods for few-shot learning of novel intent classes in task-oriented dialog."
                        },
                        {
                            "title": "Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning",
                            "year": 2021,
                            "author": "Wei et al.",
                            "abstract": " Few-shot text classification is a fundamental NLP task in which a model aims to classify text into a large number of categories, given only a few training examples per category. This paper explores data augmentation -- a technique particularly suitable for training with limited data -- for this few-shot, highly-multiclass text classification setting. On four diverse text classification tasks, we find that common data augmentation techniques can improve the performance of triplet networks by up to 3.0% on average. To further boost performance, we present a simple training strategy called curriculum data augmentation, which leverages curriculum learning by first training on only original examples and then introducing augmented data as training progresses. We explore a two-stage and a gradual schedule, and find that, compared with standard single-stage training, curriculum data augmentation trains faster, improves performance, and remains robust to high amounts of noising from augmentation. ",
                            "urls_google_scholar": [
                                "https://arxiv.org/pdf/2103.07552",
                                "https://arxiv.org/abs/2103.07552",
                                "https://www.aclweb.org/anthology/2021.naacl-main.434/",
                                "https://aclanthology.org/2021.naacl-main.434.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2021arXiv210307552W/abstract"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Few-Shot-Text-Classification-with-Triplet-Networks%2C-Wei-Huang/8d26d4f850acb8cf4f17345ad1dee67ac318a3a1"
                            ],
                            "context_in_section": "Wei et al. (2021a) show that data augmentation facilitates curriculum learning for training triplet networks for few-shot text classification."
                        },
                        {
                            "title": "Neural Data Augmentation via Example Extrapolation",
                            "year": 2021,
                            "author": "Lee et al.",
                            "abstract": "In many applications of machine learning, certain categories of examples may be underrepresented in the training data, causing systems to underperform on such \"few-shot\" cases at test time. A common remedy is to perform data augmentation, such as by duplicating underrepresented examples, or heuristically synthesizing new examples. But these remedies often fail to cover the full diversity and complexity of real examples. We propose a data augmentation approach that performs neural Example Extrapolation (Ex2). Given a handful of exemplars sampled from some distribution, Ex2 synthesizes new examples that also belong to the same distribution. The Ex2 model is learned by simulating the example generation procedure on data-rich slices of the data, and it is applied to underrepresented, few-shot slices. We apply Ex2 to a range of language understanding tasks and significantly improve over state-of-the-art methods on multiple few-shot learning benchmarks, including for relation extraction (FewRel) and intent classification + slot filling (SNIPS). ",
                            "urls_google_scholar": [
                                "https://arxiv.org/abs/2102.01335",
                                "https://arxiv.org/pdf/2102.01335",
                                "https://ui.adsabs.harvard.edu/abs/2021arXiv210201335L/abstract",
                                "http://arxiv-download.xixiaoyao.cn/pdf/2102.01335.pdf",
                                "https://arxiv-download.xixiaoyao.cn/pdf/2102.01335.pdf"
                            ],
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Neural-Data-Augmentation-via-Example-Extrapolation-Lee-Guu/10b15a695f837fbdc2babe0c38f8702c10af7bfb"
                            ],
                            "context_in_section": "Lee et al. (2021) use T5 to generate additional examples for data-scarce classes."
                        }
                    ]
                },
                {
                    "title":"Adversarial Examples (AVEs)",
                    "articles": [
                        {
                            "title": "Certified Robustness to Adversarial Word Substitutions",
                            "year": 2019,
                            "author": "Jia et al.",
                            "abstract": "State-of-the-art NLP models can often be fooled by adversaries that apply seemingly innocuous label-preserving transformations (e.g., paraphrasing) to input text. The number of possible transformations scales exponentially with text length, so data augmentation cannot cover all transformations of an input. This paper considers one exponentially large family of label-preserving transformations, in which every word in the input can be replaced with a similar word. We train the first models that are provably robust to all word substitutions in this family. Our training procedure uses Interval Bound Propagation (IBP) to minimize an upper bound on the worst-case loss that any combination of word substitutions can induce. To evaluate models’ robustness to these transformations, we measure accuracy on adversarially chosen word substitutions applied to test examples. Our IBP-trained models attain 75% adversarial accuracy on both sentiment analysis on IMDB and natural language inference on SNLI; in comparison, on IMDB, models trained normally and ones trained with data augmentation achieve adversarial accuracy of only 12% and 41%, respectively.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/D19-1423/",
                                "https://aclanthology.org/D19-1423.pdf",
                                "https://openreview.net/forum?id=_q30R2bhFk",
                                "https://arxiv.org/abs/1909.00986",
                                "https://171.64.67.140/pubs/jia2019certified.pdf",
                                "https://nlp.stanford.edu/pubs/jia2019certified.pdf",
                                "https://www-nlp.stanford.edu/pubs/jia2019certified.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190900986J/abstract"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/Certified-Robustness-to-Adversarial-Word-Jia-Raghunathan/4690190d6c110f7525f7250e1acf4a4eab42519f"
                            ],
                            "context_in_section": "Adversarial examples can be generated using innocuous label-preserving transformations (e.g. paraphrasing) that fool state-of-the-art NLP models, as shown in Jia et al. (2019). Specifically, they add sentences with distractor spans to passages to construct AVEs for span-based QA. "
                        },
                        {
                            "title": "PAWS: Paraphrase Adversaries from Word Scrambling",
                            "year": 2019,
                            "author": "Zhang et al.",
                            "abstract": "Existing paraphrase identification datasets lack sentence pairs that have high lexical overlap without being paraphrases. Models trained on such data fail to distinguish pairs like flights from New York to Florida and flights from Florida to New York. This paper introduces PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 well-formed paraphrase and non-paraphrase pairs with high lexical overlap. Challenging pairs are generated by controlled word swapping and back translation, followed by fluency and paraphrase judgments by human raters. State-of-the-art models trained on existing datasets have dismal performance on PAWS (<40% accuracy); however, including PAWS training data for these models improves their accuracy to 85% while maintaining performance on existing tasks. In contrast, models that do not capture non-local contextual information fail even with PAWS training examples. As such, PAWS provides an effective instrument for driving further progress on models that better exploit structure, context, and pairwise comparisons.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/N19-1131/",
                                "https://aclanthology.org/N19-1131.pdf",
                                "https://arxiv.org/abs/1904.01130",
                                "https://ui.adsabs.harvard.edu/abs/2019arXiv190401130Z/abstract",
                                "https://www.aclweb.org/anthology/N19-1131.pdf",
                                "https://openreview.net/forum?id=H1-AAGWO-B",
                                "https://deepai.org/publication/paws-paraphrase-adversaries-from-word-scrambling",
                                "https://www.aclweb.org/anthology/N19-1131/"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/PAWS%3A-Paraphrase-Adversaries-from-Word-Scrambling-Zhang-Baldridge/fc09d6486be1c9bbfbef4165ce3c1ab664e5d084"
                            ],
                            "context_in_section": "Zhang et al. (2019d) construct AVEs for paraphrase detection using word swapping. "
                        },
                        {
                            "title": "AdvEntuRe: Adversarial Training for Textual Entailment with Knowledge-Guided Examples",
                            "year": 2018,
                            "author": "Kang et al.",
                            "abstract": "We consider the problem of learning textual entailment models with limited supervision (5K-10K training examples), and present two complementary approaches for it. First, we propose knowledge-guided adversarial example generators for incorporating large lexical resources in entailment models via only a handful of rule templates. Second, to make the entailment model—a discriminator—more robust, we propose the first GAN-style approach for training it using a natural language example generator that iteratively adjusts to the discriminator’s weaknesses. We demonstrate effectiveness using two entailment datasets, where the proposed methods increase accuracy by 4.7% on SciTail and by 2.8% on a 1% sub-sample of SNLI. Notably, even a single hand-written rule, negate, improves the accuracy of negation examples in SNLI by 6.1%.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/P18-1225/",
                                "https://aclanthology.org/P18-1225.pdf",
                                "https://arxiv.org/abs/1805.04680",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180504680K/abstract",
                                "https://www.aclweb.org/anthology/P18-1225.pdf",
                                "http://www.cs.cmu.edu/~dongyeok/papers/acl18kang_adventure.pdf",
                                "https://openreview.net/forum?id=ry-_0ogO-H",
                                "https://www.aclweb.org/anthology/P18-1225/",
                                "https://www.cs.cmu.edu/~./hovy/papers/18ACL-adversarial-for-entailment.pdf"
                            ],
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/AdvEntuRe%3A-Adversarial-Training-for-Textual-with-Kang-Khot/e3e8a2276696a1e6ee62ddb213a38ad022dcb02e"
                            ],
                            "context_in_section": "Kang et al. (2018) and Glockner et al. (2018) create AVEs for textual entailment using WordNet relations."
                        },
                        {
                            "title": "Breaking NLI Systems with Sentences that Require Simple Lexical Inferences",
                            "year": 2018,
                            "author": "Glockner et al.",
                            "abstract": "We create a new NLI test set that shows the deficiency of state-of-the-art models in inferences that require lexical and world knowledge. The new examples are simpler than the SNLI test set, containing sentences that differ by at most one word from sentences in the training set. Yet, the performance on the new test set is substantially worse across systems trained on SNLI, demonstrating that these systems are limited in their generalization ability, failing to capture many simple inferences.",
                            "urls_google_scholar": [
                                "https://aclanthology.org/P18-2103/",
                                "https://aclanthology.org/P18-2103.pdf",
                                "https://deepai.org/publication/breaking-nli-systems-with-sentences-that-require-simple-lexical-inferences",
                                "https://openreview.net/forum?id=rk-7Mned-H",
                                "https://www.aclweb.org/anthology/P18-2103.pdf",
                                "https://ui.adsabs.harvard.edu/abs/2018arXiv180502266G/abstract",
                                "https://www.aclweb.org/anthology/P18-2103/"
                            ],
                            "urls_semantic_scholar":["https://www.semanticscholar.org/paper/Breaking-NLI-Systems-with-Sentences-that-Require-Glockner-Shwartz/413a03a146e6f7b16c11e73243d83e6f1a6627a3"],
                            "context_in_section": "Kang et al. (2018) and Glockner et al. (2018) create AVEs for textual entailment using WordNet relations."
                        }
                    ]
                }
            ]
        },
        {
            "title":"Tasks",
            "subsections": [
                {
                    "title": "Summarization",
                    "articles": [
                        {
                            "title": "Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation",
                            "year": 2020,
                            "author": "FAbbri et al.",
                            "abstract": "     Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks. However, these models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains. In this work, we introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner. WikiTransfer fine-tunes pretrained models on pseudo-summaries, produced from generic Wikipedia data, which contain characteristics of the target dataset, such as the length and level of abstraction of the desired summaries. WikiTransfer models achieve state-of-the-art, zero-shot abstractive summarization performance on the CNN-DailyMail dataset and demonstrate the effectiveness of our approach on three additional diverse datasets. These models are more robust to noisy data and also achieve better or comparable few-shot performance using 10 and 100 training examples when compared to few-shot transfer from other summarization datasets. To further boost performance, we employ data augmentation via round-trip translation as well as introduce a regularization term for improved few-shot transfer. To understand the role of dataset aspects in transfer performance and the quality of the resulting output summaries, we further study the effect of the components of our unsupervised fine-tuning data and analyze few-shot performance using both automatic and human evaluation. ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Improving-Zero-and-Few-Shot-Abstractive-with-and-Fabbri-Han/7ba0dc20800195c6350995695c8bf86be6227c49"
                            ],
                            "context_in_section": "Fabbri et al. (2020) investigate backtranslation as a DA method for few-shot abstractive summarization with the use of a consistency loss inspired by UDA."
                        },
                        {
                            "title": "Abstract Text Summarization: A Low Resource Challenge",
                            "year": 2019,
                            "author": "Parida and Motlicek",
                            "abstract": "Text summarization is considered as a challenging task in the NLP community. The availability of datasets for the task of multilingual text summarization is rare, and such datasets are difficult to construct. In this work, we build an abstract text summarizer for the German language text using the state-of-the-art “Transformer” model. We propose an iterative data augmentation approach which uses synthetic data along with the real summarization data for the German language. To generate synthetic data, the Common Crawl (German) dataset is exploited, which covers different domains. The synthetic data is effective for the low resource condition and is particularly helpful for our multilingual scenario where availability of summarizing data is still a challenging issue. The data are also useful in deep learning scenarios where the neural models require a large amount of training data for utilization of its capacity. The obtained summarization performance is measured in terms of ROUGE and BLEU score. We achieve an absolute improvement of +1.5 and +16.0 in ROUGE1 F1 (R1_F1) on the development and test sets, respectively, compared to the system which does not rely on data augmentation.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Abstract-Text-Summarization%3A-A-Low-Resource-Parida-Motl%C3%ADcek/b69932bd355beb359557d9f988dfb84cc361e605"
                            ],
                            "context_in_section": "Parida and Motlicek (2019) propose an iterative DA approach for abstractive summarization that uses a mix of synthetic and real data, where the former is generated from Common Crawl."
                        },
                        {
                            "title": "Transforming Wikipedia into Augmented Data for Query-Focused Summarization",
                            "year": 2019,
                            "author": "Zhu et al.",
                            "abstract": "     The manual construction of a query-focused summarization corpus is costly and timeconsuming. The limited size of existing datasets renders training data-driven summarization models challenging. In this paper, we use Wikipedia to automatically collect a large query-focused summarization dataset (named as WIKIREF) of more than 280,000 examples, which can serve as a means of data augmentation. Moreover, we develop a query-focused summarization model based on BERT to extract summaries from the documents. Experimental results on three DUC benchmarks show that the model pre-trained on WIKIREF has already achieved reasonable performance. After fine-tuning on the specific datasets, the model with data augmentation outperforms the state of the art on the benchmarks. ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Transforming-Wikipedia-into-Augmented-Data-for-Zhu-Dong/dd0e2a5249cced2b6de2934a2826889a2d80d839"
                            ],
                            "context_in_section": "Zhu et al. (2019) introduce a query-focused summarization (Dang, 2005) dataset collected using Wikipedia called WIKIREF which can be used for DA. "
                        },
                        {
                            "title": "Data Augmentation for Abstractive Query-Focused Multi-Document Summarization",
                            "year": 2021,
                            "author": "Pasunuru et al.",
                            "abstract": "     The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets. We present two QMDS training datasets, which we construct using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create the QMDSIR dataset. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. To cover both these real summary and query aspects, we build abstractive end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets. We also introduce new hierarchical encoders that enable a more efficient encoding of the query together with multiple documents. Empirical results demonstrate that our data augmentation and encoding methods outperform baseline models on automatic metrics, as well as on human evaluations along multiple attributes. ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Augmentation-for-Abstractive-Query-Focused-Pasunuru-Celikyilmaz/161321ef451d658d66b762cba5c202b12260220e"
                            ],
                            "context_in_section": "Pasunuru et al. (2021) use DA methods to construct two training datasets for Query-focused Multi-Document Summarization (QMDS) called QMDSCNN and QMD-SIR by modifying CNN/DM (Hermann et al., 2015) and mining search-query logs, respectively."
                        }
                    ]
                },
                {
                    "title":"Question Answering (QA)",
                    "articles": [
                        {
                            "title": "An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering",
                            "year": 2019,
                            "author": "Longpre et al.",
                            "abstract": "To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a simple negative sampling technique to be particularly effective, even though it is typically used for datasets that include unanswerable questions, such as SQuAD 2.0. When applied in conjunction with per-domain sampling, our XLNet (Yang et al., 2019)-based submission achieved the second best Exact Match and F1 in the MRQA leaderboard competition.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/An-Exploration-of-Data-Augmentation-and-Sampling-Longpre-Lu/22e510c1f0fffed225c49dc5e5f57a9d80f0d61f"
                            ],
                            "context_in_section": "Longpre et al. (2019) investigate various DA and sampling techniques for domain-agnostic QA including paraphrasing by backtranslation. "
                        },
                        {
                            "title": "Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering",
                            "year": 2019,
                            "author": "Yang et al.",
                            "abstract": "     Recently, a simple combination of passage retrieval using off-the-shelf IR techniques and a BERT reader was found to be very effective for question answering directly on Wikipedia, yielding a large improvement over the previous state of the art on a standard benchmark dataset. In this paper, we present a data augmentation technique using distant supervision that exploits positive as well as negative examples. We apply a stage-wise approach to fine tuning BERT on multiple datasets, starting with data that is \"furthest\" from the test data and ending with the \"closest\". Experimental results show large gains in effectiveness over previous approaches on English QA datasets, and we establish new baselines on two recent Chinese QA datasets. ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Augmentation-for-BERT-Fine-Tuning-in-Question-Yang-Xie/f5eaf727b80240a13e9f631211c9ecec7e3b9feb"
                            ],
                            "context_in_section": "Yang et al. (2019) propose a DA method using distant supervision to improve BERT finetuning for opendomain QA."
                        },
                        {
                            "title": "Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering",
                            "year": 2021,
                            "author": "Riabi et al.",
                            "abstract": "Coupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each language one desires to support. We propose a method to improve the Cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a cross-lingual fashion. We show that the proposed method allows to significantly outperform the baselines trained on English data only. We report a new state-of-the-art on four multilingual datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr). ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Synthetic-Data-Augmentation-for-Zero-Shot-Question-Riabi-Scialom/486dbc28a7c6ccbfb1cbaee4222d974eda0beb38"
                            ],
                            "context_in_section": "Riabi et al. (2020) leverage Question Generation models to produce augmented examples for zero-shot cross-lingual QA."
                        },
                        {
                            "title": "XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering",
                            "year": 2019,
                            "author": "Singh et al.",
                            "abstract": "While natural language processing systems often focus on a single language, multilingual transfer learning has the potential to improve performance, especially for low-resource languages. We introduce XLDA, cross-lingual data augmentation, a method that replaces a segment of the input text with its translation in another language. XLDA enhances performance of all 14 tested languages of the cross-lingual natural language inference (XNLI) benchmark. With improvements of up to 4.8%, training with XLDA achieves state-of-the-art performance for Greek, Turkish, and Urdu. XLDA is in contrast to, and performs markedly better than, a more naive approach that aggregates examples in various languages in a way that each example is solely in one language. On the SQuAD question answering task, we see that XLDA provides a 1.0% performance increase on the English evaluation set. Comprehensive experiments suggest that most languages are effective as cross-lingual augmentors, that XLDA is robust to a wide range of translation quality, and that XLDA is even more effective for randomly initialized models than for pretrained models. ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/XLDA%3A-Cross-Lingual-Data-Augmentation-for-Natural-Singh-McCann/c846cbb24866af99a8d02d4c73aa4d7dd1831538"
                            ],
                            "context_in_section": "Singh et al. (2019) propose XLDA, or CROSS-LINGUAL DA, which substitutes a portion of the input text with its translation in another language, improving performance across multiple languages on NLI tasks including the SQuAD QA task."
                        },
                        {
                            "title": "Logic-Guided Data Augmentation and Regularization for Consistent Question Answering",
                            "year": 2020,
                            "author": "Asai and Hajishirzi",
                            "abstract": "Many natural language questions require qualitative, quantitative or logical comparisons between two entities or events. This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions by integrating logic rules and neural models. Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model. Improving the global consistency of predictions, our approach achieves large improvements over previous methods in a variety of question answering (QA) tasks, including multiple-choice qualitative reasoning, cause-effect reasoning, and extractive machine reading comprehension. In particular, our method significantly improves the performance of RoBERTa-based models by 1-5% across datasets. We advance state of the art by around 5-8% on WIQA and QuaRel and reduce consistency violations by 58% on HotpotQA. We further demonstrate that our approach can learn effectively from limited data.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Logic-Guided-Data-Augmentation-and-Regularization-Asai-Hajishirzi/a390f26af171e784c15329847dd4a5e9806e15fa"
                            ],
                            "context_in_section": "Asai and Hajishirzi (2020) use logical and linguistic knowledge to generate additional training data to improve the accuracy and consistency of QA responses by models"
                        },
                        {
                            "title": "QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension ",
                            "year": 2018,
                            "author": "Yu et al.",
                            "abstract": "Current end-to-end machine reading and question answering (Q&A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these models are often slow for both training and inference due to the sequential nature of RNNs. We propose a new Q&A architecture called QANet, which does not require recurrent networks:  Its encoder consists exclusively of convolution and self-attention, where convolution models local interactions and self-attention models global interactions. On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models. The speed-up gain allows us to train the model with much more data. We hence combine our model with data generated by backtranslation from a neural machine translation model. ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/QANet%3A-Combining-Local-Convolution-with-Global-for-Yu-Dohan/8c1b00128e74f1cd92aede3959690615695d5101"
                            ],
                            "context_in_section": "Yu et al. (2018) introduce a new QA architecture called QANet that shows improved performance on SQuAD when combined with augmented data generated using backtranslation."
                        }
                    ]
                },
                {
                    "title":"Sequemce Tagging Tasks",
                    "articles": [
                        {
                            "title": "DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks",
                            "year": 2020,
                            "author": "Ding et al.",
                            "abstract": "Data augmentation techniques have been widely used to improve machine learning performance as they facilitate generalization. In this work, we propose a novel augmentation method to generate high quality synthetic data for low-resource tagging tasks with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings. For the supervised settings, we conduct extensive experiments on named entity recognition (NER), part of speech (POS) tagging and end-to-end target based sentiment analysis (E2E-TBSA) tasks. For the semi-supervised settings, we evaluate our method on the NER task under the conditions of given unlabeled data only and unlabeled data plus a knowledge base. The results show that our method can consistently outperform the baselines, particularly when the given gold training data are less.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/DAGA%3A-Data-Augmentation-with-a-Generation-Approach-Ding-Liu/7582a3ce4f28d4054f538aba5edd927098334336"
                            ],
                            "context_in_section": "Ding et al. (2020) propose DAGA, a two-step DA process. First, a language model over sequences of tags and words linearized as per a certain scheme is learned. Second, sequences are sampled from this language model and de-linearized to generate new examples"
                        },
                        {
                            "title": "Data Augmentation via Dependency Tree Morphing for Low-Resource Languages",
                            "year": 2018,
                            "author": "Şahin and Steedman",
                            "abstract": "Neural NLP systems achieve high scores in the presence of sizable training dataset. Lack of such datasets leads to poor system performances in the case low-resource languages. We present two simple text augmentation techniques using dependency trees, inspired from image processing. We “crop” sentences by removing dependency links, and we “rotate” sentences by moving the tree fragments around the root. We apply these techniques to augment the training sets of low-resource languages in Universal Dependencies project. We implement a character-level sequence tagging model and evaluate the augmented datasets on part-of-speech tagging task. We show that crop and rotate provides improvements over the models trained with non-augmented data for majority of the languages, especially for languages with rich case marking systems.",
                            "urls_semantic_scholar":[
                                "https://www.semanticscholar.org/paper/7a75dcef18df48157743226574f85ca4dd0f110b",
                                "https://www.semanticscholar.org/paper/Data-Augmentation-via-Dependency-Tree-Morphing-for-Sahin-Steedman/7a75dcef18df48157743226574f85ca4dd0f110b/figure/0",
                                "https://www.semanticscholar.org/paper/Data-Augmentation-via-Dependency-Tree-Morphing-for-Sahin-Steedman/7a75dcef18df48157743226574f85ca4dd0f110b"
                            ],
                            "context_in_section": "Sahin and Steedman (2018), discussed in §3.1, use dependency tree morphing (Figure 2) to generate additional training examples on the downstream task of part-of-speech (POS) tagging."
                        },
                        {
                            "title": "An Analysis of Simple Data Augmentation for Named Entity Recognition",
                            "year": 2020,
                            "author": "Dai et Adel",
                            "abstract": "Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/An-Analysis-of-Simple-Data-Augmentation-for-Named-Dai-Adel/bdbb944a84b8cdec8d120d2d2535995e335d0174"
                            ],
                            "context_in_section": "Dai and Adel (2020) modify DA techniques proposed for sentence-level tasks for named entity recognition (NER), including label-wise token and synonym replacement, and show improved performance using both recurrent and transformer models"
                        },
                        {
                            "title": "SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup",
                            "year": 2020,
                            "author": "Zhang et al.",
                            "abstract": "Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each iteration, which is an inefficient way of leveraging human annotations. We propose a simple but effective data augmentation method to improve label efficiency of active sequence labeling. Our method, SeqMix, simply augments the queried samples by generating extra labeled sequences in each iteration. The key difficulty is to generate plausible sequences along with token-level labels. In SeqMix, we address this challenge by performing mixup for both sequences and token-level labels of the queried samples. Furthermore, we design a discriminator during sequence mixup, which judges whether the generated sequences are plausible or not. Our experiments on Named Entity Recognition and Event Detection tasks show that SeqMix can improve the standard active sequence labeling method by 2.27%–3.75% in terms of F1 scores. The code and data for SeqMix can be found at https://github.com/rz-zhang/SeqMix.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/SeqMix%3A-Augmenting-Active-Sequence-Labeling-via-Zhang-Yu/3bb1e24eb3429f807397833105d1e137d9927767"
                            ],
                            "context_in_section": "Zhang et al. (2020) propose a DA method based on MIXUP called SEQMIX for active sequence labeling by augmenting queried samples, showing improvements on NER and Event Detection."
                        }
                    ]
                },
                {
                    "title":"Parsing Tasks",
                    "articles": [
                        {
                            "title": "Data Recombination for Neural Semantic Parsing",
                            "year": 2016,
                            "author": "Jia and Liang",
                            "abstract": "Modeling crisp logical regularities is crucial in semantic parsing, making it difficult for neural models with no task-specific prior knowledge to achieve good results. In this paper, we introduce data recombination, a novel framework for injecting such prior knowledge into a model. From the training data, we induce a highprecision synchronous context-free grammar, which captures important conditional independence properties commonly found in semantic parsing. We then train a sequence-to-sequence recurrent network (RNN) model with a novel attention-based copying mechanism on datapoints sampled from this grammar, thereby teaching the model about these structural properties. Data recombination improves the accuracy of our RNN model on three semantic parsing datasets, leading to new state-of-the-art performance on the standard GeoQuery dataset for models with comparable supervision.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Recombination-for-Neural-Semantic-Parsing-Jia-Liang/b7eac64a8410976759445cce235469163d23ee65"
                            ],
                            "context_in_section": "Jia and Liang (2016) propose DATA RECOMBINATION for injecting task-specific priors to neural semantic parsers. A synchronous context-free grammar (SCFG) is induced from training data, and new \"recombinant\" examples are sampled."
                        },
                        {
                            "title": "GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing",
                            "year": 2020,
                            "author": "Yu et al.",
                            "abstract": "     We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel text-schema linking objective that predicts the syntactic role of a table field in the SQL for each question-SQL pair. To maintain the model's ability to represent real-world data, we also include masked language modeling (MLM) over several existing table-and-language datasets to regularize the pre-training process. On four popular fully supervised and weakly supervised table semantic parsing benchmarks, GraPPa significantly outperforms RoBERTa-large as the feature representation layers and establishes new state-of-the-art results on all of them. ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/GraPPa%3A-Grammar-Augmented-Pre-Training-for-Table-Yu-Wu/eedf45f62dea0eaef5643c42c84f7cc7b80ee782"
                            ],
                            "context_in_section": "Yu et al. (2020) introduce GRAPPA, a pretraining approach for table semantic parsing, and generate synthetic question-SQL pairs via an SCFG. "
                        },
                        {
                            "title": "Good-Enough Compositional Data Augmentation",
                            "year": 2020,
                            "author": "Andreas",
                            "abstract": "We propose a simple data augmentation protocol aimed at providing a compositional inductive bias in conditional and unconditional sequence models. Under this protocol, synthetic training examples are constructed by taking real training examples and replacing (possibly discontinuous) fragments with other fragments that appear in at least one similar environment. The protocol is model-agnostic and useful for a variety of tasks. Applied to neural sequence-to-sequence models, it reduces error rate by as much as 87% on diagnostic tasks from the SCAN dataset and 16% on a semantic parsing task. Applied to n-gram language models, it reduces perplexity by roughly 1% on small corpora in several languages.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Good-Enough-Compositional-Data-Augmentation-Andreas/d4ae9dff186553d98eef4a275762b4cb15e1e41d"
                            ],
                            "context_in_section": "Andreas (2020) use compositionality to construct synthetic examples for downstream tasks like semantic parsing. Fragments of original examples are replaced with fragments from other examples in similar contexts."
                        },
                        {
                            "title": "A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages",
                            "year": 2019,
                            "author": "Vania et al.",
                            "abstract": "Parsers are available for only a handful of the world’s languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration. Experimenting on three typologically diverse low-resource languages—North Sámi, Galician, and Kazah—We find that (1) when only the low-resource treebank is available, data augmentation is very helpful; (2) when a related high-resource treebank is available, cross-lingual training is helpful and complements data augmentation; and (3) when the high-resource treebank uses a different writing system, transliteration into a shared orthographic spaces is also very helpful.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/A-systematic-comparison-of-methods-for-low-resource-Vania-Kementchedjhieva/55ace73cb8f7c3c4bd6d8ead45c0ba6193d1afda"
                            ],
                            "context_in_section": "Vania et al. (2019) investigate DA for low-resource dependency parsing including dependency tree morphing from  ̧Sahin and Steedman (2018) (Figure 2) and modified nonce sentence generation from Gulordava et al. (2018), which replaces content words with other words of the same POS, morphological features, and dependency labels."
                        },
                        {
                            "title": "Colorless Green Recurrent Networks Dream Hierarchically",
                            "year": 2018,
                            "author": "Gulordava et al.",
                            "abstract": "Recurrent neural networks (RNNs) achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues (“The colorless green ideas I ate with the chair sleep furiously”), and, for Italian, we compare model performance to human intuitions. Our language-model-trained RNNs make reliable predictions about long-distance agreement, and do not lag much behind human performance. We thus bring support to the hypothesis that RNNs are not just shallow-pattern extractors, but they also acquire deeper grammatical competence.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Colorless-green-recurrent-networks-dream-Gulordava-Bojanowski/3d42ddf7c5ce59ae04d1d27085be9f736d1be04b",
                                "https://www.semanticscholar.org/paper/Colorless-green-recurrent-networks-dream-Gulordava-Bojanowski/17b362808f275788a4c41fa1c0a38d67d3c6447b"
                            ],
                            "context_in_section": "Gulordava et al. (2018) replaces content words with other words of the same POS, morphological features, and dependency labels."
                        }
                    ]
                },
                {
                    "title": "Grammatical Error Correction (GEC)",
                    "articles": [
                        {
                            "title": "Using Wikipedia Edits in Low Resource Grammatical Error Correction",
                            "year": 2018,
                            "author": "Boyd",
                            "abstract": "We develop a grammatical error correction (GEC) system for German using a small gold GEC corpus augmented with edits extracted from Wikipedia revision history. We extend the automatic error annotation tool ERRANT (Bryant et al., 2017) for German and use it to analyze both gold GEC corrections and Wikipedia edits (Grundkiewicz and Junczys-Dowmunt, 2014) in order to select as additional training data Wikipedia edits containing grammatical corrections similar to those in the gold corpus. Using a multilayer convolutional encoder-decoder neural network GEC approach (Chollampatt and Ng, 2018), we evaluate the contribution of Wikipedia edits and find that carefully selected Wikipedia edits increase performance by over 5%.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Using-Wikipedia-Edits-in-Low-Resource-Grammatical-Boyd/b443df8079c7464af65782f0d2495cd7ef3d4d7b"
                            ],
                            "context_in_section": "There is work that makes use of additional resources. Boyd (2018) use German edits from Wikipedia revision history and use those relating to GEC as augmented training data."
                        },
                        {
                            "title": "Sequence-to-sequence Pre-training with Data Augmentation for Sentence Rewriting",
                            "year": 2019,
                            "author": "Zhang et al.",
                            "abstract": "We study sequence-to-sequence (seq2seq) pre-training with data augmentation for sentence rewriting. Instead of training a seq2seq model with gold training data and augmented data simultaneously, we separate them to train in different phases: pre-training with the augmented data and fine-tuning with the gold data. We also introduce multiple data augmentation methods to help model pre-training for sentence rewriting. We evaluate our approach in two typical well-defined sentence rewriting tasks: Grammatical Error Correction (GEC) and Formality Style Transfer (FST). Experiments demonstrate our approach can better utilize augmented data without hurting the model's trust in gold data and further improve the model's performance with our proposed data augmentation methods. \n Our approach substantially advances the state-of-the-art results in well-recognized sentence rewriting benchmarks over both GEC and FST. Specifically, it pushes the CoNLL-2014 benchmark's F0.5 score and JFLEG Test GLEU score to 62.61 and 63.54 in the restricted training setting, 66.77 and 65.22 respectively in the unrestricted setting, and advances GYAFC benchmark's BLEU to 74.24 (2.23 absolute improvement) in E&M domain and 77.97 (2.64 absolute improvement) in F&R domain. ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Sequence-to-sequence-Pre-training-with-Data-for-Zhang-Ge/862a627e8e87c4a884353f47a85aea466203916d"
                            ],
                            "context_in_section": "Zhang et al. (2019b) explore multi-task transfer, or the use of annotated data from other tasks."
                        },
                        {
                            "title": "Controllable Data Synthesis Method for Grammatical Error Correction",
                            "year": 2019,
                            "author": "Yang et al.",
                            "abstract": "Due to the lack of parallel data in current Grammatical Error Correction (GEC) task, models based on Sequence to Sequence framework cannot be adequately trained to obtain higher performance. We propose two data synthesis methods which can control the error rate and the ratio of error types on synthetic data. The first approach is to corrupt each word in the monolingual corpus with a fixed probability, including replacement, insertion and deletion. Another approach is to train error generation models and further filtering the decoding results of the models. The experiments on different synthetic data show that the error rate is 40% and the ratio of error types is the same can improve the model performance better. Finally, we synthesize about 100 million data and achieve comparable performance as the state of the art, which uses twice as much data as we use. ",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Controllable-Data-Synthesis-Method-for-Grammatical-Wang-Yang/5c498521e2cfd68e372b41870bac4d9390463352"
                            ],
                            "context_in_section": "There is also work that adds synthetic errors to noise the text. Wang et al. (2019a) investigate two approaches: token-level perturbations and training error generation models with a filtering strategy to keep generations with sufficient errors"
                        },
                        {
                            "title": "Neural grammatical error correction systems with unsupervised pre-training on synthetic data.",
                            "year": 2019,
                            "author": "Grundkiewicz et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Neural-Grammatical-Error-Correction-Systems-with-on-Grundkiewicz-Junczys-Dowmunt/7cc6f009feb5ad5ad0e1ff00c551fb318fc95016",
                                ""
                            ],
                            "context_in_section": " Grundkiewicz et al. (2019) use confusion sets generated by a spellchecker for noising."
                        },
                        {
                            "title": "A neural grammatical error correction system built on better pre-training and sequential transfer learning.",
                            "year": 2019,
                            "author": "Choe et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/A-Neural-Grammatical-Error-Correction-System-Built-Choe-Ham/2704b207c7b8f6fc32bd3d04690b2f4f745c460f"
                            ],
                            "context_in_section": "Choe et al. (2019) learn error patterns from small annotated samples along with POS-specific noising."
                        },
                        {
                            "title": "Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation",
                            "year": 2020,
                            "author": "Wan et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Improving-Grammatical-Error-Correction-with-Data-by-Wan-Wan/8ed040f1ed534901db7dd61904fd4ac5bf9cc8ef"
                            ],
                            "context_in_section": "There have also been approaches to improve the diversity of generated errors. Wan et al. (2020) investigate noising through editing the latent representations of grammatical sentences"
                        },
                        {
                            "title": "Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction",
                            "year": 2018,
                            "author": "Xie et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Noising-and-Denoising-Natural-Language%3A-Diverse-for-Xie-Genthial/be70e163473c1c6e42d02b5c4711d0faa493a49b"
                            ],
                            "context_in_section": "Xie et al. (2018) use a neural sequence transduction model and beam search noising procedures."
                        }
                    ]
                },
                {
                    "title":"Neural Machine Translation (NMT)",
                    "articles":[
                        {
                            "title": "SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation",
                            "year": 2018,
                            "author": "Wang et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/SwitchOut%3A-an-Efficient-Data-Augmentation-Algorithm-Wang-Pham/0ee468b9b709a2610c4b574d67218e7960350224"
                            ],
                            "context_in_section": "Wang et al. (2018a) propose SWITCHOUT, a DA method that randomly replaces words in both source and target sentences with other random words from their corresponding vocabularies."
                        },
                        {
                            "title": "Soft Contextual Data Augmentation for Neural Machine Translation",
                            "year": 2019,
                            "author": "Gao et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/SwitchOut%3A-an-Efficient-Data-Augmentation-Algorithm-Wang-Pham/0ee468b9b709a2610c4b574d67218e7960350224"
                            ],
                            "context_in_section": " Gao et al. (2019) introduce SOFT CONTEXTUAL DA that softly augments randomly chosen words in a sentence using a contextual mixture of multiple related words over the vocabulary"
                        },
                        {
                            "title": "Data Diversification: A Simple Strategy For Neural Machine Translation",
                            "year": 2019,
                            "author": "Nguyen et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Diversification%3A-A-Simple-Strategy-For-Neural-Nguyen-Joty/2c7d213b2230dea4ff88f0e50631089d513e9528"
                            ],
                            "context_in_section": "Nguyen et al. (2020) propose DATA DIVERSIFICATION which merges original training data with the predictions of several forward and backward models"
                        }
                    ]
                },
                {
                    "title": "Data-to-Text NLG",
                    "articles": [
                        {
                            "title": "Challenges in Data-to-Document Generation",
                            "year": 2017,
                            "author": "Wiseman et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Challenges-in-Data-to-Document-Generation-Wiseman-Shieber/13395213d47f78672ab4e81573f2b0fa0cfc8c6d"
                            ],
                            "context_in_section": "Data-to-text NLG refers to tasks which require generating natural language descriptions of structured or semi-structured data inputs, e.g. game score tables (Wiseman et al., 2017)."
                        },
                        {
                            "title": "Findings of the Third Workshop on Neural Generation and Translation",
                            "year": 2019,
                            "author": "Hayashi et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Findings-of-the-Third-Workshop-on-Neural-Generation-Hayashi-Oda/df4e3aa275b8f81e22a5332ab550805083094dae"
                            ],
                            "context_in_section": "Randomly perturbing game score values without invalidating overall game outcome is one DA strategy explored in game summary generation (Hayashi et al., 2019)."
                        },
                        {
                            "title": "Findings of the E2E NLG Challenge",
                            "year": 2018,
                            "author": "Dušek et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Findings-of-the-E2E-NLG-Challenge-Dusek-Novikova/c97dedc1f60051f81247d616a877f937a50873fe"
                            ],
                            "context_in_section": "Two popular recent benchmarks are E2E-NLG (Dušek et al., 2018) and WebNLG (Gardent et al., 2017). Both involve generation from structured inputs - meaning representation (MR) sequences and triple sequences, respectively."
                        },
                        {
                            "title": "The WebNLG Challenge: Generating Text from RDF Data",
                            "year": 2017,
                            "author": "Gardent et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/The-WebNLG-Challenge%3A-Generating-Text-from-RDF-Data-Gardent-Shimorina/a4c40532e68728fbeab5d9415f6ad8e9530db360"
                            ],
                            "context_in_section": "Two popular recent benchmarks are E2E-NLG (Dušek et al., 2018) and WebNLG (Gardent et al., 2017). Both involve generation from structured inputs - meaning representation (MR) sequences and triple sequences, respectively."
                        },
                        {
                            "title": "Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers",
                            "year": 2020,
                            "author": "Montella",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Denoising-Pre-Training-and-Data-Augmentation-for-Montella-Fabre/ced82c178037dea749f1e120abccfba320b9091a"
                            ],
                            "context_in_section": " Montella et al. (2020) show performance gains on WebNLG by DA using Wikipedia sentences as targets and parsed OpenIE triples as inputs."
                        },
                        {
                            "title": "TNT-NLG , System 2 : Data Repetition and Meaning Representation Manipulation to Improve Neural Generation",
                            "year": 2018,
                            "author": "Tandon et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/TNT-NLG-%2C-System-2-%3A-Data-Repetition-and-Meaning-to-Tandon-Oraby/65b536e121732092b7c8752c0bd15d8f1bf05d59"
                            ],
                            "context_in_section": "Tandon et al. (2018) propose DA for E2E-NLG based on permuting the input MR sequence"
                        },
                        {
                            "title": "A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models",
                            "year": 2019,
                            "author": "Kedzie and McKeown",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/A-Good-Sample-is-Hard-to-Find%3A-Noise-Injection-and-Kedzie-McKeown/c36419f8e72cfa72c333a8f74dc5343aef0b057e"
                            ],
                            "context_in_section": " Kedzie and McKeown (2019) inject Gaussian noise into a trained decoder’s hidden states and sample diverse augmented examples from it. This sample-augmentretrain loop helps performance on E2E-NLG."
                        }
                    ]
                },
                {
                    "title":"Open-Ended & Conditional Generation",
                    "articles": [
                        {
                            "title": "GenAug: Data Augmentation for Finetuning Text Generators",
                            "year": 2020,
                            "author": "Feng et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/GenAug%3A-Data-Augmentation-for-Finetuning-Text-Feng-Gangal/c299a4083443bea26188567979f20b8305554c0b"
                            ],
                            "context_in_section": "There has been limited work on DA for open-ended and conditional text generation. Feng et al. (2020) experiment with a suite of DA methods for finetuning GPT-2 on a low-resource domain in attempts to improve the quality of generated continuations, which they call GENAUG. They find that WN-HYPERS (WordNet hypernym replacement of key words) and SYNTHETIC NOISE (randomly perturb ing non-terminal characters in words) are useful, and the quality of generated text improves to a peak at ≈ 3x the original amount of training data."
                        }
                    ]
                },
                {
                    "title": "Dialogue",
                    "articles": [
                        {
                            "title": "Effective Data Augmentation Approaches to End-to-End Task-Oriented Dialogue",
                            "year": 2019,
                            "author": "Quan and Xiong",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Effective-Data-Augmentation-Approaches-to-Dialogue-Quan-Xiong/ba05a0fb32cc453da7fdf4e80e4941738de3fd7a"
                            ],
                            "context_in_section": "Quan and Xiong (2019) present sentence and word-level DA approaches for end-to-end task oriented dialogue."
                        },
                        {
                            "title": "Simple is Better! Lightweight Data Augmentation for Low Resource Slot Filling and Intent Classification",
                            "year": 2020,
                            "author": "Louvan and Magnini",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Simple-is-Better!-Lightweight-Data-Augmentation-for-Louvan-Magnini/0bbca7c7ae0cf493eb3b18778f8ff5d6ae8c8135"
                            ],
                            "context_in_section": "Louvan and Magnini (2020) propose LIGHTWEIGHT AUGMENTATION, a set of word-span and sentence-level DA methods for lowresource slot filling and intent classification."
                        },
                        {
                            "title": "Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding",
                            "year": 2018,
                            "author": "Hou et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Sequence-to-Sequence-Data-Augmentation-for-Dialogue-Hou-Liu/7e877011074793d6f378fc0c3e67ed5422ea3ce5"
                            ],
                            "context_in_section": "Hou et al. (2018) present a seq2seq DA framework to augment dialogue utterances for dialogue language understanding (Young et al., 2013), including a diversity rank to produce diverse utterances. There is also DA work for spoken dialogue. Hou et al. (2018), Kim et al. (2019), Zhao et al. (2019), and Yoo et al. (2019) investigate DA methods for dialogue and spoken language understanding (SLU), including generative latent variable models."
                        },
                        {
                            "title": "Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context",
                            "year": 2019,
                            "author": "Zhang et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Task-Oriented-Dialog-Systems-that-Consider-Multiple-Zhang-Ou/2b79e85bfc1b5543e50e86ee8b21aea48cb6d470"
                            ],
                            "context_in_section": "Zhang et al. (2019c) propose MADA to generate diverse responses using the property that several valid responses exist for a dialogue context."
                        },
                        {
                            "title": "Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding",
                            "year": 2019,
                            "author": "Kim et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Augmentation-by-Data-Noising-for-Slots-in-Kim-Roh/a746fcaf00051568a5286caae77c178244483594"
                            ],
                            "context_in_section": "There is also DA work for spoken dialogue. Hou et al. (2018), Kim et al. (2019), Zhao et al. (2019), and Yoo et al. (2019) investigate DA methods for dialogue and spoken language understanding (SLU), including generative latent variable models."
                        },
                        {
                            "title": "Data Augmentation with Atomic Templates for Spoken Language Understanding",
                            "year": 2019,
                            "author": "Zhao et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Augmentation-with-Atomic-Templates-for-Spoken-Zhao-Zhu/dddf112167dfaa02086b80eb4a15025fc3d001ea"
                            ],
                            "context_in_section": "There is also DA work for spoken dialogue. Hou et al. (2018), Kim et al. (2019), Zhao et al. (2019), and Yoo et al. (2019) investigate DA methods for dialogue and spoken language understanding (SLU), including generative latent variable models."
                        },
                        {
                            "title": " Data Augmentation for Spoken Language Understanding via Joint Variational Generation ",
                            "year": 2020,
                            "author": "Yoo et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Augmentation-for-Spoken-Language-Understanding-Yoo-Shin/b21850c88960ab26bf102535979847861ea40901"
                            ],
                            "context_in_section": "There is also DA work for spoken dialogue. Hou et al. (2018), Kim et al. (2019), Zhao et al. (2019), and Yoo et al. (2019) investigate DA methods for dialogue and spoken language understanding (SLU), including generative latent variable models."
                        }
                    ]
                },
                {
                    "title": "Multimodal Tasks",
                    "articles":[
                        {
                            "title": "Data Augmentation for Training Dialog Models Robust to Speech Recognition Errors",
                            "year": 2020,
                            "author": "Wang et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Augmentation-for-Training-Dialog-Models-Robust-Wang-Fazel-Zarandi/78e6438cb86ab36edddb5a066432376d92b49535"
                            ],
                            "context_in_section": "Beginning with speech, Wang et al. (2020) propose a DA method to improve the robustness of downstream dialogue models to speech recognition errors."
                        },
                        {
                            "title": "Multi-Modal Data Augmentation for End-to-end ASR",
                            "year": 2018,
                            "author": "Renduchintala et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Multi-Modal-Data-Augmentation-for-End-to-end-ASR-Renduchintala-Ding/a010b3aa83d7d80e52c84d5f239f940eb33df904"
                            ],
                            "context_in_section": "Wiesner et al. (2018) and Renduchintala et al. (2018) propose DA methods for end-to-end automatic speech recognition (ASR)."
                        },
                        {
                            "title": "MDA: Multimodal Data Augmentation Framework for Boosting Performance on Sentiment/Emotion Classification Tasks",
                            "year": 2020,
                            "author": "Xu et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/MDA%3A-Multimodal-Data-Augmentation-Framework-for-on-Xu-Mao/6b7eeedce255cd127e90c7f4f364992655d7ec78"
                            ],
                            "context_in_section": "Looking at images or video, Xu et al. (2020) learn a cross-modality matching network to produce synthetic image-text pairs for multimodal classifiers."
                        },
                        {
                            "title": " Text Augmentation Using BERT for Image Captioning ",
                            "year": 2020,
                            "author": "Atliha and Šešok",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Text-Augmentation-Using-BERT-for-Image-Captioning-Atliha-%C5%A0e%C5%A1ok/f4deed3f04618daa34f3915a832512575d87a27c"
                            ],
                            "context_in_section": "Atliha and Šešok (2020) explore DA methods such as synonym replacement and contextualized word embeddings augmentation using BERT for image captioning."
                        },
                        {
                            "title": "Data Augmentation for Visual Question Answering",
                            "year": 2017,
                            "author": "Kafle et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Data-Augmentation-for-Visual-Question-Answering-Kafle-Yousefhussien/7e4b638e028498e900747b600f46cd723f1f231e"
                            ],
                            "context_in_section": "Kafle et al. (2017), Yokota and Nakayama (2018), and Tang et al. (2020) propose methods for visual QA including question generation and adversarial examples."
                        },
                        {
                            "title": "Augmenting Image Question Answering Dataset by Exploiting Image Captions",
                            "year": 2018,
                            "author": "Yokota and Nakayama",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Augmenting-Image-Question-Answering-Dataset-by-Yokota-Nakayama/f88a0f44ff7ec5fe0facf0facac0a094c7bd6cb8"
                            ],
                            "context_in_section": "Kafle et al. (2017), Yokota and Nakayama (2018), and Tang et al. (2020) propose methods for visual QA including question generation and adversarial examples."
                        },
                        {
                            "title": "Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering",
                            "year": 2020,
                            "author": "Tang et al.",
                            "urls_semantic_scholar": [
                                "https://www.semanticscholar.org/paper/Semantic-Equivalent-Adversarial-Data-Augmentation-Tang-Ma/ff560bbf5c11894379d7e808683d553e3d1f08c2"
                            ],
                            "context_in_section": "Kafle et al. (2017), Yokota and Nakayama (2018), and Tang et al. (2020) propose methods for visual QA including question generation and adversarial examples."
                        }
                    ]
                }
            ]
        }
    ]
}