karhunenloeve/ml_tda.md

## ml_tda.md

      
    Raw
  

              ml_tda.md
            
          
    Topological Data Analysis in Machine Learning

In this post we want to focus on the use of topological methods for machine learning. Both the extremely fast online recognition and the processing with unsupervised learning methods in the field of machine learning are based almost exclusively on the optimization of the parameters of a mapping between input and target data. In Literature on Topological Data Analysis we have given an extensive overview of the broad spectrum of literature about TDA. Our focus is to elicit how TDA is used in this very general setting. Reviewing the categorized literature, we find that it manifests itself in three ways in the landscape of machine learning.
Applications of Persistent Homology on Persistent Data

Exploratory data analysis using the Mapper algorithm provides a dimension reduction technique based on theories of TDA and offers a low-dimensional, visually interpretable nerve of a simplicial complex as an object of investigation (Singh, 2007).² (Nicolau, 2011) identified in this
way a subgroup of cancerous tissue in the breast that was classified as non-lethal and non-metastatic. (Pachauri, 2006) have developed an algorithm based on techniques from machine learning for the discovery of Alzheimer's markers in cortical imaging data, in particular by studying the thickness of the cortex and the signals processed across the cortical surface. From a completely different area of science, namely forestry science, three-dimensional models of different tree species could be correctly classified with a hit accuracy of over 95% by Riihimäki et al. considering their topology (Riihimäki, 2019). Meteorological applications exist, in which Muszynski et al. have developed the classification and detection of atmospheric rivers (Muszynski, 2019) also based on the combination of TDA and support vector machines. The authors achieved an accuracy of 77% to 90%.
Another purpose of TDA regarding machine learning are the discriminatory topological characteristics. Conclusions can be drawn about the topological complexity of data through TDA and used to adapt the architecture of learning models. The latter manifestation is discussed in this chapter. Conventional methods of machine learning operate with k-dimensional feature vectors of constant size. The basic idea of integrating TDA is to extract topological features from a data set, in particular the estimate of singular homology. At the same time, this qualitative character of the features has some problems. Birth and death in persistence diagrams are highly non-linear and very difficult to use as input or output parameters (Chazal, 2017). A few approaches to tackle this problem have been suggested. (Reininghaus, 2015) have developed a multi-scale kernel for persistence images, which yields a vector representation of persistent homology, providing a stable summary of topological features. An alternative has been presented by (Bubenik, 2015) based on persistence diagrams and called persistence landscape acting as a summary of the evolution of homology groups upon filtrations. Recent studies by (Hofer, 2017) suggest the use of deep neural networks to generate dynamic and task-dependent features (Hofer, 2017).

Applications of Persistent Homology on Time Series Data

TDA and machine learning have been used together and in combination in a number of applications in the domain of time series analysis. (Seversky, 2017) have proposed an end-to-end framework for time series data using TDA, which provides a generic processing directive for integrating various algorithms. In particular, they consider and investigate offset embeddings at different scale, approximations of homology groups through persistent homology and learnability of complex problems. The pipeline consists of three steps:

A temporal embedding is constructed from the signal, which in turn is divided into smaller segments. However, this paper does not make it clear which criteria can be used to find a suitable segmentation or embedding.
Persistent homology is calculated, with the authors postulating that the 1-dimensional cycles are of particular interest because they provide insights into periodic and repetitive patterns often found in time-series data (Seversky, 2017).
The scale-space topological kernel and kernel support vector machines were then used for classification (Reininghaus, 2015). This process has been applied to well studied time-varying data sets, with the authors concluding that TDA works particularly well for signals of physical origin, such as motion or sensor data, whereas non-physical sources such as word embeddings are still an open research problem.

Another treatise in this area concentrates on volatile time series data having high frequencies of changing amplitudes (Yuhei, 2017). It is assumed that time series belonging to the same class also have similar characteristics of topology, where the class is understood by approximating metric distances. Be that as it may, there are also signals that are very similar under metric conditions, whose topological characteristics differ, and also signals that are topologically different, although they come from the same domain. Chaotic time series that strongly depend on an initial state are such an example. One purpose of this pipeline is to enable the simultaneous classification of chaotic and non-chaotic time series. For this a quasi-attractor represented as point cloud is calculated from the signal and then used as a basis to calculate the persistent homology. To create characteristics suitable for machine learning, the Betti sequence is proposed, which counts the individual number of complexes in each homology group. These features are then incorporated into convolutional neural network architectures. This process is evaluated using the most common biomedical signal datasets, with significantly higher performance and classification quality than conventional algorithms.  (Gholidzaedh, 2018) provide an overview of TDA in time series analysis, however not with special focus on its use within machine learning research.
Topology of Neural Networks

Recent research uses the topological properties to better understand and evaluate neural networks and to improve existing architectures by studying the topological properties of the hyperplane to be optimized. Although deep neural networks can be considered under certain conditions as universal function approximators (Lin, 2018), they are still designated as black box: Due to their high complexity of construction and the large number of neurons, known problems like overfitting, adversial attacks and the verification of correctly learned weights are of central importance (Carlsson, 2018).

(Carlsson, 2018) apply persistent homology to the weight matrix of a single layer of a neural network, to be more precise on the lattice associated with a layer. This is analyzed using the Mapper algorithm. In order to understand the behaviour of neural networks, the persistent homology of different deeper layers at varying learning stages is investigated. In addition, previously trained neural networks are processed in this way. The results show long suspected hypotheses, namely that deeper layers encode a higher abstraction of input characteristics, similar to the Mammal visual system. The authors postulate that their results make a useful contribution to accelerating the training process of neural networks and increase effectiveness on complex data sets as well as generalization. Complexity is again defined at this point with regard to the topological characteristics, i.e. by the elements of the respective homology groups on a filtration. (Naitzat, 2018) propose a method named M-Boost. Also relying on the Mapper algorithm, however using the output of the deep neural network as lens function, enabling them to generate visualizations of each layer output. The output of the Mapper algorithm is an indicator for the information used for classification by the network, which enables the identification of distinctive features. These features are now fed into a decision tree, and the two resulting classes of the decision tree are both fed into two different neural networks, in consequence providing a boost to the respective training. This step may now be repeated. The M-Boost method consists of a total of nine steps.
(Rieck, 2019) recently have taken a very different approach, considering the persistent homology of the weights of deep neural networks in dependence of the stochastic gradient descent layer by layer. They propose neural persistence as a measure which applies concepts of persistent homology to the structure of the weights. By topologically examining the mapping of the edges of a stratified graph to the actual weights, an order of normalized weights induces filtration. The neural persistence  over the respective graph with k-edges is then determined by the pth Wasserstein distance and returns therefore a scalar. Neural persistence can be also used as a stop criterion. The authors have evaluated this method and compared it with more traditional approaches using validation data sets. It shows competitive performance and the advantage that validation data sets can be dispensed with. Guss, 2018 have presented a related approach, with more focus on persistent homology as complexity measure and the generalization capability of neural networks in combination with topological methods. Their research is based on the work of Bianchini and Scarselli (Bianchini, 2014).

Acknowledgement

I would like to thank Michael Nissen, who took effort to give a first summary of the mentioned papers in his seminar work and who selected and structured the presentation of the above papers. I further thank David Haller for proofreading parts of the written text.
References


G. Singh and F. Memoli and G. Carlsson (2007): Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. Eurographics Symposium on Point-Based Graphics. DOI: 10.2312/SPBG/SPBG07/091-100.
Atmospheric rivers are long and narrow corridors in which strong horizontal water transport can be observed, often associated with heavy precipitation or snowfall.
M. Nicolau and A.J. Levine and G. Carlsson (2011): Topology based Data Analysis Identifies a Subgroup of Breast Cancers with a Unique Mutational Profile and Excellent Survival. Proceedings of the National Academy of Sciences. DOI: 10.1073/pnas.1102826108.
H. Riihimäki and W. Chachólski and J. Theorell and J. Hillert and R. Ramanujam (2019): A Topological Data Analysis based Classification Method for Multiple Measurements. Cold Spring Harbor Laboratory. DOI: 10.1101/569210.
G. Muszynski and K. Kashinath and V. Kurlin and M. Wehner and Prabhat (2019): Topological Data Analysis and Machine Learning for Recognizing Atmospheric River Patterns in Large Climate Datasets. Geoscientific Model Development. DOI: 10.5194/gmd-12-613-2019.
D. Pachauri and C. Hinrichs and M.K. Chung and S.C. Johnson and V. Singh (2011): Topology-Based Kernels With Application to Inference Problems in Alzheimer's Disease. IEEE Transactions on Medical Imaging. DOI: 10.1109/TMI.2011.2147327.
F. Chazal and B. Michel (2017): An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists. ArXiv: 1710.04019.
J. Reininghaus, St. Huber, B. Ulrich and R. Kwitt (2015): A Stable Multi-Scale Kernel for Topological Machine Learning. Proceedings of the IEEE conference on computer vision and pattern recognition. DOI: 10.1109/CVPR.2015.7299106.
P. Bubenik (2015): Statistical Topological Data Analysis using Persistence Landscapes. Journal of Machine Learning Research. DOI: 10.1214/14-AOS1252.
Ch. Hofer, R. Kwitt, M. Niethammer and A. Uhl (2017): Deep Learning with Topological Signatures. ArXiv: 1707.04041.
L.M. Seversky, S. Davis and M. Berger (2017): On Time-Series Topological Data Analysis: New Data and Opportunities. IEEE Conference on Computer Vision and Pattern Recognition Workshops. DOI: 10.1109/CVPRW.2016.131.
U. Yuhei (2017): Time Series Classification via Topological Data Analysis. Transactions of the Japanese Society for Artificial Intelligence. DOI: 10.1527/tjsai.D-G72.
Sh. Gholizadeh and W. Zadrozny (2018): A Short Survey of Topological Data Analysis in Time Series and Systems Analysis. ArXiv: 1809.10745.
H. Lin and St. Jegelka (2018): ResNet with one-neuron hidden layers is a Universal Approximator. Annual Conference on Neural Information Processing Systems.
G. Carlsson and R.B. Gabrielsson (2018): Topological Approaches to Deep Learning. ArXiv: 1811.01122.
G. Naitzat, N. Lokare, J. Silva and I. Kaynar-Kabul (2018): M-Boost: Profiling and Refining Deep Neural Networks with Topological Data Analysis. Knowledge Discovery in Databases. DOI: 10.475/123_4.
W.H. Guss and R. Salakhutdinov (2018): On Characterizing the Capacity of Neural Networks using Algebraic Topology. ArXiv: 1802.04443.
M. Bianchini and F. Scarselli (2014): On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures. IEEE Transactions on Neural Networks and Learning Systems. DOI: 10.1109/TNNLS.2013.2293637.
B. Rieck, M. Togninalli, Ch. Bock, M. Moor, M. Horn, Th. Gumbsch and K.M. Borgwardt (2019): Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology. International Conference on Learning Representations.