Skip to content

Instantly share code, notes, and snippets.

@karhunenloeve
Created March 19, 2022 17:21
Show Gist options
  • Save karhunenloeve/580472f07add7824d92de4a8e37566b9 to your computer and use it in GitHub Desktop.
Save karhunenloeve/580472f07add7824d92de4a8e37566b9 to your computer and use it in GitHub Desktop.
Short survey on persistent homology.

A Brief Survey on Persistent Homology

The indexed family of growing simplicial complexes is known as filtration. It has been shown that persistent homology is – for a d-dimensional simplicial complex – the standard homology over a certain graded module over a polynomial ring [1]. Furthermore, the above mentioned analysis showed that a simple description of the persistent homology of groups over arbitrary fields exists. From this, an algorithm for the computation of persistent homology for any dimensions and over arbitrary fields could be derived [1, §4.2]. The Čech complex is one of the most important simplicial complexes, because it is homotopy equivalent to a cover of a topological space with open balls, or intuitively speaking, it reflects the topology of a triangulable topological space. An efficient algorithm for the Čech complex is given with a runtime of O(nd+3) [2], where n is the number of points and d is the dimension of the simplicial complex. In principle, d depends on the general position of the set of points. A coarser variant – the Vietoris-Rips complex – which includes the Čech complex as a subcomplex, can be calculated more efficiently, using an algorithm of runtime O(n**2), with n being the number of points [3]. This is not too bad, but still insufficient for large amounts of points.Another construction of a filtered simplicial complex, which is mathematically extraordinarily similar to the Vietoris-Rips complex, was constructed shortly thereafter, whose time of construction is O(n log n) in a n-point metric space, with a runtime of O(n) for the filtration steps [4]. The constant factors in size and runtime depend on the doubling of the dimension of the metric space and the tighness of the approximation. That is why Boissonnat, Chazal and Yvinec have written a complete overview of the most common filtrations, with their geometric background and the corresponding algorithms to compute them [5]. The extraction of invariants of simplicial complexes over a set of points represents a generalisation of the clustering problem. The question arises, which is the simplest topology that has the same invariants as most simplicial complexes along the filtration? This development provided new possibilities for the visualisation of high dimensional data.

Visualisation

The nerve of a cover can be thought of – informally – as the underlying schematic pattern of a topological space, similar to the way children draw human hands, through a surface and five protruding strokes. It tells us something about the appearance of a data set, and is a topological space itself which can be examined. One method to investigate the nerve of a cover is the mapper algorithm, which makes these structures visible through the investigation of connected components of a simplicial complex using Reeb graphs and is suitable for visualization of high dimensional data [6]. In addition to the accomplishment of being able to span simplicial complexes over smaller point sets than the entire data set using the structure obtained by mapper, a new representation of persistent homology has been in- troduced, which allows more intuitive access for feature extraction and pattern recognition of high dimensional data, namely that of the persistence barcodes [7]. Probably one of the most famous applications of the mapper algorithm is the study and resulting discovery of a subgroup of breast cancer cells with a unique mutation profile and excellent survival rate [8]. The barcodes show for each homology group how long representatives from the corresponding homology class exist along the filtration. This is the decomposition of the persis tent interval module, whose graph corresponds to a barcode. The idea of filtration was ultimately also used in the neurosciences to study brain models. The brain network is analysed using a connectivity matrix whose neuron barriers are arbitrarily chosen threshold values, as there are no accepted general criteria [9]. Using a multiscale frame by means of persistent homology and the representation of Vietoris-Rips filtrations by barcodes, investigations could be carried out for a finite set of increasing threshold values. The extraction of significant features from high-dimensional data has become a more intuitive interpretable technique with the help of new representations of persistent homology, persistence rings, used in combination with persistence-based clustering [10]. Although persistence rings do not allow us to get an idea of the structure in higher dimensions, this structure is coded into a radial pattern, which in turn can often be identified with the bare eye and assigned to a geometric object. Following this, persistent homology and the mapper algorithm was applied to a number of visualization problems [11]. Certainly, all these visualisation techniques have been used extensively, but they share a weakness. Mathematically speaking, these persistence objects are situated in a Banach space, a vector space that is not completely normed. To enable statistical calculations, we need a fully normed Hilbert space for a meaningful calculation of the k-th statistical moments. With his work on persistent landscapes Bubenik delivers such an object [12]. The persistence landscapes are remarkably analogous to the persistence diagrams, but can be aggregated by an ensemble as mean values, harmonic means or other summary statistics. For persistence landscapes stability guarantees exist in a similar way as for persistence diagrams. The first construction of objects encoding topological features in a Hilbert space naturally triggered a number of representations for persistent homology groups, such as the stable vectorisations of persistence diagrams [13] or the vectorisation using grids, which result in a stable vector representation also called persistence image [14]. We conclude our summary of persistence visualisation at this point and pass over to multidimensional persistence, an important extension of the here discussed methods.

Multipersistence

Nonetheless, persistent homology captures the topology of only one filtration, a one-parameter family of growing spaces. These are the intervals just discussed, which indicate the lifetime of the representatives of a homology class along the filtration. It could be shown [15], that no similar completely discrete invariant exists for multidimensional persistences, i.e. those parameterised along a multifiltration – of several geometrically growing spaces. Instead, the rank invariant was proposed, a robust estimate of the Betti numbers in a multifiltration [15, §6]. At this stage, it was already apparent that persistent homology would excel as a computer-based instrument in data analysis. Persistent homology has been subsequently extended to calculate essential homologies for arbitrary filtered spaces [16]. From this the possibility arose to investigate topological structures of sub-manifolds of Euclidean space with codimension one and to extend persistent homology theory to non-manifolds by Poincaré and Lefschetz duality [16, §4]. By means of commutative algebra applied to the problem of computing multipersistences, an algorithm in polynomial time was proposed [17, §4.3].

Metrics

Cohen-Steiner, Edelsbrunner and Harer have shown that persistence diagrams provide a stable representation for a large class of functions [18], those of tame functions. Stability in terms of metrics between two diagrams indicates that whenever the persistence diagrams of two real-valued functions in a topological space change slightly, the function value also changes only moderately, and vice versa. The results on stability were shown for the bottleneck distance with a linear upper bound and were soon also shown on triangulated compact metric spaces for total persistence, as a statistical measure on persistence diagrams and special cases of the Wasserstein distance by Cohen-Steiner, Edelsbrunner, Harer et al. [19]. Chazal, de Silva, Glisse et al. provide a complete overview of the structure and stability of persistence modules in their seminal book [20]. Outdot extends this threatise connecting topological data analysis with its algebraic topological aspects in a systematic manner to quiver theory, to give a broad view onto the subject [21]. He focuses on 1-dimensional persistence. Since the stability of the persistence diagrams distinguishes them as a tool for studying the data and is thus possibly the most significant result of computer-aided topology, the results of stability were investigated by studying the Hausdorff and Bottleneck distances [22]. However, the results are very pessimistic about outliers in the point set. Recent results show an elementary proof of stability also for the p-th Wasserstein distance, a generalisation of previous results that extends stability properties to more general classes of modules [22].

Some Deep Learning

We start this paragraph with persistence-based clustering in Riemannian manifolds [23]. These manifolds have a Riemannian metric inherent and are therefore referred to as smooth manifolds due to their infinitely often differentiable structure. Persistent homology has been used as a feedback mechanism for finding clusters on such manifolds as it reflects the prominence of density modes within smooth spaces [23]. In addition, this cluster method has the advantage that the learned objects are bounded in a mathematical sense to the density centres of the actual clusters. The algorithm with the designation ToMATo delivers under mild conditions the correct number of clusters and has outperformed previous clusters in all benchmark tests in that time [23, §5]. This has triggered the inclusion of topological constraints throughout the machine learning cycle. In a first paper persistent homology was used for learning densities with bounded support to obtain certain topological features compellingly during a density estimation [24]. For the first time persistent homology was combined with kernel based methods. In medicine, as well, persistent homology together with supervised learning has been applied for the first time in the context of a classification task, through the classification of hepatic liver lesions with the aid of bottleneck distance, persistent homology and support vector machines [25].

References

  1. Zomorodian, Afra and Gunnar Carlsson (2004). “Computing Persistent Homology.” In: Proceedings of the Annual Symposium on Computational Geometry. Vol. 33. 2, pp. 347–356. doi: 10.1145/997817.997870
  2. Dantchev, Stefan and Ioannis Ivrissimtzis (2012). “Efficient Construction of the Čech Complex.” In: Comput. Graph. 36.6, pp. 708–713. doi: 10.1016/j.cag.2012.02.016
  3. Zomorodian, Afra (2010). “Fast Construction of the Vietoris-Rips Complex.” In: Computers & Graphics 34.3, pp. 263–271. doi: 10.1016/j.cag.2010.03.007
  4. Sheehy, Donald (2013). “Linear-size approximations to the Vietoris–Rips filtration.” In: Discrete & Computational Geometry 49.4, pp. 778–796. doi: 10.1007/s00454-013-9513-1
  5. Boissonnat, Jean-Daniel, Frédéric Chazal, and Mariette Yvinec (2018). Geometric and Topological Inference. Vol. 57. Cambridge University Press. doi: 10.1017/9781108297806
  6. Singh, Gurjeet, Facundo Memoli, and Gunnar Carlsson (2007). “Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition.” In: Eurographics Symposium on Point-Based Graphics. Ed. by M. Botsch, R. Pajarola, B. Chen, and M. Zwicker. The Eurographics Association. doi: 10.2312/SPBG/SPBG07/091-100
  7. Ghrist, Robert (2008). “Barcodes: the persistent topology of data.” In: Bulletin of the American Mathematical Society 45.1, pp. 61–75. doi: 10.1090/S0273-0979-07-01191-3
  8. Nicolau, Monica, Arnold J Levine, and Gunnar Carlsson (2011). “Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival.” In: Proceedings of the National Academy of Sciences 108.17, pp. 7265–7270. doi: 10.1073/pnas.1102826108
  9. Lee, Hyekyoung, Hyejin Kang, Moo Chung, Bung-Nyun Kim, and Dong Lee (2012). “Persistent brain network homology from the perspective of dendrogram.” In: IEEE transactions on medical imaging 31.12, pp. 2267–2277. doi: 10 . 1109 / TMI . 2012 . 2219590
  10. Rieck, Bastian, Hubert Mara, and Heike Leitte (2012). “Multivariate data analysis using persistence-based filtering and topological signatures.” In: IEEE Transactions on Visualization and Computer Graphics 18.12, pp. 2382–2391. doi: 10.1109/TVCG.2012.248
  11. Lum, Pek, Gurjeet Singh, Alan Lehman, Tigran Ishkanov, Mikael Vejdemo-Johansson, Muthu Alagappan, John Carlsson, and Gunnar Carlsson (2013). “Extracting insights from the shape of complex data using topology.” In: Scientific reports 3, p. 1236. doi: 10.1038/srep01236
  12. Bubenik, Peter (2015). “Statistical Topological Data Analysis using Persistence Landscapes.” In: The Journal of Machine Learning Research 16.1, pp. 77–102. doi: 10.5555/2789272.2789275
  13. Carrière, Mathieu, Steve Oudot, and Maks Ovsjanikov (2015). “Stable topological signatures for points on 3d shapes.” In: Computer Graphics Forum. Vol. 34. 5, pp. 1–12. doi: 10.1111/cgf.12692
  14. Adams, Henry, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier (2017). “Persistence Images: A Stable Vector Representation of Persistent Homology.” In: The Journal of Machine Learning Research 18.1, pp. 218–252. doi: 10.5555/3122009.3122017
  15. Carlsson, Gunnar and Afra Zomorodian (2009). “The Theory of Multidimensional Persistence.” In: Discrete Computational Geometry 42.1, pp. 71–93. doi: 10.1007/s00454-009-9176-0
  16. Cohen-Steiner, David, Herbert Edelsbrunner, and John Harer (2009). “Extending persistence using Poincaré and Lefschetz duality.” In: Foundations of Computational Mathematics 9.1, pp. 79–103. doi: 10.1007/s10208-008-9038-9
  17. Carlsson, Gunnar, Gurjeet Singh, and Afra Zomorodian (2009). “Computing Multidimensional Persistence.” In: Journal of Computational Geometry, pp. 730–739. doi: 10.1007/978-3-642-10631-6_74
  18. Cohen-Steiner, David, Herbert Edelsbrunner, and John Harer (2007). “Stability of Persistence Diagrams.” In: Discrete Computational Geometry 37.1, pp. 103–120. doi: 10.1007/s00454-006-1276-5
  19. Cohen-Steiner, David, Herbert Edelsbrunner, John Harer, and Yuriy Mileyko (2010). “Lipschitz Functions Have Lp -Stable Persistence.” In: Foundations of Computational Mathematics 10.2, pp. 127–139. doi: 10.1007/s10208-010-9060-6
  20. Chazal, Frédéric, Vin De Silva, Marc Glisse, and Steve Oudot (2016). The Structure and Stability of Persistence Modules. Springer. doi: 10.1007/978-3-319-42545-0
  21. Oudot, Steve (2015). Persistence Theory - From Quiver Representations to Data Analysis. Vol. 209. Mathematical surveys and monographs. American Mathematical Society. doi: 10.1111/j.1467-8659.2009.01516.x
  22. Skraba, Primoz and Katharine Turner (2020). “Wasserstein Stability for Persistence Diagrams.” In: arXiv 2006.16824. url: http://arxiv.org/abs/2006.16824
  23. Chazal, Frédéric, Leonidas Guibas, Steve Oudot, and Primoz Skraba (2013). “Persistence-based clustering in riemannian manifolds.” In: Journal of the Association for Computing Machinery 60.6, pp. 1–38. doi: 10.1145/2535927
  24. Pokorny, Florian, Hedvig Kjellström, Danica Kragic, and Carl Ek (2012). “Persistent homology for learning densities with bounded support.” In: Advances in Neural Information Processing Systems, pp. 1817–1825. doi: 10.5555/2999325.2999338
  25. Adcock, Aaron, Daniel Rubin, and Gunnar Carlsson (2014). “Classification of hepatic lesions using the matching metric.” In: Computer vision and image understanding 121, pp. 36–42. doi: 10.1016/j.cviu.2013.10.014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment