br Acknowledgements br Introduction Atom


Atom Probe Tomography (APT) presents [1–5] enormous potential in probing the sub-nanometer character of materials. However, recovering this information from data generated at such high spatial Dinaciclib naturally presents the concomitant challenge of interpreting very high volume and high density data [6,7]. Even small data collections involve ~107 atoms and the standard procedure of visualizing this data set to isolate features can allow important ones, such as precipitates, to be easily lost within the high volume of data. Feature extraction typically requires drawing iso-concentration surfaces [8,9] at a particular concentration threshold, then visually exploring the data space to probe for various features, and repeating the procedure over an entire range of concentration threshold values. Following up on our earlier work in rendering such high volume APT data to aid in feature extraction [10–12] we now provide an alternate data driven approach of objectively classifying different phases such as precipitates by mapping the topology of the APT data set using concepts from algebraic topology, namely, simplicial homology [13–15].
Topology is inherently a classification system that deals with qualitative geometric information. This includes the study of what the connected components of a space are and their connectivity information in different dimensions of space [16]. Metric properties such as the position of a point, the distance between points, or the curvature of a surface, are irrelevant to topology. Thus, a circle and a square have the same topology although they are geometrically different. Such topological invariants can be represented by simplicial complexes, which are combinatorial objects that can represent spaces and separate the topology of a space from its geometry [14]. Examples of simplices include a point (0-dimensional simplex), a line segment (1-dimensional simplex), a triangle (2-dimensional simplex) and a tetrahedron (3-dimensional simplex).
Simplicial homology is a process that provides information about the simplicial complex by the number of cycles (a type of hole) secondary phloem contains. One of its informational outcomes are Betti numbers which record the topological invariants of an object, invariants such as the number of connected components, holes, tunnels, or cavities [17]. While a structure can have infinite shapes, many of which cannot be quantified, it can have only limited topological features depending on its dimension. For example, in three dimensions (3D), a structure can be simply connected, or it can be connected such that a tunnel passes through it, or it can be connected to itself such that it encloses a cavity, or it can remain unconnected. Thus, we can characterize the topology of a structure by counting the number of simply connected components, number of tunnels and number of cavities denoted by Betti numbers β0, β1 and β2.

When dealing with point cloud data representing physical structures, such as the APT data, the number and type of topological invariants clearly depends on the degree of connectivity between the various points, established through some metric such as distance. The determination of which points to connect can be addressed by defining a sphere of radius ‘ɛ’ around each point and connecting it to all those other points that lie within this sphere. Again there could be a measure of arbitrariness in determining the appropriate value of ɛ. A small change in ɛ for randomly distributed points can quickly change the underlying topology due to Dinaciclib statistical noise, thus changing the Betti numbers of the structure. The challenge is to determine the appropriate value of ɛ that corresponds to a meaningful feature. A powerful technique to overcome this problem is persistent homology [18], so termed because it is based on the idea that betti numbers relating to random distribution of data points and noise cannot persist as we vary ɛ. The value of ɛ is gradually increased from and the numbers of different topological components that appear and disappear are tracked for changing ɛ. This process is called filtration. Only those topological invariants that represent true features in the underlying data will remain unaffected by small changes in ɛ.