Theoretical and Practical Aspects of Finite Closure Systems for Mining and Learning

This thesis investigates the potential and limitations of different adaptations of half-space separations in ordinary Euclidean spaces, one of the most popular paradigms in machine learning, to abstract finite closure systems. Although closure systems and their corresponding closure operators have different applications in machine learning and data mining, this question has not been studied in these research fields. Furthermore, all machine learning and data mining applications of closure systems have so far been restricted to specific domains.
Using Jamison’s notion of half-spaces in abstract closure systems, for the adaptation we regard a closed set as a half-space if its complement is also closed. However, this generalization does not preserve the most important properties of ordinary half-space separations. In particular, there is no result corresponding to Minkowski’s hyperplane separation theorem for Euclidean spaces because disjoint closed sets are not necessarily separable by half-spaces in abstract closure systems.We show that deciding the half-space separation (HSS) problem, that is, the problem whether or not two disjoint closed sets are half-space separable in an abstract closure system is NP-hard in general. Motivated by this negative complexity result, in a first step we therefore focus on the class of closure systems in which the HSS problem has a solution for any two disjoint closed sets.This kind of closure systems will be referred to as the Kakutani closure systems.On the one hand, for the case that we have access to the abstract finite closure system via the underlying closure operator only, we show the negative worst-case result that one needs exponentially many closure operator calls to decide whether the closure system is Kakutani, or not. On the other hand, we give a forbidden minor characterization of Kakutani geodesic closure systems over graphs, which, for example, implies that disjoint closed sets in outerplanar graphs are always separable via half-spaces. As a second direction to overcome the above negative result, we relax the HSS problem to finding two inclusion maximal disjoint closed sets and present a simple greedy algorithm for this problem. It turns out that this simple algorithm has several attractive properties. In particular, it is optimal with respect to the number of closure operator calls, provides an algorithmic characterization of Kakutani closure systems, and can be extended to returning maximum margin separations by utilizing the notion of monotone linkage functions.
Regarding some practical aspects of the generic theory developed in the thesis, we demonstrate on real-world networks up to more than 100,000,000 edges that their geodesic cores can closely be approximated. As a practical application, we empirically show that the Tukey depth of the vertices in a graph can closely be approximated by our greedy algorithm.

  • Type:
    Phdthesis
  • Authors:
    Seiffarth, Florian
  • Year:
    2023

Citation information

Seiffarth, Florian: Theoretical and Practical Aspects of Finite Closure Systems for Mining and Learning, 2023, https://bonndoc.ulb.uni-bonn.de/xmlui/handle/20.500.11811/11083, Seiffarth.2023a,