Optimal Bounds for $ell_p$ Sensitivity Sampling via $ell_2$ Augmentation

Data subsampling is one of the most natural methods to approximate a massively large data set by a small representative proxy. In particular, sensitivity sampling received a lot of attention, which samples points proportional to an individual importance measure called sensitivity. This framework reduces in very general settings the size of data to roughly the VC dimension $d$ times the total sensitivity $mathfrak S$ while providing strong $(1pmvarepsilon)$ guarantees on the quality of approximation. The recent work of Woodruff and Yasuda (2023) improved substantially over the general $tilde O(varepsilon^{-2}mathfrak Sd)$ bound for the important problem of $ell_p$ subspace embeddings to $tilde O(varepsilon^{-2}mathfrak S^{2/p})$ for $pin[1,2]$. Their result was subsumed by an earlier $tilde O(varepsilon^{-2}mathfrak Sd^{1-p/2})$ bound which was implicitly given in the work of Chen and Derezinski (2021)..
We show that their result is tight when sampling according to plain $ell_p$ sensitivities. We observe that by augmenting the $ell_p$ sensitivities by $ell_2$ sensitivities, we obtain better bounds improving over the aforementioned results to optimal linear $tilde O(varepsilon^{-2}(mathfrak S+d)) = tilde O(varepsilon^{-2}d)$ sampling complexity for all $p in [1,2]$. In particular, this resolves an open question of Woodruff and Yasuda (2023) in the affirmative for $p in [1,2]$ and brings sensitivity subsampling into the regime that was previously only known to be possible using Lewis weights (Cohen & Peng, 2015).
As an application of our main result, we also obtain an $tilde O(varepsilon^{-2}mu d)$ sensitivity sampling bound for logistic regression, where $mu$ is a natural complexity measure for this problem. This improves over the previous $tilde O(varepsilon^{-2}mu^2 d)$ bound of Mai et al. (2021) which was based on Lewis weights subsampling.

  • Published in:
    International Conference on Machine Learning
  • Type:
  • Authors:
    Munteanu, Alexander; Omlor, Simon
  • Year:

Citation information

Munteanu, Alexander; Omlor, Simon: Optimal Bounds for $ell_p$ Sensitivity Sampling via $ell_2$ Augmentation, International Conference on Machine Learning, 2024, Munteanu.Omlor.2024b,