Methodology

Standard SCMS Algorithm on the Euclidean Space \(\mathbb{R}^D\)

The subspace constrained mean shift (SCMS) algorithm [2] is designed to estimate the density ridges, the lower dimensional structures at which the density peak. Given a (smooth) density function \(p\) supported on \(\mathbb{R}^D\), its \(d\)-dimensional density ridge is defined as [3]

\[R_d(p) = \left\{\mathbf{x} \in \mathbb{R}^D: V_E(\mathbf{x})^T \nabla p(\mathbf{x})=\mathbf{0}, \lambda_{d+1}(\mathbf{x}) < 0 \right\},\]

where \(\nabla p(\mathbf{x})\) is the gradient of \(p\) and \(V_E(\mathbf{x})=\left[\mathbf{v}_{d+1}(\mathbf{x}),..., \mathbf{v}_D(\mathbf{x})\right] \in \mathbb{R}^{D\times (D-d)}\) consists of the last \((D-d)\) eigenvectors of the Hessian \(\nabla\nabla p(\mathbf{x})\) associated with a descending order of eigenvalues \(\lambda_{d+1}(\mathbf{x}) \geq \cdots \geq \lambda_D(\mathbf{x})\). Under the scenario of cosmic filament detection in the flat Euclidean space \(\mathbb{R}^D\), the one-dimensional density ridge \(R_1(p)\) serve as an ideal choice of the theoretical cosmic filament model.

To estimate the theoretical density ridge \(R_d(p)\) in practice, we leverage the plug-in estimator \(R_d(\widehat{p})\) derived from the kernel density estimator \(\widehat{p}\) (KDE) as:

\[\widehat{p}(\mathbf{x}) = \frac{1}{nb^D} \sum_{i=1}^n K\left(\left\|\frac{\mathbf{x}-\mathbf{X}_i}{b} \right\|_2^2 \right),\]

where \(\{\mathbf{X}_1,...,\mathbf{X}_n\} \subset \mathbb{R}^D\) is a random sample from \(p\), \(\|\cdot\|_2\) is the usual Euclidean norm, \(K:\mathbb{R} \to \mathbb{R}^+\) is the kernel function (e.g., the Gaussian kernel \(K(r)=\frac{1}{(2\pi)^{D/2}} \exp\left(\frac{r}{2} \right)\)), and \(b\) is the smoothing bandwidth parameter. The standard SCMS algorithm in \(\mathbb{R}^D\) is then applied to an initial mesh of points and iterates the following formula for \(t=0,1,...\):

\[\mathbf{x}^{(t+1)} \gets \mathbf{x}^{(t)} + \widehat{V}_E(\mathbf{x}^{(t)}) \widehat{V}_E(\mathbf{x}^{(t)})^T \left[ \frac{\sum_{i=1}^n \mathbf{X}_i K'\left(\left\|\frac{\mathbf{x}^{(t)}-\mathbf{X}_i}{b}\right\|_2^2 \right)}{\sum_{i=1}^n K'\left(\left\|\frac{\mathbf{x}^{(t)}-\mathbf{X}_i}{b}\right\|_2^2 \right)} - \mathbf{x}^{(t)} \right]\]

until convergence on each initial point \(\mathbf{x}^{(0)}\), where \(\widehat{V}_E=\left[\widehat{\mathbf{v}}_{d+1}(\mathbf{x}),..., \widehat{\mathbf{v}}_D(\mathbf{x})\right] \in \mathbb{R}^{D\times (D-d)}\) has its columns as the last \((D-d)\) eigenvectors of the estimated Hessian \(\nabla\nabla \hat{p}(\mathbf{x})\). The set of converged points is a discrete sample from the estimated density ridge \(R_d(\widehat{p})\).

Despite its fast convergence and wide applications in identifying density ridges, the standard SCMS algorithm fails to take into account any non-linear curvature in the data space; see Figure 1 in [1] and Appendix B in [5] for the spherical data case.

Directional SCMS (DirSCMS) Algorithm on the Unit Sphere \(\mathbb{S}^q\)

Our DirSCMS algorithm [5] takes a discrete collection of observations \(\{\mathbf{X}_1,...,\mathbf{X}_n\}\) on the unit (hyper)sphere \(\mathbb{S}^q=\left\{\mathbf{x}\in \mathbb{R}^{q+1}:\|\mathbf{x}\|=1 \right\}\) and estimates the density ridge of the underlying directional density function \(f:\mathbb{S}^q \to \mathbb{R}\). The definition of the order-\(d\) directional density ridge of \(f\) is generalized from the above Euclidean counterpart as:

\[R_d(f) = \left\{\mathbf{x} \in \mathbb{S}^q: V_D(\mathbf{x})^T \mathtt{grad} f(\mathbf{x})=\mathbf{0}, \lambda_{d+1}(\mathbf{x}) < 0 \right\},\]

where \(\mathtt{grad} f(\mathbf{x})\) is the Riemannian gradient of \(f\) and \(V_D(\mathbf{x})=\left[\mathbf{v}_{d+1}(\mathbf{x}),..., \mathbf{v}_q(\mathbf{x})\right] \in \mathbb{R}^{(q+1)\times (q-d)}\) consists of the last \((q-d)\) eigenvectors of the Riemannian Hessian \(\mathcal{H} f(\mathbf{x})\) within the tangent space of \(\mathbb{S}^q\) at \(\mathbf{x}\) associated with a descending order of eigenvalues \(\lambda_{d+1}(\mathbf{x}) \geq \cdots \geq \lambda_q(\mathbf{x})\). Notice that \(\mathcal{H} f(\mathbf{x})\) has \(q\) orthonormal eigenvectors spanning the tangent space of \(\mathbb{S}^q\) at \(\mathbf{x}\) and another unit eigenvector \(\mathbf{x}\) orthogonal to the tangent space.

Figure 1: An illustration of the directional density ridge on \(\mathbb{S}^2\), where the contour lines indicate the estimated density function by directional KDE.

The directional density ridge \(R_d(f)\) is indeed adaptive to the spherical geometry of \(\mathbb{S}^q\) because it is defined through the Riemannian gradient and Hessian of the directional density \(f\) within the tangent space of \(\mathbb{S}^q\). To compute these derivative quantities, one can extend the domain of \(f\) from \(\mathbb{S}^q\) to its ambient Euclidean space \(\mathbb{R}^{q+1}\setminus\{\mathbf{0}\}\). Then, the Riemannian gradient and Hessian on \(\mathbb{S}^q\) are connected with the total gradient \(\nabla f(\mathbf{x})\) and Hessian \(\mathcal{H} f(\mathbf{x})\) in \(\mathbb{R}^{q+1}\) as:

\[\mathtt{grad} f(\mathbf{x}) = (\mathbf{I}_{q+1} -\mathbf{x}\mathbf{x}^T) \nabla f(\mathbf{x}),\]

\[\mathcal{H} f(\mathbf{x}) = (\mathbf{I}_{q+1} -\mathbf{x}\mathbf{x}^T) \left[\nabla\nabla f(\mathbf{x}) - \nabla f(\mathbf{x})^T \mathbf{x} \cdot \mathbf{I}_{q+1} \right] (\mathbf{I}_{q+1} -\mathbf{x}\mathbf{x}^T),\]

where \(\mathbf{I}_{q+1}\in \mathbb{R}^{(q+1)\times (q+1)}\) is the identity matrix. In the application of modeling cosmic filaments on the celestial sphere, one can take \(q=2\) and \(d=1\) in the above definition.

To identify the directional density ridge \(R_d(f)\) from \(\{\mathbf{X}_1,...,\mathbf{X}_n\} \subset \mathbb{S}^q\), we first estimate the directional density \(f\) via the directional KDE ([6], [7]) as:

\[\widehat{f}_b(\mathbf{x}) = \frac{C_L(b)}{n} \sum_{i=1}^n L\left(\frac{1-\mathbf{x}^T\mathbf{X}_i}{b^2} \right),\]

where \(L(\cdot)\) is the directional kernel (e.g., the von Mises kernel \(L(r)=e^{-r}\)), \(b\) is the smoothing bandwidth parameter, and \(C_L(b)\) is the normalizing constant ensuring that \(\widehat{f}_b\) is a valid density on \(\mathbb{S}^q\). Again, the estimated density ridge \(R_d(\widehat{f}_b)\) based on the directional KDE \(\widehat{f}_b\) is a statistically consistent estimator of the theoretical density ridge \(R_d(f)\) and can be practically identified by our DirSCMS algorithm with its iterative formula as:

\[\mathbf{x}^{(t+1)} \gets \mathbf{x}^{(t)} - \widehat{V}_D(\mathbf{x}^{(t)}) \widehat{V}_D(\mathbf{x}^{(t)})^T \left[\frac{\sum_{i=1}^n \mathbf{X}_i L'\left(\frac{1-\mathbf{X}_i^T\mathbf{x}^{(t)}}{b^2} \right)}{\left\|\sum_{i=1}^n \mathbf{X}_i L'\left(\frac{1-\mathbf{X}_i^T\mathbf{x}^{(t)}}{b^2} \right) \right\|_2} \right]\]

\[\text{ and } \quad \mathbf{x}^{(t+1)} \gets \frac{\mathbf{x}^{(t+1)}}{\left\| \mathbf{x}^{(t+1)} \right\|_2}\]

for \(t=0,1,...\), where \(\widehat{V}_D(\mathbf{x}) = \left[\widehat{\mathbf{v}}_{d+1}(\mathbf{x}),..., \widehat{\mathbf{v}}_q(\mathbf{x}) \right] \in \mathbb{R}^{(q+1)\times (q-d)}\) has its columns as the last \((q-d)\) eigenvectors of the estimated Riemannian Hessian \(\mathcal{H} \widehat{f}_b(\mathbf{x})\) within the tangent space of \(\mathbb{S}^q\) at \(\mathbf{x}\). Notice that we leverage the normalized (total) gradient

\[\frac{\nabla \hat{f}_b(\mathbf{x}^{(t)})}{\left\|\nabla \hat{f}_b(\mathbf{x}^{(t)}) \right\|_2} = \frac{\sum_{i=1}^n \mathbf{X}_i L'\left(\frac{1-\mathbf{X}_i^T\mathbf{x}^{(t)}}{b^2} \right)}{\left\|\sum_{i=1}^n \mathbf{X}_i L'\left(\frac{1-\mathbf{X}_i^T\mathbf{x}^{(t)}}{b^2} \right) \right\|_2}\]

in the design of our DirSCMS algorithm in pursuit of a faster convergence rate [5].

Directional-linear SCMS (DirLinSCMS) Algorithm on the 3D Light Cone \(\mathbb{S}^2\times \mathbb{R}\)

Our DirLinSCMS algorithm [8] makes a further generalization of the above DirSCMS algorithm and addresses the density ridge estimation problem on a directional-linear product space \(\mathbb{S}^q\times \mathbb{R}^D\). (The implementation of the DirLinSCMS algorithm in our sconce-scms library does not restrict to the 3D light cone but accommodates this general form \(\mathbb{S}^q\times \mathbb{R}^D\) of the directional-linear space.) We assume that its input data comprise independent and identically distributed (i.i.d.) observations \((\mathbf{X}_i,\mathbf{Z}_i) \in \mathbb{S}^q\times \mathbb{R}^D, i=1,...,n\) sampled from a directional-linear density \(f_{dl}(\mathbf{x},\mathbf{z})\). The theoretical density ridge is defined similarly as:

\[R_d(f_{dl}) = \left\{(\mathbf{x},\mathbf{z}) \in \mathbb{S}^q \times \mathbb{R}^D: V_{dl}(\mathbf{x},\mathbf{z})^T \mathtt{grad} f_{dl}(\mathbf{x},\mathbf{z})=\mathbf{0}, \lambda_{d+1}(\mathbf{x},\mathbf{z}) < 0 \right\},\]

where \(\mathtt{grad} f_{dl}(\mathbf{x},\mathbf{z})\) is the Riemannian gradient of \(f_{dl}\) and \(V_{dl}(\mathbf{x},\mathbf{z})=\left[\mathbf{v}_{d+1}(\mathbf{x},\mathbf{z}),..., \mathbf{v}_{q+D}(\mathbf{x},\mathbf{z})\right] \in \mathbb{R}^{(q+1+D)\times (q+D-d)}\) consists of the last \((q+D-d)\) eigenvectors of the Riemannian Hessian \(\mathcal{H} f_{dl}(\mathbf{x},\mathbf{z})\) within the tangent space of \(\mathbb{S}^q \times \mathbb{R}^D\) at \((\mathbf{x},\mathbf{z})\) (equivalently, the orthogonal space of \((\mathbf{x},\mathbf{0})\) in \(\mathbb{R}^{q+1+D}\)) associated with a descending order of eigenvalues \(\lambda_{d+1}(\mathbf{x},\mathbf{z}) \geq \cdots \geq \lambda_{q+D}(\mathbf{x},\mathbf{z})\). The Riemannian gradient and Hessian of \(f_{dl}\) can also be expressed in terms of its total gradient and Hessian in the ambient Euclidean space \(\mathbb{R}^{q+1+D}\); see, e.g., Appendix A in [1].

Analogously, the underlying density \(f_{dl}\) and its density ridge can be estimated by directional-linear KDE [9] as:

\[\widehat{f}_{dl}(\mathbf{x},\mathbf{z}) = \frac{C_L(b_1)}{nb_2^D} \sum_{i=1}^n L\left(\frac{1-\mathbf{X}_i^T\mathbf{x}}{b_1^2} \right) K\left(\left\| \frac{\mathbf{z}-\mathbf{Z}_i}{b_2} \right\|_2^2 \right),\]

where \(L(\cdot)\) and \(K(\cdot)\) are the directional and linear kernel functions while \(b_1,b_2\) are the smoothing bandwidth parameters for directional and linear components, respectively. The challenge lies in the formulation of the DirLinSCMS algorithm, in that a naive generalization from the mean shift algorithm to its SCMS counterpart as how the standard SCMS and DirSCMS methods use will lead to a biased estimate of \(R_d(\widehat{f}_{dl})\); see Section 4 in [8]. Fortunately, under the applications of the von Mises (directional) kernel and Gaussian (linear) kernel, we are able to formulate the correct SCMS iterative formula for \((\mathbf{x}^{(t)},\mathbf{z}^{(t)}) \in \mathbb{S}^q\times \mathbb{R}^D\) with \(t=0,1,...\) as:

\[\begin{split} \begin{pmatrix} \mathbf{x}^{(t+1)}\\ \mathbf{z}^{(t+1)} \end{pmatrix} \gets \begin{pmatrix} \mathbf{x}^{(t)}\\ \mathbf{z}^{(t)} \end{pmatrix} +\eta \cdot \widehat{V}_{dl}\left(\mathbf{x}^{(t)},\mathbf{z}^{(t)}\right) \widehat{V}_{dl}\left(\mathbf{x}^{(t)},\mathbf{z}^{(t)}\right)^T \mathbf{H}\cdot \begin{pmatrix} \frac{\sum\limits_{i=1}^n \mathbf{X}_i\cdot L'\left(\frac{1-\mathbf{X}_i^T\mathbf{x}^{(t)}}{b_1^2} \right) K\left(\left\|\frac{\mathbf{z}^{(t)}-\mathbf{Z}_i}{b_2} \right\|_2^2 \right) }{\sum\limits_{i=1}^n L'\left(\frac{1-\mathbf{X}_i^T\mathbf{x}^{(t)}}{b_1^2} \right) K\left(\left\|\frac{\mathbf{z}^{(t)}-\mathbf{Z}_i}{b_2} \right\|_2^2 \right)} -\mathbf{x}^{(t)}\\ \frac{\sum\limits_{i=1}^n \mathbf{Z}_i \cdot L\left(\frac{1-\mathbf{X}_i^T\mathbf{x}^{(t)}}{b_1^2} \right) K'\left(\left\|\frac{\mathbf{z}^{(t)}-\mathbf{Z}_i}{b_2} \right\|_2^2 \right) }{\sum\limits_{i=1}^n L\left(\frac{1-\mathbf{X}_i^T\mathbf{x}^{(t)}}{b_1^2} \right) K'\left(\left\|\frac{\mathbf{z}^{(t)}-\mathbf{Z}_i}{b_2} \right\|_2^2 \right)} - \mathbf{z}^{(t)} \end{pmatrix}\end{split}\]

\[\text{ and } \quad \mathbf{x}^{(t+1)} \gets \frac{\mathbf{x}^{(t+1)}}{\left\| \mathbf{x}^{(t+1)} \right\|_2},\]

where \(\widehat{V}_{dl}(\mathbf{x},\mathbf{z})=\left[\widehat{\mathbf{v}}_{d+1}(\mathbf{x},\mathbf{z}),..., \widehat{\mathbf{v}}_{q+D}(\mathbf{x},\mathbf{z})\right] \in \mathbb{R}^{(q+1+D)\times (q+D-d)}\) has its columns as the last \((q+D-d)\) eigenvectors of the estimated Riemannian Hessian \(\mathcal{H} \widehat{f}_{dl}(\mathbf{x},\mathbf{z})\) within the tangent space of \(\mathbb{S}^q\times \mathbb{R}^D\) at \((\mathbf{x},\mathbf{z})\) and \(\mathbf{H}=\mathtt{Diag}\left(\underbrace{\frac{1}{b_1^2},....,\frac{1}{b_1^2}}_{(q+1) \text{ terms}}, \underbrace{\frac{1}{b_2^2},...,\frac{1}{b_2^2}}_{D \text{ terms}} \right) \in \mathbb{R}^{(q+1+D)\times (q+1+D)}\) is a diagonal bandwidth matrix.

One may notice that our DirLinSCMS algorithm is neither a simple generalization from the standard SCMS and our previously proposed DirSCMS algorithms nor a direct combination of them. Instead, our DirLinSCMS algorithm exhibits two major differences in its iterative formula. First, the DirLinSCMS algorithm scales the mean shift vector with an extra bandwidth matrix so as to ensure that it follows the correct projected gradient direction at each iterative step. Second, it inevitably requires a step size parameter \(\eta\) to control its convergence. As a theretically motivated and practically effective guideline, we suggest taking the step size as:

\[\eta = \min\left\{b_1\cdot b_2, 1\right\},\]

where the upper bound 1 is set to prevent the algorithm from overshooting the estimated ridge region when the smoothing bandwidth parameters are chosen to be large.

Methodology

Standard SCMS Algorithm on the Euclidean Space \(\mathbb{R}^D\)

Directional SCMS (DirSCMS) Algorithm on the Unit Sphere \(\mathbb{S}^q\)

Directional-linear SCMS (DirLinSCMS) Algorithm on the 3D Light Cone \(\mathbb{S}^2\times \mathbb{R}\)

References