Kernel low-rank approximations

Incomplete Cholesky Decomposition

Incomplete Cholesky Decomposition (CSI) learns low-rank kernel matrix approximation with pivot selection based on lower-bound of the gain of approximation error.

  1. Fine and K. Scheinberg, Efficient SVM Training Using Low-Rank Kernel Representations, J. Mach. Learn. Res., vol. 2, pp. 243-264, 2001.

Given a kernel matrix \(\mathbf{K} \in \mathbb{R}^{n\ x\ n}\) find a low-rank approximation \(\mathbf{G} \in \mathbb{R}^{n\ x\ k}\).

\[\hat{\mathbf{K}} = \mathbf{G}\mathbf{G}^T\]
class mklaren.projection.icd.ICD(rank, eps=1e-10, mode='norm_bound')
Variables:G – (numpy.ndarray) Low-rank approximation.
__init__(rank, eps=1e-10, mode='norm_bound')
Parameters:
  • rank – (int) Maximal decomposition rank.
  • eps – (float) Tolerance lower bound.
__call__(i, j)

Access portions of the combined kernel matrix at indices i, j.

Parameters:
  • i – (int) or (numpy.ndarray) Index/indices of data points(s).
  • j – (int) or (numpy.ndarray) Index/indices of data points(s).
Returns:

(numpy.ndarray) Value of the kernel matrix for i, j.

__getitem__(item)

Access portions of the kernel matrix generated by kernel.

Parameters:item – (tuple) pair of: indices or list of indices or (numpy.ndarray) or (slice) to address portions of the kernel matrix.
Returns:(numpy.ndarray) Value of the kernel matrix for item.
fit(K)

Learn a low-rank kernel approximation.

Parameters:K – (numpy.ndarray) or of (Kinterface). The kernel to be approximated with G.

Cholesky with Side Information

Cholesky with Side Information (CSI) learns low-rank kernel matrix approximation with respect ot regression targets or class labels.

    1. Bach and M. I. Jordan, “Predictive low-rank decomposition for kernel methods,” Proceedings of the 22nd international conference on Machine learning - ICML 2005. ACM Press, New York, New York, USA, pp. 33-40, 2005.

Given a kernel matrix \(\mathbf{K} \in \mathbb{R}^{n\ x\ n}\) and the targets \(\mathbf{y} \in \mathbb{R}^{n}\), find a low-rank approximation \(\mathbf{G} \in \mathbb{R}^{n\ x\ k}\).

\[\hat{\mathbf{K}} = \mathbf{G}\mathbf{G}^T\]

The implementation is based on the MATLAB/octave code provided by authors. It assumes an octave executable in the systems path and the oct2py Python module.

class mklaren.projection.csi.CSI(rank=40, kappa=0.99, centering=1, delta=40, eps=1e-20)
Variables:
  • G – (numpy.ndarray) Low-rank approximation.
  • Q – (numpy.ndarray) QR decomposition of G (Q matrix).
  • R – (numpy.ndarray) QR decomposition of G (R matrix).
__init__(rank=40, kappa=0.99, centering=1, delta=40, eps=1e-20)
Parameters:
  • rank – (int) Maximal decomposition rank.
  • kappa – (float) Trade-off accuracy vs. predictive gain.
  • centering – (int) 0 or 1, Centering of the kernel matrix.
  • delta – (int) Number of look-ahead columns.
  • eps – (float) Tolerance lower bound.
__call__(i, j)

Access portions of the combined kernel matrix at indices i, j.

Parameters:
  • i – (int) or (numpy.ndarray) Index/indices of data points(s).
  • j – (int) or (numpy.ndarray) Index/indices of data points(s).
Returns:

(numpy.ndarray) Value of the kernel matrix for i, j.

__getitem__(item)

Access portions of the kernel matrix generated by kernel.

Parameters:item – (tuple) pair of: indices or list of indices or (numpy.ndarray) or (slice) to address portions of the kernel matrix.
Returns:(numpy.ndarray) Value of the kernel matrix for item.
fit(K, y)

Learn a low-rank kernel approximation.

Parameters:
  • K – (numpy.ndarray) or of (Kinterface). The kernel to be approximated with G.
  • y – (numpy.ndarray) Class labels \(y_i \in {-1, 1}\) or regression targets.

The Nystrom approximation

The Nystrom method learns a low-rank approximation of the kernel by evaluating the kernel function at a subset of data points.

  1. Williams and M. Seeger, “Using the Nystrom method to speed up kernel machines,” in Proceedings of the 14th Annual Conference on Neural Information Processing Systems, 2001, no. EPFL-CONF.161322, pp. 682-688.
  1. Alaoui and M. Mahoney, “Fast Randomized Kernel Methods With Statistical Guarantees”, arXiv, 2001.

Given a kernel matrix \(\mathbf{K} \in \mathbb{R}^{n\ x\ n}\) and an active set \(\mathcal{A}\), the kernel matrix is approximated as

\[\mathbf{\hat{K}} = \mathbf{K}(:, \mathcal{A}) \mathbf{K}^{-1}(\mathcal{A}, \mathcal{A}) \mathbf{K}(:, \mathcal{A})^T\]

or in terms of \(\mathbf{G}\):

\[\mathbf{G} = \mathbf{K}(:, \mathcal{A}) \mathbf{K}^{-1/2}(\mathcal{A}, \mathcal{A})\]
class mklaren.projection.nystrom.Nystrom(rank=10, random_state=None, lbd=0, verbose=False)
Variables:
  • K – (Kinterface) or (numpy.ndarray) the kernel matrix.
  • active_set – The selected avtive set of indices.
  • K_SS_i – (numpy.ndarray) the inverse kernel of the active set.
  • K_XS – (numpy.ndarray) the kernel of the active set and the full data set.
  • G – (numpy.ndarray) Low-rank approximation.
__init__(rank=10, random_state=None, lbd=0, verbose=False)
Parameters:
  • rank – (int) Maximal decomposition rank.
  • lbd – (float) regularization parameter (to be used in Kernel Ridge Regression).

:param verbose (bool) Set verbosity.

fit(K, inxs=None)

Fit approximation to the kernel function / matrix.

Parameters:
  • K – (numpy.ndarray) or of (Kinterface). The kernel to be approximated with G.
  • inxs – (list) A predefined active set. If None, it is selected randomly, else use specified inputs and override the rank setting.
leverage_scores(K)

Compute leverage scores for matrix K and regularization parameter lbd.

Parameters:K – (numpy.ndarray) or of (Kinterface). The kernel to be approximated with G.
Returns:(numpy.ndarray) a vector of leverage scores to determine a sampling distribution.
predict(X=None, inxs=None)

Predict values of the kernel for a test set.

Parameters:
  • X – (numpy.ndarray) Test samples in the input space.
  • inxs – (list) The active set.
Returns:

(numpy.ndarray) Predicted values of the kernel function against all training inputs.