Kernel low-rank approximations¶
Incomplete Cholesky Decomposition¶
Incomplete Cholesky Decomposition (CSI) learns low-rank kernel matrix approximation with pivot selection based on lower-bound of the gain of approximation error.
- Fine and K. Scheinberg, Efficient SVM Training Using Low-Rank Kernel Representations, J. Mach. Learn. Res., vol. 2, pp. 243-264, 2001.
Given a kernel matrix \(\mathbf{K} \in \mathbb{R}^{n\ x\ n}\) find a low-rank approximation \(\mathbf{G} \in \mathbb{R}^{n\ x\ k}\).
-
class
mklaren.projection.icd.
ICD
(rank, eps=1e-10, mode='norm_bound')¶ Variables: G – ( numpy.ndarray
) Low-rank approximation.-
__init__
(rank, eps=1e-10, mode='norm_bound')¶ Parameters: - rank – (
int
) Maximal decomposition rank. - eps – (
float
) Tolerance lower bound.
- rank – (
-
__call__
(i, j)¶ Access portions of the combined kernel matrix at indices i, j.
Parameters: - i – (
int
) or (numpy.ndarray
) Index/indices of data points(s). - j – (
int
) or (numpy.ndarray
) Index/indices of data points(s).
Returns: (
numpy.ndarray
) Value of the kernel matrix for i, j.- i – (
-
__getitem__
(item)¶ Access portions of the kernel matrix generated by
kernel
.Parameters: item – ( tuple
) pair of: indices or list of indices or (numpy.ndarray
) or (slice
) to address portions of the kernel matrix.Returns: ( numpy.ndarray
) Value of the kernel matrix for item.
-
fit
(K)¶ Learn a low-rank kernel approximation.
Parameters: K – ( numpy.ndarray
) or of (Kinterface
). The kernel to be approximated with G.
-
Cholesky with Side Information¶
Cholesky with Side Information (CSI) learns low-rank kernel matrix approximation with respect ot regression targets or class labels.
- Bach and M. I. Jordan, “Predictive low-rank decomposition for kernel methods,” Proceedings of the 22nd international conference on Machine learning - ICML 2005. ACM Press, New York, New York, USA, pp. 33-40, 2005.
Given a kernel matrix \(\mathbf{K} \in \mathbb{R}^{n\ x\ n}\) and the targets \(\mathbf{y} \in \mathbb{R}^{n}\), find a low-rank approximation \(\mathbf{G} \in \mathbb{R}^{n\ x\ k}\).
The implementation is based on the MATLAB/octave code provided by authors. It assumes an octave
executable in the systems path and the oct2py
Python module.
-
class
mklaren.projection.csi.
CSI
(rank=40, kappa=0.99, centering=1, delta=40, eps=1e-20)¶ Variables: - G – (
numpy.ndarray
) Low-rank approximation. - Q – (
numpy.ndarray
) QR decomposition ofG
(Q matrix). - R – (
numpy.ndarray
) QR decomposition ofG
(R matrix).
-
__init__
(rank=40, kappa=0.99, centering=1, delta=40, eps=1e-20)¶ Parameters: - rank – (
int
) Maximal decomposition rank. - kappa – (
float
) Trade-off accuracy vs. predictive gain. - centering – (
int
) 0 or 1, Centering of the kernel matrix. - delta – (
int
) Number of look-ahead columns. - eps – (
float
) Tolerance lower bound.
- rank – (
-
__call__
(i, j)¶ Access portions of the combined kernel matrix at indices i, j.
Parameters: - i – (
int
) or (numpy.ndarray
) Index/indices of data points(s). - j – (
int
) or (numpy.ndarray
) Index/indices of data points(s).
Returns: (
numpy.ndarray
) Value of the kernel matrix for i, j.- i – (
-
__getitem__
(item)¶ Access portions of the kernel matrix generated by
kernel
.Parameters: item – ( tuple
) pair of: indices or list of indices or (numpy.ndarray
) or (slice
) to address portions of the kernel matrix.Returns: ( numpy.ndarray
) Value of the kernel matrix for item.
-
fit
(K, y)¶ Learn a low-rank kernel approximation.
Parameters: - K – (
numpy.ndarray
) or of (Kinterface
). The kernel to be approximated with G. - y – (
numpy.ndarray
) Class labels \(y_i \in {-1, 1}\) or regression targets.
- K – (
- G – (
The Nystrom approximation¶
The Nystrom method learns a low-rank approximation of the kernel by evaluating the kernel function at a subset of data points.
- Williams and M. Seeger, “Using the Nystrom method to speed up kernel machines,” in Proceedings of the 14th Annual Conference on Neural Information Processing Systems, 2001, no. EPFL-CONF.161322, pp. 682-688.
- Alaoui and M. Mahoney, “Fast Randomized Kernel Methods With Statistical Guarantees”, arXiv, 2001.
Given a kernel matrix \(\mathbf{K} \in \mathbb{R}^{n\ x\ n}\) and an active set \(\mathcal{A}\), the kernel matrix is approximated as
or in terms of \(\mathbf{G}\):
-
class
mklaren.projection.nystrom.
Nystrom
(rank=10, random_state=None, lbd=0, verbose=False)¶ Variables: - K – (
Kinterface
) or (numpy.ndarray
) the kernel matrix. - active_set – The selected avtive set of indices.
- K_SS_i – (
numpy.ndarray
) the inverse kernel of the active set. - K_XS – (
numpy.ndarray
) the kernel of the active set and the full data set. - G – (
numpy.ndarray
) Low-rank approximation.
-
__init__
(rank=10, random_state=None, lbd=0, verbose=False)¶ Parameters: - rank – (
int
) Maximal decomposition rank. - lbd – (
float
) regularization parameter (to be used in Kernel Ridge Regression).
:param verbose (
bool
) Set verbosity.- rank – (
-
fit
(K, inxs=None)¶ Fit approximation to the kernel function / matrix.
Parameters: - K – (
numpy.ndarray
) or of (Kinterface
). The kernel to be approximated with G. - inxs – (
list
) A predefined active set. If None, it is selected randomly, else use specified inputs and override therank
setting.
- K – (
-
leverage_scores
(K)¶ Compute leverage scores for matrix K and regularization parameter lbd.
Parameters: K – ( numpy.ndarray
) or of (Kinterface
). The kernel to be approximated with G.Returns: ( numpy.ndarray
) a vector of leverage scores to determine a sampling distribution.
-
predict
(X=None, inxs=None)¶ Predict values of the kernel for a test set.
Parameters: - X – (
numpy.ndarray
) Test samples in the input space. - inxs – (
list
) The active set.
Returns: (
numpy.ndarray
) Predicted values of the kernel function against all training inputs.- X – (
- K – (