Mklaren is a library containing Multiple kernel learning and Kernel low-rank approximation methods, written in Python.
Features:
import os
os.chdir("..")
import mklaren
import datasets
% matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(precision=3)
The input data is represented by a set pairs:
$$ \{(\mathbf{x}_1, y_1), (\mathbf{x}_2, y_2), ..., (\mathbf{x}_n, y_n)\}$$
where $\mathbf{x}_i$ represent independent variables (covariates) and $y_i$ are real-valued regression targets. The data is split in 50% (training set) and 50% (test set) in this example.
from datasets.delve import load_boston
data = load_boston()
X = data["data"]
y = data["target"]
print X.shape
tr = range(0, len(X), 2)
te = range(1, len(X), 2)
X_tr, X_te = X[tr], X[te]
y_tr, y_te = y[tr], y[te]
The central data structure in the library is the Kinterface
(Kernel interface) class. It mimics a Numpy array storing a kernel matrix generated by the kernel
function (with parameters kernel_args
) on a finite data
sample. The idea behind Kinterface
is the dynamic calculation of the kernel matrix values, as the usual kernel matrix approximation algorithms do not require access to the full kernel matrix.
from mklaren.kernel.kinterface import Kinterface
from mklaren.kernel.kernel import rbf_kernel
K = Kinterface(data=X_tr, kernel=rbf_kernel, kernel_args={"sigma": 30})
K.shape
The Kinterface
is accessed in a same way as a standard Numpy array, via the addressing brackets. The brackets notation always assumes two arguments and addresses parts of the kernel matrix.
print "The value of K at row 42 and column 66: "
print K[42, 66]
print
print "A 3x3 submatrix of K"
print K[:3, :3]
print "Another 3x3 submatrix"
print K[[0, 10, 20], [1, 2, 5]]
print
print "The value of K of sample 0 against samples 0, 1, 2 ..., 9: "
print K[0, :10]
print
Similarly, the full kernel matrix can be generated.
plt.figure()
plt.imshow(K[:, :])
plt.show()
The low rank approximations to K
find a low rank matrix (termed $\mathbf{G}$) such that
$$ \mathbf{K} = \mathbf{G}\mathbf{G}^T $$.
Incomplete Cholesky decomposition finds $\mathbf{G}$ by greedily maximizing a lower-bound to the gain in approximation error (Fine, 2002).
from mklaren.projection.icd import ICD
model = ICD(rank=15)
model.fit(K)
G_icd = model.G
inxs = model.active_set_
print "G shape:", G_icd.shape, "Error:", np.linalg.norm(K[:, :]-G_icd.dot(G_icd.T))
The Nystrom approximation (Seeger, 2001) randomly selects a subset of active colums $\mathcal{A}$ and approximates
$$ \mathbf{\hat{K}} = \mathbf{K}(:, \mathcal{A}) \mathbf{K}^{-1}(\mathcal{A}, \mathcal{A}) \mathbf{K}(:, \mathcal{A})^T $$
or in terms of $\mathbf{G}$:
$$ \mathbf{G} = \mathbf{K}(:, \mathcal{A}) \mathbf{K}^{-1/2}(\mathcal{A}, \mathcal{A}) $$
from mklaren.projection.nystrom import Nystrom
model = Nystrom(rank=15)
model.fit(K)
G_nyst = model.G
print "G shape:", G_nyst.shape, "Error:", np.linalg.norm(K[:, :]-G_nyst.dot(G_nyst.T))
The approximations can be compared also visually.
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(10, 6))
ax[0].imshow(K[:, :])
ax[0].set_title("Original")
ax[1].imshow(G_icd.dot(G_icd.T))
ax[1].set_title("ICD")
ax[2].imshow(G_nyst.dot(G_nyst.T))
ax[2].set_title("Nystroem")
fig.tight_layout()
plt.show()
from mklaren.regression.ridge import RidgeLowRank
for method in "nystrom", "icd":
model = RidgeLowRank(method=method, rank=15)
model.fit([K], y_tr)
yp = model.predict([X_te])
rmse = np.var(y_te-yp)**0.5
print "Method:", method, "Test RMSE:", rmse
Multiple kernels can be passed to RidgeLowRank
model. This corresponds to an uniform combination of kernels in low-rank feature spaces.
from mklaren.kernel.kernel import linear_kernel, poly_kernel
K_exp = Kinterface(data=X_tr, kernel=rbf_kernel, kernel_args={"sigma": 30}) # RBF kernel
K_poly = Kinterface(data=X_tr, kernel=poly_kernel, kernel_args={"p": 3}) # polynomial kernel with degree=3
K_lin = Kinterface(data=X_tr, kernel=linear_kernel) # linear kernel
model = RidgeLowRank(method="nystrom", rank=5, lbd=1)
model.fit([K_exp, K_lin, K_poly], y_tr)
yp = model.predict([X_te, X_te, X_te]) # The features passed to each kernel
rmse = np.var(y_tr-yp)**0.5
print "Test RMSE:", rmse
Alternatively, multiple kernel learning methods can be used to learn an optimal combination of kernels. The mklaren
library implements methods based on centered kernel alignment (Cortes et. al, 2012). The following methods operate on full kernel matrices.
Given $p$ kernel matrices $\mathbf{K}_1, \mathbf{K}_2, ..., \mathbf{K}_p$, centered kernel alignment learns a linear combination of kernels resulting in a combined kernel matrix.
$$ \mathbf{K}_{c\mu} = \sum_{q=1}^p \mu_q \mathbf{K}_{cq} $$where $\mathbf{K}_{cq}$ is the centered kernel matrix: $$\mathbf{K}_{cq} = (\mathbf{I} - \frac{\mathbf{11}^1}{n})\mathbf{K}_q (\mathbf{I} - \frac{\mathbf{11}^1}{n})$$
The following methods perform a constrained optimization over $\mathbf{\mu} = (\mu_1, \mu_2, ... \mu_p)$ maximizing the centered alignment:
$$ A = \frac{<\mathbf{K}_{c\mu}, \mathbf{y}\mathbf{y}^T>_F} {n <\mathbf{K}_{c\mu}, \mathbf{K}_{c\mu}>_F} $$where $\mathbf{y}\mathbf{y}^T$ is the ideal kernel based on target vector $\mathbf{y}$ and $<., .>_F$ is a matrix inner product (the Frobenius product).
align. The align
method sets each $\mu_q$ independently such that
$$ \mu_q = <\mathbf{K}_{c\mu}, \mathbf{y}\mathbf{y}^T>_F^d $$
with a user-defined parameter $d$.
from mklaren.mkl.align import Align
model = Align()
model.fit([K_exp, K_lin, K_poly], y_tr)
model.mu # kernel weights
alignf. The alignf
method optimizes with respect to $\mathbf{\mu}$ to maximize centered alignment.
$$ \text{max}_{\mathbf{\mu}} \frac{<\mathbf{K}_{c\mu}, \mathbf{y}\mathbf{y}^T>_F} {n <\mathbf{K}_{c\mu}, \mathbf{K}_{c\mu}>_F} $$
such that (typ=linear
):
$$ \sum \mu_q = 1$$
or contraining sum of wieghts to a convex combination (typ=convex
):
$$ \sum \mu_q = 1 \text{ and } \mu_q > 0, \forall q$$
from mklaren.mkl.alignf import Alignf
model = Alignf(typ="linear")
model.fit([K_exp, K_lin, K_poly], y_tr)
model.mu # kernel weights
model = Alignf(typ="convex")
model.fit([K_exp, K_lin, K_poly], y_tr)
model.mu # kernel weights (convex combination)
The Kinterface
class provide a callable. A new kernel function is a linear combination of base kernels.
mu = model.mu
combined_kernel = lambda x, y: \
mu[0] * K_exp(x, y) + mu[1] * K_lin(x, y) + mu[2] * K_poly(x, y)
combined_kernel(np.array([0, 1, 2]), np.array([2, 1, 0]), )
The obtained callable can be used e.g. in sklearn
kernel methods.
from sklearn.kernel_ridge import KernelRidge
krr = KernelRidge(kernel=combined_kernel, alpha=10.0)
krr.fit(X_tr, y_tr)
yp = krr.predict(X_te)
print "Test RMSE:", np.var(y_te-yp)**0.5
The Mklaren algorithm (Multiple kernel learning with least-angle regression) perform simultaneous low-rank approximation of multiple kernel learning using least-angle regression (Stražar & Curk, 2016).
from mklaren.mkl.mklaren import Mklaren
K_exp = Kinterface(data=X_tr, kernel=rbf_kernel, kernel_args={"sigma": 30}) # RBF kernel
K_poly = Kinterface(data=X_tr, kernel=poly_kernel, kernel_args={"p": 3}) # polynomial kernel with degree=3
K_lin = Kinterface(data=X_tr, kernel=linear_kernel)
model = Mklaren(rank=15, lbd=1, delta=30)
model.fit([K_exp, K_lin, K_poly], y_tr)
yp = model.predict([X_te, X_te, X_te])
print "Test RMSE:", np.var(y_te-yp)**0.5
Kernel more related to targets are approximated with greater accuracy.
G_exp = model.data[0]["G"]
G_lin = model.data[1]["G"]
G_poly = model.data[2]["G"]
fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(8, 4))
for i, (name, K, G) in enumerate(zip(["exp", "lin", "poly"],
[K_exp, K_lin, K_poly],
[G_exp, G_lin, G_poly])):
ax[0, i].set_title(name)
ax[0, i].imshow(K[:, :])
ax[1, i].imshow(G.dot(G.T))
ax[0, 0].set_ylabel("Original")
ax[1, 0].set_ylabel("Mklaren")
fig.tight_layout()
plt.show()