GPy.kern.src package¶

Subpackages¶

GPy.kern.src.psi_comp package

Submodules¶

GPy.kern.src.ODE_UY module¶

class ODE_UY(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, active_dims=None, name='ode_uy')[source]¶

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: Compute the diagonal of the covariance matrix associated to X.

update_gradients_full(dL_dK, X, X2=None)[source]¶: derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.ODE_UYC module¶

class ODE_UYC(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, ubias=1.0, active_dims=None, name='ode_uyc')[source]¶

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: Compute the diagonal of the covariance matrix associated to X.

update_gradients_full(dL_dK, X, X2=None)[source]¶: derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.ODE_st module¶

class ODE_st(input_dim, a=1.0, b=1.0, c=1.0, variance_Yx=3.0, variance_Yt=1.5, lengthscale_Yx=1.5, lengthscale_Yt=1.5, active_dims=None, name='ode_st')[source]¶

Bases: GPy.kern.src.kern.Kern

kernel resultiong from a first order ODE with OU driving GP

Parameters:	input_dim (int) – the number of input dimension, has to be equal to one varianceU (float) – variance of the driving GP lengthscaleU (float) – lengthscale of the driving GP (sqrt(3)/lengthscaleU) varianceY (float) – ‘variance’ of the transfer function lengthscaleY (float) – ‘lengthscale’ of the transfer function (1/lengthscaleY)
Return type:	kernel object

K(X, X2=None)[source]¶: Compute the covariance matrix between X and X2.

Kdiag(X)[source]¶: Compute the diagonal of the covariance matrix associated to X.

update_gradients_full(dL_dK, X, X2=None)[source]¶: derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.ODE_t module¶

class ODE_t(input_dim, a=1.0, c=1.0, variance_Yt=3.0, lengthscale_Yt=1.5, ubias=1.0, active_dims=None, name='ode_st')[source]¶

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]¶: Compute the covariance matrix between X and X2.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

update_gradients_full(dL_dK, X, X2=None)[source]¶: derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.add module¶

class Add(subkerns, name='sum')[source]¶

Bases: GPy.kern.src.kern.CombinationKernel

Add given list of kernels together. propagates gradients through.

This kernel will take over the active dims of it’s subkernels passed in.

NOTE: The subkernels will be copies of the original kernels, to prevent unexpected behavior.

K(X, X2=None, which_parts=None)[source]¶: Add all kernels together. If a list of parts (of this kernel!) which_parts is given, only the parts of the list are taken to compute the covariance.

Kdiag(X, which_parts=None)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶

Compute the gradient of the objective function with respect to X.

Parameters:	dL_dK (np.ndarray (num_samples x num_inducing)) – An array of gradients of the objective function with respect to the covariance function. X (np.ndarray (num_samples x input_dim)) – Observed data inputs X2 (np.ndarray (num_inducing x input_dim)) – Observed data inputs (optional, defaults to X)

gradients_XX(dL_dK, X, X2)[source]¶: \[\]

frac{partial^2 L}{partial Xpartial X_2} = frac{partial L}{partial K}frac{partial^2 K}{partial Xpartial X_2}

gradients_XX_diag(dL_dKdiag, X)[source]¶: The diagonal of the second derivative w.r.t. X and X2

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

gradients_Z_expectations(dL_psi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶: Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶: Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

input_sensitivity(summarize=True)[source]¶: If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.

psi0(Z, variational_posterior)[source]¶: \[\]

psi_0 = sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]

psi1(Z, variational_posterior)[source]¶: \[\]

psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]

psi2(Z, variational_posterior)[source]¶: \[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]¶: \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

sde()[source]¶: Support adding kernels for sde representation

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

update_gradients_diag(dL_dK, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.basis_funcs module¶

class BasisFuncKernel(input_dim, variance=1.0, active_dims=None, ARD=False, name='basis func kernel')[source]¶

Bases: GPy.kern.src.kern.Kern

Abstract superclass for kernels with explicit basis functions for use in GPy.

This class does NOT automatically add an offset to the design matrix phi!

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X, X2=None)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

concatenate_offset(X)[source]¶: Convenience function to add an offset column to phi. You can use this function to add an offset (bias on y axis) to phi in your custom self._phi(X).

parameters_changed()[source]¶: This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

phi(X)[source]¶

posterior_inf(X=None, posterior=None)[source]¶: Do the posterior inference on the parameters given this kernels functions and the model posterior, which has to be a GPy posterior, usually found at m.posterior, if m is a GPy model. If not given we search for the the highest parent to be a model, containing the posterior, and for X accordingly.

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class ChangePointBasisFuncKernel(input_dim, changepoint, variance=1.0, active_dims=None, ARD=False, name='changepoint')[source]¶

Bases: GPy.kern.src.basis_funcs.BasisFuncKernel

The basis function has a changepoint. That is, it is constant, jumps at a single point (given as changepoint) and is constant again. You can give multiple changepoints. The changepoints are calculated using np.where(self.X < self.changepoint), -1, 1)

class DomainKernel(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='constant_domain')[source]¶

Bases: GPy.kern.src.basis_funcs.LinearSlopeBasisFuncKernel

Create a constant plateou of correlation between start and stop and zero elsewhere. This is a constant shift of the outputs along the yaxis in the range from start to stop.

class LinearSlopeBasisFuncKernel(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='linear_segment')[source]¶

Bases: GPy.kern.src.basis_funcs.BasisFuncKernel

A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.

Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.

class LogisticBasisFuncKernel(input_dim, centers, variance=1.0, slope=1.0, active_dims=None, ARD=False, ARD_slope=True, name='logistic')[source]¶

Bases: GPy.kern.src.basis_funcs.BasisFuncKernel

Create a series of logistic basis functions with centers given. The slope gets computed by datafit. The number of centers determines the number of logistic functions.

parameters_changed()[source]¶: This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class PolynomialBasisFuncKernel(input_dim, degree, variance=1.0, active_dims=None, ARD=True, name='polynomial_basis')[source]¶

Bases: GPy.kern.src.basis_funcs.BasisFuncKernel

A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.

Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.

GPy.kern.src.brownian module¶

class Brownian(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]¶

Bases: GPy.kern.src.kern.Kern

Brownian motion in 1D only.

Negative times are treated as a separate (backwards!) Brownian motion.

Parameters:	input_dim (int) – the number of input dimensions variance (float) –

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

to_dict()[source]¶: Convert the object into a json serializable dictionary. Note: It uses the private method _save_to_input_dict of the parent. :return dict: json serializable dictionary containing the needed information to instantiate the object

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.coregionalize module¶

class Coregionalize(input_dim, output_dim, rank=1, W=None, kappa=None, active_dims=None, name='coregion')[source]¶

Bases: GPy.kern.src.kern.Kern

Covariance function for intrinsic/linear coregionalization models

This covariance has the form:

\[\mathbf{B} = \mathbf{W}\mathbf{W}^\intercal + \mathrm{diag}(kappa)\]

An intrinsic/linear coregionalization covariance function of the form:

\[k_2(x, y)=\mathbf{B} k(x, y)\]

it is obtained as the tensor product between a covariance function k(x, y) and B.

Parameters:

output_dim (int) – number of outputs to coregionalize
rank (int) – number of columns of the W matrix (this parameter is ignored if parameter W is not None)
W (numpy array of dimensionality (num_outpus, W_columns)) – a low rank matrix that determines the correlations between the different outputs, together with kappa it forms the coregionalization matrix B
kappa (numpy array of dimensionality (output_dim, )) – a vector which allows the outputs to behave independently

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

parameters_changed()[source]¶: This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.coregionalize_cython module¶

GPy.kern.src.diff_kern module¶

class DiffKern(base_kern, dimension)[source]¶

Bases: GPy.kern.src.kern.Kern

Diff kernel is a thin wrapper for using partial derivatives of kernels as kernels. Eg. in combination with Multioutput kernel this allows the user to train GPs with observations of latent function and latent function derivatives. NOTE: DiffKern only works when used with Multioutput kernel. Do not use the kernel as standalone

The parameters the kernel needs are: -‘base_kern’: a member of Kernel class that is used for observations -‘dimension’: integer that indigates in which dimensions the partial derivative observations are

K(X, X2=None, dimX2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

dK_dX2_wrap(X, X2)[source]¶

dK_dX_wrap(X, X2)[source]¶

gradients_X(dL_dK, X, X2)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X2(dL_dK, X, X2)[source]¶

parameters_changed()[source]¶: This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

reset_gradients()[source]¶

update_gradients_dK_dX(dL_dK, X, X2=None)[source]¶

update_gradients_dK_dX2(dL_dK, X, X2=None)[source]¶

update_gradients_diag(dL_dK_diag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None, dimX2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

gradient¶

GPy.kern.src.eq_ode1 module¶

class EQ_ODE1(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, decay=None, active_dims=None, name='eq_ode1')[source]¶

Bases: GPy.kern.src.kern.Kern

Covariance function for first order differential equation driven by an exponentiated quadratic covariance.

This outputs of this kernel have the form .. math:

rac{ ext{d}y_j}{ ext{d}t} = sum_{i=1}^R w_{j,i} u_i(t-delta_j) - d_jy_j(t)

where $R$ is the rank of the system, $w_{j,i}$ is the sensitivity of the $j$ is the decay rate of the $j$ are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.

param output_dim:

number of outputs driven by latent function.

type output_dim:

int

param W: sensitivities of each output to the latent driving function.

type W: ndarray (output_dim x rank).

param rank: If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance.

type rank: int

param decay: decay rates for the first order system.

type decay: array of length output_dim.

param delay: delay between latent force and output response.

type delay: array of length output_dim.

param kappa: diagonal term that allows each latent output to have an independent component to the response.

type kappa: array of length output_dim.

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

lnDifErf(z1, z2)[source]¶

GPy.kern.src.eq_ode2 module¶

class EQ_ODE2(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, C=None, B=None, active_dims=None, name='eq_ode2')[source]¶

Bases: GPy.kern.src.kern.Kern

Covariance function for second order differential equation driven by an exponentiated quadratic covariance.

This outputs of this kernel have the form .. math:

rac{ ext{d}^2y_j(t)}{ ext{d}^2t} + C_j rac{ ext{d}y_j(t)}{ ext{d}t} + B_jy_j(t) = sum_{i=1}^R w_{j,i} u_i(t)

where $R$ is the rank of the system, $w_{j,i}$ is the sensitivity of the $j$ is the decay rate of the $j$ and $g_i(t)$ are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.

param output_dim:

number of outputs driven by latent function.

type output_dim:

int

param W: sensitivities of each output to the latent driving function.

type W: ndarray (output_dim x rank).

param rank: If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance.

type rank: int

param C: damper constant for the second order system.

type C: array of length output_dim.

param B: spring constant for the second order system.

type B: array of length output_dim.

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.grid_kerns module¶

class GridKern(input_dim, variance, lengthscale, ARD, active_dims, name, originalDimensions, useGPU=False)[source]¶

Bases: GPy.kern.src.stationary.Stationary

dKd_dLen(X, dimension, lengthscale, X2=None)[source]¶

Derivate of Kernel function wrt lengthscale applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.

dKd_dLen(X, X2) = dKdLen_of_r((X-X2)**2)

dKd_dVar(X, X2=None)[source]¶

Derivative of Kernel function wrt variance applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.

dKd_dVar(X, X2) = dKdVar_of_r((X-X2)**2)

class GridRBF(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='gridRBF', originalDimensions=1, useGPU=False)[source]¶

Bases: GPy.kern.src.grid_kerns.GridKern

Similar to regular RBF but supplemented with methods required for Gaussian grid regression Radial Basis Function kernel, aka squared-exponential, exponentiated quadratic or Gaussian kernel:

\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)\]

K_of_r(r)[source]¶

dK_dr(r)[source]¶

dKdLen_of_r(r, dimCheck, lengthscale)[source]¶: Compute derivative of kernel for dimension wrt lengthscale Computation of derivative changes when lengthscale corresponds to the dimension of the kernel whose derivate is being computed.

dKdVar_of_r(r)[source]¶: Compute derivative of kernel wrt variance

GPy.kern.src.independent_outputs module¶

class Hierarchical(kernels, name='hierarchy')[source]¶

Bases: GPy.kern.src.kern.CombinationKernel

A kernel which can represent a simple hierarchical model.

See Hensman et al 2013, “Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters” http://www.biomedcentral.com/1471-2105/14/252

To construct this kernel, you must pass a list of kernels. the first kernel will be assumed to be the ‘base’ kernel, and will be computed everywhere. For every additional kernel, we assume another layer in the hierachy, with a corresponding column of the input matrix which indexes which function the data are in at that level.

For more, see the ipython notebook documentation on Hierarchical covariances.

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class IndependentOutputs(kernels, index_dim=-1, name='independ')[source]¶

Bases: GPy.kern.src.kern.CombinationKernel

A kernel which can represent several independent functions. this kernel ‘switches off’ parts of the matrix where the output indexes are different.

The index of the functions is given by the last column in the input X the rest of the columns of X are passed to the underlying kernel for computation (in blocks).

Parameters:	kernels – either a kernel, or list of kernels to work with. If it is

a list of kernels the indices in the index_dim, index the kernels you gave!

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.integral module¶

class Integral(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]¶

Bases: GPy.kern.src.kern.Kern

Integral kernel between…

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: I’ve used the fact that we call this method for K_ff when finding the covariance as a hack so I know if I should return K_ff or K_xx. In this case we’re returning K_ff!! $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$

dk_dl(t, tprime, l)[source]¶

g(z)[source]¶

h(z)[source]¶

k_ff(t, tprime, l)[source]¶

k_xf(t, tprime, l)[source]¶

k_xx(t, tprime, l)[source]¶

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.integral_limits module¶

class Integral_Limits(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]¶

Bases: GPy.kern.src.kern.Kern

Integral kernel. This kernel allows 1d histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs (on two dimensions) are the start and end points of each bin. The kernel’s predictions are the latent function which might have generated those binned results.

K(X, X2=None)[source]¶

Note: We have a latent function and an output function. We want to be able to find:

the covariance between values of the output function
the covariance between values of the latent function
the “cross covariance” between values of the output function and the latent function

This method is used by GPy to either get the covariance between the outputs (K_xx) or is used to get the cross covariance (between the latent function and the outputs (K_xf). We take advantage of the places where this function is used:

if X2 is none, then we know that the items being compared (to get the covariance for)

are going to be both from the OUTPUT FUNCTION. - if X2 is not none, then we know that the items being compared are from two different sets (the OUTPUT FUNCTION and the LATENT FUNCTION).

If we want the covariance between values of the LATENT FUNCTION, we take advantage of the fact that we only need that when we do prediction, and this only calls Kdiag (not K). So the covariance between LATENT FUNCTIONS is available from Kdiag.

Kdiag(X)[source]¶: I’ve used the fact that we call this method during prediction (instead of K). When we do prediction we want to know the covariance between LATENT FUNCTIONS (K_ff) (as that’s probably what the user wants). $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$

dk_dl(t, tprime, s, sprime, l)[source]¶

g(z)[source]¶

h(z)[source]¶

k_ff(t, tprime, l)[source]¶: Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required

k_xf(t, tprime, s, l)[source]¶

Covariance between the gradient (latent value) and the actual (observed) value.

Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.

k_xx(t, tprime, s, sprime, l)[source]¶

Covariance between observed values.

s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)

We’re interested in how correlated these two integrals are.

Note: We’ve not multiplied by the variance, this is done in K.

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.kern module¶

class CombinationKernel(kernels, name, extra_dims=[], link_parameters=True)[source]¶

Bases: GPy.kern.src.kern.Kern

Abstract super class for combination kernels. A combination kernel combines (a list of) kernels and works on those. Examples are the HierarchicalKernel or Add and Prod kernels.

Parameters:	kernels (list) – List of kernels to combine (can be only one element) name (str) – name of the combination kernel extra_dims (array-like) – if needed extra dimensions for the combination kernel to work on

input_sensitivity(summarize=True)[source]¶: If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.

parts¶

class Kern(input_dim, active_dims, name, useGPU=False, *a, **kw)[source]¶

Bases: GPy.core.parameterization.parameterized.Parameterized

The base class for a kernel: a positive definite function which forms of a covariance function (kernel).

input_dim:

is the number of dimensions to work on. Make sure to give the tight dimensionality of inputs. You most likely want this to be the integer telling the number of input dimensions of the kernel.

active_dims:

is the active_dimensions of inputs X we will work on. All kernels will get sliced Xes as inputs, if _all_dims_active is not None Only positive integers are allowed in active_dims! if active_dims is None, slicing is switched off and all X will be passed through as given.

Parameters:	input_dim (int) – the number of input dimensions to the function active_dims (array-like\|None) – list of indices on which dimensions this kernel works on, or none if no slicing

Do not instantiate.

K(X, X2)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

add(other, name='sum')[source]¶

Add another kernel to this one.

Parameters:	other (GPy.kern) – the other kernel to be added

static from_dict(input_dict)[source]¶

Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.

Parameters:	input_dict (dict) – Dictionary with all the information needed to instantiate the object.

get_most_significant_input_dimensions(which_indices=None)[source]¶

Determine which dimensions should be plotted

Returns the top three most signification input dimensions

if less then three dimensions, the non existing dimensions are labeled as None, so for a 1 dimensional input this returns (0, None, None).

Parameters:	which_indices (int or tuple(int,int) or tuple(int,int,int)) – force the indices to be the given indices.

gradients_X(dL_dK, X, X2)[source]¶: \[\frac{\partial L}{\partial X} = \frac{\partial L}{\partial K}\frac{\partial K}{\partial X}\]

gradients_XX(dL_dK, X, X2, cov=True)[source]¶: \[\frac{\partial^2 L}{\partial X\partial X_2} = \frac{\partial L}{\partial K}\frac{\partial^2 K}{\partial X\partial X_2}\]

gradients_XX_diag(dL_dKdiag, X, cov=True)[source]¶: The diagonal of the second derivative w.r.t. X and X2

gradients_X_X2(dL_dK, X, X2)[source]¶

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior, psi0=None, psi1=None, psi2=None)[source]¶: Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶: Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

input_sensitivity(summarize=True)[source]¶

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

plot(*args, **kwargs)¶

plot_ARD(filtering=None, legend=False, canvas=None, **kwargs)¶

If an ARD kernel is present, plot a bar representation using matplotlib

Parameters:	fignum – figure number of the plot filtering (list of names to use for ARD plot) – list of names, which to use for plotting ARD parameters. Only kernels which match names in the list of names in filtering will be used for plotting.

plot_covariance(x=None, label=None, plot_limits=None, visible_dims=None, resolution=None, projection='2d', levels=20, **kwargs)¶

Plot a kernel covariance w.r.t. another x.

Parameters:

x (array-like) – the value to use for the other kernel argument (kernels are a function of two variables!)
plot_limits (Either (xmin, xmax) for 1D or (xmin, xmax, ymin, ymax) / ((xmin, xmax), (ymin, ymax)) for 2D) – the range over which to plot the kernel
visible_dims (array-like) – input dimensions (!) to use for x. Make sure to select 2 or less dimensions to plot.
projection ({2d|3d}) – What projection shall we use to plot the kernel?
levels (int) – for 2D projection, how many levels for the contour plot to use?
kwargs – valid kwargs for your specific plotting library

Resolution:

the resolution of the lines used in plotting. for 2D this defines the grid for kernel evaluation.

prod(other, name='mul')[source]¶

Multiply two kernels (either on the same space, or on the tensor product of the input space).

Parameters:	other (GPy.kern) – the other kernel to be added

psi0(Z, variational_posterior)[source]¶: \[\psi_0 = \sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]\]

psi1(Z, variational_posterior)[source]¶: \[\psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]\]

psi2(Z, variational_posterior)[source]¶: \[\psi_2^{m,m'} = \sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m'})]\]

psi2n(Z, variational_posterior)[source]¶: \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

reset_gradients()[source]¶

to_dict()[source]¶

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.kernel_slice_operations module¶

Created on 11 Mar 2014

@author: @mzwiessele

This module provides a meta class for the kernels. The meta class is for slicing the inputs (X, X2) for the kernels, before K (or any other method involving X) gets calls. The _all_dims_active of a kernel decide which dimensions the kernel works on.

class KernCallsViaSlicerMeta[source]¶: Bases: paramz.parameterized.ParametersChangedMeta

put_clean(dct, name, func)[source]¶

GPy.kern.src.linear module¶

class Linear(input_dim, variances=None, ARD=False, active_dims=None, name='linear')[source]¶

Bases: GPy.kern.src.kern.Kern

Linear kernel

\[k(x,y) = \sum_{i=1}^{\text{input_dim}} \sigma^2_i x_iy_i\]

Parameters:	input_dim (int) – the number of input dimensions variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances $\sigma^2_i$ ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type:	kernel object

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_XX(dL_dK, X, X2=None)[source]¶

Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:

returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].

..math:

rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}

..returns:

dL2_dXdX2: [NxMxQxQ] for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None)

Thus, we return the second derivative in X2.

gradients_XX_diag(dL_dKdiag, X)[source]¶: The diagonal of the second derivative w.r.t. X and X2

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶: Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶: Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

input_sensitivity(summarize=True)[source]¶

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

psi0(Z, variational_posterior)[source]¶: \[\]

psi_0 = sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]

psi1(Z, variational_posterior)[source]¶: \[\]

psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]

psi2(Z, variational_posterior)[source]¶: \[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]¶: \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

to_dict()[source]¶

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class LinearFull(input_dim, rank, W=None, kappa=None, active_dims=None, name='linear_full')[source]¶

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.mlp module¶

class MLP(input_dim, variance=1.0, weight_variance=1.0, bias_variance=1.0, ARD=False, active_dims=None, name='mlp')[source]¶

Bases: GPy.kern.src.kern.Kern

Multi layer perceptron kernel (also known as arc sine kernel or neural network kernel)

\[k(x,y) = \sigma^{2}\frac{2}{\pi } \text{asin} \left ( \frac{ \sigma_w^2 x^\top y+\sigma_b^2}{\sqrt{\sigma_w^2x^\top x + \sigma_b^2 + 1}\sqrt{\sigma_w^2 y^\top y + \sigma_b^2 +1}} \right )\]

Parameters:

input_dim (int) – the number of input dimensions
variance (float) – the variance $\sigma^2$
weight_variance (array or list of the appropriate size (or float if there is only one weight variance parameter)) – the vector of the variances of the prior over input weights in the neural network $\sigma^2_w$
bias_variance – the variance of the prior over bias parameters $\sigma^2_b$
ARD (Boolean) – Auto Relevance Determination. If equal to “False”, the kernel is isotropic (ie. one weight variance parameter sigma^2_w), otherwise there is one weight variance parameter per dimension.

Return type:

Kernpart object

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: Compute the diagonal of the covariance matrix for X.

gradients_X(dL_dK, X, X2)[source]¶: Derivative of the covariance matrix with respect to X

gradients_X_X2(dL_dK, X, X2)[source]¶: Derivative of the covariance matrix with respect to X

gradients_X_diag(dL_dKdiag, X)[source]¶: Gradient of diagonal of covariance with respect to X

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Derivative of the covariance with respect to the parameters.

GPy.kern.src.multidimensional_integral_limits module¶

class Multidimensional_Integral_Limits(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]¶

Bases: GPy.kern.src.kern.Kern

Integral kernel, can include limits on each integral value. This kernel allows an n-dimensional histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs are the start and end points of each bin: Pairs of inputs act as the limits on each bin. So inputs 4 and 5 provide the start and end values of each bin in the 3rd dimension. The kernel’s predictions are the latent function which might have generated those binned results.

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: I’ve used the fact that we call this method for K_ff when finding the covariance as a hack so I know if I should return K_ff or K_xx. In this case we’re returning K_ff!! $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$

calc_K_xx_wo_variance(X)[source]¶: Calculates K_xx without the variance term

dk_dl(t, tprime, s, sprime, l)[source]¶

g(z)[source]¶

h(z)[source]¶

k_ff(t, tprime, l)[source]¶: Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required

k_xf(t, tprime, s, l)[source]¶

Covariance between the gradient (latent value) and the actual (observed) value.

Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.

k_xx(t, tprime, s, sprime, l)[source]¶

Covariance between observed values.

s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)

We’re interested in how correlated these two integrals are.

Note: We’ve not multiplied by the variance, this is done in K.

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.multioutput_derivative_kern module¶

class KernWrapper(fk, fug, fg, base_kern)[source]¶

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

gradient¶

class MultioutputDerivativeKern(kernels, cross_covariances={}, name='MultioutputDerivativeKern')[source]¶

Bases: GPy.kern.src.multioutput_kern.MultioutputKern

Multioutput derivative kernel is a meta class for combining different kernels for multioutput GPs. Multioutput derivative kernel is only a thin wrapper for Multioutput kernel for user not having to define cross covariances.

GPy.kern.src.multioutput_kern module¶

class MultioutputKern(kernels, cross_covariances={}, name='MultioutputKern')[source]¶

Bases: GPy.kern.src.kern.CombinationKernel

Multioutput kernel is a meta class for combining different kernels for multioutput GPs.

As an example let us have inputs x1 for output 1 with covariance k1 and x2 for output 2 with covariance k2. In addition, we need to define the cross covariances k12(x1,x2) and k21(x2,x1). Then the kernel becomes: k([x1,x2],[x1,x2]) = [k1(x1,x1) k12(x1, x2); k21(x2, x1), k2(x2,x2)]

For the kernel, the kernels of outputs are given as list in param “kernels” and cross covariances are given in param “cross_covariances” as a dictionary of tuples (i,j) as keys. If no cross covariance is given, it defaults to zero, as in k12(x1,x2)=0.

In the cross covariance dictionary, the value needs to be a struct with elements -‘kernel’: a member of Kernel class that stores the hyper parameters to be updated when optimizing the GP -‘K’: function defining the cross covariance -‘update_gradients_full’: a function to be used for updating gradients -‘gradients_X’: gives a gradient of the cross covariance with respect to the first input

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

reset_gradients()[source]¶

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class ZeroKern[source]¶

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

gradient¶

GPy.kern.src.periodic module¶

class Periodic(input_dim, variance, lengthscale, period, n_freq, lower, upper, active_dims, name)[source]¶

Bases: GPy.kern.src.kern.Kern

Parameters:	variance (float) – the variance of the Matern kernel lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel period (float) – the period n_freq (int) – the number of frequencies considered for the periodic subspace
Return type:	kernel object

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

class PeriodicExponential(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_exponential')[source]¶

Bases: GPy.kern.src.periodic.Periodic

Kernel of the periodic subspace (up to a given frequency) of a exponential (Matern 1/2) RKHS.

Only defined for input_dim=1.

Gram_matrix()[source]¶

parameters_changed()[source]¶: This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

update_gradients_full(dL_dK, X, X2=None)[source]¶: derivative of the covariance matrix with respect to the parameters (shape is N x num_inducing x num_params)

class PeriodicMatern32(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern32')[source]¶

Bases: GPy.kern.src.periodic.Periodic

Kernel of the periodic subspace (up to a given frequency) of a Matern 3/2 RKHS. Only defined for input_dim=1.

Parameters:	input_dim (int) – the number of input dimensions variance (float) – the variance of the Matern kernel lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel period (float) – the period n_freq (int) – the number of frequencies considered for the periodic subspace
Return type:	kernel object

Gram_matrix()[source]¶

parameters_changed()[source]¶: This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

update_gradients_full(dL_dK, X, X2)[source]¶: derivative of the covariance matrix with respect to the parameters (shape is num_data x num_inducing x num_params)

class PeriodicMatern52(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern52')[source]¶

Bases: GPy.kern.src.periodic.Periodic

Kernel of the periodic subspace (up to a given frequency) of a Matern 5/2 RKHS. Only defined for input_dim=1.

Parameters:	input_dim (int) – the number of input dimensions variance (float) – the variance of the Matern kernel lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel period (float) – the period n_freq (int) – the number of frequencies considered for the periodic subspace
Return type:	kernel object

Gram_matrix()[source]¶

parameters_changed()[source]¶: This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.poly module¶

class Poly(input_dim, variance=1.0, scale=1.0, bias=1.0, order=3.0, active_dims=None, name='poly')[source]¶

Bases: GPy.kern.src.kern.Kern

Polynomial kernel

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.prod module¶

class Prod(kernels, name='mul')[source]¶

Bases: GPy.kern.src.kern.CombinationKernel

Computes the product of 2 kernels

Parameters:	k2 (k1,) – the kernels to multiply
Return type:	kernel object

K(X, X2=None, which_parts=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X, which_parts=None)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

input_sensitivity(summarize=True)[source]¶: If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.

sde()[source]¶

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

dkron(A, dA, B, dB, operation='prod')[source]¶

Function computes the derivative of Kronecker product A*B (or Kronecker sum A+B).

A: 2D matrix: Some matrix
dA: 3D (or 2D matrix): Derivarives of A
B: 2D matrix: Some matrix
dB: 3D (or 2D matrix): Derivarives of B
operation: str ‘prod’ or ‘sum’: Which operation is considered. If the operation is ‘sum’ it is assumed that A and are square matrices.s
Output:: dC: 3D matrix Derivative of Kronecker product A*B (or Kronecker sum A+B)

numpy_invalid_op_as_exception(func)[source]¶: A decorator that allows catching numpy invalid operations as exceptions (the default behaviour is raising warnings).

GPy.kern.src.rbf module¶

class RBF(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='rbf', useGPU=False, inv_l=False)[source]¶

Bases: GPy.kern.src.stationary.Stationary

Radial Basis Function kernel, aka squared-exponential, exponentiated quadratic or Gaussian kernel:

\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)\]

K_of_r(r)[source]¶

dK2_dXdX2(X, X2, dimX, dimX2)[source]¶

dK2_dlengthscaledX(X, X2, dimX)[source]¶

dK2_dlengthscaledX2(X, X2, dimX2)[source]¶

dK2_drdr(r)[source]¶

dK2_drdr_diag()[source]¶: Second order derivative of K in r_{i,i}. The diagonal entries are always zero, so we do not give it here.

dK2_dvariancedX(X, X2, dim)[source]¶

dK2_dvariancedX2(X, X2, dim)[source]¶

dK3_dlengthscaledXdX2(X, X2, dimX, dimX2)[source]¶

dK3_dvariancedXdX2(X, X2, dim, dimX2)[source]¶

dK_dX(X, X2, dimX)[source]¶

dK_dX2(X, X2, dimX2)[source]¶

dK_dr(r)[source]¶

dK_dvariance(X, X2)[source]¶

get_one_dimensional_kernel(dim)[source]¶: Specially intended for Grid regression.

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶: Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶: Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

parameters_changed()[source]¶: This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:paramz.param.Observable.add_observer

psi0(Z, variational_posterior)[source]¶: \[\]

psi_0 = sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]

psi1(Z, variational_posterior)[source]¶: \[\]

psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]

psi2(Z, variational_posterior)[source]¶: \[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]¶: \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

spectrum(omega)[source]¶

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

update_gradients_diag(dL_dKdiag, X)[source]¶

Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.

See also update_gradients_full

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]¶: Given the derivative of the objective wrt the covariance matrix (dL_dK), compute the gradient wrt the parameters of this kernel, and store in the parameters object as e.g. self.variance.gradient

GPy.kern.src.sde_brownian module¶

Classes in this module enhance Brownian motion covariance function with the Stochastic Differential Equation (SDE) functionality.

class sde_Brownian(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]¶

Bases: GPy.kern.src.brownian.Brownian

Class provide extra functionality to transfer this covariance function into SDE form.

Linear kernel:

\[k(x,y) = \sigma^2 min(x,y)\]

sde()[source]¶: Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

GPy.kern.src.sde_linear module¶

Classes in this module enhance Linear covariance function with the Stochastic Differential Equation (SDE) functionality.

class sde_Linear(input_dim, X, variances=None, ARD=False, active_dims=None, name='linear')[source]¶

Bases: GPy.kern.src.linear.Linear

Class provide extra functionality to transfer this covariance function into SDE form.

Linear kernel:

\[k(x,y) = \sum_{i=1}^{input dim} \sigma^2_i x_iy_i\]

Modify the init method, because one extra parameter is required. X - points on the X axis.

sde()[source]¶: Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

GPy.kern.src.sde_matern module¶

Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_Matern32(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]¶

Bases: GPy.kern.src.stationary.Matern32

Class provide extra functionality to transfer this covariance function into SDE forrm.

Matern 3/2 kernel:

\[k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]¶: Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

class sde_Matern52(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]¶

Bases: GPy.kern.src.stationary.Matern52

Class provide extra functionality to transfer this covariance function into SDE forrm.

Matern 5/2 kernel:

\[k(r) = \sigma^2 (1 + \sqrt{5} r + \]

rac{5}{3}r^2) exp(- sqrt{5} r) ext{ where } r = sqrt{sum_{i=1}^{input dim} rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]¶: Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

GPy.kern.src.sde_standard_periodic module¶

Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_StdPeriodic(*args, **kwargs)[source]¶

Bases: GPy.kern.src.standard_periodic.StdPeriodic

Class provide extra functionality to transfer this covariance function into SDE form.

Standard Periodic kernel:

\[k(x,y) = heta_1 \exp \left[ - \]

rac{1}{2} {}sum_{i=1}^{input_dim}: left(

rac{sin( rac{pi}{lambda_i} (x_i - y_i) )}{l_i} ight)^2 ight] }

Init constructior.

Two optinal extra parameters are added in addition to the ones in StdPeriodic kernel.

Parameters:	approx_order (int) – approximation order for the RBF covariance. (Default 7) balance (bool) – Whether to balance this kernel separately. (Defaulf False). Model has a separate parameter for balancing.

sde()[source]¶

Return the state space representation of the standard periodic covariance.

! Note: one must constrain lengthscale not to drop below 0.2. (independently of approximation order) After this Bessel functions of the first becomes NaN. Rescaling time variable might help.

! Note: one must keep period also not very low. Because then the gradients wrt wavelength become ustable. However this might depend on the data. For test example with 300 data points the low limit is 0.15.

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

seriescoeff(m=6, lengthScale=1.0, magnSigma2=1.0, true_covariance=False)[source]¶

Calculate the coefficients q_j^2 for the covariance function approximation:

k( au) = sum_{j=0}^{+infty} q_j^2 cos(jomega_0 au)

Reference is:

[1] Arno Solin and Simo Särkkä (2014). Explicit link between periodic: covariance functions and state space models. In Proceedings of the Seventeenth International Conference on Artifcial Intelligence and Statistics (AISTATS 2014). JMLR: W&CP, volume 33.
Note! Only the infinite approximation (through Bessel function): is currently implemented.

m: int: Degree of approximation. Default 6.
lengthScale: float: Length scale parameter in the kerenl
magnSigma2:float: Multiplier in front of the kernel.

coeffs: array(m+1): Covariance series coefficients
coeffs_dl: array(m+1): Derivatives of the coefficients with respect to lengthscale.

GPy.kern.src.sde_static module¶

Classes in this module enhance Static covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_Bias(input_dim, variance=1.0, active_dims=None, name='bias')[source]¶

Bases: GPy.kern.src.static.Bias

Class provide extra functionality to transfer this covariance function into SDE forrm.

Bias kernel:

\[k(x,y) = lpha\]

sde()[source]¶: Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

class sde_White(input_dim, variance=1.0, active_dims=None, name='white')[source]¶

Bases: GPy.kern.src.static.White

Class provide extra functionality to transfer this covariance function into SDE forrm.

White kernel:

\[k(x,y) = lpha*\delta(x-y)\]

sde()[source]¶: Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

GPy.kern.src.sde_stationary module¶

Classes in this module enhance several stationary covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_Exponential(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]¶

Bases: GPy.kern.src.stationary.Exponential

Class provide extra functionality to transfer this covariance function into SDE form.

Exponential kernel:

\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]¶: Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

class sde_RBF(*args, **kwargs)[source]¶

Bases: GPy.kern.src.rbf.RBF

Class provide extra functionality to transfer this covariance function into SDE form.

Radial Basis Function kernel:

\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]

rac{(x_i-y_i)^2}{ell_i^2} }

Init constructior.

Two optinal extra parameters are added in addition to the ones in RBF kernel.

Parameters:	approx_order (int) – approximation order for the RBF covariance. (Default 10) balance (bool) – Whether to balance this kernel separately. (Defaulf True). Model has a separate parameter for balancing.

sde()[source]¶

Return the state space representation of the covariance.

Note! For Sparse GP inference too small or two high values of lengthscale lead to instabilities. This is because Qc are too high or too low and P_inf are not full rank. This effect depends on approximatio order. For N = 10. lengthscale must be in (0.8,8). For other N tests must be conducted. N=6: (0.06,31) Variance should be within reasonable bounds as well, but its dependence is linear.

The above facts do not take into accout regularization.

sde_update_gradient_full(gradients)[source]¶: Update gradient in the order in which parameters are represented in the kernel

class sde_RatQuad(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]¶

Bases: GPy.kern.src.stationary.RatQuad

Class provide extra functionality to transfer this covariance function into SDE form.

Rational Quadratic kernel:

\[k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- lpha} \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]¶: Return the state space representation of the covariance.

GPy.kern.src.spline module¶

class Spline(input_dim, variance=1.0, c=1.0, active_dims=None, name='spline')[source]¶

Bases: GPy.kern.src.kern.Kern

Linear spline kernel. You need to specify 2 parameters: the variance and c. The variance is defined in powers of 10. Thus specifying -2 means 10^-2. The parameter c allows to define the stiffness of the spline fit. A very stiff spline equals linear regression. See https://www.youtube.com/watch?v=50Vgw11qn0o starting at minute 1:17:28 Lit: Wahba, 1990

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.splitKern module¶

A new kernel

class DEtime(kernel, idx_p, Xp, index_dim=-1, name='DiffGenomeKern')[source]¶

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class SplitKern(kernel, Xp, index_dim=-1, name='SplitKern')[source]¶

Bases: GPy.kern.src.kern.CombinationKernel

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class SplitKern_cross(kernel, Xp, name='SplitKern_cross')[source]¶

Bases: GPy.kern.src.kern.Kern

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.standard_periodic module¶

The standard periodic kernel which mentioned in:

[1] Gaussian Processes for Machine Learning, C. E. Rasmussen, C. K. I. Williams. The MIT Press, 2005.

[2] Introduction to Gaussian processes. D. J. C. MacKay. In C. M. Bishop, editor, Neural Networks and Machine Learning, pages 133-165. Springer, 1998.

class StdPeriodic(input_dim, variance=1.0, period=None, lengthscale=None, ARD1=False, ARD2=False, active_dims=None, name='std_periodic', useGPU=False)[source]¶

Bases: GPy.kern.src.kern.Kern

Standart periodic kernel

\[k(x,y) = heta_1 \exp \left[ - \]

rac{1}{2} sum_{i=1}^{input_dim}: left(

rac{sin( rac{pi}{T_i} (x_i - y_i) )}{l_i} ight)^2 ight] }

param input_dim:

the number of input dimensions

type input_dim: int

param variance: the variance :math:` heta_1` in the formula above

type variance: float

param period: the vector of periods $\T_i$. If None then 1.0 is assumed.

type period: array or list of the appropriate size (or float if there is only one period parameter)

param lengthscale:

the vector of lengthscale $\l_i$. If None then 1.0 is assumed.

type lengthscale:

array or list of the appropriate size (or float if there is only one lengthscale parameter)

param ARD1: Auto Relevance Determination with respect to period. If equal to “False” one single period parameter $\T_i$ for each dimension is assumed, otherwise there is one lengthscale parameter per dimension.

type ARD1: Boolean

param ARD2: Auto Relevance Determination with respect to lengthscale. If equal to “False” one single lengthscale parameter $l_i$ for each dimension is assumed, otherwise there is one lengthscale parameter per dimension.

type ARD2: Boolean

param active_dims:

indices of dimensions which are used in the computation of the kernel

type active_dims:

array or list of the appropriate size

param name: Name of the kernel for output

:type String :param useGPU: whether of not use GPU :type Boolean

K(X, X2=None)[source]¶: Compute the covariance matrix between X and X2.

Kdiag(X)[source]¶: Compute the diagonal of the covariance matrix associated to X.

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

input_sensitivity(summarize=True)[source]¶

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

parameters_changed()[source]¶: This functions deals as a callback for each optimization iteration. If one optimization step was successfull and the parameters this callback function will be called to be able to update any precomputations for the kernel.

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

update_gradients_diag(dL_dKdiag, X)[source]¶: derivative of the diagonal of the covariance matrix with respect to the parameters.

update_gradients_full(dL_dK, X, X2=None)[source]¶: derivative of the covariance matrix with respect to the parameters.

GPy.kern.src.static module¶

class Bias(input_dim, variance=1.0, active_dims=None, name='bias')[source]¶

Bases: GPy.kern.src.static.Static

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

psi2(Z, variational_posterior)[source]¶: \[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]¶: \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

to_dict()[source]¶

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class Fixed(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='fixed')[source]¶

Bases: GPy.kern.src.static.Static

Parameters:	input_dim (int) – the number of input dimensions variance (float) – the variance of the kernel

K(X, X2)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

psi2(Z, variational_posterior)[source]¶: \[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]¶: \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class Precomputed(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='precomputed')[source]¶

Bases: GPy.kern.src.static.Fixed

Class for precomputed kernels, indexed by columns in X

Usage example:

import numpy as np from GPy.models import GPClassification from GPy.kern import Precomputed from sklearn.cross_validation import LeaveOneOut

n = 10 d = 100 X = np.arange(n).reshape((n,1)) # column vector of indices y = 2*np.random.binomial(1,0.5,(n,1))-1 X0 = np.random.randn(n,d) k = np.dot(X0,X0.T) kern = Precomputed(1,k) # k is a n x n covariance matrix

cv = LeaveOneOut(n) ypred = y.copy() for train, test in cv:

m = GPClassification(X[train], y[train], kernel=kern) m.optimize() ypred[test] = 2*(m.predict(X[test])[0]>0.5)-1

Parameters:	input_dim (int) – the number of input dimensions variance (float) – the variance of the kernel

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class Static(input_dim, variance, active_dims, name)[source]¶

Bases: GPy.kern.src.kern.Kern

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_XX(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial^2 L}{partial Xpartial X_2} = frac{partial L}{partial K}frac{partial^2 K}{partial Xpartial X_2}

gradients_XX_diag(dL_dKdiag, X, cov=False)[source]¶: The diagonal of the second derivative w.r.t. X and X2

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶: Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶: Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

input_sensitivity(summarize=True)[source]¶

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

psi0(Z, variational_posterior)[source]¶: \[\]

psi_0 = sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]

psi1(Z, variational_posterior)[source]¶: \[\]

psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]

psi2(Z, variational_posterior)[source]¶: \[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

class White(input_dim, variance=1.0, active_dims=None, name='white')[source]¶

Bases: GPy.kern.src.static.Static

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

psi2(Z, variational_posterior)[source]¶: \[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]¶: \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

to_dict()[source]¶

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class WhiteHeteroscedastic(input_dim, num_data, variance=1.0, active_dims=None, name='white_hetero')[source]¶

Bases: GPy.kern.src.static.Static

A heteroscedastic White kernel (nugget/noise). It defines one variance (nugget) per input sample.

Prediction excludes any noise learnt by this Kernel, so be careful using this kernel.

You can plot the errors learnt by this kernel by something similar as: plt.errorbar(m.X, m.Y, yerr=2*np.sqrt(m.kern.white.variance))

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

psi2(Z, variational_posterior)[source]¶: \[\]

psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]

psi2n(Z, variational_posterior)[source]¶: \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]

Thus, we do not sum out n, compared to psi2

to_dict()[source]¶

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.stationary module¶

class Cosine(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Cosine')[source]¶

Bases: GPy.kern.src.stationary.Stationary

Cosine Covariance function

\[k(r) = \sigma^2 \cos(r)\]

K_of_r(r)[source]¶

dK_dr(r)[source]¶

class ExpQuad(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='ExpQuad')[source]¶

Bases: GPy.kern.src.stationary.Stationary

The Exponentiated quadratic covariance function.

\[k(r) = \sigma^2 \exp(- 0.5 r^2)\]

notes::

This is exactly the same as the RBF covariance function, but the RBF implementation also has some features for doing variational kernels (the psi-statistics).

K_of_r(r)[source]¶

dK_dr(r)[source]¶

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

class ExpQuadCosine(input_dim, variance=1.0, lengthscale=None, period=1.0, ARD=False, active_dims=None, name='ExpQuadCosine')[source]¶

Bases: GPy.kern.src.stationary.Stationary

Exponentiated quadratic multiplied by cosine covariance function (spectral mixture kernel).

\[k(r) = \sigma^2 \exp(-2\pi^2r^2)\cos(2\pi r/T)\]

K_of_r(r)[source]¶

dK_dr(r)[source]¶

update_gradients_diag(dL_dKdiag, X)[source]¶

Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.

See also update_gradients_full

update_gradients_full(dL_dK, X, X2=None)[source]¶: Given the derivative of the objective wrt the covariance matrix (dL_dK), compute the gradient wrt the parameters of this kernel, and store in the parameters object as e.g. self.variance.gradient

class Exponential(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]¶

Bases: GPy.kern.src.stationary.Stationary

K_of_r(r)[source]¶

dK_dr(r)[source]¶

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

class Matern32(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]¶

Bases: GPy.kern.src.stationary.Stationary

Matern 3/2 kernel:

\[k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{\text{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }\]

Gram_matrix(F, F1, F2, lower, upper)[source]¶

Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.

Parameters:	F (np.array) – vector of functions F1 (np.array) – vector of derivatives of F F2 (np.array) – vector of second derivatives of F lower,upper (floats) – boundaries of the input domain

K_of_r(r)[source]¶

dK_dr(r)[source]¶

sde()[source]¶: Return the state space representation of the covariance.

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

class Matern52(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]¶

Bases: GPy.kern.src.stationary.Stationary

Matern 5/2 kernel:

\[k(r) = \sigma^2 (1 + \sqrt{5} r + \frac53 r^2) \exp(- \sqrt{5} r)\]

Gram_matrix(F, F1, F2, F3, lower, upper)[source]¶

Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.

Parameters:	F (np.array) – vector of functions F1 (np.array) – vector of derivatives of F F2 (np.array) – vector of second derivatives of F F3 (np.array) – vector of third derivatives of F lower,upper (floats) – boundaries of the input domain

K_of_r(r)[source]¶

dK_dr(r)[source]¶

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

class OU(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='OU')[source]¶

Bases: GPy.kern.src.stationary.Stationary

OU kernel:

\[k(r) = \sigma^2 \exp(- r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{ ext{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }\]

K_of_r(r)[source]¶

dK_dr(r)[source]¶

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

class RatQuad(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]¶

Bases: GPy.kern.src.stationary.Stationary

Rational Quadratic Kernel

\[k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- \alpha}\]

K_of_r(r)[source]¶

dK_dr(r)[source]¶

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

update_gradients_diag(dL_dKdiag, X)[source]¶

Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.

See also update_gradients_full

update_gradients_full(dL_dK, X, X2=None)[source]¶: Given the derivative of the objective wrt the covariance matrix (dL_dK), compute the gradient wrt the parameters of this kernel, and store in the parameters object as e.g. self.variance.gradient

class Sinc(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Sinc')[source]¶

Bases: GPy.kern.src.stationary.Stationary

Sinc Covariance function

\[k(r) = \sigma^2 \sinc(\pi r)\]

K_of_r(r)[source]¶

dK_dr(r)[source]¶

class Stationary(input_dim, variance, lengthscale, ARD, active_dims, name, useGPU=False)[source]¶

Bases: GPy.kern.src.kern.Kern

Stationary kernels (covariance functions).

Stationary covariance fucntion depend only on r, where r is defined as

\[r(x, x') = \sqrt{ \sum_{q=1}^Q (x_q - x'_q)^2 }\]

The covariance function k(x, x’ can then be written k(r).

In this implementation, r is scaled by the lengthscales parameter(s):

\[r(x, x') = \sqrt{ \sum_{q=1}^Q \frac{(x_q - x'_q)^2}{\ell_q^2} }.\]

By default, there’s only one lengthscale: seaprate lengthscales for each dimension can be enables by setting ARD=True.

To implement a stationary covariance function using this class, one need only define the covariance function k(r), and it derivative.

``` def K_of_r(self, r):

return foo

def dK_dr(self, r):: return bar

```

The lengthscale(s) and variance parameters are added to the structure automatically.

Thanks to @strongh: In Stationary, a covariance function is defined in GPy as stationary when it depends only on the l2-norm |x_1 - x_2 |. However this is the typical definition of isotropy, while stationarity is usually a bit more relaxed. The more common version of stationarity is that the covariance is a function of x_1 - x_2 (See e.g. R&W first paragraph of section 4.1).

K(X, X2=None)[source]¶

Kernel function applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.

K(X, X2) = K_of_r((X-X2)**2)

K_of_r(r)[source]¶

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

dK2_drdr(r)[source]¶

dK2_drdr_diag()[source]¶: Second order derivative of K in r_{i,i}. The diagonal entries are always zero, so we do not give it here.

dK2_drdr_via_X(X, X2)[source]¶

dK_dr(r)[source]¶

dK_dr_via_X(X, X2)[source]¶: compute the derivative of K wrt X going through X

dgradients2_dXdX2(X, X2, dimX, dimX2)[source]¶

dgradients_dX(X, X2, dimX)[source]¶

dgradients_dX2(X, X2, dimX2)[source]¶

get_one_dimensional_kernel(dimensions)[source]¶: Specially intended for the grid regression case For a given covariance kernel, this method returns the corresponding kernel for a single dimension. The resulting values can then be used in the algorithm for reconstructing the full covariance matrix.

gradients_X(dL_dK, X, X2=None)[source]¶: Given the derivative of the objective wrt K (dL_dK), compute the derivative wrt X

gradients_XX(dL_dK, X, X2=None)[source]¶

Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:

returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].

..math:

rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}

..returns:

dL2_dXdX2: [NxMxQxQ] in the cov=True case, or [NxMxQ] in the cov=False case,

for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None) Thus, we return the second derivative in X2.

gradients_XX_diag(dL_dK_diag, X)[source]¶

Given the derivative of the objective dL_dK, compute the second derivative of K wrt X:

..math:

rac{partial^2 K}{partial Xpartial X}

..returns:

dL2_dXdX: [NxQxQ]

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

input_sensitivity(summarize=True)[source]¶

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

reset_gradients()[source]¶

update_gradients_diag(dL_dKdiag, X)[source]¶

Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.

See also update_gradients_full

update_gradients_direct(dL_dVar, dL_dLen)[source]¶: Specially intended for the Grid regression case. Given the computed log likelihood derivates, update the corresponding kernel and likelihood gradients. Useful for when gradients have been computed a priori.

update_gradients_full(dL_dK, X, X2=None, reset=True)[source]¶: Given the derivative of the objective wrt the covariance matrix (dL_dK), compute the gradient wrt the parameters of this kernel, and store in the parameters object as e.g. self.variance.gradient

GPy.kern.src.stationary_cython module¶

GPy.kern.src.symbolic module¶

GPy.kern.src.symmetric module¶

class Symmetric(base_kernel, transform, symmetry_type='even')[source]¶

Bases: GPy.kern.src.kern.Kern

Symmetric kernel that models a function with even or odd symmetry:

For even symmetry we have:

\[f(x) = f(Ax)\]

we then model the function as:

\[f(x) = g(x) + g(Ax)\]

the corresponding kernel is:

\[k(x, x') + k(Ax, x') + k(x, Ax') + k(Ax, Ax')\]

For odd symmetry we have:

\[f(x) = -f(Ax)\]

it does this by modelling:

\[f(x) = g(x) - g(Ax)\]

with kernel

\[k(x, x') - k(Ax, x') - k(x, Ax') + k(Ax, Ax')\]

where k(x, x’) is the kernel of g(x)

Parameters:	base_kernel – kernel to make symmetric transform – transformation matrix describing symmetry plane, A in equations above symmetry_type – ‘odd’ or ‘even’ depending on the symmetry needed

K(X, X2)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

update_gradients_diag(dL_dK, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2)[source]¶: Set the gradients of all parameters when doing full (N) inference.

GPy.kern.src.trunclinear module¶

class TruncLinear(input_dim, variances=None, delta=None, ARD=False, active_dims=None, name='linear')[source]¶

Bases: GPy.kern.src.kern.Kern

Truncated Linear kernel

\[k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)\]

Parameters:	input_dim (int) – the number of input dimensions variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances $\sigma^2_i$ ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type:	kernel object

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

input_sensitivity()[source]¶

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.

class TruncLinear_inf(input_dim, interval, variances=None, ARD=False, active_dims=None, name='linear')[source]¶

Bases: GPy.kern.src.kern.Kern

Truncated Linear kernel

\[k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)\]

Parameters:	input_dim (int) – the number of input dimensions variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances $\sigma^2_i$ ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type:	kernel object

K(X, X2=None)[source]¶

Compute the kernel function.

\[K_{ij} = k(X_i, X_j)\]

Parameters:	X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.

Kdiag(X)[source]¶: The diagonal of the kernel matrix K

\[Kdiag_{i} = k(X_i, X_i)\]

gradients_X(dL_dK, X, X2=None)[source]¶: \[\]

frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}

gradients_X_diag(dL_dKdiag, X)[source]¶: The diagonal of the derivative w.r.t. X

input_sensitivity()[source]¶

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

update_gradients_diag(dL_dKdiag, X)[source]¶: update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_full(dL_dK, X, X2=None)[source]¶: Set the gradients of all parameters when doing full (N) inference.