# GPy.kern.src package¶

## GPy.kern.src.ODE_UY module¶

class ODE_UY(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, active_dims=None, name='ode_uy')[source]
K(X, X2=None)[source]
Kdiag(X)[source]

Compute the diagonal of the covariance matrix associated to X.

update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

## GPy.kern.src.ODE_UYC module¶

class ODE_UYC(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, ubias=1.0, active_dims=None, name='ode_uyc')[source]
K(X, X2=None)[source]
Kdiag(X)[source]

Compute the diagonal of the covariance matrix associated to X.

update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

## GPy.kern.src.ODE_st module¶

class ODE_st(input_dim, a=1.0, b=1.0, c=1.0, variance_Yx=3.0, variance_Yt=1.5, lengthscale_Yx=1.5, lengthscale_Yt=1.5, active_dims=None, name='ode_st')[source]

kernel resultiong from a first order ODE with OU driving GP

Parameters: input_dim (int) – the number of input dimension, has to be equal to one varianceU (float) – variance of the driving GP lengthscaleU (float) – lengthscale of the driving GP (sqrt(3)/lengthscaleU) varianceY (float) – ‘variance’ of the transfer function lengthscaleY (float) – ‘lengthscale’ of the transfer function (1/lengthscaleY) kernel object
K(X, X2=None)[source]

Compute the covariance matrix between X and X2.

Kdiag(X)[source]

Compute the diagonal of the covariance matrix associated to X.

update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

## GPy.kern.src.ODE_t module¶

class ODE_t(input_dim, a=1.0, c=1.0, variance_Yt=3.0, lengthscale_Yt=1.5, ubias=1.0, active_dims=None, name='ode_st')[source]
K(X, X2=None)[source]

Compute the covariance matrix between X and X2.

Kdiag(X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

class Add(subkerns, name='sum')[source]

This kernel will take over the active dims of it’s subkernels passed in.

NOTE: The subkernels will be copies of the original kernels, to prevent unexpected behavior.

K(X, X2=None, which_parts=None)[source]

Add all kernels together. If a list of parts (of this kernel!) which_parts is given, only the parts of the list are taken to compute the covariance.

Kdiag(X, which_parts=None)[source]
gradients_X(dL_dK, X, X2=None)[source]

Compute the gradient of the objective function with respect to X.

Parameters: dL_dK (np.ndarray (num_samples x num_inducing)) – An array of gradients of the objective function with respect to the covariance function. X (np.ndarray (num_samples x input_dim)) – Observed data inputs X2 (np.ndarray (num_inducing x input_dim)) – Observed data inputs (optional, defaults to X)
gradients_XX(dL_dK, X, X2)[source]
gradients_XX_diag(dL_dKdiag, X)[source]
gradients_X_diag(dL_dKdiag, X)[source]
gradients_Z_expectations(dL_psi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
input_sensitivity(summarize=True)[source]
psi0(Z, variational_posterior)[source]
psi1(Z, variational_posterior)[source]
psi2(Z, variational_posterior)[source]
psi2n(Z, variational_posterior)[source]
sde()[source]

Support adding kernels for sde representation

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

to_dict()[source]
update_gradients_diag(dL_dK, X)[source]
update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.basis_funcs module¶

class BasisFuncKernel(input_dim, variance=1.0, active_dims=None, ARD=False, name='basis func kernel')[source]

Abstract superclass for kernels with explicit basis functions for use in GPy.

This class does NOT automatically add an offset to the design matrix phi!

K(X, X2=None)[source]
Kdiag(X, X2=None)[source]
concatenate_offset(X)[source]

Convenience function to add an offset column to phi. You can use this function to add an offset (bias on y axis) to phi in your custom self._phi(X).

parameters_changed()[source]
phi(X)[source]
posterior_inf(X=None, posterior=None)[source]

Do the posterior inference on the parameters given this kernels functions and the model posterior, which has to be a GPy posterior, usually found at m.posterior, if m is a GPy model. If not given we search for the the highest parent to be a model, containing the posterior, and for X accordingly.

update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class ChangePointBasisFuncKernel(input_dim, changepoint, variance=1.0, active_dims=None, ARD=False, name='changepoint')[source]

The basis function has a changepoint. That is, it is constant, jumps at a single point (given as changepoint) and is constant again. You can give multiple changepoints. The changepoints are calculated using np.where(self.X < self.changepoint), -1, 1)

class DomainKernel(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='constant_domain')[source]

Create a constant plateou of correlation between start and stop and zero elsewhere. This is a constant shift of the outputs along the yaxis in the range from start to stop.

class LinearSlopeBasisFuncKernel(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='linear_segment')[source]

A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.

Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.

class LogisticBasisFuncKernel(input_dim, centers, variance=1.0, slope=1.0, active_dims=None, ARD=False, ARD_slope=True, name='logistic')[source]

Create a series of logistic basis functions with centers given. The slope gets computed by datafit. The number of centers determines the number of logistic functions.

parameters_changed()[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class PolynomialBasisFuncKernel(input_dim, degree, variance=1.0, active_dims=None, ARD=True, name='polynomial_basis')[source]

A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.

Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.

## GPy.kern.src.brownian module¶

class Brownian(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]

Brownian motion in 1D only.

Negative times are treated as a separate (backwards!) Brownian motion.

Parameters: input_dim (int) – the number of input dimensions variance (float) –
K(X, X2=None)[source]
Kdiag(X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.coregionalize module¶

class Coregionalize(input_dim, output_dim, rank=1, W=None, kappa=None, active_dims=None, name='coregion')[source]

Covariance function for intrinsic/linear coregionalization models

This covariance has the form: .. math:

\mathbf{B} = \mathbf{W}\mathbf{W}^       op +    ext{diag}(kappa)


An intrinsic/linear coregionalization covariance function of the form: .. math:

k_2(x, y)=\mathbf{B} k(x, y)


it is obtained as the tensor product between a covariance function k(x, y) and B.

Parameters: output_dim (int) – number of outputs to coregionalize rank (int) – number of columns of the W matrix (this parameter is ignored if parameter W is not None) W (numpy array of dimensionality (num_outpus, W_columns)) – a low rank matrix that determines the correlations between the different outputs, together with kappa it forms the coregionalization matrix B kappa (numpy array of dimensionality (output_dim, )) – a vector which allows the outputs to behave independently
K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_X_diag(dL_dKdiag, X)[source]
parameters_changed()[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.eq_ode1 module¶

class EQ_ODE1(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, decay=None, active_dims=None, name='eq_ode1')[source]

Covariance function for first order differential equation driven by an exponentiated quadratic covariance.

This outputs of this kernel have the form .. math:

rac{ ext{d}y_j}{ ext{d}t} = sum_{i=1}^R w_{j,i} u_i(t-delta_j) - d_jy_j(t)

where $$R$$ is the rank of the system, $$w_{j,i}$$ is the sensitivity of the $$jth output to the :math:ith latent function, :math:d_j$$ is the decay rate of the $$jth output and :math:u_i(t)$$ are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.

param output_dim:
number of outputs driven by latent function.
type output_dim:
int
param W:sensitivities of each output to the latent driving function.
type W:ndarray (output_dim x rank).
param rank:If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance.
type rank:int
param decay:decay rates for the first order system.
type decay:array of length output_dim.
param delay:delay between latent force and output response.
type delay:array of length output_dim.
param kappa:diagonal term that allows each latent output to have an independent component to the response.
type kappa:array of length output_dim.
K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
lnDifErf(z1, z2)[source]

## GPy.kern.src.eq_ode2 module¶

class EQ_ODE2(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, C=None, B=None, active_dims=None, name='eq_ode2')[source]

Covariance function for second order differential equation driven by an exponentiated quadratic covariance.

This outputs of this kernel have the form .. math:

rac{ ext{d}^2y_j(t)}{ ext{d}^2t} + C_j rac{ ext{d}y_j(t)}{ ext{d}t} + B_jy_j(t) = sum_{i=1}^R w_{j,i} u_i(t)

where $$R$$ is the rank of the system, $$w_{j,i}$$ is the sensitivity of the $$jth output to the :math:ith latent function, :math:d_j$$ is the decay rate of the $$jth output and :math:f_i(t)$$ and $$g_i(t)$$ are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.

param output_dim:
number of outputs driven by latent function.
type output_dim:
int
param W:sensitivities of each output to the latent driving function.
type W:ndarray (output_dim x rank).
param rank:If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance.
type rank:int
param C:damper constant for the second order system.
type C:array of length output_dim.
param B:spring constant for the second order system.
type B:array of length output_dim.
K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.grid_kerns module¶

class GridKern(input_dim, variance, lengthscale, ARD, active_dims, name, originalDimensions, useGPU=False)[source]
dKd_dLen(X, dimension, lengthscale, X2=None)[source]

Derivate of Kernel function wrt lengthscale applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.

dKd_dLen(X, X2) = dKdLen_of_r((X-X2)**2)

dKd_dVar(X, X2=None)[source]

Derivative of Kernel function wrt variance applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.

dKd_dVar(X, X2) = dKdVar_of_r((X-X2)**2)

class GridRBF(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='gridRBF', originalDimensions=1, useGPU=False)[source]

Similar to regular RBF but supplemented with methods required for Gaussian grid regression Radial Basis Function kernel, aka squared-exponential, exponentiated quadratic or Gaussian kernel:

$k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)$
K_of_r(r)[source]
dK_dr(r)[source]
dKdLen_of_r(r, dimCheck, lengthscale)[source]

Compute derivative of kernel for dimension wrt lengthscale Computation of derivative changes when lengthscale corresponds to the dimension of the kernel whose derivate is being computed.

dKdVar_of_r(r)[source]

Compute derivative of kernel wrt variance

## GPy.kern.src.independent_outputs module¶

class Hierarchical(kernels, name='hierarchy')[source]

A kernel which can represent a simple hierarchical model.

See Hensman et al 2013, “Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters” http://www.biomedcentral.com/1471-2105/14/252

To construct this kernel, you must pass a list of kernels. the first kernel will be assumed to be the ‘base’ kernel, and will be computed everywhere. For every additional kernel, we assume another layer in the hierachy, with a corresponding column of the input matrix which indexes which function the data are in at that level.

For more, see the ipython notebook documentation on Hierarchical covariances.

K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class IndependentOutputs(kernels, index_dim=-1, name='independ')[source]

A kernel which can represent several independent functions. this kernel ‘switches off’ parts of the matrix where the output indexes are different.

The index of the functions is given by the last column in the input X the rest of the columns of X are passed to the underlying kernel for computation (in blocks).

Parameters: kernels – either a kernel, or list of kernels to work with. If it is

a list of kernels the indices in the index_dim, index the kernels you gave!

K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_X_diag(dL_dKdiag, X)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
index_to_slices(index)[source]

take a numpy array of integers (index) and return a nested list of slices such that the slices describe the start, stop points for each integer in the index.

e.g. >>> index = np.asarray([0,0,0,1,1,1,2,2,2]) returns >>> [[slice(0,3,None)],[slice(3,6,None)],[slice(6,9,None)]]

or, a more complicated example >>> index = np.asarray([0,0,1,1,0,2,2,2,1,1]) returns >>> [[slice(0,2,None),slice(4,5,None)],[slice(2,4,None),slice(8,10,None)],[slice(5,8,None)]]

## GPy.kern.src.integral module¶

class Integral(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]

Integral kernel between…

K(X, X2=None)[source]
Kdiag(X)[source]

I’ve used the fact that we call this method for K_ff when finding the covariance as a hack so I know if I should return K_ff or K_xx. In this case we’re returning K_ff!! $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$

dk_dl(t, tprime, l)[source]
g(z)[source]
h(z)[source]
k_ff(t, tprime, l)[source]
k_xf(t, tprime, l)[source]
k_xx(t, tprime, l)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.integral_limits module¶

class Integral_Limits(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]

Integral kernel. This kernel allows 1d histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs (on two dimensions) are the start and end points of each bin. The kernel’s predictions are the latent function which might have generated those binned results.

K(X, X2=None)[source]
Note: We have a latent function and an output function. We want to be able to find:
• the covariance between values of the output function
• the covariance between values of the latent function
• the “cross covariance” between values of the output function and the latent function

This method is used by GPy to either get the covariance between the outputs (K_xx) or is used to get the cross covariance (between the latent function and the outputs (K_xf). We take advantage of the places where this function is used:

• if X2 is none, then we know that the items being compared (to get the covariance for)

are going to be both from the OUTPUT FUNCTION. - if X2 is not none, then we know that the items being compared are from two different sets (the OUTPUT FUNCTION and the LATENT FUNCTION).

If we want the covariance between values of the LATENT FUNCTION, we take advantage of the fact that we only need that when we do prediction, and this only calls Kdiag (not K). So the covariance between LATENT FUNCTIONS is available from Kdiag.

Kdiag(X)[source]

I’ve used the fact that we call this method during prediction (instead of K). When we do prediction we want to know the covariance between LATENT FUNCTIONS (K_ff) (as that’s probably what the user wants). $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$

dk_dl(t, tprime, s, sprime, l)[source]
g(z)[source]
h(z)[source]
k_ff(t, tprime, l)[source]

Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required

k_xf(t, tprime, s, l)[source]

Covariance between the gradient (latent value) and the actual (observed) value.

Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.

k_xx(t, tprime, s, sprime, l)[source]

Covariance between observed values.

s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)

We’re interested in how correlated these two integrals are.

Note: We’ve not multiplied by the variance, this is done in K.

update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.kern module¶

class CombinationKernel(kernels, name, extra_dims=[], link_parameters=True)[source]

Abstract super class for combination kernels. A combination kernel combines (a list of) kernels and works on those. Examples are the HierarchicalKernel or Add and Prod kernels.

Abstract super class for combination kernels. A combination kernel combines (a list of) kernels and works on those. Examples are the HierarchicalKernel or Add and Prod kernels.

Parameters: kernels (list) – List of kernels to combine (can be only one element) name (str) – name of the combination kernel extra_dims (array-like) – if needed extra dimensions for the combination kernel to work on
input_sensitivity(summarize=True)[source]

If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.

parts
class Kern(input_dim, active_dims, name, useGPU=False, *a, **kw)[source]

The base class for a kernel: a positive definite function which forms of a covariance function (kernel).

input_dim:

is the number of dimensions to work on. Make sure to give the tight dimensionality of inputs. You most likely want this to be the integer telling the number of input dimensions of the kernel.

active_dims:

is the active_dimensions of inputs X we will work on. All kernels will get sliced Xes as inputs, if _all_dims_active is not None Only positive integers are allowed in active_dims! if active_dims is None, slicing is switched off and all X will be passed through as given.
Parameters: input_dim (int) – the number of input dimensions to the function active_dims (array-like|None) – list of indices on which dimensions this kernel works on, or none if no slicing

Do not instantiate.

K(X, X2)[source]

Compute the kernel function.

$K_{ij} = k(X_i, X_j)$
Parameters: X – the first set of inputs to the kernel X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
Kdiag(X)[source]

The diagonal of the kernel matrix K

$Kdiag_{i} = k(X_i, X_i)$
add(other, name='sum')[source]

Add another kernel to this one.

Parameters: other (GPy.kern) – the other kernel to be added
static from_dict(input_dict)[source]
get_most_significant_input_dimensions(which_indices=None)[source]

Determine which dimensions should be plotted

Returns the top three most signification input dimensions

if less then three dimensions, the non existing dimensions are labeled as None, so for a 1 dimensional input this returns (0, None, None).

Parameters: which_indices (int or tuple(int,int) or tuple(int,int,int)) – force the indices to be the given indices.
gradients_X(dL_dK, X, X2)[source]
$\frac{\partial L}{\partial X} = \frac{\partial L}{\partial K}\frac{\partial K}{\partial X}$
gradients_XX(dL_dK, X, X2, cov=True)[source]
$\frac{\partial^2 L}{\partial X\partial X_2} = \frac{\partial L}{\partial K}\frac{\partial^2 K}{\partial X\partial X_2}$
gradients_XX_diag(dL_dKdiag, X, cov=True)[source]

The diagonal of the second derivative w.r.t. X and X2

gradients_X_X2(dL_dK, X, X2)[source]
gradients_X_diag(dL_dKdiag, X)[source]

The diagonal of the derivative w.r.t. X

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior, psi0=None, psi1=None, psi2=None)[source]

Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.

gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel

input_sensitivity(summarize=True)[source]

Returns the sensitivity for each dimension of this kernel.

This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.

Use this as relative measurement, not for absolute comparison between kernels.

plot(*args, **kwargs)
plot_ARD(kernel, filtering=None, legend=False, canvas=None, **kwargs)

If an ARD kernel is present, plot a bar representation using matplotlib

Parameters: fignum – figure number of the plot filtering (list of names to use for ARD plot) – list of names, which to use for plotting ARD parameters. Only kernels which match names in the list of names in filtering will be used for plotting.
plot_covariance(kernel, x=None, label=None, plot_limits=None, visible_dims=None, resolution=None, projection='2d', levels=20, **kwargs)

Plot a kernel covariance w.r.t. another x.

Parameters: x (array-like) – the value to use for the other kernel argument (kernels are a function of two variables!) plot_limits (Either (xmin, xmax) for 1D or (xmin, xmax, ymin, ymax) / ((xmin, xmax), (ymin, ymax)) for 2D) – the range over which to plot the kernel visible_dims (array-like) – input dimensions (!) to use for x. Make sure to select 2 or less dimensions to plot. projection ({2d|3d}) – What projection shall we use to plot the kernel? levels (int) – for 2D projection, how many levels for the contour plot to use? kwargs – valid kwargs for your specific plotting library the resolution of the lines used in plotting. for 2D this defines the grid for kernel evaluation.
prod(other, name='mul')[source]

Multiply two kernels (either on the same space, or on the tensor product of the input space).

Parameters: other (GPy.kern) – the other kernel to be added
psi0(Z, variational_posterior)[source]
$\psi_0 = \sum_{i=0}^{n}E_{q(X)}[k(X_i, X_i)]$
psi1(Z, variational_posterior)[source]
$\psi_1^{n,m} = E_{q(X)}[k(X_n, Z_m)]$
psi2(Z, variational_posterior)[source]
$\psi_2^{m,m'} = \sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m'})]$
psi2n(Z, variational_posterior)[source]
$\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]$

Thus, we do not sum out n, compared to psi2

reset_gradients()[source]
to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]

update the gradients of all parameters when using only the diagonal elements of the covariance matrix

update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]

Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.

The essential maths is

$\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}$

Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.

update_gradients_full(dL_dK, X, X2)[source]

Set the gradients of all parameters when doing full (N) inference.

## GPy.kern.src.kernel_slice_operations module¶

Created on 11 Mar 2014

@author: @mzwiessele

This module provides a meta class for the kernels. The meta class is for slicing the inputs (X, X2) for the kernels, before K (or any other method involving X) gets calls. The _all_dims_active of a kernel decide which dimensions the kernel works on.

class KernCallsViaSlicerMeta[source]

Bases: paramz.parameterized.ParametersChangedMeta

put_clean(dct, name, func)[source]

## GPy.kern.src.linear module¶

class Linear(input_dim, variances=None, ARD=False, active_dims=None, name='linear')[source]

Linear kernel

$k(x,y) = \sum_{i=1}^{\text{input_dim}} \sigma^2_i x_iy_i$
Parameters: input_dim (int) – the number of input dimensions variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances $$\sigma^2_i$$ ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension. kernel object
K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_XX(dL_dK, X, X2=None)[source]

Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:

returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].

..math:

rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}

..returns:
dL2_dXdX2: [NxMxQxQ] for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None)
Thus, we return the second derivative in X2.
gradients_XX_diag(dL_dKdiag, X)[source]
gradients_X_diag(dL_dKdiag, X)[source]
gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
input_sensitivity(summarize=True)[source]
psi0(Z, variational_posterior)[source]
psi1(Z, variational_posterior)[source]
psi2(Z, variational_posterior)[source]
psi2n(Z, variational_posterior)[source]
to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class LinearFull(input_dim, rank, W=None, kappa=None, active_dims=None, name='linear_full')[source]
K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_X_diag(dL_dKdiag, X)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.mlp module¶

class MLP(input_dim, variance=1.0, weight_variance=1.0, bias_variance=1.0, ARD=False, active_dims=None, name='mlp')[source]

Multi layer perceptron kernel (also known as arc sine kernel or neural network kernel)

$k(x,y) = \sigma^{2}\frac{2}{\pi } \text{asin} \left ( \frac{ \sigma_w^2 x^\top y+\sigma_b^2}{\sqrt{\sigma_w^2x^\top x + \sigma_b^2 + 1}\sqrt{\sigma_w^2 y^\top y + \sigma_b^2 +1}} \right )$
Parameters: input_dim (int) – the number of input dimensions variance (float) – the variance $$\sigma^2$$ weight_variance (array or list of the appropriate size (or float if there is only one weight variance parameter)) – the vector of the variances of the prior over input weights in the neural network $$\sigma^2_w$$ bias_variance – the variance of the prior over bias parameters $$\sigma^2_b$$ ARD (Boolean) – Auto Relevance Determination. If equal to “False”, the kernel is isotropic (ie. one weight variance parameter sigma^2_w), otherwise there is one weight variance parameter per dimension. Kernpart object
K(X, X2=None)[source]
Kdiag(X)[source]

Compute the diagonal of the covariance matrix for X.

gradients_X(dL_dK, X, X2)[source]

Derivative of the covariance matrix with respect to X

gradients_X_X2(dL_dK, X, X2)[source]

Derivative of the covariance matrix with respect to X

gradients_X_diag(dL_dKdiag, X)[source]

Gradient of diagonal of covariance with respect to X

update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

Derivative of the covariance with respect to the parameters.

## GPy.kern.src.multidimensional_integral_limits module¶

class Multidimensional_Integral_Limits(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]

Integral kernel, can include limits on each integral value. This kernel allows an n-dimensional histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs are the start and end points of each bin: Pairs of inputs act as the limits on each bin. So inputs 4 and 5 provide the start and end values of each bin in the 3rd dimension. The kernel’s predictions are the latent function which might have generated those binned results.

K(X, X2=None)[source]
Kdiag(X)[source]

I’ve used the fact that we call this method for K_ff when finding the covariance as a hack so I know if I should return K_ff or K_xx. In this case we’re returning K_ff!! $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$

calc_K_xx_wo_variance(X)[source]

Calculates K_xx without the variance term

dk_dl(t, tprime, s, sprime, l)[source]
g(z)[source]
h(z)[source]
k_ff(t, tprime, l)[source]

Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required

k_xf(t, tprime, s, l)[source]

Covariance between the gradient (latent value) and the actual (observed) value.

Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.

k_xx(t, tprime, s, sprime, l)[source]

Covariance between observed values.

s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)

We’re interested in how correlated these two integrals are.

Note: We’ve not multiplied by the variance, this is done in K.

update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.multioutput_kern module¶

class MultioutputKern(kernels, cross_covariances={}, name='MultioutputKern')[source]

Multioutput kernel is a meta class for combining different kernels for multioutput GPs.

As an example let us have inputs x1 for output 1 with covariance k1 and x2 for output 2 with covariance k2. In addition, we need to define the cross covariances k12(x1,x2) and k21(x2,x1). Then the kernel becomes: k([x1,x2],[x1,x2]) = [k1(x1,x1) k12(x1, x2); k21(x2, x1), k2(x2,x2)]

For the kernel, the kernels of outputs are given as list in param “kernels” and cross covariances are given in param “cross_covariances” as a dictionary of tuples (i,j) as keys. If no cross covariance is given, it defaults to zero, as in k12(x1,x2)=0.

In the cross covariance dictionary, the value needs to be a struct with elements -‘kernel’: a member of Kernel class that stores the hyper parameters to be updated when optimizing the GP -‘K’: function defining the cross covariance -‘update_gradients_full’: a function to be used for updating gradients -‘gradients_X’: gives a gradient of the cross covariance with respect to the first input

K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
reset_gradients()[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class ZeroKern[source]
K(X, X2=None)[source]
gradients_X(dL_dK, X, X2=None)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.periodic module¶

class Periodic(input_dim, variance, lengthscale, period, n_freq, lower, upper, active_dims, name)[source]
Parameters: variance (float) – the variance of the Matern kernel lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel period (float) – the period n_freq (int) – the number of frequencies considered for the periodic subspace kernel object
K(X, X2=None)[source]
Kdiag(X)[source]
class PeriodicExponential(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_exponential')[source]

Kernel of the periodic subspace (up to a given frequency) of a exponential (Matern 1/2) RKHS.

Only defined for input_dim=1.

Gram_matrix()[source]
parameters_changed()[source]
update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters (shape is N x num_inducing x num_params)

class PeriodicMatern32(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern32')[source]

Kernel of the periodic subspace (up to a given frequency) of a Matern 3/2 RKHS. Only defined for input_dim=1.

Parameters: input_dim (int) – the number of input dimensions variance (float) – the variance of the Matern kernel lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel period (float) – the period n_freq (int) – the number of frequencies considered for the periodic subspace kernel object
Gram_matrix()[source]
parameters_changed()[source]
update_gradients_full(dL_dK, X, X2)[source]

derivative of the covariance matrix with respect to the parameters (shape is num_data x num_inducing x num_params)

class PeriodicMatern52(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern52')[source]

Kernel of the periodic subspace (up to a given frequency) of a Matern 5/2 RKHS. Only defined for input_dim=1.

Parameters: input_dim (int) – the number of input dimensions variance (float) – the variance of the Matern kernel lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel period (float) – the period n_freq (int) – the number of frequencies considered for the periodic subspace kernel object
Gram_matrix()[source]
parameters_changed()[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.poly module¶

class Poly(input_dim, variance=1.0, scale=1.0, bias=1.0, order=3.0, active_dims=None, name='poly')[source]

Polynomial kernel

K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_X_diag(dL_dKdiag, X)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.prod module¶

class Prod(kernels, name='mul')[source]

Computes the product of 2 kernels

Parameters: k2 (k1,) – the kernels to multiply kernel object
K(X, X2=None, which_parts=None)[source]
Kdiag(X, which_parts=None)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_X_diag(dL_dKdiag, X)[source]
input_sensitivity(summarize=True)[source]
sde()[source]
sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
dkron(A, dA, B, dB, operation='prod')[source]

Function computes the derivative of Kronecker product A*B (or Kronecker sum A+B).

A: 2D matrix
Some matrix
dA: 3D (or 2D matrix)
Derivarives of A
B: 2D matrix
Some matrix
dB: 3D (or 2D matrix)
Derivarives of B
operation: str ‘prod’ or ‘sum’
Which operation is considered. If the operation is ‘sum’ it is assumed that A and are square matrices.s
Output:
dC: 3D matrix Derivative of Kronecker product A*B (or Kronecker sum A+B)
numpy_invalid_op_as_exception(func)[source]

A decorator that allows catching numpy invalid operations as exceptions (the default behaviour is raising warnings).

## GPy.kern.src.rbf module¶

class RBF(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='rbf', useGPU=False, inv_l=False)[source]

$k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)$
K_of_r(r)[source]
dK2_drdr(r)[source]
dK2_drdr_diag()[source]
dK_dr(r)[source]
get_one_dimensional_kernel(dim)[source]

Specially intended for Grid regression.

gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
parameters_changed()[source]
psi0(Z, variational_posterior)[source]
psi1(Z, variational_posterior)[source]
psi2(Z, variational_posterior)[source]
psi2n(Z, variational_posterior)[source]
spectrum(omega)[source]
to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.sde_brownian module¶

Classes in this module enhance Brownian motion covariance function with the Stochastic Differential Equation (SDE) functionality.

class sde_Brownian(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]

Class provide extra functionality to transfer this covariance function into SDE form.

Linear kernel:

$k(x,y) = \sigma^2 min(x,y)$
sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

## GPy.kern.src.sde_linear module¶

Classes in this module enhance Linear covariance function with the Stochastic Differential Equation (SDE) functionality.

class sde_Linear(input_dim, X, variances=None, ARD=False, active_dims=None, name='linear')[source]

Class provide extra functionality to transfer this covariance function into SDE form.

Linear kernel:

$k(x,y) = \sum_{i=1}^{input dim} \sigma^2_i x_iy_i$

Modify the init method, because one extra parameter is required. X - points on the X axis.

sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

## GPy.kern.src.sde_matern module¶

Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_Matern32(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]

Class provide extra functionality to transfer this covariance function into SDE forrm.

Matern 3/2 kernel:

$k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim}$

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

class sde_Matern52(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]

Class provide extra functionality to transfer this covariance function into SDE forrm.

Matern 5/2 kernel:

$k(r) = \sigma^2 (1 + \sqrt{5} r +$

rac{5}{3}r^2) exp(- sqrt{5} r) ext{ where } r = sqrt{sum_{i=1}^{input dim} rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

## GPy.kern.src.sde_standard_periodic module¶

Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_StdPeriodic(input_dim, variance=1.0, period=None, lengthscale=None, ARD1=False, ARD2=False, active_dims=None, name='std_periodic', useGPU=False)[source]

Class provide extra functionality to transfer this covariance function into SDE form.

Standard Periodic kernel:

$k(x,y) = heta_1 \exp \left[ -$
rac{1}{2} {}sum_{i=1}^{input_dim}
left(

rac{sin( rac{pi}{lambda_i} (x_i - y_i) )}{l_i} ight)^2 ight] }

sde()[source]

Return the state space representation of the covariance.

! Note: one must constrain lengthscale not to drop below 0.25. After this bessel functions of the first kind grows to very high.

! Note: one must keep wevelength also not very low. Because then the gradients wrt wavelength become ustable. However this might depend on the data. For test example with 300 data points the low limit is 0.15.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

seriescoeff(m=6, lengthScale=1.0, magnSigma2=1.0, true_covariance=False)[source]

Calculate the coefficients q_j^2 for the covariance function approximation:

k( au) = sum_{j=0}^{+infty} q_j^2 cos(jomega_0 au)

Reference is:

[1] Arno Solin and Simo Särkkä (2014). Explicit link between periodic
covariance functions and state space models. In Proceedings of the Seventeenth International Conference on Artifcial Intelligence and Statistics (AISTATS 2014). JMLR: W&CP, volume 33.
Note! Only the infinite approximation (through Bessel function)
is currently implemented.
m: int
Degree of approximation. Default 6.
lengthScale: float
Length scale parameter in the kerenl
magnSigma2:float
Multiplier in front of the kernel.
coeffs: array(m+1)
Covariance series coefficients
coeffs_dl: array(m+1)
Derivatives of the coefficients with respect to lengthscale.

## GPy.kern.src.sde_static module¶

Classes in this module enhance Static covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_Bias(input_dim, variance=1.0, active_dims=None, name='bias')[source]

Class provide extra functionality to transfer this covariance function into SDE forrm.

Bias kernel:

$k(x,y) = lpha$
sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

class sde_White(input_dim, variance=1.0, active_dims=None, name='white')[source]

Class provide extra functionality to transfer this covariance function into SDE forrm.

White kernel:

$k(x,y) = lpha*\delta(x-y)$
sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

## GPy.kern.src.sde_stationary module¶

Classes in this module enhance several stationary covariance functions with the Stochastic Differential Equation (SDE) functionality.

class sde_Exponential(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]

Class provide extra functionality to transfer this covariance function into SDE form.

Exponential kernel:

$k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim}$

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

class sde_RBF(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='rbf', useGPU=False, inv_l=False)[source]

Class provide extra functionality to transfer this covariance function into SDE form.

$k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim}$

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]

Return the state space representation of the covariance.

sde_update_gradient_full(gradients)[source]

Update gradient in the order in which parameters are represented in the kernel

class sde_RatQuad(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]

Class provide extra functionality to transfer this covariance function into SDE form.

$k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- lpha} \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim}$

rac{(x_i-y_i)^2}{ell_i^2} }

sde()[source]

Return the state space representation of the covariance.

## GPy.kern.src.spline module¶

class Spline(input_dim, variance=1.0, c=1.0, active_dims=None, name='spline')[source]

Linear spline kernel. You need to specify 2 parameters: the variance and c. The variance is defined in powers of 10. Thus specifying -2 means 10^-2. The parameter c allows to define the stiffness of the spline fit. A very stiff spline equals linear regression. See https://www.youtube.com/watch?v=50Vgw11qn0o starting at minute 1:17:28 Lit: Wahba, 1990

K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_X_diag(dL_dKdiag, X)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.splitKern module¶

A new kernel

class DEtime(kernel, idx_p, Xp, index_dim=-1, name='DiffGenomeKern')[source]
K(X, X2=None)[source]
Kdiag(X)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class SplitKern(kernel, Xp, index_dim=-1, name='SplitKern')[source]
K(X, X2=None)[source]
Kdiag(X)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class SplitKern_cross(kernel, Xp, name='SplitKern_cross')[source]
K(X, X2=None)[source]
Kdiag(X)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.standard_periodic module¶

The standard periodic kernel which mentioned in:

[1] Gaussian Processes for Machine Learning, C. E. Rasmussen, C. K. I. Williams. The MIT Press, 2005.

[2] Introduction to Gaussian processes. D. J. C. MacKay. In C. M. Bishop, editor, Neural Networks and Machine Learning, pages 133-165. Springer, 1998.

class StdPeriodic(input_dim, variance=1.0, period=None, lengthscale=None, ARD1=False, ARD2=False, active_dims=None, name='std_periodic', useGPU=False)[source]

Standart periodic kernel

$k(x,y) = heta_1 \exp \left[ -$
rac{1}{2} sum_{i=1}^{input_dim}
left(

rac{sin( rac{pi}{T_i} (x_i - y_i) )}{l_i} ight)^2 ight] }

param input_dim:
the number of input dimensions
type input_dim:int
param variance:the variance :math: heta_1 in the formula above
type variance:float
param period:the vector of periods $$\T_i$$. If None then 1.0 is assumed.
type period:array or list of the appropriate size (or float if there is only one period parameter)
param lengthscale:
the vector of lengthscale $$\l_i$$. If None then 1.0 is assumed.
type lengthscale:
array or list of the appropriate size (or float if there is only one lengthscale parameter)
param ARD1:Auto Relevance Determination with respect to period. If equal to “False” one single period parameter $$\T_i$$ for each dimension is assumed, otherwise there is one lengthscale parameter per dimension.
type ARD1:Boolean
param ARD2:Auto Relevance Determination with respect to lengthscale. If equal to “False” one single lengthscale parameter $$l_i$$ for each dimension is assumed, otherwise there is one lengthscale parameter per dimension.
type ARD2:Boolean
param active_dims:
indices of dimensions which are used in the computation of the kernel
type active_dims:
array or list of the appropriate size
param name:Name of the kernel for output

:type String :param useGPU: whether of not use GPU :type Boolean

K(X, X2=None)[source]

Compute the covariance matrix between X and X2.

Kdiag(X)[source]

Compute the diagonal of the covariance matrix associated to X.

gradients_X(dL_dK, X, X2=None)[source]
gradients_X_diag(dL_dKdiag, X)[source]
input_sensitivity(summarize=True)[source]
parameters_changed()[source]

This functions deals as a callback for each optimization iteration. If one optimization step was successfull and the parameters this callback function will be called to be able to update any precomputations for the kernel.

to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]

derivative of the diagonal of the covariance matrix with respect to the parameters.

update_gradients_full(dL_dK, X, X2=None)[source]

derivative of the covariance matrix with respect to the parameters.

## GPy.kern.src.static module¶

class Bias(input_dim, variance=1.0, active_dims=None, name='bias')[source]
K(X, X2=None)[source]
psi2(Z, variational_posterior)[source]
psi2n(Z, variational_posterior)[source]
to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class Fixed(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='fixed')[source]
Parameters: input_dim (int) – the number of input dimensions variance (float) – the variance of the kernel
K(X, X2)[source]
Kdiag(X)[source]
psi2(Z, variational_posterior)[source]
psi2n(Z, variational_posterior)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class Precomputed(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='precomputed')[source]

Class for precomputed kernels, indexed by columns in X

Usage example:

import numpy as np from GPy.models import GPClassification from GPy.kern import Precomputed from sklearn.cross_validation import LeaveOneOut

n = 10 d = 100 X = np.arange(n).reshape((n,1)) # column vector of indices y = 2*np.random.binomial(1,0.5,(n,1))-1 X0 = np.random.randn(n,d) k = np.dot(X0,X0.T) kern = Precomputed(1,k) # k is a n x n covariance matrix

cv = LeaveOneOut(n) ypred = y.copy() for train, test in cv:

m = GPClassification(X[train], y[train], kernel=kern) m.optimize() ypred[test] = 2*(m.predict(X[test])[0]>0.5)-1
Parameters: input_dim (int) – the number of input dimensions variance (float) – the variance of the kernel
K(X, X2=None)[source]
Kdiag(X)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class Static(input_dim, variance, active_dims, name)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_XX(dL_dK, X, X2=None)[source]
gradients_XX_diag(dL_dKdiag, X, cov=False)[source]
gradients_X_diag(dL_dKdiag, X)[source]
gradients_Z_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
gradients_qX_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
input_sensitivity(summarize=True)[source]
psi0(Z, variational_posterior)[source]
psi1(Z, variational_posterior)[source]
psi2(Z, variational_posterior)[source]
class White(input_dim, variance=1.0, active_dims=None, name='white')[source]
K(X, X2=None)[source]
psi2(Z, variational_posterior)[source]
psi2n(Z, variational_posterior)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class WhiteHeteroscedastic(input_dim, num_data, variance=1.0, active_dims=None, name='white_hetero')[source]

A heteroscedastic White kernel (nugget/noise). It defines one variance (nugget) per input sample.

Prediction excludes any noise learnt by this Kernel, so be careful using this kernel.

You can plot the errors learnt by this kernel by something similar as: plt.errorbar(m.X, m.Y, yerr=2*np.sqrt(m.kern.white.variance))

K(X, X2=None)[source]
Kdiag(X)[source]
psi2(Z, variational_posterior)[source]
psi2n(Z, variational_posterior)[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_expectations(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]
update_gradients_full(dL_dK, X, X2=None)[source]

## GPy.kern.src.stationary module¶

class Cosine(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Cosine')[source]
K_of_r(r)[source]
dK_dr(r)[source]
class ExpQuad(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='ExpQuad')[source]

$k(r) = \sigma^2 (1 + \sqrt{5} r + \frac53 r^2) \exp(- \sqrt{5} r)$
notes::
• Yes, this is exactly the same as the RBF covariance function, but the RBF implementation also has some features for doing variational kernels (the psi-statistics).
K_of_r(r)[source]
dK_dr(r)[source]
to_dict()[source]
class Exponential(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]
K_of_r(r)[source]
dK_dr(r)[source]
to_dict()[source]
class Matern32(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]

Matern 3/2 kernel:

$k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{\text{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }$
Gram_matrix(F, F1, F2, lower, upper)[source]

Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.

Parameters: F (np.array) – vector of functions F1 (np.array) – vector of derivatives of F F2 (np.array) – vector of second derivatives of F lower,upper (floats) – boundaries of the input domain
K_of_r(r)[source]
dK_dr(r)[source]
sde()[source]

Return the state space representation of the covariance.

to_dict()[source]
class Matern52(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]

Matern 5/2 kernel:

$k(r) = \sigma^2 (1 + \sqrt{5} r + \frac53 r^2) \exp(- \sqrt{5} r)$
Gram_matrix(F, F1, F2, F3, lower, upper)[source]

Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.

Parameters: F (np.array) – vector of functions F1 (np.array) – vector of derivatives of F F2 (np.array) – vector of second derivatives of F F3 (np.array) – vector of third derivatives of F lower,upper (floats) – boundaries of the input domain
K_of_r(r)[source]
dK_dr(r)[source]
to_dict()[source]
class OU(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='OU')[source]

OU kernel:

$k(r) = \sigma^2 \exp(- r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{ ext{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }$
K_of_r(r)[source]
dK_dr(r)[source]
class RatQuad(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]

$k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- \alpha}$
K_of_r(r)[source]
dK_dr(r)[source]
to_dict()[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class Stationary(input_dim, variance, lengthscale, ARD, active_dims, name, useGPU=False)[source]

Stationary kernels (covariance functions).

Stationary covariance fucntion depend only on r, where r is defined as

$r(x, x') = \sqrt{ \sum_{q=1}^Q (x_q - x'_q)^2 }$

The covariance function k(x, x’ can then be written k(r).

In this implementation, r is scaled by the lengthscales parameter(s):

$r(x, x') = \sqrt{ \sum_{q=1}^Q \frac{(x_q - x'_q)^2}{\ell_q^2} }.$

By default, there’s only one lengthscale: seaprate lengthscales for each dimension can be enables by setting ARD=True.

To implement a stationary covariance function using this class, one need only define the covariance function k(r), and it derivative.

 def K_of_r(self, r):

return foo
def dK_dr(self, r):
return bar

The lengthscale(s) and variance parameters are added to the structure automatically.

Thanks to @strongh: In Stationary, a covariance function is defined in GPy as stationary when it depends only on the l2-norm |x_1 - x_2 |. However this is the typical definition of isotropy, while stationarity is usually a bit more relaxed. The more common version of stationarity is that the covariance is a function of x_1 - x_2 (See e.g. R&W first paragraph of section 4.1).

K(X, X2=None)[source]

Kernel function applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.

K(X, X2) = K_of_r((X-X2)**2)

K_of_r(r)[source]
Kdiag(X)[source]
dK2_drdr(r)[source]
dK2_drdr_diag()[source]

Second order derivative of K in r_{i,i}. The diagonal entries are always zero, so we do not give it here.

dK2_drdr_via_X(X, X2)[source]
dK_dr(r)[source]
dK_dr_via_X(X, X2)[source]

compute the derivative of K wrt X going through X

get_one_dimensional_kernel(dimensions)[source]

Specially intended for the grid regression case For a given covariance kernel, this method returns the corresponding kernel for a single dimension. The resulting values can then be used in the algorithm for reconstructing the full covariance matrix.

gradients_X(dL_dK, X, X2=None)[source]

Given the derivative of the objective wrt K (dL_dK), compute the derivative wrt X

gradients_XX(dL_dK, X, X2=None)[source]

Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:

returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].

..math:

rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}

..returns:
dL2_dXdX2: [NxMxQxQ] in the cov=True case, or [NxMxQ] in the cov=False case,
for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None) Thus, we return the second derivative in X2.
gradients_XX_diag(dL_dK_diag, X)[source]

Given the derivative of the objective dL_dK, compute the second derivative of K wrt X:

..math:

rac{partial^2 K}{partial Xpartial X}

..returns:
dL2_dXdX: [NxQxQ]
gradients_X_diag(dL_dKdiag, X)[source]
input_sensitivity(summarize=True)[source]
reset_gradients()[source]
update_gradients_diag(dL_dKdiag, X)[source]

Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.

update_gradients_direct(dL_dVar, dL_dLen)[source]

Specially intended for the Grid regression case. Given the computed log likelihood derivates, update the corresponding kernel and likelihood gradients. Useful for when gradients have been computed a priori.

update_gradients_full(dL_dK, X, X2=None, reset=True)[source]

Given the derivative of the objective wrt the covariance matrix (dL_dK), compute the gradient wrt the parameters of this kernel, and store in the parameters object as e.g. self.variance.gradient

## GPy.kern.src.trunclinear module¶

class TruncLinear(input_dim, variances=None, delta=None, ARD=False, active_dims=None, name='linear')[source]

Truncated Linear kernel

$k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)$
Parameters: input_dim (int) – the number of input dimensions variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances $$\sigma^2_i$$ ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension. kernel object
K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_X_diag(dL_dKdiag, X)[source]
input_sensitivity()[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full(dL_dK, X, X2=None)[source]
class TruncLinear_inf(input_dim, interval, variances=None, ARD=False, active_dims=None, name='linear')[source]

Truncated Linear kernel

$k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)$
Parameters: input_dim (int) – the number of input dimensions variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances $$\sigma^2_i$$ ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension. kernel object
K(X, X2=None)[source]
Kdiag(X)[source]
gradients_X(dL_dK, X, X2=None)[source]
gradients_X_diag(dL_dKdiag, X)[source]
input_sensitivity()[source]
update_gradients_diag(dL_dKdiag, X)[source]
update_gradients_full`(dL_dK, X, X2=None)[source]