GPy.kern.src package¶
Subpackages¶
- GPy.kern.src.psi_comp package
- Submodules
- GPy.kern.src.psi_comp.gaussherm module
- GPy.kern.src.psi_comp.linear_psi_comp module
- GPy.kern.src.psi_comp.rbf_psi_comp module
- GPy.kern.src.psi_comp.rbf_psi_gpucomp module
- GPy.kern.src.psi_comp.sslinear_psi_comp module
- GPy.kern.src.psi_comp.ssrbf_psi_comp module
- GPy.kern.src.psi_comp.ssrbf_psi_gpucomp module
Submodules¶
GPy.kern.src.ODE_UY module¶
-
class
ODE_UY
(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, active_dims=None, name='ode_uy')[source]¶ Bases:
GPy.kern.src.kern.Kern
GPy.kern.src.ODE_UYC module¶
-
class
ODE_UYC
(input_dim, variance_U=3.0, variance_Y=1.0, lengthscale_U=1.0, lengthscale_Y=1.0, ubias=1.0, active_dims=None, name='ode_uyc')[source]¶ Bases:
GPy.kern.src.kern.Kern
GPy.kern.src.ODE_st module¶
-
class
ODE_st
(input_dim, a=1.0, b=1.0, c=1.0, variance_Yx=3.0, variance_Yt=1.5, lengthscale_Yx=1.5, lengthscale_Yt=1.5, active_dims=None, name='ode_st')[source]¶ Bases:
GPy.kern.src.kern.Kern
kernel resultiong from a first order ODE with OU driving GP
Parameters: - input_dim (int) – the number of input dimension, has to be equal to one
- varianceU (float) – variance of the driving GP
- lengthscaleU (float) – lengthscale of the driving GP (sqrt(3)/lengthscaleU)
- varianceY (float) – ‘variance’ of the transfer function
- lengthscaleY (float) – ‘lengthscale’ of the transfer function (1/lengthscaleY)
Return type: kernel object
GPy.kern.src.ODE_t module¶
-
class
ODE_t
(input_dim, a=1.0, c=1.0, variance_Yt=3.0, lengthscale_Yt=1.5, ubias=1.0, active_dims=None, name='ode_st')[source]¶ Bases:
GPy.kern.src.kern.Kern
GPy.kern.src.add module¶
-
class
Add
(subkerns, name='sum')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
Add given list of kernels together. propagates gradients through.
This kernel will take over the active dims of it’s subkernels passed in.
NOTE: The subkernels will be copies of the original kernels, to prevent unexpected behavior.
-
K
(X, X2=None, which_parts=None)[source]¶ Add all kernels together. If a list of parts (of this kernel!) which_parts is given, only the parts of the list are taken to compute the covariance.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ Compute the gradient of the objective function with respect to X.
Parameters: - dL_dK (np.ndarray (num_samples x num_inducing)) – An array of gradients of the objective function with respect to the covariance function.
- X (np.ndarray (num_samples x input_dim)) – Observed data inputs
- X2 (np.ndarray (num_inducing x input_dim)) – Observed data inputs (optional, defaults to X)
-
gradients_XX
(dL_dK, X, X2)[source]¶ - \[\]
frac{partial^2 L}{partial Xpartial X_2} = frac{partial L}{partial K}frac{partial^2 K}{partial Xpartial X_2}
-
gradients_Z_expectations
(dL_psi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
input_sensitivity
(summarize=True)[source]¶ If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
sde_update_gradient_full
(gradients)[source]¶ Update gradient in the order in which parameters are represented in the kernel
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
update_gradients_diag
(dL_dK, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
GPy.kern.src.basis_funcs module¶
-
class
BasisFuncKernel
(input_dim, variance=1.0, active_dims=None, ARD=False, name='basis func kernel')[source]¶ Bases:
GPy.kern.src.kern.Kern
Abstract superclass for kernels with explicit basis functions for use in GPy.
This class does NOT automatically add an offset to the design matrix phi!
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
concatenate_offset
(X)[source]¶ Convenience function to add an offset column to phi. You can use this function to add an offset (bias on y axis) to phi in your custom self._phi(X).
-
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
posterior_inf
(X=None, posterior=None)[source]¶ Do the posterior inference on the parameters given this kernels functions and the model posterior, which has to be a GPy posterior, usually found at m.posterior, if m is a GPy model. If not given we search for the the highest parent to be a model, containing the posterior, and for X accordingly.
-
-
class
ChangePointBasisFuncKernel
(input_dim, changepoint, variance=1.0, active_dims=None, ARD=False, name='changepoint')[source]¶ Bases:
GPy.kern.src.basis_funcs.BasisFuncKernel
The basis function has a changepoint. That is, it is constant, jumps at a single point (given as changepoint) and is constant again. You can give multiple changepoints. The changepoints are calculated using np.where(self.X < self.changepoint), -1, 1)
-
class
DomainKernel
(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='constant_domain')[source]¶ Bases:
GPy.kern.src.basis_funcs.LinearSlopeBasisFuncKernel
Create a constant plateou of correlation between start and stop and zero elsewhere. This is a constant shift of the outputs along the yaxis in the range from start to stop.
-
class
LinearSlopeBasisFuncKernel
(input_dim, start, stop, variance=1.0, active_dims=None, ARD=False, name='linear_segment')[source]¶ Bases:
GPy.kern.src.basis_funcs.BasisFuncKernel
A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.
Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.
-
class
LogisticBasisFuncKernel
(input_dim, centers, variance=1.0, slope=1.0, active_dims=None, ARD=False, ARD_slope=True, name='logistic')[source]¶ Bases:
GPy.kern.src.basis_funcs.BasisFuncKernel
Create a series of logistic basis functions with centers given. The slope gets computed by datafit. The number of centers determines the number of logistic functions.
-
class
PolynomialBasisFuncKernel
(input_dim, degree, variance=1.0, active_dims=None, ARD=True, name='polynomial_basis')[source]¶ Bases:
GPy.kern.src.basis_funcs.BasisFuncKernel
A linear segment transformation. The segments start at start, are then linear to stop and constant again. The segments are normalized, so that they have exactly as much mass above as below the origin.
Start and stop can be tuples or lists of starts and stops. Behaviour of start stop is as np.where(X<start) would do.
GPy.kern.src.brownian module¶
-
class
Brownian
(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]¶ Bases:
GPy.kern.src.kern.Kern
Brownian motion in 1D only.
Negative times are treated as a separate (backwards!) Brownian motion.
Parameters: - input_dim (int) – the number of input dimensions
- variance (float) –
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
GPy.kern.src.coregionalize module¶
-
class
Coregionalize
(input_dim, output_dim, rank=1, W=None, kappa=None, active_dims=None, name='coregion')[source]¶ Bases:
GPy.kern.src.kern.Kern
Covariance function for intrinsic/linear coregionalization models
This covariance has the form:
\[\mathbf{B} = \mathbf{W}\mathbf{W}^\intercal + \mathrm{diag}(kappa)\]An intrinsic/linear coregionalization covariance function of the form:
\[k_2(x, y)=\mathbf{B} k(x, y)\]it is obtained as the tensor product between a covariance function k(x, y) and B.
Parameters: - output_dim (int) – number of outputs to coregionalize
- rank (int) – number of columns of the W matrix (this parameter is ignored if parameter W is not None)
- W (numpy array of dimensionality (num_outpus, W_columns)) – a low rank matrix that determines the correlations between the different outputs, together with kappa it forms the coregionalization matrix B
- kappa (numpy array of dimensionality (output_dim, )) – a vector which allows the outputs to behave independently
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
GPy.kern.src.coregionalize_cython module¶
GPy.kern.src.diff_kern module¶
-
class
DiffKern
(base_kern, dimension)[source]¶ Bases:
GPy.kern.src.kern.Kern
Diff kernel is a thin wrapper for using partial derivatives of kernels as kernels. Eg. in combination with Multioutput kernel this allows the user to train GPs with observations of latent function and latent function derivatives. NOTE: DiffKern only works when used with Multioutput kernel. Do not use the kernel as standalone
The parameters the kernel needs are: -‘base_kern’: a member of Kernel class that is used for observations -‘dimension’: integer that indigates in which dimensions the partial derivative observations are
-
K
(X, X2=None, dimX2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
update_gradients_diag
(dL_dK_diag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_full
(dL_dK, X, X2=None, dimX2=None)[source]¶ Set the gradients of all parameters when doing full (N) inference.
-
gradient
¶
-
GPy.kern.src.eq_ode1 module¶
-
class
EQ_ODE1
(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, decay=None, active_dims=None, name='eq_ode1')[source]¶ Bases:
GPy.kern.src.kern.Kern
Covariance function for first order differential equation driven by an exponentiated quadratic covariance.
This outputs of this kernel have the form .. math:
rac{ ext{d}y_j}{ ext{d}t} = sum_{i=1}^R w_{j,i} u_i(t-delta_j) - d_jy_j(t)
where \(R\) is the rank of the system, \(w_{j,i}\) is the sensitivity of the \(j\) is the decay rate of the \(j\) are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.
param output_dim: number of outputs driven by latent function. type output_dim: int param W: sensitivities of each output to the latent driving function. type W: ndarray (output_dim x rank). param rank: If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance. type rank: int param decay: decay rates for the first order system. type decay: array of length output_dim. param delay: delay between latent force and output response. type delay: array of length output_dim. param kappa: diagonal term that allows each latent output to have an independent component to the response. type kappa: array of length output_dim. -
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.eq_ode2 module¶
-
class
EQ_ODE2
(input_dim=2, output_dim=1, rank=1, W=None, lengthscale=None, C=None, B=None, active_dims=None, name='eq_ode2')[source]¶ Bases:
GPy.kern.src.kern.Kern
Covariance function for second order differential equation driven by an exponentiated quadratic covariance.
This outputs of this kernel have the form .. math:
rac{ ext{d}^2y_j(t)}{ ext{d}^2t} + C_j rac{ ext{d}y_j(t)}{ ext{d}t} + B_jy_j(t) = sum_{i=1}^R w_{j,i} u_i(t)
where \(R\) is the rank of the system, \(w_{j,i}\) is the sensitivity of the \(j\) is the decay rate of the \(j\) and \(g_i(t)\) are independent latent Gaussian processes goverened by an exponentiated quadratic covariance.
param output_dim: number of outputs driven by latent function. type output_dim: int param W: sensitivities of each output to the latent driving function. type W: ndarray (output_dim x rank). param rank: If rank is greater than 1 then there are assumed to be a total of rank latent forces independently driving the system, each with identical covariance. type rank: int param C: damper constant for the second order system. type C: array of length output_dim. param B: spring constant for the second order system. type B: array of length output_dim. -
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.grid_kerns module¶
-
class
GridKern
(input_dim, variance, lengthscale, ARD, active_dims, name, originalDimensions, useGPU=False)[source]¶
-
class
GridRBF
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='gridRBF', originalDimensions=1, useGPU=False)[source]¶ Bases:
GPy.kern.src.grid_kerns.GridKern
Similar to regular RBF but supplemented with methods required for Gaussian grid regression Radial Basis Function kernel, aka squared-exponential, exponentiated quadratic or Gaussian kernel:
\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)\]
GPy.kern.src.independent_outputs module¶
-
class
Hierarchical
(kernels, name='hierarchy')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
A kernel which can represent a simple hierarchical model.
See Hensman et al 2013, “Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters” http://www.biomedcentral.com/1471-2105/14/252
To construct this kernel, you must pass a list of kernels. the first kernel will be assumed to be the ‘base’ kernel, and will be computed everywhere. For every additional kernel, we assume another layer in the hierachy, with a corresponding column of the input matrix which indexes which function the data are in at that level.
For more, see the ipython notebook documentation on Hierarchical covariances.
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
-
class
IndependentOutputs
(kernels, index_dim=-1, name='independ')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
A kernel which can represent several independent functions. this kernel ‘switches off’ parts of the matrix where the output indexes are different.
The index of the functions is given by the last column in the input X the rest of the columns of X are passed to the underlying kernel for computation (in blocks).
Parameters: kernels – either a kernel, or list of kernels to work with. If it is a list of kernels the indices in the index_dim, index the kernels you gave!
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.integral module¶
-
class
Integral
(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]¶ Bases:
GPy.kern.src.kern.Kern
Integral kernel between…
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
GPy.kern.src.integral_limits module¶
-
class
Integral_Limits
(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]¶ Bases:
GPy.kern.src.kern.Kern
Integral kernel. This kernel allows 1d histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs (on two dimensions) are the start and end points of each bin. The kernel’s predictions are the latent function which might have generated those binned results.
-
K
(X, X2=None)[source]¶ - Note: We have a latent function and an output function. We want to be able to find:
- the covariance between values of the output function
- the covariance between values of the latent function
- the “cross covariance” between values of the output function and the latent function
This method is used by GPy to either get the covariance between the outputs (K_xx) or is used to get the cross covariance (between the latent function and the outputs (K_xf). We take advantage of the places where this function is used:
- if X2 is none, then we know that the items being compared (to get the covariance for)
are going to be both from the OUTPUT FUNCTION. - if X2 is not none, then we know that the items being compared are from two different sets (the OUTPUT FUNCTION and the LATENT FUNCTION).
If we want the covariance between values of the LATENT FUNCTION, we take advantage of the fact that we only need that when we do prediction, and this only calls Kdiag (not K). So the covariance between LATENT FUNCTIONS is available from Kdiag.
-
Kdiag
(X)[source]¶ I’ve used the fact that we call this method during prediction (instead of K). When we do prediction we want to know the covariance between LATENT FUNCTIONS (K_ff) (as that’s probably what the user wants). $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$
-
k_ff
(t, tprime, l)[source]¶ Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required
-
k_xf
(t, tprime, s, l)[source]¶ Covariance between the gradient (latent value) and the actual (observed) value.
Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.
-
k_xx
(t, tprime, s, sprime, l)[source]¶ Covariance between observed values.
s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)
We’re interested in how correlated these two integrals are.
Note: We’ve not multiplied by the variance, this is done in K.
-
GPy.kern.src.kern module¶
-
class
CombinationKernel
(kernels, name, extra_dims=[], link_parameters=True)[source]¶ Bases:
GPy.kern.src.kern.Kern
Abstract super class for combination kernels. A combination kernel combines (a list of) kernels and works on those. Examples are the HierarchicalKernel or Add and Prod kernels.
Abstract super class for combination kernels. A combination kernel combines (a list of) kernels and works on those. Examples are the HierarchicalKernel or Add and Prod kernels.
Parameters: - kernels (list) – List of kernels to combine (can be only one element)
- name (str) – name of the combination kernel
- extra_dims (array-like) – if needed extra dimensions for the combination kernel to work on
-
input_sensitivity
(summarize=True)[source]¶ If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.
-
parts
¶
-
class
Kern
(input_dim, active_dims, name, useGPU=False, *a, **kw)[source]¶ Bases:
GPy.core.parameterization.parameterized.Parameterized
The base class for a kernel: a positive definite function which forms of a covariance function (kernel).
input_dim:
is the number of dimensions to work on. Make sure to give the tight dimensionality of inputs. You most likely want this to be the integer telling the number of input dimensions of the kernel.active_dims:
is the active_dimensions of inputs X we will work on. All kernels will get sliced Xes as inputs, if _all_dims_active is not None Only positive integers are allowed in active_dims! if active_dims is None, slicing is switched off and all X will be passed through as given.Parameters: - input_dim (int) – the number of input dimensions to the function
- active_dims (array-like|None) – list of indices on which dimensions this kernel works on, or none if no slicing
Do not instantiate.
-
K
(X, X2)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
add
(other, name='sum')[source]¶ Add another kernel to this one.
Parameters: other (GPy.kern) – the other kernel to be added
-
static
from_dict
(input_dict)[source]¶ Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.
Parameters: input_dict (dict) – Dictionary with all the information needed to instantiate the object.
-
get_most_significant_input_dimensions
(which_indices=None)[source]¶ Determine which dimensions should be plotted
Returns the top three most signification input dimensions
if less then three dimensions, the non existing dimensions are labeled as None, so for a 1 dimensional input this returns (0, None, None).
Parameters: which_indices (int or tuple(int,int) or tuple(int,int,int)) – force the indices to be the given indices.
-
gradients_X
(dL_dK, X, X2)[source]¶ - \[\frac{\partial L}{\partial X} = \frac{\partial L}{\partial K}\frac{\partial K}{\partial X}\]
-
gradients_XX
(dL_dK, X, X2, cov=True)[source]¶ - \[\frac{\partial^2 L}{\partial X\partial X_2} = \frac{\partial L}{\partial K}\frac{\partial^2 K}{\partial X\partial X_2}\]
-
gradients_XX_diag
(dL_dKdiag, X, cov=True)[source]¶ The diagonal of the second derivative w.r.t. X and X2
-
gradients_Z_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior, psi0=None, psi1=None, psi2=None)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
input_sensitivity
(summarize=True)[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
plot
(*args, **kwargs)¶
-
plot_ARD
(filtering=None, legend=False, canvas=None, **kwargs)¶ If an ARD kernel is present, plot a bar representation using matplotlib
Parameters: - fignum – figure number of the plot
- filtering (list of names to use for ARD plot) – list of names, which to use for plotting ARD parameters. Only kernels which match names in the list of names in filtering will be used for plotting.
-
plot_covariance
(x=None, label=None, plot_limits=None, visible_dims=None, resolution=None, projection='2d', levels=20, **kwargs)¶ Plot a kernel covariance w.r.t. another x.
Parameters: - x (array-like) – the value to use for the other kernel argument (kernels are a function of two variables!)
- plot_limits (Either (xmin, xmax) for 1D or (xmin, xmax, ymin, ymax) / ((xmin, xmax), (ymin, ymax)) for 2D) – the range over which to plot the kernel
- visible_dims (array-like) – input dimensions (!) to use for x. Make sure to select 2 or less dimensions to plot.
- projection ({2d|3d}) – What projection shall we use to plot the kernel?
- levels (int) – for 2D projection, how many levels for the contour plot to use?
- kwargs – valid kwargs for your specific plotting library
Resolution: the resolution of the lines used in plotting. for 2D this defines the grid for kernel evaluation.
-
prod
(other, name='mul')[source]¶ Multiply two kernels (either on the same space, or on the tensor product of the input space).
Parameters: other (GPy.kern) – the other kernel to be added
-
psi2
(Z, variational_posterior)[source]¶ - \[\psi_2^{m,m'} = \sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m'})]\]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
GPy.kern.src.kernel_slice_operations module¶
Created on 11 Mar 2014
@author: @mzwiessele
This module provides a meta class for the kernels. The meta class is for slicing the inputs (X, X2) for the kernels, before K (or any other method involving X) gets calls. The _all_dims_active of a kernel decide which dimensions the kernel works on.
GPy.kern.src.linear module¶
-
class
Linear
(input_dim, variances=None, ARD=False, active_dims=None, name='linear')[source]¶ Bases:
GPy.kern.src.kern.Kern
Linear kernel
\[k(x,y) = \sum_{i=1}^{\text{input_dim}} \sigma^2_i x_iy_i\]Parameters: - input_dim (int) – the number of input dimensions
- variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances \(\sigma^2_i\)
- ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type: kernel object
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
gradients_XX
(dL_dK, X, X2=None)[source]¶ Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:
returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].
..math:
rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}
- ..returns:
- dL2_dXdX2: [NxMxQxQ] for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None)
- Thus, we return the second derivative in X2.
-
gradients_Z_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
input_sensitivity
(summarize=True)[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
class
LinearFull
(input_dim, rank, W=None, kappa=None, active_dims=None, name='linear_full')[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.mlp module¶
-
class
MLP
(input_dim, variance=1.0, weight_variance=1.0, bias_variance=1.0, ARD=False, active_dims=None, name='mlp')[source]¶ Bases:
GPy.kern.src.kern.Kern
Multi layer perceptron kernel (also known as arc sine kernel or neural network kernel)
\[k(x,y) = \sigma^{2}\frac{2}{\pi } \text{asin} \left ( \frac{ \sigma_w^2 x^\top y+\sigma_b^2}{\sqrt{\sigma_w^2x^\top x + \sigma_b^2 + 1}\sqrt{\sigma_w^2 y^\top y + \sigma_b^2 +1}} \right )\]Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance \(\sigma^2\)
- weight_variance (array or list of the appropriate size (or float if there is only one weight variance parameter)) – the vector of the variances of the prior over input weights in the neural network \(\sigma^2_w\)
- bias_variance – the variance of the prior over bias parameters \(\sigma^2_b\)
- ARD (Boolean) – Auto Relevance Determination. If equal to “False”, the kernel is isotropic (ie. one weight variance parameter sigma^2_w), otherwise there is one weight variance parameter per dimension.
Return type: Kernpart object
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
GPy.kern.src.multidimensional_integral_limits module¶
-
class
Multidimensional_Integral_Limits
(input_dim, variances=None, lengthscale=None, ARD=False, active_dims=None, name='integral')[source]¶ Bases:
GPy.kern.src.kern.Kern
Integral kernel, can include limits on each integral value. This kernel allows an n-dimensional histogram or binned data to be modelled. The outputs are the counts in each bin. The inputs are the start and end points of each bin: Pairs of inputs act as the limits on each bin. So inputs 4 and 5 provide the start and end values of each bin in the 3rd dimension. The kernel’s predictions are the latent function which might have generated those binned results.
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
Kdiag
(X)[source]¶ I’ve used the fact that we call this method for K_ff when finding the covariance as a hack so I know if I should return K_ff or K_xx. In this case we’re returning K_ff!! $K_{ff}^{post} = K_{ff} - K_{fx} K_{xx}^{-1} K_{xf}$
-
k_ff
(t, tprime, l)[source]¶ Doesn’t need s or sprime as we’re looking at the ‘derivatives’, so no domains over which to integrate are required
-
k_xf
(t, tprime, s, l)[source]¶ Covariance between the gradient (latent value) and the actual (observed) value.
Note that sprime isn’t actually used in this expression, presumably because the ‘primes’ are the gradient (latent) values which don’t involve an integration, and thus there is no domain over which they’re integrated, just a single value that we want.
-
k_xx
(t, tprime, s, sprime, l)[source]¶ Covariance between observed values.
s and t are one domain of the integral (i.e. the integral between s and t) sprime and tprime are another domain of the integral (i.e. the integral between sprime and tprime)
We’re interested in how correlated these two integrals are.
Note: We’ve not multiplied by the variance, this is done in K.
-
GPy.kern.src.multioutput_derivative_kern module¶
-
class
KernWrapper
(fk, fug, fg, base_kern)[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
update_gradients_full
(dL_dK, X, X2=None)[source]¶ Set the gradients of all parameters when doing full (N) inference.
-
gradient
¶
-
-
class
MultioutputDerivativeKern
(kernels, cross_covariances={}, name='MultioutputDerivativeKern')[source]¶ Bases:
GPy.kern.src.multioutput_kern.MultioutputKern
Multioutput derivative kernel is a meta class for combining different kernels for multioutput GPs. Multioutput derivative kernel is only a thin wrapper for Multioutput kernel for user not having to define cross covariances.
GPy.kern.src.multioutput_kern module¶
-
class
MultioutputKern
(kernels, cross_covariances={}, name='MultioutputKern')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
Multioutput kernel is a meta class for combining different kernels for multioutput GPs.
As an example let us have inputs x1 for output 1 with covariance k1 and x2 for output 2 with covariance k2. In addition, we need to define the cross covariances k12(x1,x2) and k21(x2,x1). Then the kernel becomes: k([x1,x2],[x1,x2]) = [k1(x1,x1) k12(x1, x2); k21(x2, x1), k2(x2,x2)]
For the kernel, the kernels of outputs are given as list in param “kernels” and cross covariances are given in param “cross_covariances” as a dictionary of tuples (i,j) as keys. If no cross covariance is given, it defaults to zero, as in k12(x1,x2)=0.
In the cross covariance dictionary, the value needs to be a struct with elements -‘kernel’: a member of Kernel class that stores the hyper parameters to be updated when optimizing the GP -‘K’: function defining the cross covariance -‘update_gradients_full’: a function to be used for updating gradients -‘gradients_X’: gives a gradient of the cross covariance with respect to the first input
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
-
class
ZeroKern
[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
update_gradients_full
(dL_dK, X, X2=None)[source]¶ Set the gradients of all parameters when doing full (N) inference.
-
gradient
¶
-
GPy.kern.src.periodic module¶
-
class
Periodic
(input_dim, variance, lengthscale, period, n_freq, lower, upper, active_dims, name)[source]¶ Bases:
GPy.kern.src.kern.Kern
Parameters: - variance (float) – the variance of the Matern kernel
- lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel
- period (float) – the period
- n_freq (int) – the number of frequencies considered for the periodic subspace
Return type: kernel object
-
class
PeriodicExponential
(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_exponential')[source]¶ Bases:
GPy.kern.src.periodic.Periodic
Kernel of the periodic subspace (up to a given frequency) of a exponential (Matern 1/2) RKHS.
Only defined for input_dim=1.
-
class
PeriodicMatern32
(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern32')[source]¶ Bases:
GPy.kern.src.periodic.Periodic
Kernel of the periodic subspace (up to a given frequency) of a Matern 3/2 RKHS. Only defined for input_dim=1.
Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance of the Matern kernel
- lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel
- period (float) – the period
- n_freq (int) – the number of frequencies considered for the periodic subspace
Return type: kernel object
-
class
PeriodicMatern52
(input_dim=1, variance=1.0, lengthscale=1.0, period=6.283185307179586, n_freq=10, lower=0.0, upper=12.566370614359172, active_dims=None, name='periodic_Matern52')[source]¶ Bases:
GPy.kern.src.periodic.Periodic
Kernel of the periodic subspace (up to a given frequency) of a Matern 5/2 RKHS. Only defined for input_dim=1.
Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance of the Matern kernel
- lengthscale (np.ndarray of size (input_dim,)) – the lengthscale of the Matern kernel
- period (float) – the period
- n_freq (int) – the number of frequencies considered for the periodic subspace
Return type: kernel object
GPy.kern.src.poly module¶
-
class
Poly
(input_dim, variance=1.0, scale=1.0, bias=1.0, order=3.0, active_dims=None, name='poly')[source]¶ Bases:
GPy.kern.src.kern.Kern
Polynomial kernel
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.prod module¶
-
class
Prod
(kernels, name='mul')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
Computes the product of 2 kernels
Parameters: k2 (k1,) – the kernels to multiply Return type: kernel object -
K
(X, X2=None, which_parts=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
input_sensitivity
(summarize=True)[source]¶ If summize is true, we want to get the summerized view of the sensitivities, otherwise put everything into an array with shape (#kernels, input_dim) in the order of appearance of the kernels in the parameterized object.
-
sde_update_gradient_full
(gradients)[source]¶ Update gradient in the order in which parameters are represented in the kernel
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
-
dkron
(A, dA, B, dB, operation='prod')[source]¶ Function computes the derivative of Kronecker product A*B (or Kronecker sum A+B).
- A: 2D matrix
- Some matrix
- dA: 3D (or 2D matrix)
- Derivarives of A
- B: 2D matrix
- Some matrix
- dB: 3D (or 2D matrix)
- Derivarives of B
- operation: str ‘prod’ or ‘sum’
- Which operation is considered. If the operation is ‘sum’ it is assumed that A and are square matrices.s
- Output:
- dC: 3D matrix Derivative of Kronecker product A*B (or Kronecker sum A+B)
GPy.kern.src.rbf module¶
-
class
RBF
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='rbf', useGPU=False, inv_l=False)[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Radial Basis Function kernel, aka squared-exponential, exponentiated quadratic or Gaussian kernel:
\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg)\]-
dK2_drdr_diag
()[source]¶ Second order derivative of K in r_{i,i}. The diagonal entries are always zero, so we do not give it here.
-
gradients_Z_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
parameters_changed
()[source]¶ This method gets called when parameters have changed. Another way of listening to param changes is to add self as a listener to the param, such that updates get passed through. See :py:function:
paramz.param.Observable.add_observer
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.
See also update_gradients_full
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
GPy.kern.src.sde_brownian module¶
Classes in this module enhance Brownian motion covariance function with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Brownian
(input_dim=1, variance=1.0, active_dims=None, name='Brownian')[source]¶ Bases:
GPy.kern.src.brownian.Brownian
Class provide extra functionality to transfer this covariance function into SDE form.
Linear kernel:
\[k(x,y) = \sigma^2 min(x,y)\]
GPy.kern.src.sde_linear module¶
Classes in this module enhance Linear covariance function with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Linear
(input_dim, X, variances=None, ARD=False, active_dims=None, name='linear')[source]¶ Bases:
GPy.kern.src.linear.Linear
Class provide extra functionality to transfer this covariance function into SDE form.
Linear kernel:
\[k(x,y) = \sum_{i=1}^{input dim} \sigma^2_i x_iy_i\]Modify the init method, because one extra parameter is required. X - points on the X axis.
GPy.kern.src.sde_matern module¶
Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Matern32
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]¶ Bases:
GPy.kern.src.stationary.Matern32
Class provide extra functionality to transfer this covariance function into SDE forrm.
Matern 3/2 kernel:
\[k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]rac{(x_i-y_i)^2}{ell_i^2} }
-
class
sde_Matern52
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]¶ Bases:
GPy.kern.src.stationary.Matern52
Class provide extra functionality to transfer this covariance function into SDE forrm.
Matern 5/2 kernel:
\[k(r) = \sigma^2 (1 + \sqrt{5} r + \]rac{5}{3}r^2) exp(- sqrt{5} r) ext{ where } r = sqrt{sum_{i=1}^{input dim} rac{(x_i-y_i)^2}{ell_i^2} }
GPy.kern.src.sde_standard_periodic module¶
Classes in this module enhance Matern covariance functions with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_StdPeriodic
(*args, **kwargs)[source]¶ Bases:
GPy.kern.src.standard_periodic.StdPeriodic
Class provide extra functionality to transfer this covariance function into SDE form.
Standard Periodic kernel:
\[k(x,y) = heta_1 \exp \left[ - \]- rac{1}{2} {}sum_{i=1}^{input_dim}
- left(
rac{sin( rac{pi}{lambda_i} (x_i - y_i) )}{l_i} ight)^2 ight] }
Init constructior.
Two optinal extra parameters are added in addition to the ones in StdPeriodic kernel.
Parameters: - approx_order (int) – approximation order for the RBF covariance. (Default 7)
- balance (bool) – Whether to balance this kernel separately. (Defaulf False). Model has a separate parameter for balancing.
-
sde
()[source]¶ Return the state space representation of the standard periodic covariance.
! Note: one must constrain lengthscale not to drop below 0.2. (independently of approximation order) After this Bessel functions of the first becomes NaN. Rescaling time variable might help.
! Note: one must keep period also not very low. Because then the gradients wrt wavelength become ustable. However this might depend on the data. For test example with 300 data points the low limit is 0.15.
-
seriescoeff
(m=6, lengthScale=1.0, magnSigma2=1.0, true_covariance=False)[source]¶ Calculate the coefficients q_j^2 for the covariance function approximation:
k( au) = sum_{j=0}^{+infty} q_j^2 cos(jomega_0 au)Reference is:
- [1] Arno Solin and Simo Särkkä (2014). Explicit link between periodic
- covariance functions and state space models. In Proceedings of the Seventeenth International Conference on Artifcial Intelligence and Statistics (AISTATS 2014). JMLR: W&CP, volume 33.
- Note! Only the infinite approximation (through Bessel function)
- is currently implemented.
- m: int
- Degree of approximation. Default 6.
- lengthScale: float
- Length scale parameter in the kerenl
- magnSigma2:float
- Multiplier in front of the kernel.
- coeffs: array(m+1)
- Covariance series coefficients
- coeffs_dl: array(m+1)
- Derivatives of the coefficients with respect to lengthscale.
GPy.kern.src.sde_static module¶
Classes in this module enhance Static covariance functions with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Bias
(input_dim, variance=1.0, active_dims=None, name='bias')[source]¶ Bases:
GPy.kern.src.static.Bias
Class provide extra functionality to transfer this covariance function into SDE forrm.
Bias kernel:
\[k(x,y) = lpha\]
-
class
sde_White
(input_dim, variance=1.0, active_dims=None, name='white')[source]¶ Bases:
GPy.kern.src.static.White
Class provide extra functionality to transfer this covariance function into SDE forrm.
White kernel:
\[k(x,y) = lpha*\delta(x-y)\]
GPy.kern.src.sde_stationary module¶
Classes in this module enhance several stationary covariance functions with the Stochastic Differential Equation (SDE) functionality.
-
class
sde_Exponential
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]¶ Bases:
GPy.kern.src.stationary.Exponential
Class provide extra functionality to transfer this covariance function into SDE form.
Exponential kernel:
\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]rac{(x_i-y_i)^2}{ell_i^2} }
-
class
sde_RBF
(*args, **kwargs)[source]¶ Bases:
GPy.kern.src.rbf.RBF
Class provide extra functionality to transfer this covariance function into SDE form.
Radial Basis Function kernel:
\[k(r) = \sigma^2 \exp \bigg(- \frac{1}{2} r^2 \bigg) \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]rac{(x_i-y_i)^2}{ell_i^2} }
Init constructior.
Two optinal extra parameters are added in addition to the ones in RBF kernel.
Parameters: - approx_order (int) – approximation order for the RBF covariance. (Default 10)
- balance (bool) – Whether to balance this kernel separately. (Defaulf True). Model has a separate parameter for balancing.
-
sde
()[source]¶ Return the state space representation of the covariance.
Note! For Sparse GP inference too small or two high values of lengthscale lead to instabilities. This is because Qc are too high or too low and P_inf are not full rank. This effect depends on approximatio order. For N = 10. lengthscale must be in (0.8,8). For other N tests must be conducted. N=6: (0.06,31) Variance should be within reasonable bounds as well, but its dependence is linear.
The above facts do not take into accout regularization.
-
class
sde_RatQuad
(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]¶ Bases:
GPy.kern.src.stationary.RatQuad
Class provide extra functionality to transfer this covariance function into SDE form.
Rational Quadratic kernel:
\[k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- lpha} \ \ \ \ ext{ where } r = \sqrt{\sum_{i=1}^{input dim} \]rac{(x_i-y_i)^2}{ell_i^2} }
GPy.kern.src.spline module¶
-
class
Spline
(input_dim, variance=1.0, c=1.0, active_dims=None, name='spline')[source]¶ Bases:
GPy.kern.src.kern.Kern
Linear spline kernel. You need to specify 2 parameters: the variance and c. The variance is defined in powers of 10. Thus specifying -2 means 10^-2. The parameter c allows to define the stiffness of the spline fit. A very stiff spline equals linear regression. See https://www.youtube.com/watch?v=50Vgw11qn0o starting at minute 1:17:28 Lit: Wahba, 1990
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
GPy.kern.src.splitKern module¶
A new kernel
-
class
DEtime
(kernel, idx_p, Xp, index_dim=-1, name='DiffGenomeKern')[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
-
class
SplitKern
(kernel, Xp, index_dim=-1, name='SplitKern')[source]¶ Bases:
GPy.kern.src.kern.CombinationKernel
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
-
class
SplitKern_cross
(kernel, Xp, name='SplitKern_cross')[source]¶ Bases:
GPy.kern.src.kern.Kern
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
GPy.kern.src.standard_periodic module¶
The standard periodic kernel which mentioned in:
[1] Gaussian Processes for Machine Learning, C. E. Rasmussen, C. K. I. Williams. The MIT Press, 2005.
[2] Introduction to Gaussian processes. D. J. C. MacKay. In C. M. Bishop, editor, Neural Networks and Machine Learning, pages 133-165. Springer, 1998.
-
class
StdPeriodic
(input_dim, variance=1.0, period=None, lengthscale=None, ARD1=False, ARD2=False, active_dims=None, name='std_periodic', useGPU=False)[source]¶ Bases:
GPy.kern.src.kern.Kern
Standart periodic kernel
\[k(x,y) = heta_1 \exp \left[ - \]- rac{1}{2} sum_{i=1}^{input_dim}
- left(
rac{sin( rac{pi}{T_i} (x_i - y_i) )}{l_i} ight)^2 ight] }
param input_dim: the number of input dimensions type input_dim: int param variance: the variance :math:` heta_1` in the formula above type variance: float param period: the vector of periods \(\T_i\). If None then 1.0 is assumed. type period: array or list of the appropriate size (or float if there is only one period parameter) param lengthscale: the vector of lengthscale \(\l_i\). If None then 1.0 is assumed. type lengthscale: array or list of the appropriate size (or float if there is only one lengthscale parameter) param ARD1: Auto Relevance Determination with respect to period. If equal to “False” one single period parameter \(\T_i\) for each dimension is assumed, otherwise there is one lengthscale parameter per dimension. type ARD1: Boolean param ARD2: Auto Relevance Determination with respect to lengthscale. If equal to “False” one single lengthscale parameter \(l_i\) for each dimension is assumed, otherwise there is one lengthscale parameter per dimension. type ARD2: Boolean param active_dims: indices of dimensions which are used in the computation of the kernel type active_dims: array or list of the appropriate size param name: Name of the kernel for output :type String :param useGPU: whether of not use GPU :type Boolean
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
input_sensitivity
(summarize=True)[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
parameters_changed
()[source]¶ This functions deals as a callback for each optimization iteration. If one optimization step was successfull and the parameters this callback function will be called to be able to update any precomputations for the kernel.
-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
GPy.kern.src.static module¶
-
class
Bias
(input_dim, variance=1.0, active_dims=None, name='bias')[source]¶ Bases:
GPy.kern.src.static.Static
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
-
class
Fixed
(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='fixed')[source]¶ Bases:
GPy.kern.src.static.Static
Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance of the kernel
-
K
(X, X2)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
class
Precomputed
(input_dim, covariance_matrix, variance=1.0, active_dims=None, name='precomputed')[source]¶ Bases:
GPy.kern.src.static.Fixed
Class for precomputed kernels, indexed by columns in X
Usage example:
import numpy as np from GPy.models import GPClassification from GPy.kern import Precomputed from sklearn.cross_validation import LeaveOneOut
n = 10 d = 100 X = np.arange(n).reshape((n,1)) # column vector of indices y = 2*np.random.binomial(1,0.5,(n,1))-1 X0 = np.random.randn(n,d) k = np.dot(X0,X0.T) kern = Precomputed(1,k) # k is a n x n covariance matrix
cv = LeaveOneOut(n) ypred = y.copy() for train, test in cv:
m = GPClassification(X[train], y[train], kernel=kern) m.optimize() ypred[test] = 2*(m.predict(X[test])[0]>0.5)-1Parameters: - input_dim (int) – the number of input dimensions
- variance (float) – the variance of the kernel
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
class
Static
(input_dim, variance, active_dims, name)[source]¶ Bases:
GPy.kern.src.kern.Kern
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
gradients_XX
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial^2 L}{partial Xpartial X_2} = frac{partial L}{partial K}frac{partial^2 K}{partial Xpartial X_2}
-
gradients_XX_diag
(dL_dKdiag, X, cov=False)[source]¶ The diagonal of the second derivative w.r.t. X and X2
-
gradients_Z_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Returns the derivative of the objective wrt Z, using the chain rule through the expectation variables.
-
gradients_qX_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Compute the gradients wrt the parameters of the variational distruibution q(X), chain-ruling via the expectations of the kernel
-
-
class
White
(input_dim, variance=1.0, active_dims=None, name='white')[source]¶ Bases:
GPy.kern.src.static.Static
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
-
class
WhiteHeteroscedastic
(input_dim, num_data, variance=1.0, active_dims=None, name='white_hetero')[source]¶ Bases:
GPy.kern.src.static.Static
A heteroscedastic White kernel (nugget/noise). It defines one variance (nugget) per input sample.
Prediction excludes any noise learnt by this Kernel, so be careful using this kernel.
You can plot the errors learnt by this kernel by something similar as: plt.errorbar(m.X, m.Y, yerr=2*np.sqrt(m.kern.white.variance))
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
psi2
(Z, variational_posterior)[source]¶ - \[\]
psi_2^{m,m’} = sum_{i=0}^{n}E_{q(X)}[ k(Z_m, X_i) k(X_i, Z_{m’})]
-
psi2n
(Z, variational_posterior)[source]¶ - \[\psi_2^{n,m,m'} = E_{q(X)}[ k(Z_m, X_n) k(X_n, Z_{m'})]\]
Thus, we do not sum out n, compared to psi2
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ update the gradients of all parameters when using only the diagonal elements of the covariance matrix
-
update_gradients_expectations
(dL_dpsi0, dL_dpsi1, dL_dpsi2, Z, variational_posterior)[source]¶ Set the gradients of all parameters when doing inference with uncertain inputs, using expectations of the kernel.
The essential maths is
\[\frac{\partial L}{\partial \theta_i} & = \frac{\partial L}{\partial \psi_0}\frac{\partial \psi_0}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_1}\frac{\partial \psi_1}{\partial \theta_i}\ & \quad + \frac{\partial L}{\partial \psi_2}\frac{\partial \psi_2}{\partial \theta_i}\]Thus, we push the different derivatives through the gradients of the psi statistics. Be sure to set the gradients for all kernel parameters here.
-
GPy.kern.src.stationary module¶
-
class
Cosine
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Cosine')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Cosine Covariance function
\[k(r) = \sigma^2 \cos(r)\]
-
class
ExpQuad
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='ExpQuad')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
The Exponentiated quadratic covariance function.
\[k(r) = \sigma^2 \exp(- 0.5 r^2)\]- notes::
- This is exactly the same as the RBF covariance function, but the RBF implementation also has some features for doing variational kernels (the psi-statistics).
-
class
ExpQuadCosine
(input_dim, variance=1.0, lengthscale=None, period=1.0, ARD=False, active_dims=None, name='ExpQuadCosine')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Exponentiated quadratic multiplied by cosine covariance function (spectral mixture kernel).
\[k(r) = \sigma^2 \exp(-2\pi^2r^2)\cos(2\pi r/T)\]
-
class
Exponential
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Exponential')[source]¶
-
class
Matern32
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat32')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Matern 3/2 kernel:
\[k(r) = \sigma^2 (1 + \sqrt{3} r) \exp(- \sqrt{3} r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{\text{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }\]-
Gram_matrix
(F, F1, F2, lower, upper)[source]¶ Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.
Parameters: - F (np.array) – vector of functions
- F1 (np.array) – vector of derivatives of F
- F2 (np.array) – vector of second derivatives of F
- lower,upper (floats) – boundaries of the input domain
-
-
class
Matern52
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Mat52')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Matern 5/2 kernel:
\[k(r) = \sigma^2 (1 + \sqrt{5} r + \frac53 r^2) \exp(- \sqrt{5} r)\]-
Gram_matrix
(F, F1, F2, F3, lower, upper)[source]¶ Return the Gram matrix of the vector of functions F with respect to the RKHS norm. The use of this function is limited to input_dim=1.
Parameters: - F (np.array) – vector of functions
- F1 (np.array) – vector of derivatives of F
- F2 (np.array) – vector of second derivatives of F
- F3 (np.array) – vector of third derivatives of F
- lower,upper (floats) – boundaries of the input domain
-
-
class
OU
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='OU')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
OU kernel:
\[k(r) = \sigma^2 \exp(- r) \ \ \ \ \text{ where } r = \sqrt{\sum_{i=1}^{ ext{input_dim}} \frac{(x_i-y_i)^2}{\ell_i^2} }\]
-
class
RatQuad
(input_dim, variance=1.0, lengthscale=None, power=2.0, ARD=False, active_dims=None, name='RatQuad')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Rational Quadratic Kernel
\[k(r) = \sigma^2 \bigg( 1 + \frac{r^2}{2} \bigg)^{- \alpha}\]-
to_dict
()[source]¶ Convert the object into a json serializable dictionary.
Note: It uses the private method _save_to_input_dict of the parent.
Return dict: json serializable dictionary containing the needed information to instantiate the object
-
-
class
Sinc
(input_dim, variance=1.0, lengthscale=None, ARD=False, active_dims=None, name='Sinc')[source]¶ Bases:
GPy.kern.src.stationary.Stationary
Sinc Covariance function
\[k(r) = \sigma^2 \sinc(\pi r)\]
-
class
Stationary
(input_dim, variance, lengthscale, ARD, active_dims, name, useGPU=False)[source]¶ Bases:
GPy.kern.src.kern.Kern
Stationary kernels (covariance functions).
Stationary covariance fucntion depend only on r, where r is defined as
\[r(x, x') = \sqrt{ \sum_{q=1}^Q (x_q - x'_q)^2 }\]The covariance function k(x, x’ can then be written k(r).
In this implementation, r is scaled by the lengthscales parameter(s):
\[r(x, x') = \sqrt{ \sum_{q=1}^Q \frac{(x_q - x'_q)^2}{\ell_q^2} }.\]By default, there’s only one lengthscale: seaprate lengthscales for each dimension can be enables by setting ARD=True.
To implement a stationary covariance function using this class, one need only define the covariance function k(r), and it derivative.
``` def K_of_r(self, r):
return foo- def dK_dr(self, r):
- return bar
The lengthscale(s) and variance parameters are added to the structure automatically.
Thanks to @strongh: In Stationary, a covariance function is defined in GPy as stationary when it depends only on the l2-norm |x_1 - x_2 |. However this is the typical definition of isotropy, while stationarity is usually a bit more relaxed. The more common version of stationarity is that the covariance is a function of x_1 - x_2 (See e.g. R&W first paragraph of section 4.1).
-
K
(X, X2=None)[source]¶ Kernel function applied on inputs X and X2. In the stationary case there is an inner function depending on the distances from X to X2, called r.
K(X, X2) = K_of_r((X-X2)**2)
-
dK2_drdr_diag
()[source]¶ Second order derivative of K in r_{i,i}. The diagonal entries are always zero, so we do not give it here.
-
get_one_dimensional_kernel
(dimensions)[source]¶ Specially intended for the grid regression case For a given covariance kernel, this method returns the corresponding kernel for a single dimension. The resulting values can then be used in the algorithm for reconstructing the full covariance matrix.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ Given the derivative of the objective wrt K (dL_dK), compute the derivative wrt X
-
gradients_XX
(dL_dK, X, X2=None)[source]¶ Given the derivative of the objective K(dL_dK), compute the second derivative of K wrt X and X2:
returns the full covariance matrix [QxQ] of the input dimensionfor each pair or vectors, thus the returned array is of shape [NxNxQxQ].
..math:
rac{partial^2 K}{partial X2 ^2} = - rac{partial^2 K}{partial Xpartial X2}
- ..returns:
- dL2_dXdX2: [NxMxQxQ] in the cov=True case, or [NxMxQ] in the cov=False case,
- for X [NxQ] and X2[MxQ] (X2 is X if, X2 is None) Thus, we return the second derivative in X2.
-
gradients_XX_diag
(dL_dK_diag, X)[source]¶ Given the derivative of the objective dL_dK, compute the second derivative of K wrt X:
..math:
rac{partial^2 K}{partial Xpartial X}
- ..returns:
- dL2_dXdX: [NxQxQ]
-
input_sensitivity
(summarize=True)[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
update_gradients_diag
(dL_dKdiag, X)[source]¶ Given the derivative of the objective with respect to the diagonal of the covariance matrix, compute the derivative wrt the parameters of this kernel and stor in the <parameter>.gradient field.
See also update_gradients_full
GPy.kern.src.stationary_cython module¶
GPy.kern.src.symbolic module¶
GPy.kern.src.symmetric module¶
-
class
Symmetric
(base_kernel, transform, symmetry_type='even')[source]¶ Bases:
GPy.kern.src.kern.Kern
Symmetric kernel that models a function with even or odd symmetry:
For even symmetry we have:
\[f(x) = f(Ax)\]we then model the function as:
\[f(x) = g(x) + g(Ax)\]the corresponding kernel is:
\[k(x, x') + k(Ax, x') + k(x, Ax') + k(Ax, Ax')\]For odd symmetry we have:
\[f(x) = -f(Ax)\]it does this by modelling:
\[f(x) = g(x) - g(Ax)\]with kernel
\[k(x, x') - k(Ax, x') - k(x, Ax') + k(Ax, Ax')\]where k(x, x’) is the kernel of g(x)
Parameters: - base_kernel – kernel to make symmetric
- transform – transformation matrix describing symmetry plane, A in equations above
- symmetry_type – ‘odd’ or ‘even’ depending on the symmetry needed
-
K
(X, X2)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
GPy.kern.src.trunclinear module¶
-
class
TruncLinear
(input_dim, variances=None, delta=None, ARD=False, active_dims=None, name='linear')[source]¶ Bases:
GPy.kern.src.kern.Kern
Truncated Linear kernel
\[k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)\]Parameters: - input_dim (int) – the number of input dimensions
- variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances \(\sigma^2_i\)
- ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type: kernel object
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
input_sensitivity
()[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.
-
class
TruncLinear_inf
(input_dim, interval, variances=None, ARD=False, active_dims=None, name='linear')[source]¶ Bases:
GPy.kern.src.kern.Kern
Truncated Linear kernel
\[k(x,y) = \sum_{i=1}^input_dim \sigma^2_i \max(0, x_iy_i - \sigma_q)\]Parameters: - input_dim (int) – the number of input dimensions
- variances (array or list of the appropriate size (or float if there is only one variance parameter)) – the vector of variances \(\sigma^2_i\)
- ARD (Boolean) – Auto Relevance Determination. If False, the kernel has only one variance parameter sigma^2, otherwise there is one variance parameter per dimension.
Return type: kernel object
-
K
(X, X2=None)[source]¶ Compute the kernel function.
\[K_{ij} = k(X_i, X_j)\]Parameters: - X – the first set of inputs to the kernel
- X2 – (optional) the second set of arguments to the kernel. If X2 is None, this is passed throgh to the ‘part’ object, which handLes this as X2 == X.
-
gradients_X
(dL_dK, X, X2=None)[source]¶ - \[\]
frac{partial L}{partial X} = frac{partial L}{partial K}frac{partial K}{partial X}
-
input_sensitivity
()[source]¶ Returns the sensitivity for each dimension of this kernel.
This is an arbitrary measurement based on the parameters of the kernel per dimension and scaling in general.
Use this as relative measurement, not for absolute comparison between kernels.