# GPy.models package¶

## GPy.models.bayesian_gplvm module¶

class BayesianGPLVM(Y, input_dim, X=None, X_variance=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='bayesian gplvm', mpi_comm=None, normalizer=None, missing_data=False, stochastic=False, batchsize=1, Y_metadata=None)[source]

Bayesian Gaussian Process Latent Variable Model

Parameters: Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood input_dim (int) – latent dimensionality init ('PCA'|'random') – initialisation method for the latent space
get_X_gradients(X)[source]

Get the gradients of the posterior distribution of X in its specific form.

parameters_changed()[source]
plot_inducing(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)

Plot a scatter plot of the inducing inputs.

Parameters: which_indices ([int]) – which input dimensions to plot against each other legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot marker (str) – marker to use [default is custom arrow like] kwargs – the kwargs for the scatter plots projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other resolution (int) – the resolution at which we predict the magnification factor legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot updates (bool) – if possible, make interactive updates using the specific library you are using

:param Kern kern: the kernel to use for prediction :param str marker: markers to use - cycle if more labels then markers are given :param int num_samples: the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples. :param imshow_kwargs: the kwargs for the imshow (magnification factor) :param scatter_kwargs: the kwargs for the scatter plots

plot_scatter(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)

Plot a scatter plot of the latent space.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot marker (str) – markers to use - cycle if more labels then markers are given kwargs – the kwargs for the scatter plots
plot_steepest_gradient_map(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other resolution (int) – the resolution at which we predict the magnification factor legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot updates (bool) – if possible, make interactive updates using the specific library you are using

:param Kern kern: the kernel to use for prediction :param str marker: markers to use - cycle if more labels then markers are given :param int num_samples: the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples. :param imshow_kwargs: the kwargs for the imshow (magnification factor) :param annotation_kwargs: the kwargs for the annotation plot :param scatter_kwargs: the kwargs for the scatter plots

set_X_gradients(X, X_grad)[source]

Set the gradients of the posterior distribution of X in its specific form.

## GPy.models.bayesian_gplvm_minibatch module¶

class BayesianGPLVMMiniBatch(Y, input_dim, X=None, X_variance=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='bayesian gplvm', normalizer=None, missing_data=False, stochastic=False, batchsize=1)[source]

Bayesian Gaussian Process Latent Variable Model

Parameters: Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood input_dim (int) – latent dimensionality init ('PCA'|'random') – initialisation method for the latent space
parameters_changed()[source]
plot_inducing(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)

Plot a scatter plot of the inducing inputs.

Parameters: which_indices ([int]) – which input dimensions to plot against each other legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot marker (str) – marker to use [default is custom arrow like] kwargs – the kwargs for the scatter plots projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other resolution (int) – the resolution at which we predict the magnification factor legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot updates (bool) – if possible, make interactive updates using the specific library you are using

:param Kern kern: the kernel to use for prediction :param str marker: markers to use - cycle if more labels then markers are given :param int num_samples: the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples. :param imshow_kwargs: the kwargs for the imshow (magnification factor) :param scatter_kwargs: the kwargs for the scatter plots

plot_scatter(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)

Plot a scatter plot of the latent space.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot marker (str) – markers to use - cycle if more labels then markers are given kwargs – the kwargs for the scatter plots
plot_steepest_gradient_map(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other resolution (int) – the resolution at which we predict the magnification factor legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot updates (bool) – if possible, make interactive updates using the specific library you are using

:param Kern kern: the kernel to use for prediction :param str marker: markers to use - cycle if more labels then markers are given :param int num_samples: the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples. :param imshow_kwargs: the kwargs for the imshow (magnification factor) :param annotation_kwargs: the kwargs for the annotation plot :param scatter_kwargs: the kwargs for the scatter plots

## GPy.models.bcgplvm module¶

class BCGPLVM(Y, input_dim, kernel=None, mapping=None)[source]

Back constrained Gaussian Process Latent Variable Model

Parameters: Y (np.ndarray) – observed data input_dim (int) – latent dimensionality mapping (GPy.core.Mapping object) – mapping for back constraint
parameters_changed()[source]

## GPy.models.dpgplvm module¶

class DPBayesianGPLVM(Y, input_dim, X_prior, X=None, X_variance=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='bayesian gplvm', mpi_comm=None, normalizer=None, missing_data=False, stochastic=False, batchsize=1)[source]

Bayesian Gaussian Process Latent Variable Model with Descriminative prior

## GPy.models.gp_classification module¶

class GPClassification(X, Y, kernel=None, Y_metadata=None, mean_function=None)[source]

Bases: GPy.core.gp.GP

Gaussian Process classification

This is a thin wrapper around the models.GP class, with a set of sensible defaults

Parameters: X – input observations Y – observed values, can be None if likelihood is not None kernel – a GPy kernel, defaults to rbf

Note

Multiple independent outputs are allowed using columns of Y

static from_dict(input_dict, data=None)[source]
static from_gp(gp)[source]
save_model(output_filename, compress=True, save_data=True)[source]
to_dict(save_data=True)[source]

## GPy.models.gp_coregionalized_regression module¶

class GPCoregionalizedRegression(X_list, Y_list, kernel=None, likelihoods_list=None, name='GPCR', W_rank=1, kernel_name='coreg')[source]

Bases: GPy.core.gp.GP

Gaussian Process model for heteroscedastic multioutput regression

This is a thin wrapper around the models.GP class, with a set of sensible defaults

Parameters: Likelihoods_list: X_list (list of numpy arrays) – list of input observations corresponding to each output Y_list (list of numpy arrays) – list of observed values related to the different noise models kernel (None | GPy.kernel defaults) – a GPy kernel ** Coregionalized, defaults to RBF ** Coregionalized name (string) – model name W_rank (integer) – number tuples of the corregionalization parameters ‘W’ (see coregionalize kernel documentation) kernel_name (string) – name of the kernel a list of likelihoods, defaults to list of Gaussian likelihoods

## GPy.models.gp_grid_regression module¶

class GPRegressionGrid(X, Y, kernel=None, Y_metadata=None, normalizer=None)[source]

Gaussian Process model for grid inputs using Kronecker products

This is a thin wrapper around the models.GpGrid class, with a set of sensible defaults

Parameters: X – input observations Y – observed values kernel – a GPy kernel, defaults to the kron variation of SqExp normalizer (Norm) – [False] Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)

Note

Multiple independent outputs are allowed using columns of Y

## GPy.models.gp_heteroscedastic_regression module¶

class GPHeteroscedasticRegression(X, Y, kernel=None, Y_metadata=None)[source]

Bases: GPy.core.gp.GP

Gaussian Process model for heteroscedastic regression

This is a thin wrapper around the models.GP class, with a set of sensible defaults

Parameters: X – input observations Y – observed values kernel – a GPy kernel, defaults to rbf

NB: This model does not make inference on the noise outside the training set

## GPy.models.gp_kronecker_gaussian_regression module¶

class GPKroneckerGaussianRegression(X1, X2, Y, kern1, kern2, noise_var=1.0, name='KGPR')[source]

Kronecker GP regression

Take two kernels computed on separate spaces K1(X1), K2(X2), and a data matrix Y which is f size (N1, N2).

The effective covaraince is np.kron(K2, K1) The effective data is vec(Y) = Y.flatten(order=’F’)

The noise must be iid Gaussian.

See Stegle et al. @inproceedings{stegle2011efficient,

title={Efficient inference in matrix-variate gaussian models with $backslash$ iid observation noise}, author={Stegle, Oliver and Lippert, Christoph and Mooij, Joris M and Lawrence, Neil D and Borgwardt, Karsten M}, booktitle={Advances in Neural Information Processing Systems}, pages={630–638}, year={2011}

}

log_likelihood()[source]
parameters_changed()[source]
predict(X1new, X2new)[source]

Return the predictive mean and variance at a series of new points X1new, X2new Only returns the diagonal of the predictive variance, for now.

Parameters: X1new (np.ndarray, Nnew x self.input_dim1) – The points at which to make a prediction X2new (np.ndarray, Nnew x self.input_dim2) – The points at which to make a prediction

## GPy.models.gp_multiout_regression module¶

class GPMultioutRegression(X, Y, Xr_dim, kernel=None, kernel_row=None, Z=None, Z_row=None, X_row=None, Xvariance_row=None, num_inducing=(10, 10), qU_var_r_W_dim=None, qU_var_c_W_dim=None, init='GP', name='GPMR')[source]

Gaussian Process model for multi-output regression without missing data

This is an implementation of Latent Variable Multiple Output Gaussian Processes (LVMOGP) in [Dai et al. 2017].

Zhenwen Dai, Mauricio A. Alvarez and Neil D. Lawrence. Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes. In NIPS, 2017.

Parameters: X (numpy.ndarray) – input observations. Y (numpy.ndarray) – output observations, each column corresponding to an output dimension. Xr_dim (int) – the dimensionality of a latent space, in which output dimensions are embedded in kernel (GPy.kern.Kern or None) – a GPy kernel for GP of individual output dimensions ** defaults to RBF ** kernel_row (GPy.kern.Kern or None) – a GPy kernel for the GP of the latent space ** defaults to RBF ** Z (numpy.ndarray or None) – inducing inputs Z_row (numpy.ndarray or None) – inducing inputs for the latent space X_row (numpy.ndarray or None) – the initial value of the mean of the variational posterior distribution of points in the latent space Xvariance_row (numpy.ndarray or None) – the initial value of the variance of the variational posterior distribution of points in the latent space num_inducing ((int, int)) – a tuple (M, Mr). M is the number of inducing points for GP of individual output dimensions. Mr is the number of inducing points for the latent space. qU_var_r_W_dim (int) – the dimensionality of the covariance of q(U) for the latent space. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix. qU_var_c_W_dim (int) – the dimensionality of the covariance of q(U) for the GP regression. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix. init (str) – the choice of initialization: ‘GP’ or ‘rand’. With ‘rand’, the model is initialized randomly. With ‘GP’, the model is initialized through a protocol as follows: (1) fits a sparse GP (2) fits a BGPLVM based on the outcome of sparse GP (3) initialize the model based on the outcome of the BGPLVM. name (str) – the name of the model
optimize_auto(max_iters=10000, verbose=True)[source]

Optimize the model parameters through a pre-defined protocol.

Parameters: max_iters (int) – the maximum number of iterations. verbose (boolean) – print the progress of optimization or not.
parameters_changed()[source]

## GPy.models.gp_multiout_regression_md module¶

class GPMultioutRegressionMD(X, Y, indexD, Xr_dim, kernel=None, kernel_row=None, Z=None, Z_row=None, X_row=None, Xvariance_row=None, num_inducing=(10, 10), qU_var_r_W_dim=None, qU_var_c_W_dim=None, init='GP', heter_noise=False, name='GPMRMD')[source]

Gaussian Process model for multi-output regression with missing data

This is an implementation of Latent Variable Multiple Output Gaussian Processes (LVMOGP) in [Dai et al. 2017]. This model targets at the use case, in which each output dimension is observed at a different set of inputs. The model takes a different data format: the inputs and outputs observations of all the output dimensions are stacked together correspondingly into two matrices. An extra array is used to indicate the index of output dimension for each data point. The output dimensions are indexed using integers from 0 to D-1 assuming there are D output dimensions.

Zhenwen Dai, Mauricio A. Alvarez and Neil D. Lawrence. Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes. In NIPS, 2017.

Parameters: X (numpy.ndarray) – input observations. Y (numpy.ndarray) – output observations, each column corresponding to an output dimension. indexD (numpy.ndarray) – the array containing the index of output dimension for each data point Xr_dim (int) – the dimensionality of a latent space, in which output dimensions are embedded in kernel (GPy.kern.Kern or None) – a GPy kernel for GP of individual output dimensions ** defaults to RBF ** kernel_row (GPy.kern.Kern or None) – a GPy kernel for the GP of the latent space ** defaults to RBF ** Z (numpy.ndarray or None) – inducing inputs Z_row (numpy.ndarray or None) – inducing inputs for the latent space X_row (numpy.ndarray or None) – the initial value of the mean of the variational posterior distribution of points in the latent space Xvariance_row (numpy.ndarray or None) – the initial value of the variance of the variational posterior distribution of points in the latent space num_inducing ((int, int)) – a tuple (M, Mr). M is the number of inducing points for GP of individual output dimensions. Mr is the number of inducing points for the latent space. qU_var_r_W_dim (int) – the dimensionality of the covariance of q(U) for the latent space. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix. qU_var_c_W_dim (int) – the dimensionality of the covariance of q(U) for the GP regression. If it is smaller than the number of inducing points, it represents a low-rank parameterization of the covariance matrix. init (str) – the choice of initialization: ‘GP’ or ‘rand’. With ‘rand’, the model is initialized randomly. With ‘GP’, the model is initialized through a protocol as follows: (1) fits a sparse GP (2) fits a BGPLVM based on the outcome of sparse GP (3) initialize the model based on the outcome of the BGPLVM. heter_noise (boolean) – whether assuming heteroscedastic noise in the model, boolean name (str) – the name of the model
optimize_auto(max_iters=10000, verbose=True)[source]

Optimize the model parameters through a pre-defined protocol.

Parameters: max_iters (int) – the maximum number of iterations. verbose (boolean) – print the progress of optimization or not.
parameters_changed()[source]

## GPy.models.gp_offset_regression module¶

class GPOffsetRegression(X, Y, kernel=None, Y_metadata=None, normalizer=None, noise_var=1.0, mean_function=None)[source]

Bases: GPy.core.gp.GP

Gaussian Process model for offset regression

Parameters: X – input observations, we assume for this class that this has one dimension of actual inputs and the last dimension should be the index of the cluster (so X should be Nx2) Y – observed values (Nx1?) kernel – a GPy kernel, defaults to rbf normalizer (Norm) – [False] noise_var – the noise variance for Gaussian likelhood, defaults to 1. Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)

Note

Multiple independent outputs are allowed using columns of Y

dr_doffset(X, sel, delta)[source]
parameters_changed()[source]

## GPy.models.gp_regression module¶

class GPRegression(X, Y, kernel=None, Y_metadata=None, normalizer=None, noise_var=1.0, mean_function=None)[source]

Bases: GPy.core.gp.GP

Gaussian Process model for regression

This is a thin wrapper around the models.GP class, with a set of sensible defaults

Parameters: X – input observations Y – observed values kernel – a GPy kernel, defaults to rbf normalizer (Norm) – [False] noise_var – the noise variance for Gaussian likelhood, defaults to 1. Normalize Y with the norm given. If normalizer is False, no normalization will be done If it is None, we use GaussianNorm(alization)

Note

Multiple independent outputs are allowed using columns of Y

static from_gp(gp)[source]
save_model(output_filename, compress=True, save_data=True)[source]
to_dict(save_data=True)[source]

## GPy.models.gp_var_gauss module¶

class GPVariationalGaussianApproximation(X, Y, kernel, likelihood, Y_metadata=None)[source]

Bases: GPy.core.gp.GP

The Variational Gaussian Approximation revisited

@article{Opper:2009,
title = {The Variational Gaussian Approximation Revisited}, author = {Opper, Manfred and Archambeau, C{‘e}dric}, journal = {Neural Comput.}, year = {2009}, pages = {786–792},

}

## GPy.models.gplvm module¶

class GPLVM(Y, input_dim, init='PCA', X=None, kernel=None, name='gplvm')[source]

Bases: GPy.core.gp.GP

Gaussian Process Latent Variable Model

Parameters: Y (np.ndarray) – observed data input_dim (int) – latent dimensionality init ('PCA'|'random') – initialisation method for the latent space
parameters_changed()[source]
plot_inducing(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)

Plot a scatter plot of the inducing inputs.

Parameters: which_indices ([int]) – which input dimensions to plot against each other legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot marker (str) – marker to use [default is custom arrow like] kwargs – the kwargs for the scatter plots projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other resolution (int) – the resolution at which we predict the magnification factor legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot updates (bool) – if possible, make interactive updates using the specific library you are using

:param Kern kern: the kernel to use for prediction :param str marker: markers to use - cycle if more labels then markers are given :param int num_samples: the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples. :param imshow_kwargs: the kwargs for the imshow (magnification factor) :param scatter_kwargs: the kwargs for the scatter plots

plot_scatter(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)

Plot a scatter plot of the latent space.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot marker (str) – markers to use - cycle if more labels then markers are given kwargs – the kwargs for the scatter plots
plot_steepest_gradient_map(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other resolution (int) – the resolution at which we predict the magnification factor legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot updates (bool) – if possible, make interactive updates using the specific library you are using

:param Kern kern: the kernel to use for prediction :param str marker: markers to use - cycle if more labels then markers are given :param int num_samples: the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples. :param imshow_kwargs: the kwargs for the imshow (magnification factor) :param annotation_kwargs: the kwargs for the annotation plot :param scatter_kwargs: the kwargs for the scatter plots

class GradientChecker(f, df, x0, names=None, *args, **kwargs)[source]
Parameters: f – Function to check gradient for df – Gradient of function to check x0 ([array-like] | array-like | float | int) – Initial guess for inputs x (if it has a shape (a,b) this will be reflected in the parameter names). Can be a list of arrays, if takes a list of arrays. This list will be passed to f and df in the same order as given here. If only one argument, make sure not to pass a list!!! names – Names to print, when performing gradcheck. If a list was passed to x0 a list of names with the same length is expected. args – Arguments passed as f(x, *args, **kwargs) and df(x, *args, **kwargs)

from GPy.models import GradientChecker N, M, Q = 10, 5, 3

Sinusoid:

Using GPy:

X, Z = numpy.random.randn(N,Q), numpy.random.randn(M,Q) kern = GPy.kern.linear(Q, ARD=True) + GPy.kern.rbf(Q, ARD=True) grad = GradientChecker(kern.K,

lambda x: 2*kern.dK_dX(numpy.ones((1,1)), x), x0 = X.copy(), names=’X’)

log_likelihood()[source]
class HessianChecker(f, df, ddf, x0, names=None, *args, **kwargs)[source]
Parameters: f – Function (only used for numerical hessian gradient) df – Gradient of function to check ddf – Analytical gradient function x0 ([array-like] | array-like | float | int) – Initial guess for inputs x (if it has a shape (a,b) this will be reflected in the parameter names). Can be a list of arrays, if takes a list of arrays. This list will be passed to f and df in the same order as given here. If only one argument, make sure not to pass a list!!! names – Names to print, when performing gradcheck. If a list was passed to x0 a list of names with the same length is expected. args – Arguments passed as f(x, *args, **kwargs) and df(x, *args, **kwargs)
checkgrad(target_param=None, verbose=False, step=1e-06, tolerance=0.001, block_indices=None, plot=False)[source]

Parameters: verbose (bool) – If True, print a “full” checking of each parameter step (float (default 1e-6)) – The size of the step around which to linearise the objective tolerance (float (default 1e-3)) – the tolerance allowed (see note)
Note:-
The gradient is considered correct if the ratio of the analytical and numerical gradients is within <tolerance> of unity.
checkgrad_block(analytic_hess, numeric_hess, verbose=False, step=1e-06, tolerance=0.001, block_indices=None, plot=False)[source]

class SkewChecker(df, ddf, dddf, x0, names=None, *args, **kwargs)[source]
Parameters: df – gradient of function ddf – Gradient of function to check (hessian) dddf – Analytical gradient function (third derivative) x0 ([array-like] | array-like | float | int) – Initial guess for inputs x (if it has a shape (a,b) this will be reflected in the parameter names). Can be a list of arrays, if takes a list of arrays. This list will be passed to f and df in the same order as given here. If only one argument, make sure not to pass a list!!! names – Names to print, when performing gradcheck. If a list was passed to x0 a list of names with the same length is expected. args – Arguments passed as f(x, *args, **kwargs) and df(x, *args, **kwargs)
checkgrad(target_param=None, verbose=False, step=1e-06, tolerance=0.001, block_indices=None, plot=False, super_plot=False)[source]

Gradient checker that just checks each hessian individually

super_plot will plot the hessian wrt every parameter, plot will just do the first one

at_least_one_element(x)[source]
flatten_if_needed(x)[source]
get_shape(x)[source]

## GPy.models.ibp_lfm module¶

class IBPLFM(X, Y, input_dim=2, output_dim=1, rank=1, Gamma=None, num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='IBP for LFM', alpha=2.0, beta=2.0, connM=None, tau=None, mpi_comm=None, normalizer=False, variational_prior=None, **kwargs)[source]

Indian Buffet Process for Latent Force Models

Parameters: Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood X (np.ndarray) – input data (np.ndarray) [X:values, X:index], index refers to the number of the output input_dim (int) – latent dimensionality

: param rank: number of latent functions

get_Zp_gradients(Zp)[source]

Get the gradients of the posterior distribution of Zp in its specific form.

parameters_changed()[source]
set_Zp_gradients(Zp, Zp_grad)[source]

Set the gradients of the posterior distribution of Zp in its specific form.

class IBPPosterior(binary_prob, tau=None, name='Sensitivity space', *a, **kw)[source]

The IBP distribution for variational approximations.

binary_prob : the probability of including a latent function over an output.

set_gradients(grad)[source]
class IBPPrior(rank, alpha=2.0, name='IBPPrior', **kw)[source]
KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]
class VarDTC_minibatch_IBPLFM(batchsize=None, limit=3, mpi_comm=None)[source]

Modifications of VarDTC_minibatch for IBP LFM

gatherPsiStat(kern, X, Z, Y, beta, Zp)[source]
inference_likelihood(kern, X, Z, likelihood, Y, Zp)[source]

The first phase of inference: Compute: log-likelihood, dL_dKmm

Cached intermediate results: Kmm, KmmInv,

inference_minibatch(kern, X, Z, likelihood, Y, Zp)[source]

The second phase of inference: Computing the derivatives over a minibatch of Y Compute: dL_dpsi0, dL_dpsi1, dL_dpsi2, dL_dthetaL return a flag showing whether it reached the end of Y (isEnd)

update_gradients(model, mpi_comm=None)[source]

## GPy.models.input_warped_gp module¶

class InputWarpedGP(X, Y, kernel=None, normalizer=False, warping_function=None, warping_indices=None, Xmin=None, Xmax=None, epsilon=None)[source]

Bases: GPy.core.gp.GP

Input Warped GP

This defines a GP model that applies a warping function to the Input. By default, it uses Kumar Warping (CDF of Kumaraswamy distribution)

X : array_like, shape = (n_samples, n_features) for input data

Y : array_like, shape = (n_samples, 1) for output data

kernel : object, optional
An instance of kernel function defined in GPy.kern Default to Matern 32
warping_function : object, optional
An instance of warping function defined in GPy.util.input_warping_functions Default to KumarWarping
warping_indices : list of int, optional
An list of indices of which features in X should be warped. It is used in the Kumar warping function
normalizer : bool, optional
A bool variable indicates whether to normalize the output
Xmin : list of float, optional
The min values for every feature in X It is used in the Kumar warping function
Xmax : list of float, optional
The max values for every feature in X It is used in the Kumar warping function
epsilon : float, optional
We normalize X to [0+e, 1-e]. If not given, using the default value defined in KumarWarping function
X_untransformed : array_like, shape = (n_samples, n_features)
A copy of original input X
X_warped : array_like, shape = (n_samples, n_features)
Input data after warping
warping_function : object, optional
An instance of warping function defined in GPy.util.input_warping_functions Default to KumarWarping

Kumar warping uses the CDF of Kumaraswamy distribution. More on the Kumaraswamy distribution can be found at the wiki page: https://en.wikipedia.org/wiki/Kumaraswamy_distribution

Snoek, J.; Swersky, K.; Zemel, R. S. & Adams, R. P. Input Warping for Bayesian Optimization of Non-stationary Functions preprint arXiv:1402.0929, 2014

log_likelihood()[source]

Compute the marginal log likelihood

For input warping, just use the normal GP log likelihood

parameters_changed()[source]

Update the gradients of parameters for warping function

This method is called when having new values of parameters for warping function, kernels and other parameters in a normal GP

predict(Xnew)[source]

Prediction on the new data

Xnew : array_like, shape = (n_samples, n_features)
The test data.
mean : array_like, shape = (n_samples, output.dim)
Posterior mean at the location of Xnew
var : array_like, shape = (n_samples, 1)
Posterior variance at the location of Xnew
transform_data(X, test_data=False)[source]

Apply warping_function to some Input data

X : array_like, shape = (n_samples, n_features)

test_data: bool, optional
Default to False, should set to True when transforming test data

## GPy.models.mrd module¶

class MRD(Ylist, input_dim, X=None, X_variance=None, initx='PCA', initz='permute', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihoods=None, name='mrd', Ynames=None, normalizer=False, stochastic=False, batchsize=10)[source]

!WARNING: This is bleeding edge code and still in development. Functionality may change fundamentally during development!

Apply MRD to all given datasets Y in Ylist.

Y_i in [n x p_i]

If Ylist is a dictionary, the keys of the dictionary are the names, and the values are the different datasets to compare.

The samples n in the datasets need to match up, whereas the dimensionality p_d can differ.

Parameters: Ylist ([array-like]) – List of datasets to apply MRD on input_dim (int) – latent dimensionality X (array-like) – mean of starting latent space q in [n x q] X_variance (array-like) – variance of starting latent space q in [n x q] initx (['concat'|'single'|'random']) – initialisation method for the latent space : ’concat’ - PCA on concatenation of all datasets ’single’ - Concatenation of PCA on datasets, respectively ’random’ - Random draw from a Normal(0,1) initz ('permute'|'random') – initialisation method for inducing inputs num_inducing – number of inducing inputs to use Z – initial inducing inputs kernel ([GPy.kernels.kernels] | GPy.kernels.kernels | None (default)) – list of kernels or kernel to copy for each output
:param :class:~GPy.inference.latent_function_inference inference_method:
InferenceMethodList of inferences, or one inference method for all

:param likelihoods likelihoods: the likelihoods to use :param str name: the name of this model :param [str] Ynames: the names for the datasets given, must be of equal length as Ylist or None :param bool|Norm normalizer: How to normalize the data? :param bool stochastic: Should this model be using stochastic gradient descent over the dimensions? :param bool|[bool] batchsize: either one batchsize for all, or one batchsize per dataset.

factorize_space(threshold=0.005, printOut=False, views=None)[source]

Given a trained MRD model, this function looks at the optimized ARD weights (lengthscales) and decides which part of the latent space is shared across views or private, according to a threshold. The threshold is applied after all weights are normalized so that the maximum value is 1.

log_likelihood()[source]
parameters_changed()[source]
plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', predict_kwargs={}, scatter_kwargs=None, **imshow_kwargs)[source]

see plotting.matplot_dep.dim_reduction_plots.plot_latent if predict_kwargs is None, will plot latent spaces for 0th dataset (and kernel), otherwise give predict_kwargs=dict(Yindex=’index’) for plotting only the latent space of dataset with ‘index’.

plot_scales(titles=None, fig_kwargs={}, **kwargs)[source]

Plot input sensitivity for all datasets, to see which input dimensions are significant for which dataset.

Parameters: titles – titles for axes of datasets

kwargs go into plot_ARD for each kernel.

predict(Xnew, full_cov=False, Y_metadata=None, kern=None, Yindex=0)[source]

Prediction for data set Yindex[default=0]. This predicts the output mean and variance for the dataset given in Ylist[Yindex]

## GPy.models.one_vs_all_classification module¶

class OneVsAllClassification(X, Y, kernel=None, Y_metadata=None, messages=True)[source]

Bases: object

Gaussian Process classification: One vs all

This is a thin wrapper around the models.GPClassification class, with a set of sensible defaults

Parameters: X – input observations Y – observed values, can be None if likelihood is not None kernel – a GPy kernel, defaults to rbf

Note

Multiple independent outputs are not allowed

## GPy.models.one_vs_all_sparse_classification module¶

class OneVsAllSparseClassification(X, Y, kernel=None, Y_metadata=None, messages=True, num_inducing=10)[source]

Bases: object

Gaussian Process classification: One vs all

This is a thin wrapper around the models.GPClassification class, with a set of sensible defaults

Parameters: X – input observations Y – observed values, can be None if likelihood is not None kernel – a GPy kernel, defaults to rbf

Note

Multiple independent outputs are not allowed

## GPy.models.sparse_gp_classification module¶

class SparseGPClassification(X, Y=None, likelihood=None, kernel=None, Z=None, num_inducing=10, Y_metadata=None)[source]

Sparse Gaussian Process model for classification

This is a thin wrapper around the sparse_GP class, with a set of sensible defaults

Parameters: X – input observations Y – observed values likelihood – a GPy likelihood, defaults to Binomial with probit link_function kernel – a GPy kernel, defaults to rbf+white normalize_X (False|True) – whether to normalize the input data before computing (predictions will be in original scales) normalize_Y (False|True) – whether to normalize the input data before computing (predictions will be in original scales) model object
class SparseGPClassificationUncertainInput(X, X_variance, Y, kernel=None, Z=None, num_inducing=10, Y_metadata=None, normalizer=None)[source]

Sparse Gaussian Process model for classification with uncertain inputs.

This is a thin wrapper around the sparse_GP class, with a set of sensible defaults

Parameters: X (np.ndarray (num_data x input_dim)) – input observations X_variance (np.ndarray (num_data x input_dim)) – The uncertainty in the measurements of X (Gaussian variance, optional) Y – observed values kernel – a GPy kernel, defaults to rbf+white Z (np.ndarray (num_inducing x input_dim) | None) – inducing inputs (optional, see note) num_inducing (int) – number of inducing points (ignored if Z is passed, see note) model object

Note

If no Z array is passed, num_inducing (default 10) points are selected from the data. Other wise num_inducing is ignored

Note

Multiple independent outputs are allowed using columns of Y

parameters_changed()[source]

## GPy.models.sparse_gp_coregionalized_regression module¶

class SparseGPCoregionalizedRegression(X_list, Y_list, Z_list=[], kernel=None, likelihoods_list=None, num_inducing=10, X_variance=None, name='SGPCR', W_rank=1, kernel_name='coreg')[source]

Sparse Gaussian Process model for heteroscedastic multioutput regression

This is a thin wrapper around the SparseGP class, with a set of sensible defaults

Parameters: Likelihoods_list: X_list (list of numpy arrays) – list of input observations corresponding to each output Y_list (list of numpy arrays) – list of observed values related to the different noise models Z_list (empty list | list of numpy arrays) – list of inducing inputs (optional) kernel (None | GPy.kernel defaults) – a GPy kernel ** Coregionalized, defaults to RBF ** Coregionalized num_inducing (integer | list of integers) – number of inducing inputs, defaults to 10 per output (ignored if Z_list is not empty) name (string) – model name W_rank (integer) – number tuples of the corregionalization parameters ‘W’ (see coregionalize kernel documentation) kernel_name (string) – name of the kernel a list of likelihoods, defaults to list of Gaussian likelihoods

## GPy.models.sparse_gp_minibatch module¶

class SparseGPMiniBatch(X, Y, Z, kernel, likelihood, inference_method=None, name='sparse gp', Y_metadata=None, normalizer=False, missing_data=False, stochastic=False, batchsize=1)[source]

A general purpose Sparse GP model, allowing missing data and stochastics across dimensions.

This model allows (approximate) inference using variational DTC or FITC (Gaussian likelihoods) as well as non-conjugate sparse methods based on these.

Parameters: X (np.ndarray (num_data x input_dim)) – inputs likelihood (GPy.likelihood.(Gaussian | EP | Laplace)) – a likelihood instance, containing the observed data kernel (a GPy.kern.kern instance) – the kernel (covariance function). See link kernels X_variance (np.ndarray (num_data x input_dim) | None) – The uncertainty in the measurements of X (Gaussian variance) Z (np.ndarray (num_inducing x input_dim)) – inducing inputs num_inducing (int) – Number of inducing points (optional, default 10. Ignored if Z is not None)
has_uncertain_inputs()[source]
optimize(optimizer=None, start=None, **kwargs)[source]
parameters_changed()[source]

## GPy.models.sparse_gp_regression module¶

class SparseGPRegression(X, Y, kernel=None, Z=None, num_inducing=10, X_variance=None, mean_function=None, normalizer=None, mpi_comm=None, name='sparse_gp')[source]

Gaussian Process model for regression

This is a thin wrapper around the SparseGP class, with a set of sensible defalts

Parameters: X – input observations X_variance – input uncertainties, one per input X Y – observed values kernel – a GPy kernel, defaults to rbf+white Z (np.ndarray (num_inducing x input_dim) | None) – inducing inputs (optional, see note) num_inducing (int) – number of inducing points (ignored if Z is passed, see note) model object

Note

If no Z array is passed, num_inducing (default 10) points are selected from the data. Other wise num_inducing is ignored

Note

Multiple independent outputs are allowed using columns of Y

parameters_changed()[source]

## GPy.models.sparse_gp_regression_md module¶

class SparseGPRegressionMD(X, Y, indexD, kernel=None, Z=None, num_inducing=10, normalizer=None, mpi_comm=None, individual_Y_noise=False, name='sparse_gp')[source]

Sparse Gaussian Process Regression with Missing Data

This model targets at the use case, in which there are multiple output dimensions (different dimensions are assumed to be independent following the same GP prior) and each output dimension is observed at a different set of inputs. The model takes a different data format: the inputs and outputs observations of all the output dimensions are stacked together correspondingly into two matrices. An extra array is used to indicate the index of output dimension for each data point. The output dimensions are indexed using integers from 0 to D-1 assuming there are D output dimensions.

Parameters: X (numpy.ndarray) – input observations. Y (numpy.ndarray) – output observations, each column corresponding to an output dimension. indexD (numpy.ndarray) – the array containing the index of output dimension for each data point kernel (GPy.kern.Kern or None) – a GPy kernel for GP of individual output dimensions ** defaults to RBF ** Z (numpy.ndarray or None) – inducing inputs num_inducing ((int, int)) – a tuple (M, Mr). M is the number of inducing points for GP of individual output dimensions. Mr is the number of inducing points for the latent space. individual_Y_noise (boolean) – whether individual output dimensions have their own noise variance or not, boolean name (str) – the name of the model
parameters_changed()[source]

## GPy.models.sparse_gplvm module¶

class SparseGPLVM(Y, input_dim, X=None, kernel=None, init='PCA', num_inducing=10)[source]

Sparse Gaussian Process Latent Variable Model

Parameters: Y (np.ndarray) – observed data input_dim (int) – latent dimensionality init ('PCA'|'random') – initialisation method for the latent space
parameters_changed()[source]
plot_latent(labels=None, which_indices=None, resolution=50, ax=None, marker='o', s=40, fignum=None, plot_inducing=True, legend=True, plot_limits=None, aspect='auto', updates=False, predict_kwargs={}, imshow_kwargs={})[source]

## GPy.models.ss_gplvm module¶

class IBPPosterior(means, variances, binary_prob, tau=None, sharedX=False, name='latent space')[source]

The SpikeAndSlab distribution for variational approximations.

binary_prob : the probability of the distribution on the slab part.

set_gradients(grad)[source]
class IBPPrior(input_dim, alpha=2.0, name='IBPPrior', **kw)[source]
KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]
class SLVMPosterior(means, variances, binary_prob, tau=None, name='latent space')[source]

The SpikeAndSlab distribution for variational approximations.

binary_prob : the probability of the distribution on the slab part.

set_gradients(grad)[source]
class SLVMPrior(input_dim, alpha=1.0, beta=1.0, Z=None, name='SLVMPrior', **kw)[source]
KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]
class SSGPLVM(Y, input_dim, X=None, X_variance=None, Gamma=None, init='PCA', num_inducing=10, Z=None, kernel=None, inference_method=None, likelihood=None, name='Spike_and_Slab GPLVM', group_spike=False, IBP=False, SLVM=False, alpha=2.0, beta=2.0, connM=None, tau=None, mpi_comm=None, pi=None, learnPi=False, normalizer=False, sharedX=False, variational_prior=None, **kwargs)[source]

Spike-and-Slab Gaussian Process Latent Variable Model

Parameters: Y (np.ndarray| GPy.likelihood instance) – observed data (np.ndarray) or GPy.likelihood input_dim (int) – latent dimensionality init ('PCA'|'random') – initialisation method for the latent space
get_X_gradients(X)[source]

Get the gradients of the posterior distribution of X in its specific form.

input_sensitivity()[source]
parameters_changed()[source]
plot_inducing(which_indices=None, legend=False, plot_limits=None, marker=None, projection='2d', **kwargs)

Plot a scatter plot of the inducing inputs.

Parameters: which_indices ([int]) – which input dimensions to plot against each other legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot marker (str) – marker to use [default is custom arrow like] kwargs – the kwargs for the scatter plots projection (str) – for now 2d or 3d projection (other projections can be implemented, see developer documentation)
plot_latent(labels=None, which_indices=None, resolution=60, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, projection='2d', scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other resolution (int) – the resolution at which we predict the magnification factor legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot updates (bool) – if possible, make interactive updates using the specific library you are using

:param Kern kern: the kernel to use for prediction :param str marker: markers to use - cycle if more labels then markers are given :param int num_samples: the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples. :param imshow_kwargs: the kwargs for the imshow (magnification factor) :param scatter_kwargs: the kwargs for the scatter plots

plot_scatter(labels=None, which_indices=None, legend=True, plot_limits=None, marker='<>^vsd', num_samples=1000, projection='2d', **kwargs)

Plot a scatter plot of the latent space.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other legend (bool) – whether to plot the legend on the figure plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot marker (str) – markers to use - cycle if more labels then markers are given kwargs – the kwargs for the scatter plots
plot_steepest_gradient_map(output_labels=None, data_labels=None, which_indices=None, resolution=15, legend=True, plot_limits=None, updates=False, kern=None, marker='<>^vsd', num_samples=1000, annotation_kwargs=None, scatter_kwargs=None, **imshow_kwargs)

Plot the latent space of the GP on the inputs. This is the density of the GP posterior as a grey scale and the scatter plot of the input dimemsions selected by which_indices.

Parameters: labels (array-like) – a label for each data point (row) of the inputs int) which_indices ((int,) – which input dimensions to plot against each other resolution (int) – the resolution at which we predict the magnification factor legend (bool) – whether to plot the legend on the figure, if int plot legend columns on legend plot_limits ((xmin, xmax, ymin, ymax) or ((xmin, xmax), (ymin, ymax))) – the plot limits for the plot updates (bool) – if possible, make interactive updates using the specific library you are using

:param Kern kern: the kernel to use for prediction :param str marker: markers to use - cycle if more labels then markers are given :param int num_samples: the number of samples to plot maximally. We do a stratified subsample from the labels, if the number of samples (in X) is higher then num_samples. :param imshow_kwargs: the kwargs for the imshow (magnification factor) :param annotation_kwargs: the kwargs for the annotation plot :param scatter_kwargs: the kwargs for the scatter plots

sample_W(nSamples, raw_samples=False)[source]

set_X_gradients(X, X_grad)[source]

Set the gradients of the posterior distribution of X in its specific form.

## GPy.models.ss_mrd module¶

The Maniforld Relevance Determination model with the spike-and-slab prior

class IBPPrior_SSMRD(nModels, input_dim, alpha=2.0, tau=None, name='IBPPrior', **kw)[source]
KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]
class SSMRD(Ylist, input_dim, X=None, X_variance=None, Gammas=None, initx='PCA_concat', initz='permute', num_inducing=10, Zs=None, kernels=None, inference_methods=None, likelihoods=None, group_spike=True, pi=0.5, name='ss_mrd', Ynames=None, mpi_comm=None, IBP=False, alpha=2.0, taus=None)[source]
log_likelihood()[source]
optimize(optimizer=None, start=None, **kwargs)[source]
parameters_changed()[source]
optimizer_array

Array for the optimizer to work on. This array always lives in the space for the optimizer. Thus, it is untransformed, going from Transformations.

Setting this array, will make sure the transformed parameters for this model will be set accordingly. It has to be set with an array, retrieved from this method, as e.g. fixing will resize the array.

The optimizer should only interfere with this array, such that transformations are secured.

class SpikeAndSlabPrior_SSMRD(nModels, pi=0.5, learnPi=False, group_spike=True, variance=1.0, name='SSMRDPrior', **kw)[source]
KL_divergence(variational_posterior)[source]
update_gradients_KL(variational_posterior)[source]

## GPy.models.state_space_main module¶

Main functionality for state-space inference.

class AddMethodToClass(func=None, tp='staticmethod')[source]

Bases: object

func: function to add tp: string Type of the method: normal, staticmethod, classmethod

class ContDescrStateSpace[source]

Class for continuous-discrete Kalman filter. State equation is continuous while measurement equation is discrete.

d x(t)/ dt = F x(t) + L q; where q~ N(0, Qc) y_{t_k} = H_{k} x_{t_k} + r_{k}; r_{k-1} ~ N(0, R_{k})
class AQcompute_batch_Python(F, L, Qc, dt, compute_derivatives=False, grad_params_no=None, P_inf=None, dP_inf=None, dF=None, dQc=None)[source]

Class for calculating matrices A, Q, dA, dQ of the discrete Kalman Filter from the matrices F, L, Qc, P_ing, dF, dQc, dP_inf of the continuos state equation. dt - time steps.

It has the same interface as AQcompute_once.

It computes matrices for all time steps. This object is used when there are not so many (controlled by internal variable) different time steps and storing all the matrices do not take too much memory.

Since all the matrices are computed all together, this object can be used in smoother without repeating the computations.

Constructor. All necessary parameters are passed here and stored in the opject.

F, L, Qc, P_inf : matrices
Parameters of corresponding continuous state model
dt: array
All time steps
compute_derivatives: bool
Whether to calculate derivatives
dP_inf, dF, dQc: 3D array
Derivatives if they are required

Nothing

Ak(k, m, P)[source]
Q_srk(k)[source]

Square root of the noise matrix Q

Qk(k)[source]
dAk(k)[source]
dQk(k)[source]
f_a(k, m, A)[source]

Dynamic model

reset(compute_derivatives=False)[source]

For reusing this object e.g. in smoother computation. It makes sence because necessary matrices have been already computed for all time steps.

return_last()[source]

Function returns last available matrices.

class AQcompute_once(F, L, Qc, dt, compute_derivatives=False, grad_params_no=None, P_inf=None, dP_inf=None, dF=None, dQc=None)[source]

Class for calculating matrices A, Q, dA, dQ of the discrete Kalman Filter from the matrices F, L, Qc, P_ing, dF, dQc, dP_inf of the continuos state equation. dt - time steps.

It has the same interface as AQcompute_batch.

It computes matrices for only one time step. This object is used when there are many different time steps and storing matrices for each of them would take too much memory.

Constructor. All necessary parameters are passed here and stored in the opject.

F, L, Qc, P_inf : matrices
Parameters of corresponding continuous state model
dt: array
All time steps
compute_derivatives: bool
Whether to calculate derivatives
dP_inf, dF, dQc: 3D array
Derivatives if they are required

Nothing

Ak(k, m, P)[source]
Q_srk(k)[source]

Square root of the noise matrix Q

Qk(k)[source]
dAk(k)[source]
dQk(k)[source]
f_a(k, m, A)[source]

Dynamic model

reset(compute_derivatives)[source]

For reusing this object e.g. in smoother computation. Actually, this object can not be reused because it computes the matrices on every iteration. But this method is written for keeping the same interface with the class AQcompute_batch.

return_last()[source]

Function returns last computed matrices.

classmethod cont_discr_kalman_filter(F, L, Qc, p_H, p_R, P_inf, X, Y, index=None, m_init=None, P_init=None, p_kalman_filter_type='regular', calc_log_likelihood=False, calc_grad_log_likelihood=False, grad_params_no=0, grad_calc_params=None)[source]

This function implements the continuous-discrete Kalman Filter algorithm These notations for the State-Space model are assumed:

d/dt x(t) = F * x(t) + L * w(t); w(t) ~ N(0, Qc) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Returns estimated filter distributions x_{k} ~ N(m_{k}, P(k))

1) The function generaly do not modify the passed parameters. If it happens then it is an error. There are several exeprions: scalars can be modified into a matrix, in some rare cases shapes of the derivatives matrices may be changed, it is ignored for now.

2) Copies of F,L,Qc are created in memory because they may be used later in smoother. References to copies are kept in “AQcomp” object return parameter.

3) Function support “multiple time series mode” which means that exactly the same State-Space model is used to filter several sets of measurements. In this case third dimension of Y should include these state-space measurements Log_likelihood and Grad_log_likelihood have the corresponding dimensions then.

4) Calculation of Grad_log_likelihood is not supported if matrices H, or R changes overf time (with index k). (later may be changed)

5) Measurement may include missing values. In this case update step is not done for this measurement. (later may be changed)

F: (state_dim, state_dim) matrix
F in the model.
L: (state_dim, noise_dim) matrix
L in the model.
Qc: (noise_dim, noise_dim) matrix
Q_c in the model.
p_H: scalar, matrix (measurement_dim, state_dim) , 3D array
H_{k} in the model. If matrix then H_{k} = H - constant. If it is 3D array then H_{k} = p_Q[:,:, index[2,k]]
p_R: scalar, square symmetric matrix, 3D array
R_{k} in the model. If matrix then R_{k} = R - constant. If it is 3D array then R_{k} = p_R[:,:, index[3,k]]
P_inf: (state_dim, state_dim) matrix
State varince matrix on infinity.
X: 1D array
Time points of measurements. Needed for converting continuos problem to the discrete one.
Y: matrix or vector or 3D array
Data. If Y is matrix then samples are along 0-th dimension and features along the 1-st. If 3D array then third dimension correspond to “multiple time series mode”.
index: vector
Which indices (on 3-rd dimension) from arrays p_H, p_R to use on every time step. If this parameter is None then it is assumed that p_H, p_R do not change over time and indices are not needed. index[0,:] - correspond to H, index[1,:] - correspond to R If index.shape[0] == 1, it is assumed that indides for all matrices are the same.
m_init: vector or matrix
Initial distribution mean. If None it is assumed to be zero. For “multiple time series mode” it is matrix, second dimension of which correspond to different time series. In regular case (“one time series mode”) it is a vector.
P_init: square symmetric matrix or scalar
Initial covariance of the states. If the parameter is scalar then it is assumed that initial covariance matrix is unit matrix multiplied by this scalar. If None the unit matrix is used instead. “multiple time series mode” does not affect it, since it does not affect anything related to state variaces.
p_kalman_filter_type: string, one of (‘regular’, ‘svd’)
Which Kalman Filter is used. Regular or SVD. SVD is more numerically stable, in particular, Covariace matrices are guarantied to be positive semi-definite. However, ‘svd’ works slower, especially for small data due to SVD call overhead.
calc_log_likelihood: boolean
Whether to calculate marginal likelihood of the state-space model.
Whether to calculate gradient of the marginal likelihood of the state-space model. If true then “grad_calc_params” parameter must provide the extra parameters for gradient calculation.
If previous parameter is true, then this parameters gives the total number of parameters in the gradient.
Dictionary with derivatives of model matrices with respect to parameters “dF”, “dL”, “dQc”, “dH”, “dR”, “dm_init”, “dP_init”. They can be None, in this case zero matrices (no dependence on parameters) is assumed. If there is only one parameter then third dimension is automatically added.
M: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
Filter estimates of the state means. In the extra step the initial value is included. In the “multiple time series mode” third dimension correspond to different timeseries.
P: (no_steps+1, state_dim, state_dim) 3D array
Filter estimates of the state covariances. In the extra step the initial value is included.

log_likelihood: double or (1, time_series_no) 3D array.

If the parameter calc_log_likelihood was set to true, return logarithm of marginal likelihood of the state-space model. If the parameter was false, return None. In the “multiple time series mode” it is a vector providing log_likelihood for each time series.
If calc_grad_log_likelihood is true, return gradient of log likelihood with respect to parameters. It returns it column wise, so in “multiple time series mode” gradients for each time series is in the corresponding column.
AQcomp: object
Contains some pre-computed values for converting continuos model into discrete one. It can be used later in the smoothing pahse.
classmethod cont_discr_rts_smoother(state_dim, filter_means, filter_covars, p_dynamic_callables=None, X=None, F=None, L=None, Qc=None)[source]

Continuos-discrete Rauch–Tung–Striebel(RTS) smoother.

This function implements Rauch–Tung–Striebel(RTS) smoother algorithm based on the results of _cont_discr_kalman_filter_raw.

Model:
d/dt x(t) = F * x(t) + L * w(t); w(t) ~ N(0, Qc) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Returns estimated smoother distributions x_{k} ~ N(m_{k}, P(k))

filter_means: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
Results of the Kalman Filter means estimation.
filter_covars: (no_steps+1, state_dim, state_dim) 3D array
Results of the Kalman Filter covariance estimation.
Dynamic_callables: object or None
Object form the filter phase which provides functions for computing A, Q, dA, dQ fro discrete model from the continuos model.
X, F, L, Qc: matrices
If AQcomp is None, these matrices are used to create this object from scratch.
M: (no_steps+1,state_dim) matrix
Smoothed estimates of the state means
P: (no_steps+1,state_dim, state_dim) 3D array
Smoothed estimates of the state covariances
static lti_sde_to_descrete(F, L, Qc, dt, compute_derivatives=False, grad_params_no=None, P_inf=None, dP_inf=None, dF=None, dQc=None)[source]

Linear Time-Invariant Stochastic Differential Equation (LTI SDE):

dx(t) = F x(t) dt + L d eta ,where

x(t): (vector) stochastic process eta: (vector) Brownian motion process F, L: (time invariant) matrices of corresponding dimensions Qc: covariance of noise.

This function rewrites it into the corresponding state-space form:

x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1})

F,L: LTI SDE matrices of corresponding dimensions

Qc: matrix (n,n)
Covarince between different dimensions of noise eta. n is the dimensionality of the noise.
dt: double or iterable
Time difference used on this iteration. If dt is iterable, then A and Q_noise are computed for every unique dt
compute_derivatives: boolean
Whether derivatives of A and Q are required.

P_inf: (state_dim. state_dim) matrix

dP_inf

dF: 3D array
Derivatives of F
dQc: 3D array
Derivatives of Qc
dR: 3D array
Derivatives of R
A: matrix
A_{k}. Because we have LTI SDE only dt can affect on matrix difference for different k.
Q_noise: matrix
Covariance matrix of (vector) q_{k-1}. Only dt can affect the matrix difference for different k.
reconstruct_index: array
If dt was iterable return three dimensinal arrays A and Q_noise. Third dimension of these arrays correspond to unique dt’s. This reconstruct_index contain indices of the original dt’s in the uninue dt sequence. A[:,:, reconstruct_index[5]] is matrix A of 6-th(indices start from zero) dt in the original sequence.
dA: 3D array
Derivatives of A
dQ: 3D array
Derivatives of Q
class DescreteStateSpace[source]

Bases: object

This class implents state-space inference for linear and non-linear state-space models. Linear models are: x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Nonlinear: x_{k} = f_a(k, x_{k-1}, A_{k}) + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = f_h(k, x_{k}, H_{k}) + r_{k}; r_{k-1} ~ N(0, R_{k}) Here f_a and f_h are some functions of k (iteration number), x_{k-1} or x_{k} (state value on certain iteration), A_{k} and H_{k} - Jacobian matrices of f_a and f_h respectively. In the linear case they are exactly A_{k} and H_{k}.

Currently two nonlinear Gaussian filter algorithms are implemented: Extended Kalman Filter (EKF), Statistically linearized Filter (SLF), which implementations are very similar.

classmethod extended_kalman_filter(p_state_dim, p_a, p_f_A, p_f_Q, p_h, p_f_H, p_f_R, Y, m_init=None, P_init=None, calc_log_likelihood=False)[source]

Extended Kalman Filter

p_state_dim: integer

p_a: if None - the function from the linear model is assumed. No non-

linearity in the dynamic is assumed.

function (k, x_{k-1}, A_{k}). Dynamic function. k: (iteration number), x_{k-1}: (previous state) x_{k}: Jacobian matrices of f_a. In the linear case it is exactly A_{k}.

p_f_A: matrix - in this case function which returns this matrix is assumed.

Look at this parameter description in kalman_filter function.

function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.

k: (iteration number), m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.

p_f_Q: matrix. In this case function which returns this matrix is asumed.

Look at this parameter description in kalman_filter function.

function (k). Returns noise matrix of dynamic model on iteration k. k: (iteration number).

p_h: if None - the function from the linear measurement model is assumed.

No nonlinearity in the measurement is assumed.

function (k, x_{k}, H_{k}). Measurement function. k: (iteration number), x_{k}: (current state) H_{k}: Jacobian matrices of f_h. In the linear case it is exactly H_{k}.

p_f_H: matrix - in this case function which returns this matrix is assumed.
function (k, m, P) return Jacobian of dynamic function, it is passed into p_h. k: (iteration number), m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
p_f_R: matrix. In this case function which returns this matrix is asumed.
function (k). Returns noise matrix of measurement equation on iteration k. k: (iteration number).
Y: matrix or vector
Data. If Y is matrix then samples are along 0-th dimension and features along the 1-st. May have missing values.
p_mean: vector
Initial distribution mean. If None it is assumed to be zero
P_init: square symmetric matrix or scalar
Initial covariance of the states. If the parameter is scalar then it is assumed that initial covariance matrix is unit matrix multiplied by this scalar. If None the unit matrix is used instead.
calc_log_likelihood: boolean
Whether to calculate marginal likelihood of the state-space model.
classmethod kalman_filter(p_A, p_Q, p_H, p_R, Y, index=None, m_init=None, P_init=None, p_kalman_filter_type='regular', calc_log_likelihood=False, calc_grad_log_likelihood=False, grad_params_no=None, grad_calc_params=None)[source]

This function implements the basic Kalman Filter algorithm These notations for the State-Space model are assumed:

x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Returns estimated filter distributions x_{k} ~ N(m_{k}, P(k))

1) The function generaly do not modify the passed parameters. If it happens then it is an error. There are several exeprions: scalars can be modified into a matrix, in some rare cases shapes of the derivatives matrices may be changed, it is ignored for now.

2) Copies of p_A, p_Q, index are created in memory to be used later in smoother. References to copies are kept in “matrs_for_smoother” return parameter.

3) Function support “multiple time series mode” which means that exactly the same State-Space model is used to filter several sets of measurements. In this case third dimension of Y should include these state-space measurements Log_likelihood and Grad_log_likelihood have the corresponding dimensions then.

4) Calculation of Grad_log_likelihood is not supported if matrices A,Q, H, or R changes over time. (later may be changed)

5) Measurement may include missing values. In this case update step is not done for this measurement. (later may be changed)

p_A: scalar, square matrix, 3D array
A_{k} in the model. If matrix then A_{k} = A - constant. If it is 3D array then A_{k} = p_A[:,:, index[0,k]]
p_Q: scalar, square symmetric matrix, 3D array
Q_{k-1} in the model. If matrix then Q_{k-1} = Q - constant. If it is 3D array then Q_{k-1} = p_Q[:,:, index[1,k]]
p_H: scalar, matrix (measurement_dim, state_dim) , 3D array
H_{k} in the model. If matrix then H_{k} = H - constant. If it is 3D array then H_{k} = p_Q[:,:, index[2,k]]
p_R: scalar, square symmetric matrix, 3D array
R_{k} in the model. If matrix then R_{k} = R - constant. If it is 3D array then R_{k} = p_R[:,:, index[3,k]]
Y: matrix or vector or 3D array
Data. If Y is matrix then samples are along 0-th dimension and features along the 1-st. If 3D array then third dimension correspond to “multiple time series mode”.
index: vector
Which indices (on 3-rd dimension) from arrays p_A, p_Q,p_H, p_R to use on every time step. If this parameter is None then it is assumed that p_A, p_Q, p_H, p_R do not change over time and indices are not needed. index[0,:] - correspond to A, index[1,:] - correspond to Q index[2,:] - correspond to H, index[3,:] - correspond to R. If index.shape[0] == 1, it is assumed that indides for all matrices are the same.
m_init: vector or matrix
Initial distribution mean. If None it is assumed to be zero. For “multiple time series mode” it is matrix, second dimension of which correspond to different time series. In regular case (“one time series mode”) it is a vector.
P_init: square symmetric matrix or scalar
Initial covariance of the states. If the parameter is scalar then it is assumed that initial covariance matrix is unit matrix multiplied by this scalar. If None the unit matrix is used instead. “multiple time series mode” does not affect it, since it does not affect anything related to state variaces.
calc_log_likelihood: boolean
Whether to calculate marginal likelihood of the state-space model.
Whether to calculate gradient of the marginal likelihood of the state-space model. If true then “grad_calc_params” parameter must provide the extra parameters for gradient calculation.
If previous parameter is true, then this parameters gives the total number of parameters in the gradient.
Dictionary with derivatives of model matrices with respect to parameters “dA”, “dQ”, “dH”, “dR”, “dm_init”, “dP_init”. They can be None, in this case zero matrices (no dependence on parameters) is assumed. If there is only one parameter then third dimension is automatically added.
M: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
Filter estimates of the state means. In the extra step the initial value is included. In the “multiple time series mode” third dimension correspond to different timeseries.
P: (no_steps+1, state_dim, state_dim) 3D array
Filter estimates of the state covariances. In the extra step the initial value is included.
log_likelihood: double or (1, time_series_no) 3D array.
If the parameter calc_log_likelihood was set to true, return logarithm of marginal likelihood of the state-space model. If the parameter was false, return None. In the “multiple time series mode” it is a vector providing log_likelihood for each time series.
If calc_grad_log_likelihood is true, return gradient of log likelihood with respect to parameters. It returns it column wise, so in “multiple time series mode” gradients for each time series is in the corresponding column.
matrs_for_smoother: dict
Dictionary with model functions for smoother. The intrinsic model functions are computed in this functions and they are returned to use in smoother for convenience. They are: ‘p_a’, ‘p_f_A’, ‘p_f_Q’ The dictionary contains the same fields.
classmethod rts_smoother(state_dim, p_dynamic_callables, filter_means, filter_covars)[source]

This function implements Rauch–Tung–Striebel(RTS) smoother algorithm based on the results of kalman_filter_raw. These notations are the same:

x_{k} = A_{k} * x_{k-1} + q_{k-1}; q_{k-1} ~ N(0, Q_{k-1}) y_{k} = H_{k} * x_{k} + r_{k}; r_{k-1} ~ N(0, R_{k})

Returns estimated smoother distributions x_{k} ~ N(m_{k}, P(k))

p_a: function (k, x_{k-1}, A_{k}). Dynamic function.
k (iteration number), starts at 0 x_{k-1} State from the previous step A_{k} Jacobian matrices of f_a. In the linear case it is exactly A_{k}.
p_f_A: function (k, m, P) return Jacobian of dynamic function, it is
passed into p_a. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
p_f_Q: function (k). Returns noise matrix of dynamic model on iteration k.
k (iteration number). starts at 0
filter_means: (no_steps+1,state_dim) matrix or (no_steps+1,state_dim, time_series_no) 3D array
Results of the Kalman Filter means estimation.
filter_covars: (no_steps+1, state_dim, state_dim) 3D array
Results of the Kalman Filter covariance estimation.
M: (no_steps+1, state_dim) matrix
Smoothed estimates of the state means
P: (no_steps+1, state_dim, state_dim) 3D array
Smoothed estimates of the state covariances
class DescreteStateSpaceMeta[source]

Bases: type

Substitute necessary methods from cython.

After thos method the class object is created

Dynamic_Callables_Class
class Dynamic_Callables_Python[source]

Bases: object

Ak(k, m, P)[source]

function (k, m, P) return Jacobian of dynamic function, it is passed into p_a.

k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
Q_srk(k)[source]

function (k). Returns the square root of noise matrix of dynamic model on iteration k.

k (iteration number). starts at 0

This function is implemented to use SVD prediction step.

Qk(k)[source]
function (k). Returns noise matrix of dynamic model on iteration k.
k (iteration number). starts at 0
dAk(k)[source]
function (k). Returns the derivative of A on iteration k.
k (iteration number). starts at 0
dQk(k)[source]
function (k). Returns the derivative of Q on iteration k.
k (iteration number). starts at 0
f_a(k, m, A)[source]
p_a: function (k, x_{k-1}, A_{k}). Dynamic function.
k (iteration number), starts at 0 x_{k-1} State from the previous step A_{k} Jacobian matrices of f_a. In the linear case it is exactly A_{k}.
reset(compute_derivatives=False)[source]

Return the state of this object to the beginning of iteration (to k eq. 0).

Measurement_Callables_Class
class Measurement_Callables_Python[source]

Bases: object

Hk(k, m_pred, P_pred)[source]
function (k, m, P) return Jacobian of measurement function, it is
passed into p_h. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
R_isrk(k)[source]
function (k). Returns the square root of the noise matrix of
measurement equation on iteration k. k (iteration number). starts at 0

This function is implemented to use SVD prediction step.

Rk(k)[source]
function (k). Returns noise matrix of measurement equation
on iteration k. k (iteration number). starts at 0
dHk(k)[source]
function (k). Returns the derivative of H on iteration k.
k (iteration number). starts at 0
dRk(k)[source]
function (k). Returns the derivative of R on iteration k.
k (iteration number). starts at 0
f_h(k, m_pred, Hk)[source]
function (k, x_{k}, H_{k}). Measurement function.
k (iteration number), starts at 0 x_{k} state H_{k} Jacobian matrices of f_h. In the linear case it is exactly H_{k}.
reset(compute_derivatives=False)[source]

Return the state of this object to the beginning of iteration (to k eq. 0)

Q_handling_Class

alias of Q_handling_Python

class Q_handling_Python(Q, index, Q_time_var_index, unique_Q_number, dQ=None)[source]
R - array with noise on various steps. The result of preprocessing
the noise input.
index - for each step of Kalman filter contains the corresponding index
in the array.
R_time_var_index - another index in the array R. Computed earlier and
passed here.
unique_R_number - number of unique noise matrices below which square
roots are cached and above which they are computed each time.
dQ: 3D array[:, :, param_num]
derivative of Q. Derivative is supported only when Q do not change over time
Object which has two necessary functions:
f_R(k) inv_R_square_root(k)
Q_srk(k)[source]
function (k). Returns the square root of noise matrix of dynamic model
on iteration k.

k (iteration number). starts at 0

This function is implemented to use SVD prediction step.

Qk(k)[source]
function (k). Returns noise matrix of dynamic model on iteration k.
k (iteration number). starts at 0
dQk(k)[source]
R_handling_Class

alias of R_handling_Python

class R_handling_Python(R, index, R_time_var_index, unique_R_number, dR=None)[source]

The calss handles noise matrix R.

R - array with noise on various steps. The result of preprocessing
the noise input.
index - for each step of Kalman filter contains the corresponding index
in the array.
R_time_var_index - another index in the array R. Computed earlier and
is passed here.
unique_R_number - number of unique noise matrices below which square
roots are cached and above which they are computed each time.
dR: 3D array[:, :, param_num]
derivative of R. Derivative is supported only when R do not change over time
Object which has two necessary functions:
f_R(k) inv_R_square_root(k)
R_isrk(k)[source]

Function returns the inverse square root of R matrix on step k.

Rk(k)[source]
dRk(k)[source]
Std_Dynamic_Callables_Class
class Std_Dynamic_Callables_Python(A, A_time_var_index, Q, index, Q_time_var_index, unique_Q_number, dA=None, dQ=None)[source]
Ak(k, m_pred, P_pred)[source]
function (k, m, P) return Jacobian of measurement function, it is
passed into p_h. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
dAk(k)[source]
f_a(k, m, A)[source]

f_a: function (k, x_{k-1}, A_{k}). Dynamic function. k (iteration number), starts at 0 x_{k-1} State from the previous step A_{k} Jacobian matrices of f_a. In the linear case it is exactly A_{k}.

reset(compute_derivatives=False)[source]

Return the state of this object to the beginning of iteration (to k eq. 0)

Std_Measurement_Callables_Class
class Std_Measurement_Callables_Python(H, H_time_var_index, R, index, R_time_var_index, unique_R_number, dH=None, dR=None)[source]
Hk(k, m_pred, P_pred)[source]
function (k, m, P) return Jacobian of measurement function, it is
passed into p_h. k (iteration number), starts at 0 m: point where Jacobian is evaluated P: parameter for Jacobian, usually covariance matrix.
dHk(k)[source]
f_h(k, m, H)[source]
function (k, x_{k}, H_{k}). Measurement function.
k (iteration number), starts at 0 x_{k} state H_{k} Jacobian matrices of f_h. In the linear case it is exactly H_{k}.
class Struct[source]

Bases: object

balance_matrix(A)[source]

Balance matrix, i.e. finds such similarity transformation of the original matrix A: A = T * bA * T^{-1}, where norms of columns of bA and of rows of bA are as close as possible. It is usually used as a preprocessing step in eigenvalue calculation routine. It is useful also for State-Space models.

[1] Beresford N. Parlett and Christian Reinsch (1969). Balancing
a matrix for calculation of eigenvalues and eigenvectors. Numerische Mathematik, 13(4): 293-304.
A: square matrix
Matrix to be balanced
bA: matrix
Balanced matrix
T: matrix
Left part of the similarity transformation
T_inv: matrix
Right part of the similarity transformation.
balance_ss_model(F, L, Qc, H, Pinf, P0, dF=None, dQc=None, dPinf=None, dP0=None)[source]

Balances State-Space model for more numerical stability

This is based on the following:

dx/dt = F x + L w
y = H x

Let T z = x, which gives

dz/dt = inv(T) F T z + inv(T) L w
y = H T z
matrix_exponent(M)[source]

The function computes matrix exponent and handles some special cases

## GPy.models.state_space_model module¶

class StateSpace(X, Y, kernel=None, noise_var=1.0, kalman_filter_type='regular', use_cython=False, name='StateSpace')[source]
log_likelihood()[source]
parameters_changed()[source]

Parameters have now changed

plot(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, samples_likelihood=0, lower=2.5, upper=97.5, plot_data=True, plot_inducing=True, plot_density=False, predict_kw=None, projection='2d', legend=True, **kwargs)

Convenience function for plotting the fit of a GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

If you want fine graned control use the specific plotting functions supplied in the model.

Parameters: plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v. resolution (int) – The resolution of the prediction [default:200] plot_raw (bool) – plot the latent function (usually denoted f) only? apply_link (bool) – whether to apply the link function of the GP to the raw prediction. which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all) visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints) levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you. samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function. samples_likelihood (int) – the number of samples to draw from the GP and apply the likelihood noise. This is usually not what you want! lower (float) – the lower percentile to plot upper (float) – the upper percentile to plot plot_data (bool) – plot the data into the plot? plot_inducing (bool) – plot inducing inputs? plot_density (bool) – plot density instead of the confidence interval? predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=) in here projection ({2d|3d}) – plot in 2d or 3d? legend (bool) – convenience, whether to put a legend on the plot or not.
plot_confidence(lower=2.5, upper=97.5, plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', label='gp confidence', predict_kw=None, **kwargs)

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters: lower (float) – the lower percentile to plot upper (float) – the upper percentile to plot plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v. resolution (int) – The resolution of the prediction [default:200] plot_raw (bool) – plot the latent function (usually denoted f) only? apply_link (bool) – whether to apply the link function of the GP to the raw prediction. visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints) which_data_ycols (array-like) – which columns of the output y (!) to plot (array-like or list of ints) predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=) in here
plot_data(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **plot_kwargs)
Plot the training data
• For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters: which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all) which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two) projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs! label (str) – the label for the plot plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using of plots created.
plot_data_error(which_data_rows='all', which_data_ycols='all', visible_dims=None, projection='2d', label=None, **error_kwargs)

Plot the training data input error.

For higher dimensions than two, use fixed_inputs to plot the data points with some of the inputs fixed.

Can plot only part of the data using which_data_rows and which_data_ycols.

Parameters: which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all) which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these visible_dims (a numpy array) – an array specifying the input dimensions to plot (maximum two) projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs! error_kwargs (dict) – kwargs for the error plot for the plotting library you are using label (str) – the label for the plot plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using of plots created.
plot_density(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=35, label='gp density', predict_kw=None, **kwargs)

Plot the confidence interval between the percentiles lower and upper. E.g. the 95% confidence interval is $2.5, 97.5$. Note: Only implemented for one dimension!

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters: plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v. resolution (int) – The resolution of the prediction [default:200] plot_raw (bool) – plot the latent function (usually denoted f) only? apply_link (bool) – whether to apply the link function of the GP to the raw prediction. visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints) which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints) levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you. predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=) in here
plot_errorbars_trainset(which_data_rows='all', which_data_ycols='all', fixed_inputs=None, plot_raw=False, apply_link=False, label=None, projection='2d', predict_kw=None, **plot_kwargs)

Plot the errorbars of the GP likelihood on the training data. These are the errorbars after the appropriate approximations according to the likelihood are done.

This also works for heteroscedastic likelihoods.

Give the Y_metadata in the predict_kw if you need it.

Parameters: which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all) which_data_ycols – when the data has several columns (independant outputs), only plot these fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v. predict_kwargs (dict) – kwargs for the prediction used to predict the right quantiles. plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_f(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters: plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v. resolution (int) – The resolution of the prediction [default:200] apply_link (bool) – whether to apply the link function of the GP to the raw prediction. which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all) visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two) levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you. samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function. lower (float) – the lower percentile to plot upper (float) – the upper percentile to plot plot_data (bool) – plot the data into the plot? plot_inducing (bool) – plot inducing inputs? plot_density (bool) – plot density instead of the confidence interval? predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=) in here error_kwargs (dict) – kwargs for the error plot for the plotting library you are using plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_latent(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters: plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v. resolution (int) – The resolution of the prediction [default:200] apply_link (bool) – whether to apply the link function of the GP to the raw prediction. which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all) visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two) levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you. samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function. lower (float) – the lower percentile to plot upper (float) – the upper percentile to plot plot_data (bool) – plot the data into the plot? plot_inducing (bool) – plot inducing inputs? plot_density (bool) – plot density instead of the confidence interval? predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=) in here error_kwargs (dict) – kwargs for the error plot for the plotting library you are using plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_mean(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=False, apply_link=False, visible_dims=None, which_data_ycols='all', levels=20, projection='2d', label='gp mean', predict_kw=None, **kwargs)

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters: plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v. resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50] plot_raw (bool) – plot the latent function (usually denoted f) only? apply_link (bool) – whether to apply the link function of the GP to the raw prediction. which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints) levels (int) – for 2D plotting, the number of contour levels to use is projection ({'2d','3d'}) – whether to plot in 2d or 3d. This only applies when plotting two dimensional inputs! label (str) – the label for the plot. predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=) in here
plot_noiseless(plot_limits=None, fixed_inputs=None, resolution=None, apply_link=False, which_data_ycols='all', which_data_rows='all', visible_dims=None, levels=20, samples=0, lower=2.5, upper=97.5, plot_density=False, plot_data=True, plot_inducing=True, projection='2d', legend=True, predict_kw=None, **kwargs)

Convinience function for plotting the fit of a GP. This is the same as plot, except it plots the latent function fit of the GP!

If you want fine graned control use the specific plotting functions supplied in the model.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters: plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v. resolution (int) – The resolution of the prediction [default:200] apply_link (bool) – whether to apply the link function of the GP to the raw prediction. which_data_ycols ('all' or a list of integers) – when the data has several columns (independant outputs), only plot these which_data_rows ('all' or a slice object to slice self.X, self.Y) – which of the training data to plot (default all) visible_dims (array-like) – an array specifying the input dimensions to plot (maximum two) levels (int) – the number of levels in the density (number bigger then 1, where 35 is smooth and 1 is the same as plot_confidence). You can go higher then 50 if the result is not smooth enough for you. samples (int) – the number of samples to draw from the GP and plot into the plot. This will allways be samples from the latent function. lower (float) – the lower percentile to plot upper (float) – the upper percentile to plot plot_data (bool) – plot the data into the plot? plot_inducing (bool) – plot inducing inputs? plot_density (bool) – plot density instead of the confidence interval? predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=) in here error_kwargs (dict) – kwargs for the error plot for the plotting library you are using plot_kwargs (kwargs) – kwargs for the data plot for the plotting library you are using
plot_samples(plot_limits=None, fixed_inputs=None, resolution=None, plot_raw=True, apply_link=False, visible_dims=None, which_data_ycols='all', samples=3, projection='2d', label='gp_samples', predict_kw=None, **kwargs)

Plot the mean of the GP.

You can deactivate the legend for this one plot by supplying None to label.

Give the Y_metadata in the predict_kw if you need it.

Parameters: plot_limits (np.array) – The limits of the plot. If 1D [xmin,xmax], if 2D [[xmin,ymin],[xmax,ymax]]. Defaluts to data limits fixed_inputs (a list of tuples) – a list of tuple [(i,v), (i,v)…], specifying that input dimension i should be set to value v. resolution (int) – The resolution of the prediction [defaults are 1D:200, 2D:50] plot_raw (bool) – plot the latent function (usually denoted f) only? This is usually what you want! apply_link (bool) – whether to apply the link function of the GP to the raw prediction. visible_dims (array-like) – which columns of the input X (!) to plot (array-like or list of ints) which_data_ycols (array-like) – which columns of y to plot (array-like or list of ints) predict_kw (dict) – the keyword arguments for the prediction. If you want to plot a specific kernel give dict(kern=) in here levels (int) – for 2D plotting, the number of contour levels to use is
predict(Xnew=None, filteronly=False, include_likelihood=True, **kw)[source]
predict_quantiles(Xnew=None, quantiles=(2.5, 97.5), **kw)[source]

## GPy.models.state_space_setup module¶

This module is intended for the setup of state_space_main module. The need of this module appeared because of the way state_space_main module connected with cython code.

## GPy.models.warped_gp module¶

class WarpedGP(X, Y, kernel=None, warping_function=None, warping_terms=3, normalizer=False)[source]

Bases: GPy.core.gp.GP

This defines a GP Regression model that applies a warping function to the output.

log_likelihood()[source]

Notice we add the jacobian of the warping function here.

log_predictive_density(x_test, y_test, Y_metadata=None)[source]

Calculation of the log predictive density. Notice we add the jacobian of the warping function here.

Parameters: x_test ((Nx1) array) – test locations (x_{*}) y_test ((Nx1) array) – test observations (y_{*}) Y_metadata – metadata associated with the test points
parameters_changed()[source]

Notice that we update the warping function gradients here.

plot_warping()[source]
predict(Xnew, kern=None, pred_init=None, Y_metadata=None, median=False, deg_gauss_hermite=20, likelihood=None)[source]

Prediction results depend on: - The value of the self.predict_in_warped_space flag - The median flag passed as argument The likelihood keyword is never used, it is just to follow the plotting API.

predict_quantiles(X, quantiles=(2.5, 97.5), Y_metadata=None, likelihood=None, kern=None)[source]

Get the predictive quantiles around the prediction at X

Parameters: X (np.ndarray (Xnew x self.input_dim)) – The points at which to make a prediction quantiles (tuple) – tuple of quantiles, default is (2.5, 97.5) which is the 95% interval list of quantiles for each X and predictive quantiles for interval combination [np.ndarray (Xnew x self.input_dim), np.ndarray (Xnew x self.input_dim)]
set_XY(X=None, Y=None)[source]
transform_data`()[source]