The API¶

pomegranate has a minimal core API that is made possible because all models are treated as probability distributions regardless of their complexity. This point is repeated throughout the documentation because it has important consequences for how the package is designed and also for how one should think about designing probabilistic models. Although each model documentation page has an API reference showing the full set of methods and parameters for each model, each models has the following methods:.

>>> model.probability(X)

This method takes in a set of examples (either 2D or 3D depending on the model) and returns a vector of probabilities.

>>> model.log_probability(X)

This method takes in a set of examples (either 2D or 3D depending on the model) and returns a vector of log probabilities. Log probabilities are more numerically stable and, in fact, calls to model.probability just exponentiate the value returned from this call.

>>> model.fit(X, sample_weight=None)

This method will fit the model to the given data that is optionally weighted. If the model is a simple probability distribution, a Bayes classifier, or a Bayesian network with fully observed features, the method will use maximum likelihood estimates. For other models and settings, the method will use expectation-maximization to fit the model parameters. When a structure is not provided for hidden Markov models or Bayesian networks, this method will jointly learn the structure and the parameters of the model. The shape of data should be (n, d) or (n, l, d) depending on if there is a length dimension, where n is the number of samples, l is the length of the data, and d is the dimensionality. Sample weights should either be a vector of non-negative numbers of size (n,) or a matrix of size (n, d).

>>> model.summarize(X, sample_weight=None)

This method is the first step of the two step out-of-core learning API. The method will take in a data and optional weights and extract the sufficient statistics that allow for an exact update and added to the cached values. Because these sufficient statistics are additive one can derive an exact update from multiple calls to this method without having to store an entire data set in memory.

>>> model.from_summaries()

This method is the second step in the out-of-core learning API. The method uses the extracted and aggregated sufficient statistics to derive exact parameter updates for the model. After the parameters are updated, the stored sufficient statistics will be zeroed out.

Compositional Methods¶

For models that are composed of other models/distributions, e.g. mixture models, hidden Markov models, and Bayesian networks, there are additional methods that relate to inferring how the data relates to each of these distributions. For example, instead of just calculating the log probability of an example under an entire mixture model, one might want to calculate the posterior probability that the data was generated by each of the distributions. These posterior probabilities are found by applying Bayes’ rule, which connects prior probabilities and likelihoods to posterior probabilities.

>>> model.predict(X)

This method will return the most likely inferred value for each example in the data. In the case of Bayesian networks operating on incomplete data, this inferred value is the most likely value that each variable takes given the structure of the model and the observed data. For all other methods, this is the most likely component that explains the data, P(M|D).

>>> model.predict_proba(X)

This returns the matrix of posterior probabilities P(M|D) directly. The predict method simply runs an argmax over this matrix.

>>> model.predict_log_proba(X)

This returns the matrix of log posterior probabilities for numerical stability.

API Reference¶

Distributions¶

class pomegranate.distributions.Bernoulli(probs=None, inertia=0.0, frozen=False, check_data=True)¶

A Bernoulli distribution object.

A Bernoulli distribution models the probability of a binary variable occurring. rates of discrete events, and has a probability parameter describing this value. This distribution assumes that each feature is independent of the others.

There are two ways to initialize this object. The first is to pass in the tensor of probablity parameters, at which point they can immediately be used. The second is to not pass in the rate parameters and then call either fit or summary + from_summaries, at which point the probability parameter will be learned from data.

probs: list, numpy.ndarray, torch.Tensor or None, shape=(d,), optional: The probability parameters for each feature. Default is None.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format. For a Bernoulli distribution, each entry in the data must be either 0 or 1.

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

sample(n)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution.

n: int: The number of samples to generate.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

class pomegranate.distributions.Categorical(probs=None, n_categories=None, pseudocount=0.0, inertia=0.0, frozen=False, check_data=True)¶

A categorical distribution object.

A categorical distribution models the probability of a set of distinct values happening. It is an extension of the Bernoulli distribution to multiple values. Sometimes it is refered to as a discrete distribution, but this distribution does not enforce that the numeric values used for the keys have any relationship based on their identity. Permuting the keys will have no effect on the calculation. This distribution assumes that the features are independent from each other.

The keys must be contiguous non-negative integers that begin at zero. Because the probabilities are represented as a single tensor, each feature must have values for all keys up to the maximum key of any one distribution. Specifically, if one feature has 10 keys and a second feature has only 4, the tensor must go out to 10 for each feature but encode probabilities of zero for the second feature.

probs: list, numpy.ndarray, torch.tensor or None, shape=(k, d), optional: Probabilities for each key for each feature, where k is the largest number of keys across all features. Default is None
inertia: float, (0, 1), optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format. For a categorical distribution, each entry in the data must be an integer in the range [0, n_keys).

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

sample(n)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution.

n: int: The number of samples to generate.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

class pomegranate.distributions.ConditionalCategorical(probs=None, n_categories=None, pseudocount=0, inertia=0.0, frozen=False, check_data=True)¶

Still under development.

sample(n, X)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution. For a mixture model, this involves first sampling the component using the prior probabilities, and then sampling from the chosen distribution.

n: int: The number of samples to generate.
X: list, numpy.ndarray, torch.tensor, shape=(n, d, *self.probs.shape-1): The values to be conditioned on when generating the samples.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

class pomegranate.distributions.JointCategorical(probs=None, n_categories=None, pseudocount=0, inertia=0.0, frozen=False, check_data=True)¶

A joint categorical distribution.

A joint categorical distribution models the probability of a vector of categorical values occuring without assuming that the dimensions are independent from each other. Essentially, it is a Categorical distribution without the assumption that the dimensions are independent of each other.

There are two ways to initialize this object. The first is to pass in the tensor of probablity parameters, at which point they can immediately be used. The second is to not pass in the rate parameters and then call either fit or summary + from_summaries, at which point the probability parameters will be learned from data.

probs: list, numpy.ndarray, torch.tensor, or None, shape=*n_categories: A tensor where each dimension corresponds to one column in the data set being modeled and the size of each dimension is the number of categories in that column, e.g., if the data being modeled is binary and has shape (5, 4), this will be a tensor with shape (2, 2, 2, 2). Default is None.
n_categories: list, numpy.ndarray, torch.tensor, or None, shape=(d,): A vector with the maximum number of categories that each column can have. If not given, this will be inferred from the data. Default is None.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
pseudocount: float, optional: A number of observations to add to each entry in the probability distribution during training. A higher value will smooth the distributions more. Default is 0.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format. For a joint categorical distribution, each value must be an integer category that is smaller than the maximum number of categories for each feature.

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

sample(n)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution. For a mixture model, this involves first sampling the component using the prior probabilities, and then sampling from the chosen distribution.

n: int: The number of samples to generate.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

class pomegranate.distributions.DiracDelta(alphas=None, inertia=0.0, frozen=False, check_data=True)¶

A dirac delta distribution object.

A dirac delta distribution is a probability distribution that has its entire density at zero. This distribution assumes that each feature is independent of the others. This means that, in practice, it will assign a zero probability if any value in an example is non-zero.

There are two ways to initialize this object. The first is to pass in the tensor of alpha values representing the probability to return given a zero value, at which point they can immediately be used. The second is to not pass in the rate parameters and then call either fit or summary + from_summaries, at which point the probability parameter will be learned from data.

alphas: list, numpy.ndarray, torch.Tensor or None, shape=(d,), optional: The probability parameters for each feature. Default is None.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

For a dirac delta distribution, there are no updates.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format.

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

For a dirac delta distribution, there are no statistics to extract.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

class pomegranate.distributions.Exponential(scales=None, inertia=0.0, frozen=False, check_data=True)¶

An exponential distribution object.

An exponential distribution models scales of discrete events, and has a rate parameter describing the average time between event occurances. This distribution assumes that each feature is independent of the others. Although the object is meant to operate on discrete counts, it can be used on any non-negative continuous data.

There are two ways to initialize this object. The first is to pass in the tensor of rate parameters, at which point they can immediately be used. The second is to not pass in the rate parameters and then call either fit or summary + from_summaries, at which point the rate parameter will be learned from data.

scales: list, numpy.ndarray, torch.Tensor or None, shape=(d,), optional: The rate parameters for each feature. Default is None.
inertia: float, (0, 1), optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format. For an exponential distribution, the data must be non-negative.

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

sample(n)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution.

n: int: The number of samples to generate.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

class pomegranate.distributions.Gamma(shapes=None, rates=None, inertia=0.0, tol=0.0001, max_iter=20, frozen=False, check_data=True)¶

A gamma distribution object.

A gamma distribution is the sum of exponential distributions, and has shape and rate parameters. This distribution assumes that each feature is independent of the others.

There are two ways to initialize this objecct. The first is to pass in the tensor of rate and shae parameters, at which point they can immediately be used. The second is to not pass in the rate parameters and then call either fit or summary + from_summaries, at which point the rate and shape parameters will be learned from data.

shapes: torch.tensor or None, shape=(d,), optional: The shape parameter for each feature. Default is None
rates: torch.tensor or None, shape=(d,), optional: The rate parameters for each feature. Default is None.
inertia: float, (0, 1), optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
tol: float, [0, inf), optional: The threshold at which to stop fitting the parameters of the distribution. Default is 1e-4.
max_iter: int, [0, inf), optional: The maximum number of iterations to run EM when fitting the parameters of the distribution. Default is 20.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format. For a gamma distribution, the data must be non-negative.

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

sample(n)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution.

n: int: The number of samples to generate.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

class pomegranate.distributions.Normal(means=None, covs=None, covariance_type='full', min_cov=None, inertia=0.0, frozen=False, check_data=True)¶

A normal distribution object.

A normal distribution models the probability of a variable occuring under a bell-shaped curve. It is described by a vector of mean values and a covariance value that can be zero, one, or two dimensional. This distribution can assume that features are independent of the others if the covariance type is ‘diag’ or ‘sphere’, but if the type is ‘full’ then the features are not independent.

There are two ways to initialize this object. The first is to pass in the tensor of probablity parameters, at which point they can immediately be used. The second is to not pass in the rate parameters and then call either fit or summary + from_summaries, at which point the probability parameter will be learned from data.

means: list, numpy.ndarray, torch.Tensor or None, shape=(d,), optional: The mean values of the distributions. Default is None.
covs: list, numpy.ndarray, torch.Tensor, or None, optional: The variances and covariances of the distribution. If covariance_type is ‘full’, the shape should be (self.d, self.d); if ‘diag’, the shape should be (self.d,); if ‘sphere’, it should be (1,). Note that this is the variances or covariances in all settings, and not the standard deviation, as may be more common for diagonal covariance matrices. Default is None.
covariance_type: str, optional: The type of covariance matrix. Must be one of ‘full’, ‘diag’, or ‘sphere’. Default is ‘full’.
min_cov: float or None, optional: The minimum variance or covariance.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format.

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

sample(n)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution.

n: int: The number of samples to generate.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

class pomegranate.distributions.Poisson(lambdas=None, inertia=0.0, frozen=False, check_data=True)¶

An poisson distribution object.

A poisson distribution models the number of occurances of events that happen in a fixed time span, assuming that the occurance of each event is independent. This distibution also asumes that each feature is independent of the others.

There are two ways to initialize this objecct. The first is to pass in the tensor of lambda parameters, at which point they can immediately be used. The second is to not pass in the lambda parameters and then call either fit or summary + from_summaries, at which point the lambda parameter will be learned from data.

lambdas: list, numpy.ndarray, torch.Tensor or None, shape=(d,), optional: The lambda parameters for each feature. Default is None.
inertia: float, (0, 1), optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format. For a Poisson distribution, each entry in the data must be non-negative.

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

sample(n)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution.

n: int: The number of samples to generate.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

class pomegranate.distributions.StudentT(dofs, means=None, covs=None, covariance_type='diag', min_cov=None, inertia=0.0, frozen=False, check_data=True)¶

A Student T distribution.

A Student T distribution models the probability of a variable occuring under a bell-shaped curve with heavy tails. Basically, this is a version of the normal distribution that is less resistant to outliers. It is described by a vector of mean values and a vector of variance values. This distribution can assume that features are independent of the others if the covariance type is ‘diag’ or ‘sphere’, but if the type is ‘full’ then the features are not independent.

There are two ways to initialize this object. The first is to pass in the tensor of probablity parameters, at which point they can immediately be used. The second is to not pass in the rate parameters and then call either fit or summary + from_summaries, at which point the probability parameter will be learned from data.

means: list, numpy.ndarray, torch.Tensor or None, shape=(d,), optional: The mean values of the distributions. Default is None.
covs: list, numpy.ndarray, torch.Tensor, or None, optional: The variances and covariances of the distribution. If covariance_type is ‘full’, the shape should be (self.d, self.d); if ‘diag’, the shape should be (self.d,); if ‘sphere’, it should be (1,). Note that this is the variances or covariances in all settings, and not the standard deviation, as may be more common for diagonal covariance matrices. Default is None.
covariance_type: str, optional: The type of covariance matrix. Must be one of ‘full’, ‘diag’, or ‘sphere’. Default is ‘full’.
min_cov: float or None, optional: The minimum variance or covariance.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format.

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

sample(n)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution.

n: int: The number of samples to generate.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

class pomegranate.distributions.Uniform(mins=None, maxs=None, inertia=0.0, frozen=False, check_data=True)¶

A uniform distribution.

A uniform distribution models the probability of a variable occuring given a range that has the same probability within it and no probability outside it. It is described by a vector of minimum and maximum values for this range. This distribution assumes that the features are independent of each other.

There are two ways to initialize this object. The first is to pass in the tensor of probablity parameters, at which point they can immediately be used. The second is to not pass in the rate parameters and then call either fit or summary + from_summaries, at which point the probability parameter will be learned from data.

mins: list, numpy.ndarray, torch.Tensor or None, shape=(d,), optional: The minimum values of the range.
maxs: list, numpy.ndarray, torch.Tensor, or None, optional: The maximum values of the range.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

log_probability(X)¶

Calculate the log probability of each example.

This method calculates the log probability of each example given the parameters of the distribution. The examples must be given in a 2D format. For a Bernoulli distribution, each entry in the data must be either 0 or 1.

Note: This differs from some other log probability calculation functions, like those in torch.distributions, because it is not returning the log probability of each feature independently, but rather the total log probability of the entire example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.

logp: torch.Tensor, shape=(-1,): The log probability of each example.

sample(n)¶

Sample from the probability distribution.

This method will return n samples generated from the underlying probability distribution.

n: int: The number of samples to generate.

X: torch.tensor, shape=(n, self.d): Randomly generated samples.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

class pomegranate.distributions.ZeroInflated(distribution, priors=None, max_iter=10, tol=0.1, inertia=0.0, frozen=False, check_data=False, verbose=False)¶

A wrapper for a zero-inflated distribution.

Some discrete distributions, e.g. Poisson or negative binomial, are used to model data that has many more zeroes in it than one would expect from the true signal itself. Potentially, this is because data collection devices fail or other gaps exist in the data. A zero-inflated distribution is essentially a mixture of these zero values and the real underlying distribution.

Accordingly, this class serves as a wrapper that can be dropped in for other probability distributions and makes them “zero-inflated”. It is similar to a mixture model between the distribution passed in and a dirac delta distribution, except that the mixture happens independently for each distribution as well as for each example.

distribution: pomegranate.distributions.Distribution: A pomegranate distribution object. It should probably be a discrete distribution, but does not technically have to be.
priors: tuple, numpy.ndarray, torch.Tensor, or None. shape=(2,), optional: The prior probabilities over the given distribution and the dirac delta component. Default is None.
max_iter: int, optional: The number of iterations to do in the EM step of fitting the distribution. Default is 10.
tol: float, optional: The threshold at which to stop during fitting when the improvement goes under. Default is 0.1.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
verbose: bool, optional: Whether to print the improvement and timings during training.

fit(X, sample_weight=None)¶

Fit the model to optionally weighted examples.

This method implements the core of the learning process. For a zero-inflated distribution, this involves performing EM until the distribution being fit converges.

This method is largely a wrapper around the summarize and from_summaries methods. It’s primary contribution is serving as a loop around these functions and to monitor convergence.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to evaluate.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

self

from_summaries()¶

Update the model parameters given the extracted statistics.

This method uses calculated statistics from calls to the summarize method to update the distribution parameters. Hyperparameters for the update are passed in at initialization time.

Note: Internally, a call to fit is just a successive call to the summarize method followed by the from_summaries method.

summarize(X, sample_weight=None)¶

Extract the sufficient statistics from a batch of data.

This method calculates the sufficient statistics from optionally weighted data and adds them to the stored cache. The examples must be given in a 2D format. Sample weights can either be provided as one value per example or as a 2D matrix of weights for each feature in each example.

X: list, tuple, numpy.ndarray, torch.Tensor, shape=(-1, self.d): A set of examples to summarize.
sample_weight: list, tuple, numpy.ndarray, torch.Tensor, optional: A set of weights for the examples. This can be either of shape (-1, self.d) or a vector of shape (-1,). Default is ones.

Models¶

class pomegranate.bayes_classifier.BayesClassifier(distributions, priors=None, inertia=0.0, frozen=False, check_data=True)¶

A Bayes classifier object.

A simple way to produce a classifier using probabilistic models is to plug them into Bayes’ rule. Basically, inference is the same as the ‘E’ step in EM for mixture models. However, fitting can be significantly faster because instead of having to iteratively infer labels and learn parameters, you can just learn the parameters given the known labels. Because the learning step for most models are simple MLE estimates, this can be done extremely quickly.

Although the most common distribution to use is a Gaussian with a diagonal covariance matrix, termed the Gaussian naive Bayes model, any probability distribution can be used. Here, you can just drop any distributions or probabilistic model in as long as it has the log_probability, summarize, and from_samples methods implemented.

Further, the probabilistic models do not even need to be simple distributions. The distributions can be mixture models or hidden Markov models or Bayesian networks.

distributions: tuple or list: A set of distribution objects. These objects do not need to be initialized, i.e., can be “Normal()”.
priors: tuple, numpy.ndarray, torch.Tensor, or None. shape=(k,), optional: The prior probabilities over the given distributions. Default is None.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling. Default is True.

class pomegranate.gmm.GeneralMixtureModel(distributions, priors=None, init='random', max_iter=1000, tol=0.1, inertia=0.0, frozen=False, random_state=None, check_data=True, verbose=False)¶

A general mixture model.

Frequently, data is generated from multiple components. A mixture model is a probabilistic model that explicitly models data as having come from a set of probability distributions rather than a single one. Usually, the abbreviation “GMM” refers to a Gaussian mixture model, but any probability distribution or heterogeneous set of distributions can be included in the mixture, making it a “general” mixture model.

However, a mixture model itself has all the same theoretical properties as a probability distribution because it is one. Hence, it can be used in any situation that a simpler distribution could, such as an emission distribution for a HMM or a component of a Bayes classifier.

Conversely, many models that are usually thought of as composed of probability distributions but distinct from them, e.g. hidden Markov models, Markov chains, and Bayesian networks, can in theory be passed into this object and incorporated into the mixture.

If the distributions included in the mixture are not initialized, the fitting step will first initialize them by running k-means for a small number of iterations and fitting the distributions to the clusters that are discovered.

distributions: tuple or list: A set of distribution objects. These objects do not need to be initialized, i.e., can be “Normal()”.
priors: tuple, numpy.ndarray, torch.Tensor, or None. shape=(k,), optional: The prior probabilities over the given distributions. Default is None.
max_iter: int, optional: The number of iterations to do in the EM step of fitting the distribution. Default is 10.
tol: float, optional: The threshold at which to stop during fitting when the improvement goes under. Default is 0.1.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling. Default is True.
verbose: bool, optional: Whether to print the improvement and timings during training.

class pomegranate.hmm.DenseHMM(distributions=None, edges=None, starts=None, ends=None, init='random', max_iter=1000, tol=0.1, sample_length=None, return_sample_paths=False, inertia=0.0, frozen=False, check_data=True, random_state=None, verbose=False)¶

A hidden Markov model with a dense transition matrix.

A hidden Markov model is an extension of a mixture model to sequences by including a transition matrix between the elements of the mixture. Each of the algorithms for a hidden Markov model are essentially just a revision of those algorithms to incorporate this transition matrix.

This object is a wrapper for a hidden Markov model with a dense transition matrix.

This object is a wrapper for both implementations, which can be specified using the kind parameter. Choosing the right implementation will not effect the accuracy of the results but will change the speed at which they are calculated.

Separately, there are two ways to instantiate the hidden Markov model. The first is by passing in a set of distributions, a dense transition matrix, and optionally start/end probabilities. The second is to initialize the object without these and then to add edges using the add_edge method and to add distributions using the add_distributions method. Importantly, the way that you choose to initialize the hidden Markov model is independent of the implementation that you end up choosing. If you pass in a dense transition matrix, this will be converted to a sparse matrix with all the zeros dropped if you choose kind=’sparse’.

distributions: tuple or list: A set of distribution objects. These objects do not need to be initialized, i.e., can be “Normal()”.
edges: numpy.ndarray, torch.Tensor, or None. shape=(k,k), optional: A dense transition matrix of probabilities for how each node or distribution passed in connects to each other one. This can contain many zeroes, and when paired with kind=’sparse’, will drop those elements from the matrix.
starts: list, numpy.ndarray, torch.Tensor, or None. shape=(k,), optional: The probability of starting at each node. If not provided, assumes these probabilities are uniform.
ends: list, numpy.ndarray, torch.Tensor, or None. shape=(k,), optional: The probability of ending at each node. If not provided, assumes these probabilities are uniform.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling. Default is True.

class pomegranate.hmm.SparseHMM(distributions=None, edges=None, starts=None, ends=None, init='random', max_iter=1000, tol=0.1, sample_length=None, return_sample_paths=False, inertia=0.0, frozen=False, check_data=True, random_state=None, verbose=False)¶

A hidden Markov model with a sparse transition matrix.

A hidden Markov model is an extension of a mixture model to sequences by including a transition matrix between the elements of the mixture. Each of the algorithms for a hidden Markov model are essentially just a revision of those algorithms to incorporate this transition matrix.

This object is a wrapper for a hidden Markov model with a sparse transition matrix.

Separately, there are two ways to instantiate the hidden Markov model. The first is by passing in a set of distributions, a dense transition matrix, and optionally start/end probabilities. The second is to initialize the object without these and then to add edges using the add_edge method and to add distributions using the add_distributions method. Importantly, the way that you choose to initialize the hidden Markov model is independent of the implementation that you end up choosing. If you pass in a dense transition matrix, this will be converted to a sparse matrix with all the zeros dropped if you choose kind=’sparse’.

distributions: tuple or list: A set of distribution objects. These objects do not need to be initialized, i.e., can be “Normal()”.
edges: numpy.ndarray, torch.Tensor, or None. shape=(k,k): A dense transition matrix of probabilities for how each node or distribution passed in connects to each other one. This can contain many zeroes, and when paired with kind=’sparse’, will drop those elements from the matrix.
starts: list, numpy.ndarray, torch.Tensor, or None. shape=(k,): The probability of starting at each node. If not provided, assumes these probabilities are uniform.
ends: list, numpy.ndarray, torch.Tensor, or None. shape=(k,): The probability of ending at each node. If not provided, assumes these probabilities are uniform.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

class pomegranate.markov_chain.MarkovChain(distributions=None, k=None, n_categories=None, inertia=0.0, frozen=False, check_data=True)¶

A Markov chain.

A Markov chain is the simplest sequential model which factorizes the joint probability distribution P(X_{0} … X_{t}) along a chain into the product of a marginal distribution P(X_{0}) P(X_{1} | X_{0}) … with k conditional probability distributions for a k-th order Markov chain.

Despite sometimes being thought of as an independent model, Markov chains are probability distributions over sequences just like hidden Markov models. Because a Markov chain has the same theoretical properties as a probability distribution, it can be used in any situation that a simpler distribution could, such as an emission distribution for a HMM or a component of a Bayes classifier.

distributions: tuple or list or None: A set of distribution objects. These objects do not need to be initialized, i.e., can be “Categorical()”.
k: int or None: The number of conditional distributions to include in the chain, also the number of steps back to model in the sequence. This must be passed in if the distributions are not passed in.
n_categories: list, numpy.ndarray, torch.tensor, or None, shape=(d,): A vector with the maximum number of categories that each column can have. If not given, this will be inferred from the data. Default is None.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.

class pomegranate.bayesian_network.BayesianNetwork(distributions=None, edges=None, structure=None, algorithm=None, include_parents=None, exclude_parents=None, max_parents=None, pseudocount=0.0, max_iter=20, tol=1e-06, inertia=0.0, frozen=False, check_data=True, verbose=False)¶

A Bayesian network object.

A Bayesian network is a probability distribution where dependencies between variables are explicitly encoded in a graph structure and the lack of an edge represents a conditional independence. These graphs are directed and typically must be acyclic, but this implementation allows for the networks to be cyclic as long as there is no assumption of convergence during inference.

Inference is doing using loopy belief propogation along a factor graph representation. This is sometimes called the sum-product algorithm. It will yield exact results if the graph has a tree-like structure. Otherwise, if the graph is acyclic, it is guaranteed to converge but not necessarily to optimal results. If the graph is cyclic, there is no guarantee on convergence, but it is thought that the longer the loop is the more likely one will get good results.

Structure learning can be done using a variety of methods.

distributions: tuple or list or None: A set of distribution objects. These do not need to be initialized, i.e. can be “Categorical()”. Currently, they must be either Categorical or JointCategorical distributions. If provided, they must be consistent with the provided edges in that every conditional distribution must have at least one parent in the provided structure. Default is None.
edges: tuple or list or None, optional: A list or tuple of 2-tuples where the first element in the 2-tuple is the parent distribution object and the second element is the child distribution object. If None, then no edges. Default is None.
struture: tuple or list or None, optional: A list or tuple of the parents for each distribution with a tuple containing no elements indicating a root node. For instance, ((), (0,), (), (0, 2)) would represent a graph with four nodes, where the second distribution has the first distribution as a parent and the fourth distribution has the first and third distributions as parents. Use this only when you want new distribution objects to be created and fit when using the fit method. Default is None.
max_iter: int, optional: The number of iterations to do in the inference step. Default is 10.
tol: float, optional: The threshold at which to stop during fitting when the improvement goes under. Default is 1e-6.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during fitting. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling. Default is True.
verbose: bool, optional: Whether to print the improvement and timings during training.

class pomegranate.factor_graph.FactorGraph(factors=None, marginals=None, edges=None, max_iter=20, tol=1e-06, inertia=0.0, frozen=False, check_data=True, verbose=False)¶

A factor graph object.

A factor graph represents a probability distribution as a bipartite graph where marginal distributions of each dimension in the distribution are on one side of the graph and factors are on the other side. The distributions on the factor side encode probability estimates from the model, whereas the distributions on the marginal side encode probability estimates from the data.

Inference is done on the factor graph using the loopy belief propogation algorithm. This is an iterative algorithm where “messages” are passed along each edge between the marginals and the factors until the estimates for the marginals converges. In brief: each message represents what the generating node thinks its marginal distribution is with respect to the child node. Calculating each message involves marginalizing the parent node with respect to every other node. When the parent node is already a univariate distribution – either because it is a marginal node or a univariate factor node – no marginalization is needed and it sends itself as the message. Basically, a joint probability table will receive messages from all the marginal nodes that comprise its dimensions and, to each of those marginal nodes, it will send a message back saying what it (the joint probability table) thinks its marginal distribution is with respect to the messages from the OTHER marginals. More concretely, if a joint probability table has two dimensions with marginal node parents A and B, it will send a message to A that is itself after marginalizing out B, and will send a message to B that is itself after marginalizing out A.

..note:: It is worth noting that this algorithm is exact when the structure is a tree. If there exist any loops in the factors, i.e., you can draw a circle beginning with a factor and then hopping between marginals and factors and make it back to the factor without crossing any edges twice, the probabilities returned are approximate.

factors: tuple or list or None: A set of distribution objects. These do not need to be initialized, i.e. can be “Categorical()”. Currently, they must be either Categorical or JointCategorical distributions. Default is None.
marginals: tuple or list or None: A set of distribution objects. These must be initialized and be Categorical distributions.
edges: list or tuple or None: A set of edges. Critically, the items in this list must be the distribution objects themselves, and the order that edges must match the order distributions in a multivariate distribution. Specifically, if you have a multivariate distribution, the first edge that includes it must correspond to the first dimension, the second edge must correspond to the second dimension, etc, and the total number of edges cannot exceed the number of dimensions. Default is None.
max_iter: int, optional: The number of iterations to do in the inference step as distributions are converging. Default is 10.
tol: float, optional: The threshold at which to stop during fitting when the improvement goes under. Default is 1e-6.
inertia: float, [0, 1], optional: Indicates the proportion of the update to apply to the parameters during training. When the inertia is 0.0, the update is applied in its entirety and the previous parameters are ignored. When the inertia is 1.0, the update is entirely ignored and the previous parameters are kept, equivalently to if the parameters were frozen.
frozen: bool, optional: Whether all the parameters associated with this distribution are frozen. If you want to freeze individual pameters, or individual values in those parameters, you must modify the frozen attribute of the tensor or parameter directly. Default is False.
check_data: bool, optional: Whether to check properties of the data and potentially recast it to torch.tensors. This does not prevent checking of parameters but can slightly speed up computation when you know that your inputs are valid. Setting this to False is also necessary for compiling.
verbose: bool, optional: Whether to print the improvement and timings during training.