Bayes Classifiers and Naive Bayes¶
Bayes classifiers are simple probabilistic classification models based off of Bayes theorem. See the above tutorial for a full primer on how they work, and what the distinction between a naive Bayes classifier and a Bayes classifier is. Essentially, each class is modeled by a probability distribution and classifications are made according to what distribution fits the data the best. They are a supervised version of general mixture models, in that the predict
, predict_proba
, and predict_log_proba
methods return the same values for the same underlying distributions, but that instead of using expectationmaximization to fit to new data they can use the provided labels directly.
Initialization¶
Bayes classifiers and naive Bayes can both be initialized in one of two ways depending on if you know the parameters of the model beforehand or not, (1) passing in a list of preinitialized distributions to the model, or (2) using the from_samples
class method to initialize the model directly from data. For naive Bayes models on multivariate data, the preinitialized distributions must be a list of IndependentComponentDistribution
objects since each dimension is modeled independently from the others. For Bayes classifiers on multivariate data a list of any type of multivariate distribution can be provided. For univariate data the two models produce identical results, and can be passed in a list of univariate distributions. For example:
from pomegranate import *
d1 = IndependentComponentsDistribution([NormalDistribution(5, 2), NormalDistribution(6, 1), NormalDistribution(9, 1)])
d2 = IndependentComponentsDistribution([NormalDistribution(2, 1), NormalDistribution(8, 1), NormalDistribution(5, 1)])
d3 = IndependentComponentsDistribution([NormalDistribution(3, 1), NormalDistribution(5, 3), NormalDistribution(4, 1)])
model = NaiveBayes([d1, d2, d3])
would create a three class naive Bayes classifier that modeled data with three dimensions. Alternatively, we can initialize a Bayes classifier in the following manner
from pomegranate import *
d1 = MultivariateGaussianDistribution([5, 6, 9], [[2, 0, 0], [0, 1, 0], [0, 0, 1]])
d2 = MultivariateGaussianDistribution([2, 8, 5], [[1, 0, 0], [0, 1, 0], [0, 0, 1]])
d3 = MultivariateGaussianDistribution([3, 5, 4], [[1, 0, 0], [0, 3, 0], [0, 0, 1]])
model = BayesClassifier([d1, d2, d3])
The two examples above functionally create the same model, as the Bayes classifier uses multivariate Gaussian distributions with the same means and a diagonal covariance matrix containing only the variances. However, if we were to fit these models to data later on, the Bayes classifier would learn a full covariance matrix while the naive Bayes would only learn the diagonal.
If we instead wish to initialize our model directly onto data, we use the from_samples
class method.
from pomegranate import *
import numpy
X = numpy.load('data.npy')
y = numpy.load('labels.npy')
model = NaiveBayes.from_samples(NormalDistribution, X, y)
This would create a naive Bayes model directly from the data with normal distributions modeling each of the dimensions, and a number of components equal to the number of classes in y
. Alternatively if we wanted to create a model with different distributions for each dimension we can do the following:
>>> model = NaiveBayes.from_samples([NormalDistribution, ExponentialDistribution], X, y)
This assumes that your data is two dimensional and that you want to model the first distribution as a normal distribution and the second dimension as an exponential distribution.
We can do pretty much the same thing with Bayes classifiers, except passing in a more complex model.
>>> model = BayesClassifier.from_samples(MultivariateGaussianDistribution, X, y)
One can use much more complex models than just a multivariate Gaussian with a full covariance matrix when using a Bayes classifier. Specifically, you can also have your distributions be general mixture models, hidden Markov models, and Bayesian networks. For example:
>>> model = BayesClassifier.from_samples(BayesianNetwork, X, y)
That would require that the data is only discrete valued currently, and the structure learning task may be too long if not set appropriately. However, it is possible. Currently, one cannot simply put in GeneralMixtureModel or HiddenMarkovModel despite them having a from_samples
method because there is a great deal of flexibility in terms of the structure or emission distributions. The easiest way to set up one of these more complex models is to build each of the components separately and then feed them into the Bayes classifier method using the first initialization method.
>>> d1 = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, n_components=5, X=X[y==0])
>>> d2 = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, n_components=5, X=X[y==1])
>>> model = BayesClassifier([d1, d2])
Prediction¶
Bayes classifiers and naive Bayes supports the same three prediction methods that the other models support, predict
, predict_proba
, and predict_log_proba
. These methods return the most likely class given the data (argmax_m P(MD)), the probability of each class given the data (P(MD)), and the log probability of each class given the data (log P(MD)). It is best to always pass in a 2D matrix even for univariate data, where it would have a shape of (n, 1).
The predict
method takes in samples and returns the most likely class given the data.
from pomegranate import *
model = NaiveBayes([NormalDistribution(5, 2), UniformDistribution(0, 10), ExponentialDistribution(1.0)])
model.predict( np.array([[0], [1], [2], [3], [4]]))
[2, 2, 2, 0, 0]
Calling predict_proba
on five samples for a Naive Bayes with univariate components would look like the following.
from pomegranate import *
model = NaiveBayes([NormalDistribution(5, 2), UniformDistribution(0, 10), ExponentialDistribution(1)])
model.predict_proba(np.array([[0], [1], [2], [3], [4]]))
[[ 0.00790443 0.09019051 0.90190506]
[ 0.05455011 0.20207126 0.74337863]
[ 0.21579499 0.33322883 0.45097618]
[ 0.44681566 0.36931382 0.18387052]
[ 0.59804205 0.33973357 0.06222437]]
Multivariate models work the same way.
from pomegranate import *
d1 = MultivariateGaussianDistribution([5, 5], [[1, 0], [0, 1]])
d2 = IndependentComponentsDistribution([NormalDistribution(5, 2), NormalDistribution(5, 2)])
model = BayesClassifier([d1, d2])
clf.predict_proba(np.array([[0, 4],
[1, 3],
[2, 2],
[3, 1],
[4, 0]]))
array([[ 0.00023312, 0.99976688],
[ 0.00220745, 0.99779255],
[ 0.00466169, 0.99533831],
[ 0.00220745, 0.99779255],
[ 0.00023312, 0.99976688]])
predict_log_proba
works the same way, returning the log probabilities instead of the probabilities.
Fitting¶
Both naive Bayes and Bayes classifiers also have a fit
method that updates the parameters of the model based on new data. The major difference between these methods and the others presented is that these are supervised methods and so need to be passed labels in addition to data. This change propagates also to the summarize
method, where labels are provided as well.
from pomegranate import *
d1 = MultivariateGaussianDistribution([5, 5], [[1, 0], [0, 1]])
d2 = IndependentComponentsDistribution(NormalDistribution(5, 2), NormalDistribution(5, 2)])
model = BayesClassifier([d1, d2])
X = np.array([[6.0, 5.0],
[3.5, 4.0],
[7.5, 1.5],
[7.0, 7.0 ]])
y = np.array([0, 0, 1, 1])
model.fit(X, y)
As we can see, there are four samples, with the first two samples labeled as class 0 and the last two samples labeled as class 1. Keep in mind that the training samples must match the input requirements for the models used. So if using a univariate distribution, then each sample must contain one item. A bivariate distribution, two. For hidden markov models, the sample can be a list of observations of any length. An example using hidden markov models would be the following.
d1 = HiddenMarkovModel...
d2 = HiddenMarkovModel...
d3 = HiddenMarkovModel...
model = BayesClassifier([d1, d2, d3])
X = np.array([list('HHHHHTHTHTTTTH'),
list('HHTHHTTHHHHHTH'),
list('TH'),
list('HHHHT')])
y = np.array([2, 2, 1, 0])
model.fit(X, y)
API Reference¶

class
pomegranate.NaiveBayes.
NaiveBayes
¶ A naive Bayes model, a supervised alternative to GMM.
A naive Bayes classifier, that treats each dimension independently from each other. This is a simpler version of the Bayes Classifier, that can use any distribution with any covariance structure, including Bayesian networks and hidden Markov models.
Parameters:  models : list
A list of initialized distributions.
 weights : list or numpy.ndarray or None, default None
The prior probabilities of the components. If None is passed in then defaults to the uniformly distributed priors.
Examples
>>> from pomegranate import * >>> X = [0, 2, 0, 1, 0, 5, 6, 5, 7, 6] >>> y = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1] >>> clf = NaiveBayes.from_samples(NormalDistribution, X, y) >>> clf.predict_proba([6]) array([[0.01973451, 0.98026549]])
>>> from pomegranate import * >>> clf = NaiveBayes([NormalDistribution(1, 2), NormalDistribution(0, 1)]) >>> clf.predict_log_proba([[0], [1], [2], [1]]) array([[1.1836569 , 0.36550972], [0.79437677, 0.60122959], [0.26751248, 1.4493653], [1.09861229, 0.40546511]])
Attributes:  models : list
The model objects, either initialized by the user or fit to data.
 weights : numpy.ndarray
The prior probability of each component of the model.

clear_summaries
()¶ Remove the stored sufficient statistics.
Parameters:  None
Returns:  None

copy
()¶ Return a deep copy of this distribution object.
This object will not be tied to any other distribution or connected in any form.
Parameters:  None
Returns:  distribution : Distribution
A copy of the distribution with the same parameters.

fit
()¶ Fit the Bayes classifier to the data by passing data to its components.
The fit step for a Bayes classifier with purely labeled data is a simple MLE update on the underlying distributions, grouped by the labels. However, in the semisupervised the model is trained on a mixture of both labeled and unlabeled data, where the unlabeled data uses the label 1. In this setting, EM is used to train the model. The model is initialized using the labeled data and then sufficient statistics are gathered for both the labeled and unlabeled data, combined, and used to update the parameters.
Parameters:  X : numpy.ndarray or list
The dataset to operate on. For most models this is a numpy array with columns corresponding to features and rows corresponding to samples. For markov chains and HMMs this will be a list of variable length sequences.
 y : numpy.ndarray or list or None
Data labels for supervised training algorithms.
 weights : arraylike or None, shape (n_samples,), optional
The initial weights of each sample in the matrix. If nothing is passed in then each sample is assumed to be the same weight. Default is None.
 inertia : double, optional
Inertia used for the training the distributions.
 pseudocount : double, optional
A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Default is 0.
 stop_threshold : double, optional, positive
The threshold at which EM will terminate for the improvement of the model. If the model does not improve its fit of the data by a log probability of 0.1 then terminate. Only required if doing semisupervised learning. Default is 0.1.
 max_iterations : int, optional, positive
The maximum number of iterations to run EM for. If this limit is hit then it will terminate training, regardless of how well the model is improving per iteration. Only required if doing semisupervised learning. Default is 1e8.
 callbacks : list, optional
A list of callback objects that describe functionality that should be undertaken over the course of training. Only used for semisupervised learning.
 return_history : bool, optional
Whether to return the history during training as well as the model. Only used for semisupervised learning.
 verbose : bool, optional
Whether or not to print out improvement information over iterations. Only required if doing semisupervised learning. Default is False.
 n_jobs : int
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
Returns:  self : object
Returns the fitted model

freeze
()¶ Freeze the distribution, preventing updates from occurring.

from_json
()¶ Deserialize this object from its JSON representation.

from_samples
()¶ Create a naive Bayes classifier directly from the given dataset.
This will initialize the distributions using maximum likelihood estimates derived by partitioning the dataset using the label vector. If any labels are missing, the model will be trained using EM in a semisupervised setting.
A homogeneous model can be defined by passing in a single distribution callable as the first parameter and specifying the number of components, while a heterogeneous model can be defined by passing in a list of callables of the appropriate type.
A naive Bayes classifier is a subrset of the Bayes classifier in that the math is identical, but the distributions are independent for each feature. Simply put, one can create a multivariate Gaussian Bayes classifier with a full covariance matrix, but a Gaussian naive Bayes would require a diagonal covariance matrix.
Parameters:  distributions : arraylike, shape (n_components,) or callable
The components of the model. This should either be a single callable if all components will be the same distribution, or an array of callables, one for each feature.
 X : arraylike or generator, shape (n_samples, n_dimensions)
This is the data to train on. Each row is a sample, and each column is a dimension to train on.
 y : arraylike, shape (n_samples,)
The labels for each sample. The labels should be integers between 0 and k1 for a problem with k classes, or 1 if the label is not known for that sample.
 weights : arraylike, shape (n_samples,), optional
The initial weights of each sample in the matrix. If nothing is passed in then each sample is assumed to be the same weight. Default is None.
 pseudocount : double, optional, positive
A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Only effects mixture models defined over discrete distributions. Default is 0.
 stop_threshold : double, optional, positive
The threshold at which EM will terminate for the improvement of the model. If the model does not improve its fit of the data by a log probability of 0.1 then terminate. Only required if doing semisupervised learning. Default is 0.1.
 max_iterations : int, optional, positive
The maximum number of iterations to run EM for. If this limit is hit then it will terminate training, regardless of how well the model is improving per iteration. Only required if doing semisupervised learning. Default is 1e8.
 callbacks : list, optional
A list of callback objects that describe functionality that should be undertaken over the course of training.
 return_history : bool, optional
Whether to return the history during training as well as the model.
 verbose : bool, optional
Whether or not to print out improvement information over iterations. Only required if doing semisupervised learning. Default is False.
 n_jobs : int
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. Default is 1.
Returns:  model : NaiveBayes
The fit naive Bayes model.

from_summaries
()¶ Fit the model to the collected sufficient statistics.
Fit the parameters of the model to the sufficient statistics gathered during the summarize calls. This should return an exact update.
Parameters:  inertia : double, optional
The weight of the previous parameters of the model. The new parameters will roughly be old_param*inertia + new_param*(1inertia), so an inertia of 0 means ignore the old parameters, whereas an inertia of 1 means ignore the new parameters. Default is 0.0.
 pseudocount : double, optional
A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. If discrete data, will smooth both the prior probabilities of each component and the emissions of each component. Otherwise, will only smooth the prior probabilities of each component. Default is 0.
Returns:  None

from_yaml
()¶ Deserialize this object from its YAML representation.

log_probability
()¶ Calculate the log probability of a point under the distribution.
The probability of a point is the sum of the probabilities of each distribution multiplied by the weights. Thus, the log probability is the sum of the log probability plus the log prior.
This is the python interface.
Parameters:  X : numpy.ndarray, shape=(n, d) or (n, m, d)
The samples to calculate the log probability of. Each row is a sample and each column is a dimension. If emissions are HMMs then shape is (n, m, d) where m is variable length for each observation, and X becomes an array of n (m, d)shaped arrays.
 n_jobs : int, optional
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
 batch_size: int or None, optional
The size of the batches to make predictions on. Passing in None means splitting the data set evenly among the number of jobs. Default is None.
Returns:  log_probability : double
The log probability of the point under the distribution.

predict
()¶ Predict the most likely component which generated each sample.
Calculate the posterior P(MD) for each sample and return the index of the component most likely to fit it. This corresponds to a simple argmax over the responsibility matrix.
This is a sklearn wrapper for the maximum_a_posteriori method.
Parameters:  X : arraylike, shape (n_samples, n_dimensions)
The samples to do the prediction on. Each sample is a row and each column corresponds to a dimension in that sample. For univariate distributions, a single array may be passed in.
 n_jobs : int
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
 batch_size: int or None, optional
The size of the batches to make predictions on. Passing in None means splitting the data set evenly among the number of jobs. Default is None.
Returns:  y : arraylike, shape (n_samples,)
The predicted component which fits the sample the best.

predict_log_proba
()¶ Calculate the posterior log P(MD) for data.
Calculate the log probability of each item having been generated from each component in the model. This returns normalized log probabilities such that the probabilities should sum to 1
This is a sklearn wrapper for the original posterior function.
Parameters:  X : arraylike, shape (n_samples, n_dimensions)
The samples to do the prediction on. Each sample is a row and each column corresponds to a dimension in that sample. For univariate distributions, a single array may be passed in.
 n_jobs : int
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
 batch_size: int or None, optional
The size of the batches to make predictions on. Passing in None means splitting the data set evenly among the number of jobs. Default is None.
Returns:  y : arraylike, shape (n_samples, n_components)
The normalized log probability log P(MD) for each sample. This is the probability that the sample was generated from each component.

predict_proba
()¶ Calculate the posterior P(MD) for data.
Calculate the probability of each item having been generated from each component in the model. This returns normalized probabilities such that each row should sum to 1.
Since calculating the log probability is much faster, this is just a wrapper which exponentiates the log probability matrix.
Parameters:  X : arraylike, shape (n_samples, n_dimensions)
The samples to do the prediction on. Each sample is a row and each column corresponds to a dimension in that sample. For univariate distributions, a single array may be passed in.
 n_jobs : int
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
 batch_size: int or None, optional
The size of the batches to make predictions on. Passing in None means splitting the data set evenly among the number of jobs. Default is None.
Returns:  probability : arraylike, shape (n_samples, n_components)
The normalized probability P(MD) for each sample. This is the probability that the sample was generated from each component.

probability
()¶ Return the probability of the given symbol under this distribution.
Parameters:  symbol : object
The symbol to calculate the probability of
Returns:  probability : double
The probability of that point under the distribution.

sample
()¶ Generate a sample from the model.
First, randomly select a component weighted by the prior probability, Then, use the sample method from that component to generate a sample.
Parameters:  n : int, optional
The number of samples to generate. Defaults to 1.
 random_state : int, numpy.random.RandomState, or None
The random state used for generating samples. If set to none, a random seed will be used. If set to either an integer or a random seed, will produce deterministic outputs.
Returns:  sample : arraylike or object
A randomly generated sample from the model of the type modelled by the emissions. An integer if using most distributions, or an array if using multivariate ones, or a string for most discrete distributions. If n=1 return an object, if n>1 return an array of the samples.

score
()¶ Return the accuracy of the model on a data set.
Parameters:  X : numpy.ndarray, shape=(n, d)
The values of the data set
 y : numpy.ndarray, shape=(n,)
The labels of each value

summarize
()¶ Summarize data into stored sufficient statistics for outofcore training.
Parameters:  X : arraylike, shape (n_samples, variable)
Array of the samples, which can be either fixed size or variable depending on the underlying components.
 y : arraylike, shape (n_samples,)
Array of the known labels as integers
 weights : arraylike, shape (n_samples,) optional
Array of the weight of each sample, a positive float
Returns:  None

thaw
()¶ Thaw the distribution, reallowing updates to occur.

to_json
()¶ Serialize the model to JSON.
Parameters:  separators : tuple, optional
The two separators to pass to the json.dumps function for formatting. Default is (‘,’, ‘ : ‘).
 indent : int, optional
The indentation to use at each level. Passed to json.dumps for formatting. Default is 4.
Returns:  json : str
A properly formatted JSON object.

to_yaml
()¶ Serialize the model to YAML for compactness.

class
pomegranate.BayesClassifier.
BayesClassifier
¶ A Bayes classifier, a more general form of a naive Bayes classifier.
A Bayes classifier, like a naive Bayes classifier, uses Bayes’ rule in order to calculate the posterior probability of the classes, which are used for the predictions. However, a naive Bayes classifier assumes that each of the features are independent of each other and so can be modelled as independent distributions. A generalization of that, the Bayes classifier, allows for an arbitrary covariance between the features. This allows for more complicated components to be used, up to and including even HMMs to form a classifier over sequences, or mixtures to form a classifier with complex emissions.
Parameters:  models : list
A list of initialized distribution objects to use as the components in the model.
 weights : list or numpy.ndarray or None, default None
The prior probabilities of the components. If None is passed in then defaults to the uniformly distributed priors.
Examples
>>> from pomegranate import * >>> >>> d1 = NormalDistribution(3, 2) >>> d2 = NormalDistribution(5, 1.5) >>> >>> clf = BayesClassifier([d1, d2]) >>> clf.predict_proba([[6]]) array([[ 0.2331767, 0.7668233]]) >>> X = [[0], [2], [0], [1], [0], [5], [6], [5], [7], [6]] >>> y = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1] >>> clf.fit(X, y) >>> clf.predict_proba([[6]]) array([[ 0.01973451, 0.98026549]])
Attributes:  models : list
The model objects, either initialized by the user or fit to data.
 weights : numpy.ndarray
The prior probability of each component of the model.

clear_summaries
()¶ Remove the stored sufficient statistics.
Parameters:  None
Returns:  None

copy
()¶ Return a deep copy of this distribution object.
This object will not be tied to any other distribution or connected in any form.
Parameters:  None
Returns:  distribution : Distribution
A copy of the distribution with the same parameters.

fit
()¶ Fit the Bayes classifier to the data by passing data to its components.
The fit step for a Bayes classifier with purely labeled data is a simple MLE update on the underlying distributions, grouped by the labels. However, in the semisupervised the model is trained on a mixture of both labeled and unlabeled data, where the unlabeled data uses the label 1. In this setting, EM is used to train the model. The model is initialized using the labeled data and then sufficient statistics are gathered for both the labeled and unlabeled data, combined, and used to update the parameters.
Parameters:  X : numpy.ndarray or list
The dataset to operate on. For most models this is a numpy array with columns corresponding to features and rows corresponding to samples. For markov chains and HMMs this will be a list of variable length sequences.
 y : numpy.ndarray or list or None
Data labels for supervised training algorithms.
 weights : arraylike or None, shape (n_samples,), optional
The initial weights of each sample in the matrix. If nothing is passed in then each sample is assumed to be the same weight. Default is None.
 inertia : double, optional
Inertia used for the training the distributions.
 pseudocount : double, optional
A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Default is 0.
 stop_threshold : double, optional, positive
The threshold at which EM will terminate for the improvement of the model. If the model does not improve its fit of the data by a log probability of 0.1 then terminate. Only required if doing semisupervised learning. Default is 0.1.
 max_iterations : int, optional, positive
The maximum number of iterations to run EM for. If this limit is hit then it will terminate training, regardless of how well the model is improving per iteration. Only required if doing semisupervised learning. Default is 1e8.
 callbacks : list, optional
A list of callback objects that describe functionality that should be undertaken over the course of training. Only used for semisupervised learning.
 return_history : bool, optional
Whether to return the history during training as well as the model. Only used for semisupervised learning.
 verbose : bool, optional
Whether or not to print out improvement information over iterations. Only required if doing semisupervised learning. Default is False.
 n_jobs : int
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
Returns:  self : object
Returns the fitted model

freeze
()¶ Freeze the distribution, preventing updates from occurring.

from_json
()¶ Deserialize this object from its JSON representation.

from_samples
()¶ Create a Bayes classifier directly from the given dataset.
This will initialize the distributions using maximum likelihood estimates derived by partitioning the dataset using the label vector. If any labels are missing, the model will be trained using EM in a semisupervised setting.
A homogeneous model can be defined by passing in a single distribution callable as the first parameter and specifying the number of components, while a heterogeneous model can be defined by passing in a list of callables of the appropriate type.
A Bayes classifier is a superset of the naive Bayes classifier in that the math is identical, but the distributions used do not have to be independent for each feature. Simply put, one can create a multivariate Gaussian Bayes classifier with a full covariance matrix, but a Gaussian naive Bayes would require a diagonal covariance matrix.
Parameters:  distributions : arraylike, shape (n_components,) or callable
The components of the model. This should either be a single callable if all components will be the same distribution, or an array of callables, one for each feature.
 X : arraylike, shape (n_samples, n_dimensions)
This is the data to train on. Each row is a sample, and each column is a dimension to train on.
 y : arraylike, shape (n_samples,)
The labels for each sample. The labels should be integers between 0 and k1 for a problem with k classes, or 1 if the label is not known for that sample.
 weights : arraylike, shape (n_samples,), optional
The initial weights of each sample in the matrix. If nothing is passed in then each sample is assumed to be the same weight. Default is None.
 inertia : double, optional
Inertia used for the training the distributions.
 pseudocount : double, optional
A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Default is 0.
 stop_threshold : double, optional, positive
The threshold at which EM will terminate for the improvement of the model. If the model does not improve its fit of the data by a log probability of 0.1 then terminate. Only required if doing semisupervised learning. Default is 0.1.
 max_iterations : int, optional, positive
The maximum number of iterations to run EM for. If this limit is hit then it will terminate training, regardless of how well the model is improving per iteration. Only required if doing semisupervised learning. Default is 1e8.
 callbacks : list, optional
A list of callback objects that describe functionality that should be undertaken over the course of training.
 return_history : bool, optional
Whether to return the history during training as well as the model.
 keys : list
A list of sets where each set is the keys present in that column. If there are d columns in the data set then this list should have d sets and each set should have at least two keys in it.
 verbose : bool, optional
Whether or not to print out improvement information over iterations. Only required if doing semisupervised learning. Default is False.
 n_jobs : int, optional
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
 **kwargs : dict, optional
Any arguments to pass into the from_samples methods of other objects that are being created such as BayesianNetworks or HMMs.
Returns:  model : BayesClassifier
The fit Bayes classifier model.

from_summaries
()¶ Fit the model to the collected sufficient statistics.
Fit the parameters of the model to the sufficient statistics gathered during the summarize calls. This should return an exact update.
Parameters:  inertia : double, optional
The weight of the previous parameters of the model. The new parameters will roughly be old_param*inertia + new_param*(1inertia), so an inertia of 0 means ignore the old parameters, whereas an inertia of 1 means ignore the new parameters. Default is 0.0.
 pseudocount : double, optional
A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. If discrete data, will smooth both the prior probabilities of each component and the emissions of each component. Otherwise, will only smooth the prior probabilities of each component. Default is 0.
Returns:  None

from_yaml
()¶ Deserialize this object from its YAML representation.

log_probability
()¶ Calculate the log probability of a point under the distribution.
The probability of a point is the sum of the probabilities of each distribution multiplied by the weights. Thus, the log probability is the sum of the log probability plus the log prior.
This is the python interface.
Parameters:  X : numpy.ndarray, shape=(n, d) or (n, m, d)
The samples to calculate the log probability of. Each row is a sample and each column is a dimension. If emissions are HMMs then shape is (n, m, d) where m is variable length for each observation, and X becomes an array of n (m, d)shaped arrays.
 n_jobs : int, optional
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
 batch_size: int or None, optional
The size of the batches to make predictions on. Passing in None means splitting the data set evenly among the number of jobs. Default is None.
Returns:  log_probability : double
The log probability of the point under the distribution.

predict
()¶ Predict the most likely component which generated each sample.
Calculate the posterior P(MD) for each sample and return the index of the component most likely to fit it. This corresponds to a simple argmax over the responsibility matrix.
This is a sklearn wrapper for the maximum_a_posteriori method.
Parameters:  X : arraylike, shape (n_samples, n_dimensions)
The samples to do the prediction on. Each sample is a row and each column corresponds to a dimension in that sample. For univariate distributions, a single array may be passed in.
 n_jobs : int
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
 batch_size: int or None, optional
The size of the batches to make predictions on. Passing in None means splitting the data set evenly among the number of jobs. Default is None.
Returns:  y : arraylike, shape (n_samples,)
The predicted component which fits the sample the best.

predict_log_proba
()¶ Calculate the posterior log P(MD) for data.
Calculate the log probability of each item having been generated from each component in the model. This returns normalized log probabilities such that the probabilities should sum to 1
This is a sklearn wrapper for the original posterior function.
Parameters:  X : arraylike, shape (n_samples, n_dimensions)
The samples to do the prediction on. Each sample is a row and each column corresponds to a dimension in that sample. For univariate distributions, a single array may be passed in.
 n_jobs : int
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
 batch_size: int or None, optional
The size of the batches to make predictions on. Passing in None means splitting the data set evenly among the number of jobs. Default is None.
Returns:  y : arraylike, shape (n_samples, n_components)
The normalized log probability log P(MD) for each sample. This is the probability that the sample was generated from each component.

predict_proba
()¶ Calculate the posterior P(MD) for data.
Calculate the probability of each item having been generated from each component in the model. This returns normalized probabilities such that each row should sum to 1.
Since calculating the log probability is much faster, this is just a wrapper which exponentiates the log probability matrix.
Parameters:  X : arraylike, shape (n_samples, n_dimensions)
The samples to do the prediction on. Each sample is a row and each column corresponds to a dimension in that sample. For univariate distributions, a single array may be passed in.
 n_jobs : int
The number of jobs to use to parallelize, either the number of threads or the number of processes to use. 1 means use all available resources. Default is 1.
 batch_size: int or None, optional
The size of the batches to make predictions on. Passing in None means splitting the data set evenly among the number of jobs. Default is None.
Returns:  probability : arraylike, shape (n_samples, n_components)
The normalized probability P(MD) for each sample. This is the probability that the sample was generated from each component.

probability
()¶ Return the probability of the given symbol under this distribution.
Parameters:  symbol : object
The symbol to calculate the probability of
Returns:  probability : double
The probability of that point under the distribution.

sample
()¶ Generate a sample from the model.
First, randomly select a component weighted by the prior probability, Then, use the sample method from that component to generate a sample.
Parameters:  n : int, optional
The number of samples to generate. Defaults to 1.
 random_state : int, numpy.random.RandomState, or None
The random state used for generating samples. If set to none, a random seed will be used. If set to either an integer or a random seed, will produce deterministic outputs.
Returns:  sample : arraylike or object
A randomly generated sample from the model of the type modelled by the emissions. An integer if using most distributions, or an array if using multivariate ones, or a string for most discrete distributions. If n=1 return an object, if n>1 return an array of the samples.

score
()¶ Return the accuracy of the model on a data set.
Parameters:  X : numpy.ndarray, shape=(n, d)
The values of the data set
 y : numpy.ndarray, shape=(n,)
The labels of each value

summarize
()¶ Summarize data into stored sufficient statistics for outofcore training.
Parameters:  X : arraylike, shape (n_samples, variable)
Array of the samples, which can be either fixed size or variable depending on the underlying components.
 y : arraylike, shape (n_samples,)
Array of the known labels as integers
 weights : arraylike, shape (n_samples,) optional
Array of the weight of each sample, a positive float
Returns:  None

thaw
()¶ Thaw the distribution, reallowing updates to occur.

to_json
()¶ Serialize the model to JSON.
Parameters:  separators : tuple, optional
The two separators to pass to the json.dumps function for formatting. Default is (‘,’, ‘ : ‘).
 indent : int, optional
The indentation to use at each level. Passed to json.dumps for formatting. Default is 4.
Returns:  json : str
A properly formatted JSON object.

to_yaml
()¶ Serialize the model to YAML for compactness.