Bayesian Networks¶

Bayesian networks are a probabilistic model that are especially good at inference given incomplete data. Much like a hidden Markov model, they consist of a directed graphical model (though Bayesian networks must also be acyclic) and a set of probability distributions. The edges encode dependency statements between the variables, where the lack of an edge between any pair of variables indicates a conditional independence. Each node encodes a probability distribution, where root nodes encode univariate probability distributions and inner/leaf nodes encode conditional probability distributions. Bayesian networks are exceptionally flexible when doing inference, as any subset of variables can be observed, and inference done over all other variables, without needing to define these groups in advance. In fact, the set of observed variables can change from one sample to the next without needing to modify the underlying algorithm at all.

Currently, pomegranate only supports discrete Bayesian networks, meaning that the values must be categories, i.e. ‘apples’ and ‘oranges’, or 1 and 2, where 1 and 2 refer to categories, not numbers, and so 2 is not explicitly ‘bigger’ than 1.

Initialization¶

Bayesian networks can be initialized in two ways, depending on whether the underlying graphical structure is known or not: (1) the graphical structure can be built one node at a time with pre-initialized distributions set for each node, or (2) both the graphical structure and distributions can be learned directly from data. This mirrors the other models that are implemented in pomegranate. However, typically expectation maximization is used to fit the parameters of the distribution, and so initialization (such as through k-means) is typically fast whereas fitting is slow. For Bayesian networks, the opposite is the case. Fitting can be done quickly by just summing counts through the data, while initialization is hard as it requires an exponential time search through all possible DAGs to identify the optimal graph. More is discussed in the tutorials above and in the fitting section below.

Let’s take a look at initializing a Bayesian network in the first manner by quickly implementing the Monty Hall problem. The Monty Hall problem arose from the gameshow Let’s Make a Deal, where a guest had to choose which one of three doors had a prize behind it. The twist was that after the guest chose, the host, originally Monty Hall, would then open one of the doors the guest did not pick and ask if the guest wanted to switch which door they had picked. Initial inspection may lead you to believe that if there are only two doors left, there is a 50-50 chance of you picking the right one, and so there is no advantage one way or the other. However, it has been proven both through simulations and analytically that there is in fact a 66% chance of getting the prize if the guest switches their door, regardless of the door they initially went with.

Our network will have three nodes, one for the guest, one for the prize, and one for the door Monty chooses to open. The door the guest initially chooses and the door the prize is behind are uniform random processes across the three doors, but the door which Monty opens is dependent on both the door the guest chooses (it cannot be the door the guest chooses), and the door the prize is behind (it cannot be the door with the prize behind it).

from pomegranate import *

guest = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
prize = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
monty = ConditionalProbabilityTable(
[['A', 'A', 'A', 0.0],
['A', 'A', 'B', 0.5],
['A', 'A', 'C', 0.5],
['A', 'B', 'A', 0.0],
['A', 'B', 'B', 0.0],
['A', 'B', 'C', 1.0],
['A', 'C', 'A', 0.0],
['A', 'C', 'B', 1.0],
['A', 'C', 'C', 0.0],
['B', 'A', 'A', 0.0],
['B', 'A', 'B', 0.0],
['B', 'A', 'C', 1.0],
['B', 'B', 'A', 0.5],
['B', 'B', 'B', 0.0],
['B', 'B', 'C', 0.5],
['B', 'C', 'A', 1.0],
['B', 'C', 'B', 0.0],
['B', 'C', 'C', 0.0],
['C', 'A', 'A', 0.0],
['C', 'A', 'B', 1.0],
['C', 'A', 'C', 0.0],
['C', 'B', 'A', 1.0],
['C', 'B', 'B', 0.0],
['C', 'B', 'C', 0.0],
['C', 'C', 'A', 0.5],
['C', 'C', 'B', 0.5],
['C', 'C', 'C', 0.0]], [guest, prize])

s1 = Node(guest, name="guest")
s2 = Node(prize, name="prize")
s3 = Node(monty, name="monty")

model = BayesianNetwork("Monty Hall Problem")
model.bake()

Note

The objects ‘state’ and ‘node’ are really the same thing and can be used interchangeable. The only difference is the name, as hidden Markov models use ‘state’ in the literature frequently whereas Bayesian networks use ‘node’ frequently.

The conditional distribution must be explicitly spelled out in this example, followed by a list of the parents in the same order as the columns take in the table that is provided (e.g. the columns in the table correspond to guest, prize, monty, probability.)

However, one can also initialize a Bayesian network based completely on data. As mentioned before, the exact version of this algorithm takes exponential time with the number of variables and typically can’t be done on more than ~25 variables. This is because there are a super-exponential number of directed acyclic graphs that one could define over a set of variables, but fortunately one can use dynamic programming in order to reduce this complexity down to “simply exponential.” The implementation of the exact algorithm actually goes further than the original dynamic programming algorithm by implementing an A* search to somewhat reduce computational time but drastically reduce required memory, sometimes by an order of magnitude.

from pomegranate import *
import numpy

model = BayesianNetwork.from_samples(X, algorithm='exact')

The exact algorithm is not the default, though. The default is a novel greedy algorithm that greedily chooses a topological ordering of the variables, but optimally identifies the best parents for each variable given this ordering. It is significantly faster and more memory efficient than the exact algorithm and produces far better estimates than using a Chow-Liu tree. This is set to the default to avoid locking up the computers of users that unintentionally tell their computers to do a near-impossible task.

Probability¶

You can calculate the probability of a sample under a Bayesian network as the product of the probability of each variable given its parents, if it has any. This can be expressed as $$P = \prod\limits_{i=1}^{d} P(D_{i}|Pa_{i})$$ for a sample with $d$ dimensions. For example, in the Monty Hal problem, the probability of a show is the probability of the guest choosing the respective door, times the probability of the prize being behind a given door, times the probability of Monty opening a given door given the previous two values. For example, using the manually initialized network above:

>>> print(model.probability([['A', 'A', 'A'],
['A', 'A', 'B'],
['C', 'C', 'B']]))
[ 0.          0.05555556  0.05555556]

Prediction¶

Bayesian networks are frequently used to infer/impute the value of missing variables given the observed values. In other models, typically there is either a single or fixed set of missing variables, such as latent factors, that need to be imputed, and so returning a fixed vector or matrix as the predictions makes sense. However, in the case of Bayesian networks, we can make no such assumptions, and so when data is passed in for prediction it should be in the format as a matrix with None in the missing variables that need to be inferred. The return is thus a filled in matrix where the Nones have been replaced with the imputed values. For example:

>>> print(model.predict([['A', 'B', None],
['A', 'C', None],
['C', 'B', None]]))
[['A' 'B' 'C']
['A' 'C' 'B']
['C' 'B' 'A']]

In this example, the final column is the one that is always missing, but a more complex example is as follows:

>>> print(model.predict([['A', 'B', None],
['A', None, 'C'],
[None, 'B', 'A']]))
[['A' 'B' 'C']
['A' 'B' 'C']
['C' 'B' 'A']]

Fitting¶

Fitting a Bayesian network to data is a fairly simple process. Essentially, for each variable, you need consider only that column of data and the columns corresponding to that variables parents. If it is a univariate distribution, then the maximum likelihood estimate is just the count of each symbol divided by the number of samples in the data. If it is a multivariate distribution, it ends up being the probability of each symbol in the variable of interest given the combination of symbols in the parents. For example, consider a binary dataset with two variables, X and Y, where X is a parent of Y. First, we would go through the dataset and calculate P(X=0) and P(X=1). Then, we would calculate P(Y=0|X=0), P(Y=1|X=0), P(Y=0|X=1), and P(Y=1|X=1). Those values encode all of the parameters of the Bayesian network.

API Reference¶

class pomegranate.BayesianNetwork.BayesianNetwork

A Bayesian Network Model.

A Bayesian network is a directed graph where nodes represent variables, edges represent conditional dependencies of the children on their parents, and the lack of an edge represents a conditional independence.

Parameters
namestr, optional

The name of the model. Default is None

Attributes
stateslist, shape (n_states,)

A list of all the state objects in the model

graphnetworkx.DiGraph

The underlying graph object.

Add a transition from state a to state b which indicates that B is dependent on A in ways specified by the distribution.

Add a node to the graph.

Add multiple states to the graph.

Another name for a node.

Another name for a node.

Transitions and edges are the same.

bake()

Finalize the topology of the model.

Assign a numerical index to every state and create the underlying arrays corresponding to the states and edges between the states. This method must be called before any of the probability-calculating methods. This includes converting conditional probability tables into joint probability tables and creating a list of both marginal and table nodes.

Parameters
None
Returns
None
clear_summaries()

Clear the summary statistics stored in the object.

copy()

Return a deep copy of this distribution object.

This object will not be tied to any other distribution or connected in any form.

Parameters
None
Returns
distributionDistribution

A copy of the distribution with the same parameters.

dense_transition_matrix()

Returns the dense transition matrix. Useful if the transitions of somewhat small models need to be analyzed.

edge_count()

Returns the number of edges present in the model.

fit()

Fit the model to data using MLE estimates.

Fit the model to the data by updating each of the components of the model, which are univariate or multivariate distributions. This uses a simple MLE estimate to update the distributions according to their summarize or fit methods.

This is a wrapper for the summarize and from_summaries methods.

Parameters
Xarray-like or generator, shape (n_samples, n_nodes)

The data to train on, where each row is a sample and each column corresponds to the associated variable.

weightsarray-like, shape (n_nodes), optional

The weight of each sample as a positive double. Default is None.

The inertia for updating the distributions, passed along to the distribution method. Default is 0.0.

pseudocountdouble, optional

A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Only effects hidden Markov models defined over discrete distributions. Default is 0.

verbosebool, optional

Whether or not to print out improvement information over iterations. Only required if doing semisupervised learning. Default is False.

n_jobsint

The number of jobs to use to parallelize, either the number of threads or the number of processes to use. -1 means use all available resources. Default is 1.

Returns
selfBayesianNetwork

The fit Bayesian network object with updated model parameters.

freeze()

Freeze the distribution, preventing updates from occurring.

from_dict()

Deserialize this object from a dictionary of parameters.

from_json()

Deserialize this object from its JSON representation.

Parameters
sstr

A JSON formatted string containing the file.

Returns
modelobject

A properly initialized and baked model.

from_samples()

Learn the structure of the network from data.

There are currently two types of approaches implemented. The first, the Chow-Liu algorithm, finds a tree-like structure from symmetric mutual-information scores given a root node (the root parameter). The second type searches through structures and returns the structure that maximizes the following objective function:

P(D|M) + penalty * |M|

where P(D|M) is the probability of the data given the found model, penalty is a user-specified parameters, and |M| is the number of parameters in the model. When this penalty is log2(|D|) / 2 (the default) where |D| is the weight sum of the examples, this is equivalent to the minimum description length (MDL).

There are currently three ways that the learned structure can be controlled. The first is to increase the penalty term to increase sparsity. The second is to pass in a specified list of edges that must exist (include_edges) or cannot exist (exclude_edges). Lastly, a constraint graph can be specified where each node in the graph is a set of variables being modeled and the edges in the graph indicate which sets of variables can be parents to which other sets of variables (and where a self-loop is the normal structure learning step).

Parameters
Xarray-like or generator, shape (n_samples, n_nodes)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsarray-like, shape (n_samples), optional

The weight of each sample as a positive double. Default is None.

algorithmstr, one of ‘chow-liu’, ‘greedy’, ‘exact’, ‘exact-dp’ optional

The algorithm to use for learning the Bayesian network. Default is ‘greedy’ that greedily attempts to find the best structure, and frequently can identify the optimal structure. ‘exact’ uses DP/A* to find the optimal Bayesian network, and ‘exact-dp’ tries to find the shortest path on the entire order lattice, which is more memory and computationally expensive. ‘exact’ and ‘exact-dp’ should give identical results, with ‘exact-dp’ remaining an option mostly for debugging reasons. ‘chow-liu’ will return the optimal tree-like structure for the Bayesian network, which is a very fast approximation but not always the best network.

max_parentsint, optional

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

rootint, optional

For algorithms which require a single root (‘chow-liu’), this is the root for which all edges point away from. User may specify which column to use as the root. Default is the first column.

constraint_graphnetworkx.DiGraph or None, optional

A directed graph showing valid parent sets for each variable. Each node is a set of variables, and edges represent which variables can be valid parents of those variables. The naive structure learning task is just all variables in a single node with a self edge, meaning that you know nothing about

include_edgeslist or None, optional

A list of (parent, child) tuples that are edges which must be present in the found structure. Default is None.

exclude_edgeslist or None, optional

A list of (parent, child) tuples that are edges which cannot be present in the found structure. Default is None.

pseudocountdouble, optional

A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Default is 0.

state_namesarray-like, shape (n_nodes), optional

A list of meaningful names to be applied to nodes

namestr, optional

The name of the model. Default is None.

reduce_datasetbool, optional

Given the discrete nature of these datasets, frequently a user will pass in a dataset that has many identical samples. It is time consuming to go through these redundant samples and a far more efficient use of time to simply calculate a new dataset comprised of the subset of unique observed samples weighted by the number of times they occur in the dataset. This typically will speed up all algorithms, including when using a constraint graph. Default is True.

keyslist, optional

A list of sets where each set is the keys present in that column. If there are d columns in the data set then this list should have d sets and each set should have at least two keys in it. Default is None.

low_memorybool or None, optional

Whether to use a low-memory version of the search algorithm. This option only affects algorithm=”greedy” and algorithm=”exact”. Although the low-memory version of both the greedy and exact algorithms will use less memory, it will also significantly slow down the exact algorithm. However, setting this to True will also significantly speed up the greedy algorithm. Setting this value to None will enable it when algorithm=”greedy” and disable it otherwise. Default is None.

n_jobsint, optional

The number of threads to use when learning the structure of the network. If a constraint graph is provided, this will parallelize the tasks as directed by the constraint graph. If one is not provided it will parallelize the building of the parent graphs. Both cases will provide large speed gains.

Returns
modelBayesianNetwork

The learned BayesianNetwork.

from_structure()

Return a Bayesian network from a predefined structure.

Pass in the structure of the network as a tuple of tuples and get a fit network in return. The tuple should contain n tuples, with one for each node in the graph. Each inner tuple should be of the parents for that node. For example, a three node graph where both node 0 and 1 have node 2 as a parent would be specified as ((2,), (2,), ()).

Parameters
Xarray-like, shape (n_samples, n_nodes)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

structuretuple of tuples

The parents for each node in the graph. If a node has no parents, then do not specify any parents.

weightsarray-like, shape (n_nodes), optional

The weight of each sample as a positive double. Default is None.

pseudocountdouble, optional

A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Default is 0.

namestr, optional

The name of the model. Default is None.

state_namesarray-like, shape (n_nodes), optional

A list of meaningful names to be applied to nodes

keyslist

A list of sets where each set is the keys present in that column. If there are d columns in the data set then this list should have d sets and each set should have at least two keys in it.

Returns
modelBayesianNetwork

A Bayesian network with the specified structure.

from_summaries()

Use MLE on the stored sufficient statistics to train the model.

This uses MLE estimates on the stored sufficient statistics to train the model.

Parameters

The inertia for updating the distributions, passed along to the distribution method. Default is 0.0.

pseudocountdouble, optional

A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Default is 0.

Returns
None
from_yaml()

Deserialize this object from its YAML representation.

log_probability()

Return the log probability of samples under the Bayesian network.

The log probability is just the sum of the log probabilities under each of the components. The log probability of a sample under the graph A -> B is just P(A)*P(B|A). This will return a vector of log probabilities, one for each sample.

Parameters
Xarray-like, shape (n_samples, n_dim)

The sample is a vector of points where each dimension represents the same variable as added to the graph originally. It doesn’t matter what the connections between these variables are, just that they are all ordered the same.

check_inputbool, optional

Check to make sure that the observed symbol is a valid symbol for that distribution to produce. Default is True.

n_jobsint

The number of jobs to use to parallelize, either the number of threads or the number of processes to use. -1 means use all available resources. Default is 1.

Returns
logpnumpy.ndarray or double

The log probability of the samples if many, or the single log probability.

marginal()

Return the marginal probabilities of each variable in the graph.

This is equivalent to a pass of belief propagation on a graph where no data has been given. This will calculate the probability of each variable being in each possible emission when nothing is known.

Parameters
None
Returns
marginalsarray-like, shape (n_nodes)

An array of univariate distribution objects showing the marginal probabilities of that variable.

node_count()

Returns the number of nodes/states in the model

plot()

Draw this model’s graph using pygraphviz and matplotlib.

If no filename, it uses pygraphviz to write a temporary png file, and matplotlib to imshow() it. If using jupyter or IPython, enable %matplotlib inline and this will immediately display your graph. Otherwise, per usual matplotlib convention, you have to issue a plt.show() or matplotlib.pyplot.show() to open a window with the image.

TODO: Use svg or pdf for original image. Jupyter and IPython can render SVG directly, e.g. from IPython.display import SVG and SVG(filename=…).

Parameters
filenamestr, optional

Filename for saving the .pdf graph. Default is None

Returns
None
predict()

Predict missing values of a data matrix using MLE.

Impute the missing values of a data matrix using the maximally likely predictions according to the forward-backward algorithm. Run each sample through the algorithm (predict_proba) and replace missing values with the maximally likely predicted emission.

Parameters
Xarray-like, shape (n_samples, n_nodes)

Data matrix to impute. Missing values must be either None (if lists) or np.nan (if numpy.ndarray). Will fill in these values with the maximally likely ones.

max_iterationsint, optional

Number of iterations to run loopy belief propagation for. Default is 100.

check_inputbool, optional

Check to make sure that the observed symbol is a valid symbol for that distribution to produce. Default is True.

n_jobsint

The number of jobs to use to parallelize, either the number of threads or the number of processes to use. -1 means use all available resources. Default is 1.

Returns
y_hatnumpy.ndarray, shape (n_samples, n_nodes)

This is the data matrix with the missing values imputed.

predict_proba()

Returns the probabilities of each variable in the graph given evidence.

This calculates the marginal probability distributions for each state given the evidence provided through loopy belief propagation. Loopy belief propagation is an approximate algorithm which is exact for certain graph structures.

Parameters
Xdict or array-like, shape <= n_nodes

The evidence supplied to the graph. This can either be a dictionary with keys being state names and values being the observed values (either the emissions or a distribution over the emissions) or an array with the values being ordered according to the nodes incorporation in the graph (the order fed into .add_states/add_nodes) and None for variables which are unknown. It can also be vectorized, so a list of dictionaries can be passed in where each dictionary is a single sample, or a list of lists where each list is a single sample, both formatted as mentioned before.

max_iterationsint, optional

The number of iterations with which to do loopy belief propagation. Usually requires only 1. Default is 100.

check_inputbool, optional

Check to make sure that the observed symbol is a valid symbol for that distribution to produce. Default is True.

n_jobsint, optional

The number of threads to use when parallelizing the job. This parameter is passed directly into joblib. Default is 1, indicating no parallelism.

Returns
y_hatarray-like, shape (n_samples, n_nodes)

An array of univariate distribution objects showing the probabilities of each variable.

probability()

Return the probability of the given symbol under this distribution.

Parameters
symbolobject

The symbol to calculate the probability of

Returns
probabilitydouble

The probability of that point under the distribution.

sample()

Sample the network, optionally given some evidences Use rejection to condition on non marginal nodes

Parameters
nint, optional

The number of samples to generate. Defaults to 1.

evidenceslist of dict, optional

Evidence to set constant while samples are generated.

algorithm:str, one of ‘gibbs’, ‘rejection’ optional. default ‘rejection’

Rejection sampling successively sample each node given its parents evidence. When evidences are given on non-root nodes, only draws that are compatible with evidence nodes are not rejected. Rejection sampling is a good option when evidences nodes are not far from the root nodes or when given evidence is likely. Rare evidences lead to a high rate of rejected samples, thus to significant slow down of the sampling. Gibbs sampling scheme is a Markov Chain Monte Carlo (MCMC) technique designed to speed up the sampling. It builds conditional probability of state transition of each nodes given its neighbours in its markov blanket. Works well with a lot of evidences in the network, even when they are far from the root nodes. Drawback : convergence is only guaranteed when there is a non null probability path between states. If the posterior consists of isolated islands of high probability, Gibbs sampling will stay stuck in one the island and will never transition to the others. Successive samples will have high correlation.

min_probfloat <1 optional. If algorithm == “rejection”

stop iterations when Sum P(X|Evidence) < min_prob. generated samples for a given evidence will be incomplete (<n)

initial_statedict, optional

initial state used by the Gibbs sampler. Default is to use the maximum joint-probability values, calculated with self.predict(). The default should be optimal.

scan_order: str, one ‘topological’,’random’ optional. If algorithm == “gibbs”

Scan order or the gibbs sampler. Indicate in which order nodes are sampled. Topological order is good for chain like networks (lots of successive nodes). Random order yield better results with more connected

networks. Default : ‘random’.

burninint, optional. If algorithm == “Gibbs”

Number of sample to discard at the begining of the sampling. Default is 0.

random_stateseed or seeded numpy instance (for gibbs, only seed)
Returns
a nested list of sampled states of shape [n*len(evidences),len(nodes)]

Examples

>>> network.sample(evidence = [{'HLML': '2'},{'HLML': '2','TYPL':'1'},{'NBPI': '02','TYPL':'1'}])
score()

Return the accuracy of the model on a data set.

Parameters
Xnumpy.ndarray, shape=(n, d)

The values of the data set

ynumpy.ndarray, shape=(n,)

The labels of each value

state_count()

Returns the number of states present in the model.

summarize()

Summarize a batch of data and store the sufficient statistics.

This will partition the dataset into columns which belong to their appropriate distribution. If the distribution has parents, then multiple columns are sent to the distribution. This relies mostly on the summarize function of the underlying distribution.

Parameters
Xarray-like, shape (n_samples, n_nodes)

The data to train on, where each row is a sample and each column corresponds to the associated variable.

weightsarray-like, shape (n_nodes), optional

The weight of each sample as a positive double. Default is None.

Returns
None
thaw()

Thaw the distribution, re-allowing updates to occur.

to_dict()

Serialize this object to a dictionary of parameters.

to_json()

Serialize the model to JSON.

Parameters
separatorstuple, optional

The two separators to pass to the json.dumps function for formatting. Default is (‘,’, ‘ : ‘).

indentint, optional

The indentation to use at each level. Passed to json.dumps for formatting. Default is 4.

Returns
jsonstr

A properly formatted JSON object.

to_yaml()

Serialize the model to YAML for compactness.

class pomegranate.BayesianNetwork.ParentGraph

Generate a parent graph for a single variable over its parents.

This will generate the parent graph for a single parents given the data. A parent graph is the dynamically generated best parent set and respective score for each combination of parent variables. For example, if we are generating a parent graph for x1 over x2, x3, and x4, we may calculate that having x2 as a parent is better than x2,x3 and so store the value of x2 in the node for x2,x3.

Parameters
Xnumpy.ndarray, shape=(n, d)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsnumpy.ndarray, shape=(n,)

The weight of each sample as a positive double. Default is None.

key_countnumpy.ndarray, shape=(d,)

The number of unique keys in each column.

include_parentstuple

A set of parents that this node must have.

exclude_parentstuple

A set of parents that this node cannot have.

pseudocountdouble

A pseudocount to add to each possibility.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

max_parentsint

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

low_memorybool or None, optional

Whether to use a low-memory version of the search algorithm. This option only affects algorithm=”greedy” and algorithm=”exact”. Although the low-memory version of both the greedy and exact algorithms will use less memory, it will also significantly slow down the exact algorithm. However, setting this to True will also significantly speed up the greedy algorithm. Setting this value to None will enable it when algorithm=”greedy” and disable it otherwise. Default is None.

Returns
structuretuple, shape=(d,)

The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_a_star()

Find the optimal graph over a set of variables with no other knowledge.

This is the naive dynamic programming structure learning task where the optimal graph is identified from a set of variables using an order graph and parent graphs. This can be used either when no constraint graph is provided or for a SCC which is made up of a node containing a self-loop. It uses DP/A* in order to find the optimal graph without considering all possible topological sorts. A greedy version of the algorithm can be used that massively reduces both the computational and memory cost while frequently producing the optimal graph.

Parameters
Xnumpy.ndarray, shape=(n, d)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsnumpy.ndarray, shape=(n,)

The weight of each sample as a positive double. Default is None.

key_countnumpy.ndarray, shape=(d,)

The number of unique keys in each column.

include_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that must exist in the found structure.

exclude_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that cannot exist in the found structure.

pseudocountdouble

A pseudocount to add to each possibility.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

max_parentsint

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

low_memorybool or None, optional

Whether to use a low-memory version of the search algorithm. This option only affects algorithm=”greedy” and algorithm=”exact”. Although the low-memory version of both the greedy and exact algorithms will use less memory, it will also significantly slow down the exact algorithm. However, setting this to True will also significantly speed up the greedy algorithm. Setting this value to None will enable it when algorithm=”greedy” and disable it otherwise. Default is None.

n_jobsint

The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns
structuretuple, shape=(d,)

The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_component()

Find the optimal graph over a multi-node component of the constaint graph.

The general algorithm in this case is to begin with each variable and add all possible single children for that entry recursively until completion. This will result in a far sparser order graph than before. In addition, one can eliminate entries from the parent graphs that contain invalid parents as they are a fast of computational time.

Parameters
Xnumpy.ndarray, shape=(n, d)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsnumpy.ndarray, shape=(n,)

The weight of each sample as a positive double. Default is None.

include_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that must exist in the found structure.

exclude_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that cannot exist in the found structure.

key_countnumpy.ndarray, shape=(d,)

The number of unique keys in each column.

pseudocountdouble

A pseudocount to add to each possibility.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

max_parentsint

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

n_jobsint

The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns
structuretuple, shape=(d,)

The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_dp()

Find the optimal graph over a set of variables with no other knowledge.

This is the naive dynamic programming structure learning task where the optimal graph is identified from a set of variables using an order graph and parent graphs. This can be used either when no constraint graph is provided or for a SCC which is made up of a node containing a self-loop. This is a reference implementation that uses the naive shortest path algorithm over the entire order graph. The ‘exact’ option uses the A* path in order to avoid considering the full order graph.

Parameters
Xnumpy.ndarray, shape=(n, d)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsnumpy.ndarray, shape=(n,)

The weight of each sample as a positive double. Default is None.

key_countnumpy.ndarray, shape=(d,)

The number of unique keys in each column.

include_edgeslist or None

A set of (parent, child) tuples where each tuple is an edge that must exist in the found structure.

exclude_edgeslist or None

A set of (parent, child) tuples where each tuple is an edge that cannot exist in the found structure.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

max_parentsint

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

n_jobsint

The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns
structuretuple, shape=(d,)

The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_slap()

Find the optimal graph in a node with a Self Loop And Parents (SLAP).

Instead of just performing exact BNSL over the set of all parents and removing the offending edges there are efficiencies that can be gained by considering the structure. In particular, parents not coming from the main node do not need to be considered in the order graph but simply added to each entry after creation of the order graph. This is because those variables occur earlier in the topological ordering but it doesn’t matter how they occur otherwise. Parent graphs must be defined over all variables however.

Parameters
Xnumpy.ndarray, shape=(n, d)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsnumpy.ndarray, shape=(n,)

The weight of each sample as a positive double. Default is None.

key_countnumpy.ndarray, shape=(d,)

The number of unique keys in each column.

include_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that must exist in the found structure.

exclude_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that cannot exist in the found structure.

pseudocountdouble

A pseudocount to add to each possibility.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

max_parentsint

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

n_jobsint

The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns
structuretuple, shape=(d,)

The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_with_constraints()

This returns the optimal Bayesian network given a set of constraints.

This function controls the process of learning the Bayesian network by taking in a constraint graph, identifying the strongly connected components (SCCs) and solving each one using the appropriate algorithm. This is mostly an internal function.

Parameters
Xnumpy.ndarray, shape=(n, d)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsnumpy.ndarray, shape=(n,)

The weight of each sample as a positive double. Default is None.

key_countnumpy.ndarray, shape=(d,)

The number of unique keys in each column.

include_edgeslist or None

A set of (parent, child) tuples where each tuple is an edge that must exist in the found structure.

exclude_edgeslist or None

A set of (parent, child) tuples where each tuple is an edge that cannot exist in the found structure.

pseudocountdouble

A pseudocount to add to each possibility.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

max_parentsint

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

constraint_graphnetworkx.DiGraph

A directed graph showing valid parent sets for each variable. Each node is a set of variables, and edges represent which variables can be valid parents of those variables. The naive structure learning task is just all variables in a single node with a self edge, meaning that you know nothing about

low_memorybool or None, optional

Whether to use a low-memory version of the search algorithm. This option only affects algorithm=”greedy” and algorithm=”exact”. Although the low-memory version of both the greedy and exact algorithms will use less memory, it will also significantly slow down the exact algorithm. However, setting this to True will also significantly speed up the greedy algorithm. Setting this value to None will enable it when algorithm=”greedy” and disable it otherwise. Default is None.

n_jobsint

The number of threads to use when learning the structure of the network. This parallelized both the creation of the parent graphs for each variable and the solving of the SCCs. -1 means use all available resources. Default is 1, meaning no parallelism.

Returns
structuretuple, shape=(d,)

The parents for each variable in the network.

This is a wrapper for the function to be parallelized by joblib.

This function takes in a single task as an id and a set of parents and children and calls the appropriate function. This is mostly a wrapper for joblib to parallelize.

Parameters
Xnumpy.ndarray, shape=(n, d)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsnumpy.ndarray, shape=(n,)

The weight of each sample as a positive double. Default is None.

key_countnumpy.ndarray, shape=(d,)

The number of unique keys in each column.

include_edgeslist or None

A set of (parent, child) tuples where each tuple is an edge that must exist in the found structure.

exclude_edgeslist or None

A set of (parent, child) tuples where each tuple is an edge that cannot exist in the found structure.

pseudocountdouble

A pseudocount to add to each possibility.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

max_parentsint

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

A 3-tuple containing the id, the set of parents and the set of children to learn a component of the Bayesian network over. The cases represent a SCC of the following:

0 - Self loop and no parents 1 - Self loop and parents 2 - Parents and no self loop 3 - Multiple nodes

low_memorybool or None, optional

Whether to use a low-memory version of the search algorithm. This option only affects algorithm=”greedy” and algorithm=”exact”. Although the low-memory version of both the greedy and exact algorithms will use less memory, it will also significantly slow down the exact algorithm. However, setting this to True will also significantly speed up the greedy algorithm. Setting this value to None will enable it when algorithm=”greedy” and disable it otherwise. Default is None.

n_jobsint

The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs for each task or the finding of best parents in case 2.

Returns
structuretuple, shape=(d,)

The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_greedy()

Find the optimal graph over a set of variables with no other knowledge.

Parameters
Xnumpy.ndarray, shape=(n, d)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsnumpy.ndarray, shape=(n,)

The weight of each sample as a positive double. Default is None.

key_countnumpy.ndarray, shape=(d,)

The number of unique keys in each column.

include_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that must exist in the found structure.

exclude_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that cannot exist in the found structure.

pseudocountdouble

A pseudocount to add to each possibility.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

max_parentsint

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

low_memorybool or None, optional

Whether to use a low-memory version of the search algorithm. This option only affects algorithm=”greedy” and algorithm=”exact”. Although the low-memory version of both the greedy and exact algorithms will use less memory, it will also significantly slow down the exact algorithm. However, setting this to True will also significantly speed up the greedy algorithm. Setting this value to None will enable it when algorithm=”greedy” and disable it otherwise. Default is None.

n_jobsint

The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns
structuretuple, shape=(d,)

The parents for each variable in this SCC

pomegranate.BayesianNetwork.generate_parent_graph()

Generate a parent graph for a single variable over its parents.

This will generate the parent graph for a single parents given the data. A parent graph is the dynamically generated best parent set and respective score for each combination of parent variables. For example, if we are generating a parent graph for x1 over x2, x3, and x4, we may calculate that having x2 as a parent is better than x2,x3 and so store the value of x2 in the node for x2,x3.

Parameters
Xnumpy.ndarray, shape=(n, d)

The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.

weightsnumpy.ndarray, shape=(n,)

The weight of each sample as a positive double. Default is None.

key_countnumpy.ndarray, shape=(d,)

The number of unique keys in each column.

iint

The column index to build the parent graph for.

include_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that must exist in the found structure.

exclude_edgeslist or None

A list of (parent, child) tuples where each tuple corresponds to an edge that cannot exist in the found structure.

pseudocountdouble

A pseudocount to add to each possibility.

penaltyfloat or None, optional

The weighting of the model complexity term in the objective function. Increasing this value will encourage sparsity whereas setting the value to 0 will result in an unregularized structure. Default is log2(|D|) / 2 where |D| is the sum of the weights of the data.

max_parentsint

The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.

parent_settuple, default ()

The variables which are possible parents for this variable. If nothing is passed in then it defaults to all other variables, as one would expect in the naive case. This allows for cases where we want to build a parent graph over only a subset of the variables.

Returns
structuretuple, shape=(d,)

The parents for each variable in this SCC