Bayesian Networks¶

Bayesian networks are a probabilistic model that are especially good at inference given incomplete data. Much like a hidden Markov model, they consist of a directed graphical model (though Bayesian networks must also be acyclic) and a set of probability distributions. The edges encode dependency statements between the variables, where the lack of an edge between any pair of variables indicates a conditional independence. Each node encodes a probability distribution, where root nodes encode univariate probability distributions and inner/leaf nodes encode conditional probability distributions. Bayesian networks are exceptionally flexible when doing inference, as any subset of variables can be observed, and inference done over all other variables, without needing to define these groups in advance. In fact, the set of observed variables can change from one sample to the next without needing to modify the underlying algorithm at all.

Currently, pomegranate only supports discrete Bayesian networks, meaning that the values must be categories, i.e. ‘apples’ and ‘oranges’, or 1 and 2, where 1 and 2 refer to categories, not numbers, and so 2 is not explicitly ‘bigger’ than 1.

Initialization¶

Bayesian networks can be initialized in two ways, depending on whether the underlying graphical structure is known or not: (1) the graphical structure can be built one node at a time with pre-initialized distributions set for each node, or (2) both the graphical structure and distributions can be learned directly from data. This mirrors the other models that are implemented in pomegranate. However, typically expectation maximization is used to fit the parameters of the distribution, and so initialization (such as through k-means) is typically fast whereas fitting is slow. For Bayesian networks, the opposite is the case. Fitting can be done quickly by just summing counts through the data, while initialization is hard as it requires an exponential time search through all possible DAGs to identify the optimal graph. More is discussed in the tutorials above and in the fitting section below.

Let’s take a look at initializing a Bayesian network in the first manner by quickly implementing the Monty Hall problem. The Monty Hall problem arose from the gameshow Let’s Make a Deal, where a guest had to choose which one of three doors had a prize behind it. The twist was that after the guest chose, the host, originally Monty Hall, would then open one of the doors the guest did not pick and ask if the guest wanted to switch which door they had picked. Initial inspection may lead you to believe that if there are only two doors left, there is a 50-50 chance of you picking the right one, and so there is no advantage one way or the other. However, it has been proven both through simulations and analytically that there is in fact a 66% chance of getting the prize if the guest switches their door, regardless of the door they initially went with.

Our network will have three nodes, one for the guest, one for the prize, and one for the door Monty chooses to open. The door the guest initially chooses and the door the prize is behind are uniform random processes across the three doors, but the door which Monty opens is dependent on both the door the guest chooses (it cannot be the door the guest chooses), and the door the prize is behind (it cannot be the door with the prize behind it).

from pomegranate import *

guest = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
prize = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
monty = ConditionalProbabilityTable(
        [['A', 'A', 'A', 0.0],
         ['A', 'A', 'B', 0.5],
         ['A', 'A', 'C', 0.5],
         ['A', 'B', 'A', 0.0],
         ['A', 'B', 'B', 0.0],
         ['A', 'B', 'C', 1.0],
         ['A', 'C', 'A', 0.0],
         ['A', 'C', 'B', 1.0],
         ['A', 'C', 'C', 0.0],
         ['B', 'A', 'A', 0.0],
         ['B', 'A', 'B', 0.0],
         ['B', 'A', 'C', 1.0],
         ['B', 'B', 'A', 0.5],
         ['B', 'B', 'B', 0.0],
         ['B', 'B', 'C', 0.5],
         ['B', 'C', 'A', 1.0],
         ['B', 'C', 'B', 0.0],
         ['B', 'C', 'C', 0.0],
         ['C', 'A', 'A', 0.0],
         ['C', 'A', 'B', 1.0],
         ['C', 'A', 'C', 0.0],
         ['C', 'B', 'A', 1.0],
         ['C', 'B', 'B', 0.0],
         ['C', 'B', 'C', 0.0],
         ['C', 'C', 'A', 0.5],
         ['C', 'C', 'B', 0.5],
         ['C', 'C', 'C', 0.0]], [guest, prize])

s1 = Node(guest, name="guest")
s2 = Node(prize, name="prize")
s3 = Node(monty, name="monty")

model = BayesianNetwork("Monty Hall Problem")
model.add_states(s1, s2, s3)
model.add_edge(s1, s3)
model.add_edge(s2, s3)
model.bake()

Note

The objects ‘state’ and ‘node’ are really the same thing and can be used interchangable. The only difference is the name, as hidden Markov models use ‘state’ in the literature frequently whereas Bayesian networks use ‘node’ frequently.

The conditional distribution must be explicitly spelled out in this example, followed by a list of the parents in the same order as the columns take in the tabble that is provided (e.g. the columns in the table correspond to guest, prize, monty, probability.)

However, one can also initialize a Bayesian network based completely on data. As mentioned before, the exact version of this algorithm takes exponential time with the number of variables and typically can’t be done on more than ~25 variables. This is because there are a super-exponential number of directed acyclic graphs that one could define over a set of variables, but fortunately one can use dynamic programming in order to reduce this complexity down to “simply exponential.” The implementation of the exact algorithm actually goes further than the original dynamic programing algorithm by implementing an A* search to somewhat reduce computational time but drastically reduce required memory, sometimes by an order of magnitude.

from pomegranate import *
import numpy

X = numpy.load('data.npy')
model = BayesianNetwork.from_samples(X, algorithm='exact')

The exact algorithm is not the default, though. The default is a novel greedy algorithm that greedily chooses a topological ordering of the variables, but optimally identifies the best parents for each variable given this ordering. It is significantly faster and more memory efficient than the exact algorithm and produces far better estimates than using a Chow-Liu tree. This is set to the default to avoid locking up the computers of users that unintentionally tell their computers to do a near-impossible task.

Probability¶

You can calculate the probabiity of a sample under a Bayesian network as the product of the probability of each variable given its parents, if it has any. This can be expressed as $P = \prod\limits_{i=1}^{d} P(D_{i}|Pa_{i})$ for a sample with $d$ dimensions. For example, in the Monty Hal problem, the probability of a show is the probability of the guest choosing the respective door, times the probability of the prize being behind a given door, times the probability of Monty opening a given door given the previous two values. For example, using the manually initialized network above:

>>> print model.probability([['A', 'A', 'A'],
                             ['A', 'A', 'B'],
                             ['C', 'C', 'B']])
[ 0.          0.05555556  0.05555556]

Prediction¶

Bayesian networks are frequently used to infer/impute the value of missing variables given the observed values. In other models, typically there is either a single or fixed set of missing variables, such as latent factors, that need to be imputed, and so returning a fixed vector or matrix as the predictions makes sense. However, in the case of Bayesian networks, we can make no such assumptions, and so when data is passed in for prediction it should be in the format as a matrix with None in the missing variables that need to be inferred. The return is thus a filled in matrix where the Nones have been replaced with the imputed values. For example:

>>> print model.predict([['A', 'B', None],
                         ['A', 'C', None],
                         ['C', 'B', None]])
[['A' 'B' 'C']
 ['A' 'C' 'B']
 ['C' 'B' 'A']]

In this example, the final column is the one that is always missing, but a more complex example is as follows:

>>> print model.predict([['A', 'B', None],
                 ['A', None, 'C'],
                 [None, 'B', 'A']])
[['A' 'B' 'C']
 ['A' 'B' 'C']
 ['C' 'B' 'A']]

Fitting¶

Fitting a Bayesian network to data is a fairly simple process. Essentially, for each variable, you need consider only that column of data and the columns corresponding to that variables parents. If it is a univariate distribution, then the maximum likelihood estimate is just the count of each symbol divided by the number of samples in the data. If it is a multivariate distribution, it ends up being the probability of each symbol in the variable of interest given the combination of symbols in the parents. For example, consider a binary dataset with two variables, X and Y, where X is a parent of Y. First, we would go through the dataset and calculate P(X=0) and P(X=1). Then, we would calculate P(Y=0|X=0), P(Y=1|X=0), P(Y=0|X=1), and P(Y=1|X=1). Those values encode all of the parameters of the Bayesian network.

API Reference¶

class pomegranate.BayesianNetwork.BayesianNetwork¶

A Bayesian Network Model.

A Bayesian network is a directed graph where nodes represent variables, edges represent conditional dependencies of the children on their parents, and the lack of an edge represents a conditional independence.

Parameters:	name : str, optional The name of the model. Default is None
Attributes:	states : list, shape (n_states,) A list of all the state objects in the model graph : networkx.DiGraph The underlying graph object.

add_edge()¶: Add a transition from state a to state b which indicates that B is dependent on A in ways specified by the distribution.

add_node()¶: Add a node to the graph.

add_nodes()¶: Add multiple states to the graph.

add_state()¶: Another name for a node.

add_states()¶: Another name for a node.

add_transition()¶: Transitions and edges are the same.

bake()¶

Finalize the topology of the model.

Assign a numerical index to every state and create the underlying arrays corresponding to the states and edges between the states. This method must be called before any of the probability-calculating methods. This includes converting conditional probability tables into joint probability tables and creating a list of both marginal and table nodes.

Parameters:	None
Returns:	None

clear_summaries()¶: Clear the summary statistics stored in the object.

copy()¶

Return a deep copy of this distribution object.

This object will not be tied to any other distribution or connected in any form.

Parameters:	None
Returns:	distribution : Distribution A copy of the distribution with the same parameters.

dense_transition_matrix()¶: Returns the dense transition matrix. Useful if the transitions of somewhat small models need to be analyzed.

edge_count()¶: Returns the number of edges present in the model.

fit()¶

Fit the model to data using MLE estimates.

Fit the model to the data by updating each of the components of the model, which are univariate or multivariate distributions. This uses a simple MLE estimate to update the distributions according to their summarize or fit methods.

This is a wrapper for the summarize and from_summaries methods.

Parameters:

X : array-like, shape (n_samples, n_nodes): The data to train on, where each row is a sample and each column corresponds to the associated variable.
weights : array-like, shape (n_nodes), optional: The weight of each sample as a positive double. Default is None.
inertia : double, optional: The inertia for updating the distributions, passed along to the distribution method. Default is 0.0.
pseudocount : double, optional: A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Only effects hidden Markov models defined over discrete distributions. Default is 0.
verbose : bool, optional: Whether or not to print out improvement information over iterations. Only required if doing semisupervised learning. Default is False.
n_jobs : int: The number of jobs to use to parallelize, either the number of threads or the number of processes to use. -1 means use all available resources. Default is 1.

Returns:

self : BayesianNetwork: The fit Bayesian network object with updated model parameters.

freeze()¶: Freeze the distribution, preventing updates from occuring.

from_json()¶

Read in a serialized Bayesian Network and return the appropriate object.

Parameters:	s : str A JSON formatted string containing the file.
Returns:	model : object A properly initialized and baked model.

from_samples()¶

Learn the structure of the network from data.

Find the structure of the network from data using a Bayesian structure learning score. This currently enumerates all the exponential number of structures and finds the best according to the score. This allows weights on the different samples as well. The score that is optimized is the minimum description length (MDL).

Parameters:

X : array-like, shape (n_samples, n_nodes): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : array-like, shape (n_nodes), optional: The weight of each sample as a positive double. Default is None.
algorithm : str, one of ‘chow-liu’, ‘greedy’, ‘exact’, ‘exact-dp’ optional: The algorithm to use for learning the Bayesian network. Default is ‘greedy’ that greedily attempts to find the best structure, and frequently can identify the optimal structure. ‘exact’ uses DP/A* to find the optimal Bayesian network, and ‘exact-dp’ tries to find the shortest path on the entire order lattice, which is more memory and computationally expensive. ‘exact’ and ‘exact-dp’ should give identical results, with ‘exact-dp’ remaining an option mostly for debugging reasons. ‘chow-liu’ will return the optimal tree-like structure for the Bayesian network, which is a very fast approximation but not always the best network.
max_parents : int, optional: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
root : int, optional: For algorithms which require a single root (‘chow-liu’), this is the root for which all edges point away from. User may specify which column to use as the root. Default is the first column.
constraint_graph : networkx.DiGraph or None, optional: A directed graph showing valid parent sets for each variable. Each node is a set of variables, and edges represent which variables can be valid parents of those variables. The naive structure learning task is just all variables in a single node with a self edge, meaning that you know nothing about
pseudocount : double, optional: A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Default is 0.
state_names : array-like, shape (n_nodes), optional: A list of meaningful names to be applied to nodes
name : str, optional: The name of the model. Default is None.
reduce_dataset : bool, optional: Given the discrete nature of these datasets, frequently a user will pass in a dataset that has many identical samples. It is time consuming to go through these redundant samples and a far more efficient use of time to simply calculate a new dataset comprised of the subset of unique observed samples weighted by the number of times they occur in the dataset. This typically will speed up all algorithms, including when using a constraint graph. Default is True.
n_jobs : int, optional: The number of threads to use when learning the structure of the network. If a constraint graph is provided, this will parallelize the tasks as directed by the constraint graph. If one is not provided it will parallelize the building of the parent graphs. Both cases will provide large speed gains.

Returns:

model : BayesianNetwork: The learned BayesianNetwork.

from_structure()¶

Return a Bayesian network from a predefined structure.

Pass in the structure of the network as a tuple of tuples and get a fit network in return. The tuple should contain n tuples, with one for each node in the graph. Each inner tuple should be of the parents for that node. For example, a three node graph where both node 0 and 1 have node 2 as a parent would be specified as ((2,), (2,), ()).

Parameters:

X : array-like, shape (n_samples, n_nodes): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
structure : tuple of tuples: The parents for each node in the graph. If a node has no parents, then do not specify any parents.
weights : array-like, shape (n_nodes), optional: The weight of each sample as a positive double. Default is None.
pseudocount : double, optional: A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Default is 0.
name : str, optional: The name of the model. Default is None.
state_names : array-like, shape (n_nodes), optional: A list of meaningful names to be applied to nodes
n_jobs : int: The number of jobs to use to parallelize, either the number of threads or the number of processes to use. -1 means use all available resources. Default is 1.

Returns:

model : BayesianNetwoork: A Bayesian network with the specified structure.

from_summaries()¶

Use MLE on the stored sufficient statistics to train the model.

This uses MLE estimates on the stored sufficient statistics to train the model.

Parameters:	inertia : double, optional The inertia for updating the distributions, passed along to the distribution method. Default is 0.0. pseudocount : double, optional A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Default is 0.
Returns:	None

log_probability()¶

Return the log probability of samples under the Bayesian network.

The log probability is just the sum of the log probabilities under each of the components. The log probability of a sample under the graph A -> B is just P(A)*P(B|A). This will return a vector of log probabilities, one for each sample.

Parameters:

X : array-like, shape (n_samples, n_dim): The sample is a vector of points where each dimension represents the same variable as added to the graph originally. It doesn’t matter what the connections between these variables are, just that they are all ordered the same.
n_jobs : int: The number of jobs to use to parallelize, either the number of threads or the number of processes to use. -1 means use all available resources. Default is 1.

Returns:

logp : numpy.ndarray or double: The log probability of the samples if many, or the single log probability.

marginal()¶

Return the marginal probabilities of each variable in the graph.

This is equivalent to a pass of belief propogation on a graph where no data has been given. This will calculate the probability of each variable being in each possible emission when nothing is known.

Parameters:	None
Returns:	marginals : array-like, shape (n_nodes) An array of univariate distribution objects showing the marginal probabilities of that variable.

node_count()¶: Returns the number of nodes/states in the model

plot()¶

Draw this model’s graph using NetworkX and matplotlib.

Note that this relies on networkx’s built-in graphing capabilities (and not Graphviz) and thus can’t draw self-loops.

See networkx.draw_networkx() for the keywords you can pass in.

Parameters:	**kwargs : any The arguments to pass into networkx.draw_networkx()
Returns:	None

predict()¶

Predict missing values of a data matrix using MLE.

Impute the missing values of a data matrix using the maximally likely predictions according to the forward-backward algorithm. Run each sample through the algorithm (predict_proba) and replace missing values with the maximally likely predicted emission.

Parameters:

X : array-like, shape (n_samples, n_nodes): Data matrix to impute. Missing values must be either None (if lists) or np.nan (if numpy.ndarray). Will fill in these values with the maximally likely ones.
max_iterations : int, optional: Number of iterations to run loopy belief propogation for. Default is 100.
n_jobs : int: The number of jobs to use to parallelize, either the number of threads or the number of processes to use. -1 means use all available resources. Default is 1.

Returns:

X : numpy.ndarray, shape (n_samples, n_nodes): This is the data matrix with the missing values imputed.

predict_proba()¶

Returns the probabilities of each variable in the graph given evidence.

This calculates the marginal probability distributions for each state given the evidence provided through loopy belief propogation. Loopy belief propogation is an approximate algorithm which is exact for certain graph structures.

Parameters:

data : dict or array-like, shape <= n_nodes, optional: The evidence supplied to the graph. This can either be a dictionary with keys being state names and values being the observed values (either the emissions or a distribution over the emissions) or an array with the values being ordered according to the nodes incorporation in the graph (the order fed into .add_states/add_nodes) and None for variables which are unknown. If nothing is fed in then calculate the marginal of the graph. Default is {}.
max_iterations : int, optional: The number of iterations with which to do loopy belief propogation. Usually requires only 1. Default is 100.
check_input : bool, optional: Check to make sure that the observed symbol is a valid symbol for that distribution to produce. Default is True.

Returns:

probabilities : array-like, shape (n_nodes): An array of univariate distribution objects showing the probabilities of each variable.

probability()¶

Return the probability of the given symbol under this distribution.

Parameters:	symbol : object The symbol to calculate the probability of
Returns:	probability : double The probability of that point under the distribution.

sample()¶

Return a random item sampled from this distribution.

Parameters:	n : int or None, optional The number of samples to return. Default is None, which is to generate a single sample.
Returns:	sample : double or object Returns a sample from the distribution of a type in the support of the distribution.

state_count()¶: Returns the number of states present in the model.

summarize()¶

Summarize a batch of data and store the sufficient statistics.

This will partition the dataset into columns which belong to their appropriate distribution. If the distribution has parents, then multiple columns are sent to the distribution. This relies mostly on the summarize function of the underlying distribution.

Parameters:	X : array-like, shape (n_samples, n_nodes) The data to train on, where each row is a sample and each column corresponds to the associated variable. weights : array-like, shape (n_nodes), optional The weight of each sample as a positive double. Default is None.
Returns:	None

thaw()¶: Thaw the distribution, re-allowing updates to occur.

to_json()¶

Serialize the model to a JSON.

Parameters:	separators : tuple, optional The two separaters to pass to the json.dumps function for formatting. indent : int, optional The indentation to use at each level. Passed to json.dumps for formatting.
Returns:	json : str A properly formatted JSON object.

class pomegranate.BayesianNetwork.ParentGraph¶

Generate a parent graph for a single variable over its parents.

This will generate the parent graph for a single parents given the data. A parent graph is the dynamically generated best parent set and respective score for each combination of parent variables. For example, if we are generating a parent graph for x1 over x2, x3, and x4, we may calculate that having x2 as a parent is better than x2,x3 and so store the value of x2 in the node for x2,x3.

Parameters:

X : numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count : numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount : double: A pseudocount to add to each possibility.
max_parents : int: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
parent_set : tuple, default (): The variables which are possible parents for this variable. If nothing is passed in then it defaults to all other variables, as one would expect in the naive case. This allows for cases where we want to build a parent graph over only a subset of the variables.

Returns:

structure : tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_a_star()¶

Find the optimal graph over a set of variables with no other knowledge.

This is the naive dynamic programming structure learning task where the optimal graph is identified from a set of variables using an order graph and parent graphs. This can be used either when no constraint graph is provided or for a SCC which is made up of a node containing a self-loop. It uses DP/A* in order to find the optimal graph without considering all possible topological sorts. A greedy version of the algorithm can be used that massively reduces both the computational and memory cost while frequently producing the optimal graph.

Parameters:

X : numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count : numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount : double: A pseudocount to add to each possibility.
max_parents : int: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
n_jobs : int: The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns:

structure : tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_component()¶

Find the optimal graph over a multi-node component of the constaint graph.

The general algorithm in this case is to begin with each variable and add all possible single children for that entry recursively until completion. This will result in a far sparser order graph than before. In addition, one can eliminate entries from the parent graphs that contain invalid parents as they are a fast of computational time.

Parameters:

X : numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count : numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount : double: A pseudocount to add to each possibility.
max_parents : int: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
n_jobs : int: The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns:

structure : tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_dp()¶

Find the optimal graph over a set of variables with no other knowledge.

This is the naive dynamic programming structure learning task where the optimal graph is identified from a set of variables using an order graph and parent graphs. This can be used either when no constraint graph is provided or for a SCC which is made up of a node containing a self-loop. This is a reference implementation that uses the naive shortest path algorithm over the entire order graph. The ‘exact’ option uses the A* path in order to avoid considering the full order graph.

Parameters:

X : numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count : numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount : double: A pseudocount to add to each possibility.
max_parents : int: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
n_jobs : int: The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns:

structure : tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_slap()¶

Find the optimal graph in a node with a Self Loop And Parents (SLAP).

Instead of just performing exact BNSL over the set of all parents and removing the offending edges there are efficiencies that can be gained by considering the structure. In particular, parents not coming from the main node do not need to be considered in the order graph but simply added to each entry after creation of the order graph. This is because those variables occur earlier in the topological ordering but it doesn’t matter how they occur otherwise. Parent graphs must be defined over all variables however.

Parameters:

X : numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count : numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount : double: A pseudocount to add to each possibility.
max_parents : int: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
n_jobs : int: The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns:

structure : tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_exact_with_constraints()¶

This returns the optimal Bayesian network given a set of constraints.

This function controls the process of learning the Bayesian network by taking in a constraint graph, identifying the strongly connected components (SCCs) and solving each one using the appropriate algorithm. This is mostly an internal function.

Parameters:

X : numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count : numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount : double: A pseudocount to add to each possibility.
max_parents : int: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
constraint_graph : networkx.DiGraph: A directed graph showing valid parent sets for each variable. Each node is a set of variables, and edges represent which variables can be valid parents of those variables. The naive structure learning task is just all variables in a single node with a self edge, meaning that you know nothing about
n_jobs : int: The number of threads to use when learning the structure of the network. This parallelized both the creation of the parent graphs for each variable and the solving of the SCCs. -1 means use all available resources. Default is 1, meaning no parallelism.

Returns:

structure : tuple, shape=(d,): The parents for each variable in the network.

pomegranate.BayesianNetwork.discrete_exact_with_constraints_task()¶

This is a wrapper for the function to be parallelzied by joblib.

This function takes in a single task as an id and a set of parents and children and calls the appropriate function. This is mostly a wrapper for joblib to parallelize.

Parameters:

X : numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count : numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount : double: A pseudocount to add to each possibility.
max_parents : int: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
task : tuple: A 3-tuple containing the id, the set of parents and the set of children to learn a component of the Bayesian network over. The cases represent a SCC of the following:

0 - Self loop and no parents 1 - Self loop and parents 2 - Parents and no self loop 3 - Multiple nodes
n_jobs : int: The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs for each task or the finding of best parents in case 2.

Returns:

structure : tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork.discrete_greedy()¶

Find the optimal graph over a set of variables with no other knowledge.

This is the naive dynamic programming structure learning task where the optimal graph is identified from a set of variables using an order graph and parent graphs. This can be used either when no constraint graph is provided or for a SCC which is made up of a node containing a self-loop. It uses DP/A* in order to find the optimal graph without considering all possible topological sorts. A greedy version of the algorithm can be used that massively reduces both the computational and memory cost while frequently producing the optimal graph.

Parameters:

X : numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count : numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount : double: A pseudocount to add to each possibility.
max_parents : int: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
greedy : bool, default is True: Whether the use a heuristic in order to massive reduce computation and memory time, but without the guarantee of finding the best network.
n_jobs : int: The number of threads to use when learning the structure of the network. This parallelizes the creation of the parent graphs.

Returns:

structure : tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork.generate_parent_graph()¶

Generate a parent graph for a single variable over its parents.

This will generate the parent graph for a single parents given the data. A parent graph is the dynamically generated best parent set and respective score for each combination of parent variables. For example, if we are generating a parent graph for x1 over x2, x3, and x4, we may calculate that having x2 as a parent is better than x2,x3 and so store the value of x2 in the node for x2,x3.

Parameters:

X : numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and each column corresponds to the associated variable.
weights : numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count : numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount : double: A pseudocount to add to each possibility.
max_parents : int: The maximum number of parents a node can have. If used, this means using the k-learn procedure. Can drastically speed up algorithms. If -1, no max on parents. Default is -1.
parent_set : tuple, default (): The variables which are possible parents for this variable. If nothing is passed in then it defaults to all other variables, as one would expect in the naive case. This allows for cases where we want to build a parent graph over only a subset of the variables.

Returns:

structure : tuple, shape=(d,): The parents for each variable in this SCC