Markov Chains¶

IPython Notebook Tutorial

Markov chains are form of structured model over sequences. They represent the probability of each character in the sequence as a conditional probability of the last k symbols. For example, a 3rd order Markov chain would have each symbol depend on the last three symbols. A 0th order Markov chain is a naive predictor where each symbol is independent of all other symbols. Currently pomegranate only supports discrete emission Markov chains where each symbol is a discrete symbol versus a continuous number (like ‘A’ ‘B’ ‘C’ instead of 17.32 or 19.65).

Initialization¶

Markov chains can almost be represented by a single conditional probability table (CPT), except that the probability of the first k elements (for a k-th order Markov chain) cannot be appropriately represented except by using special characters. Due to this pomegranate takes in a series of k+1 distributions representing the first k elements. For example for a second order Markov chain:

from pomegranate import *
d1 = DiscreteDistribution({'A': 0.25, 'B': 0.75})
d2 = ConditionalProbabilityTable([['A', 'A', 0.1],
                                      ['A', 'B', 0.9],
                                      ['B', 'A', 0.6],
                                      ['B', 'B', 0.4]], [d1])
d3 = ConditionalProbabilityTable([['A', 'A', 'A', 0.4],
                                      ['A', 'A', 'B', 0.6],
                                      ['A', 'B', 'A', 0.8],
                                      ['A', 'B', 'B', 0.2],
                                      ['B', 'A', 'A', 0.9],
                                      ['B', 'A', 'B', 0.1],
                                      ['B', 'B', 'A', 0.2],
                                      ['B', 'B', 'B', 0.8]], [d1, d2])
model = MarkovChain([d1, d2, d3])

Probability¶

The probability of a sequence under the Markov chain is just the probabiliy of the first character under the first distribution times the probability of the second character under the second distribution and so forth until you go past the (k+1)th character, which remains evaluated under the (k+1)th distribution. We can calculate the probability or log probability in the same manner as any of the other models. Given the model shown before:

>>> model.log_probability(['A', 'B', 'B', 'B'])
-3.324236340526027
>>> model.log_probability(['A', 'A', 'A', 'A'])
-5.521460917862246

Fitting¶

Markov chains are not very complicated to train. For each sequence the appropriate symbols are sent to the appropriate distributions and maximum likelihood estimates are used to update the parameters of the distributions. There are no latent factors to train and so no expectation maximization or iterative algorithms are needed to train anything.

API Reference¶

class pomegranate.MarkovChain.MarkovChain¶

A Markov Chain.

Implemented as a series of conditional distributions, the Markov chain models P(X_i | X_i-1…X_i-k) for a k-th order Markov network. The conditional dependencies are directly on the emissions, and not on a hidden state as in a hidden Markov model.

Parameters:	distributions : list, shape (k+1) A list of the conditional distributions which make up the markov chain. Begins with P(X_i), then P(X_i \| X_i-1). For a k-th order markov chain you must put in k+1 distributions.

Examples

>>> from pomegranate import *
>>> d1 = DiscreteDistribution({'A': 0.25, 'B': 0.75})
>>> d2 = ConditionalProbabilityTable([['A', 'A', 0.33],
                                     ['B', 'A', 0.67],
                                     ['A', 'B', 0.82],
                                     ['B', 'B', 0.18]], [d1])
>>> mc = MarkovChain([d1, d2])
>>> mc.log_probability(list('ABBAABABABAABABA'))
-8.9119890701808213

Attributes:	distributions : list, shape (k+1) The distributions which make up the chain.

fit()¶

Fit the model to new data using MLE.

The underlying distributions are fed in their appropriate points and weights and are updated.

Parameters:

sequences : array-like, shape (n_samples, variable): This is the data to train on. Each row is a sample which contains a sequence of variable length
weights : array-like, shape (n_samples,), optional: The initial weights of each sample. If nothing is passed in then each sample is assumed to be the same weight. Default is None.
inertia : double, optional: The weight of the previous parameters of the model. The new parameters will roughly be old_param*inertia + new_param*(1-inertia), so an inertia of 0 means ignore the old parameters, whereas an inertia of 1 means ignore the new parameters. Default is 0.0.

Returns:

None

from_json()¶

Read in a serialized model and return the appropriate classifier.

Parameters:	s : str A JSON formatted string containing the file.
Returns:	model : object A properly initialized and baked model.

from_samples()¶

Learn the Markov chain from data.

Takes in the memory of the chain (k) and learns the initial distribution and probability tables associated with the proper parameters.

Parameters:	X : array-like, list or numpy.array The data to fit the structure too as a list of sequences of variable length. Since the data will be of variable length, there is no set form weights : array-like, shape (n_nodes), optional The weight of each sample as a positive double. Default is None. k : int, optional The number of samples back to condition on in the model. Default is 1.
Returns:	model : MarkovChain The learned markov chain model.

from_summaries()¶

Fit the model to the collected sufficient statistics.

Fit the parameters of the model to the sufficient statistics gathered during the summarize calls. This should return an exact update.

Parameters:	inertia : double, optional The weight of the previous parameters of the model. The new parameters will roughly be old_paraminertia + new_param (1-inertia), so an inertia of 0 means ignore the old parameters, whereas an inertia of 1 means ignore the new parameters. Default is 0.0.
Returns:	None

log_probability()¶

Calculate the log probability of the sequence under the model.

This calculates the first slices of increasing size under the corresponding first few components of the model until size k is reached, at which all slices are evaluated under the final component.

Parameters:	sequence : array-like An array of observations
Returns:	logp : double The log probability of the sequence under the model.

sample()¶

Create a random sample from the model.

Parameters:	length : int or Distribution Give either the length of the sample you want to generate, or a distribution object which will be randomly sampled for the length. Continuous distributions will have their sample rounded to the nearest integer, minimum 1.
Returns:	sequence : array-like, shape = (length,) A sequence randomly generated from the markov chain.

summarize()¶

Summarize a batch of data and store sufficient statistics.

This will summarize the sequences into sufficient statistics stored in each distribution.

Parameters:	sequences : array-like, shape (n_samples, variable) This is the data to train on. Each row is a sample which contains a sequence of variable length weights : array-like, shape (n_samples,), optional The initial weights of each sample. If nothing is passed in then each sample is assumed to be the same weight. Default is None.
Returns:	None

to_json()¶

Serialize the model to a JSON.

Parameters:	separators : tuple, optional The two separaters to pass to the json.dumps function for formatting. Default is (‘,’, ‘ : ‘). indent : int, optional The indentation to use at each level. Passed to json.dumps for formatting. Default is 4.
Returns:	json : str A properly formatted JSON object.