Markov Chains

IPython Notebook Tutorial

Markov chains are form of structured model over sequences. They represent the probability of each character in the sequence as a conditional probability of the last k symbols. For example, a 3rd order Markov chain would have each symbol depend on the last three symbols. A 0th order Markov chain is a naive predictor where each symbol is independent of all other symbols. Currently pomegranate only supports discrete emission Markov chains where each symbol is a discrete symbol versus a continuous number (like ‘A’ ‘B’ ‘C’ instead of 17.32 or 19.65).

Initialization

Markov chains can almost be represented by a single conditional probability table (CPT), except that the probability of the first k elements (for a k-th order Markov chain) cannot be appropriately represented except by using special characters. Due to this pomegranate takes in a series of k+1 distributions representing the first k elements. For example for a second order Markov chain:

from pomegranate import *
d1 = DiscreteDistribution({'A': 0.25, 'B': 0.75})
d2 = ConditionalProbabilityTable([['A', 'A', 0.1],
                                      ['A', 'B', 0.9],
                                      ['B', 'A', 0.6],
                                      ['B', 'B', 0.4]], [d1])
d3 = ConditionalProbabilityTable([['A', 'A', 'A', 0.4],
                                      ['A', 'A', 'B', 0.6],
                                      ['A', 'B', 'A', 0.8],
                                      ['A', 'B', 'B', 0.2],
                                      ['B', 'A', 'A', 0.9],
                                      ['B', 'A', 'B', 0.1],
                                      ['B', 'B', 'A', 0.2],
                                      ['B', 'B', 'B', 0.8]], [d1, d2])
model = MarkovChain([d1, d2, d3])

Probability

The probability of a sequence under the Markov chain is just the probabiliy of the first character under the first distribution times the probability of the second character under the second distribution and so forth until you go past the (k+1)th character, which remains evaluated under the (k+1)th distribution. We can calculate the probability or log probability in the same manner as any of the other models. Given the model shown before:

>>> model.log_probability(['A', 'B', 'B', 'B'])
-3.324236340526027
>>> model.log_probability(['A', 'A', 'A', 'A'])
-5.521460917862246

Fitting

Markov chains are not very complicated to train. For each sequence the appropriate symbols are sent to the appropriate distributions and maximum likelihood estimates are used to update the parameters of the distributions. There are no latent factors to train and so no expectation maximization or iterative algorithms are needed to train anything.

API Reference

class pomegranate.MarkovChain.MarkovChain

A Markov Chain.

Implemented as a series of conditional distributions, the Markov chain models P(X_i | X_i-1...X_i-k) for a k-th order Markov network. The conditional dependencies are directly on the emissions, and not on a hidden state as in a hidden Markov model.

Parameters:

distributions : list, shape (k+1)

A list of the conditional distributions which make up the markov chain. Begins with P(X_i), then P(X_i | X_i-1). For a k-th order markov chain you must put in k+1 distributions.

Examples

>>> from pomegranate import *
>>> d1 = DiscreteDistribution({'A': 0.25, 'B': 0.75})
>>> d2 = ConditionalProbabilityTable([['A', 'A', 0.33],
                                     ['B', 'A', 0.67],
                                     ['A', 'B', 0.82],
                                     ['B', 'B', 0.18]], [d1])
>>> mc = MarkovChain([d1, d2])
>>> mc.log_probability(list('ABBAABABABAABABA'))
-8.9119890701808213

Attributes

distributions (list, shape (k+1)) The distributions which make up the chain.
fit()

Fit the model to new data using MLE.

The underlying distributions are fed in their appropriate points and weights and are updated.

Parameters:

sequences : array-like, shape (n_samples, variable)

This is the data to train on. Each row is a sample which contains a sequence of variable length

weights : array-like, shape (n_samples,), optional

The initial weights of each sample. If nothing is passed in then each sample is assumed to be the same weight. Default is None.

inertia : double, optional

The weight of the previous parameters of the model. The new parameters will roughly be old_param*inertia + new_param*(1-inertia), so an inertia of 0 means ignore the old parameters, whereas an inertia of 1 means ignore the new parameters. Default is 0.0.

Returns:

None

from_json()

Read in a serialized model and return the appropriate classifier.

Parameters:

s : str

A JSON formatted string containing the file.

Returns:

model : object

A properly initialized and baked model.

from_samples()

Learn the Markov chain from data.

Takes in the memory of the chain (k) and learns the initial distribution and probability tables associated with the proper parameters.

Parameters:

X : array-like, list or numpy.array

The data to fit the structure too as a list of sequences of variable length. Since the data will be of variable length, there is no set form

weights : array-like, shape (n_nodes), optional

The weight of each sample as a positive double. Default is None.

k : int, optional

The number of samples back to condition on in the model. Default is 1.

Returns:

model : MarkovChain

The learned markov chain model.

from_summaries()

Fit the model to the collected sufficient statistics.

Fit the parameters of the model to the sufficient statistics gathered during the summarize calls. This should return an exact update.

Parameters:

inertia : double, optional

The weight of the previous parameters of the model. The new parameters will roughly be old_param*inertia + new_param * (1-inertia), so an inertia of 0 means ignore the old parameters, whereas an inertia of 1 means ignore the new parameters. Default is 0.0.

Returns:

None

log_probability()

Calculate the log probability of the sequence under the model.

This calculates the first slices of increasing size under the corresponding first few components of the model until size k is reached, at which all slices are evaluated under the final component.

Parameters:

sequence : array-like

An array of observations

Returns:

logp : double

The log probability of the sequence under the model.

sample()

Create a random sample from the model.

Parameters:

length : int or Distribution

Give either the length of the sample you want to generate, or a distribution object which will be randomly sampled for the length. Continuous distributions will have their sample rounded to the nearest integer, minimum 1.

Returns:

sequence : array-like, shape = (length,)

A sequence randomly generated from the markov chain.

summarize()

Summarize a batch of data and store sufficient statistics.

This will summarize the sequences into sufficient statistics stored in each distribution.

Parameters:

sequences : array-like, shape (n_samples, variable)

This is the data to train on. Each row is a sample which contains a sequence of variable length

weights : array-like, shape (n_samples,), optional

The initial weights of each sample. If nothing is passed in then each sample is assumed to be the same weight. Default is None.

Returns:

None

to_json()

Serialize the model to a JSON.

Parameters:

separators : tuple, optional

The two separaters to pass to the json.dumps function for formatting. Default is (‘,’, ‘ : ‘).

indent : int, optional

The indentation to use at each level. Passed to json.dumps for formatting. Default is 4.

Returns:

json : str

A properly formatted JSON object.