# Hidden Markov Models¶

Hidden Markov models (HMMs) are a form of structured model in which a sequence of observations are labelled according to the hidden state they belong. A strength of HMMs is their ability to analyze variable length sequences whereas other models require a static set of features. This makes them extensively used in the fields of natural language processing and bioinformatics where data is routinely variable length sequences. They can be thought of as a structured form of General Mixture Models.

## Initialization¶

Transitioning from YAHMM to pomegranate is simple because the only change is that the Model class is now HiddenMarkovModel. You can port your code over by either changing Model to HiddenMarkovModel or changing your imports at the top of the file as follows:

>>> from pomegranate import *
>>> from pomegranate import HiddenMarkovModel as Model


>>> from yahmm import *


Since hidden Markov models are graphical structures, that structure has to be defined. pomegranate allows you to define this structure either through matrices as is common in other packages, or build it up state by state and edge by edge. pomegranate differs from other packages in that it offers both explicit start and end states which you must begin in or end in. Explicit end states give you more control over the model because algorithms require ending there, as opposed to in any state in the model. It also offers silent states, which are states without explicit emission distributions but can be used to significantly simplify the graphical structure.

>>> from pomegranate import *
>>> dists = [NormalDistribution(5, 1), NormalDistribution(1, 7), NormalDistribution(8,2)]
>>> trans_mat = numpy.array([[0.7, 0.3, 0.0],
[0.0, 0.8, 0.2],
[0.0, 0.0, 0.9]])
>>> starts = numpy.array([1.0, 0.0, 0.0])
>>> ends = numpy.array([0.0, 0.0, 0.1])
>>> model = HiddenMarkovModel.from_matrix(trans_mat, dists, starts, ends)


Alternatively this model could be created edge by edge and state by state. This is helpful for large sparse graphs. You must add transitions using the explicit start and end states where the sum of probabilities leaving a state sums to 1.0.

>>> from pomegranate import *
>>> s1 = State( Distribution( NormalDistribution(5, 1) ) )
>>> s2 = State( Distribution( NormalDistribution(1, 7) ) )
>>> s3 = State( Distribution( NormalDistribution(8, 2) ) )
>>> model = HiddenMarkovModel()
>>> model.bake()


Models built in this manner must be explicitly “baked” at the end. This finalizes the model topology and creates the internal sparse matrix which makes up the model. This removes “orphan” parts of the model, normalizes all transitions to make sure they sum to 1.0, stores information about tied distributions, edges, and pseudocounts, and merges unneccesary silent states in the model for computational efficiency. This can cause the bake step to take a little bit of time. If you want to reduce this overhead and are sure you specified the model correctly you can pass in merge=”None” to the bake step to avoid model checking.

## Log Probability¶

There are two common forms of the log probability which are used. The first is the log probability of the most likely path the sequence can take through the model, called the Viterbi probability. This can be calculated using model.viterbi(sequence). However, this is $$P(D|S_{ML}, S_{ML}, S_{ML})$$ not $$P(D|M)$$. In order to get $$P(D|M)$$ we have to sum over all possible paths instead of just the single most likely path, which can be calculated using model.log_probability(sequence) using the forward or backward algorithms. On that note, the full forward matrix can be returned using model.forward(sequence) and the full backward matrix can be returned using model.backward(sequence).

## Prediction¶

A common prediction technique is calculating the Viterbi path, which is the most likely sequence of states that generated the sequence given the full model. This is solved using a simple dynamic programming algorithm similar to sequence alignment in bioinformatics. This can be called using model.viterbi(sequence). A sklearn wrapper can be called using model.predict(sequence, algorithm='viterbi').

Another prediction technique is called maximum a posteriori or forward-backward, which uses the forward and backward algorithms to calculate the most likely state per observation in the sequence given the entire remaining alignment. Much like the forward algorithm can calculate the sum-of-all-paths probability instead of the most likely single path, the forward-backward algorithm calculates the best sum-of-all-paths state assignment instead of calculating the single best path. This can be called using model.predict(sequence, algorithm='map') and the raw normalized probability matrices can be called using model.predict_proba(sequence).

## Fitting¶

A simple fitting algorithm for hidden Markov models is called Viterbi training. In this method, each observation is tagged with the most likely state to generate it using the Viterbi algorithm. The distributions (emissions) of each states are then updated using MLE estimates on the observations which were generated from them, and the transition matrix is updated by looking at pairs of adjacent state taggings. This can be done using model.fit(sequence, algorithm='viterbi').

However, this is not the best way to do training and much like the other sections there is a way of doing training using sum-of-all-paths probabilities instead of maximally likely path. This is called Baum-Welch or forward-backward training. Instead of using hard assignments based on the Viterbi path, observations are given weights equal to the probability of them having been generated by that state. Weighted MLE can then be done to updates the distributions, and the soft transition matrix can give a more precise probability estimate. This is the default training algorithm, and can be called using either model.fit(sequences) or explicitly using model.fit(sequences, algorithm='baum-welch').

Fitting in pomegranate also has a number of options, including the use of distribution or edge inertia, freezing certain states, tying distributions or edges, and using pseudocounts. See the tutorial linked to at the top of this page for full details on each of these options.

## API Reference¶

class pomegranate.hmm.HiddenMarkovModel

A Hidden Markov Model

A Hidden Markov Model (HMM) is a directed graphical model where nodes are hidden states which contain an observed emission distribution and edges contain the probability of transitioning from one hidden state to another. HMMs allow you to tag each observation in a variable length sequence with the most likely hidden state according to the model.

Parameters: name : str, optional The name of the model. Default is None. start : State, optional An optional state to force the model to start in. Default is None. end : State, optional An optional state to force the model to end in. Default is None.

Examples

>>> from pomegranate import *
>>> d1 = DiscreteDistribution({'A' : 0.35, 'C' : 0.20, 'G' : 0.05, 'T' : 40})
>>> d2 = DiscreteDistribution({'A' : 0.25, 'C' : 0.25, 'G' : 0.25, 'T' : 25})
>>> d3 = DiscreteDistribution({'A' : 0.10, 'C' : 0.40, 'G' : 0.40, 'T' : 10})
>>>
>>> s1 = State( d1, name="s1" )
>>> s2 = State( d2, name="s2" )
>>> s3 = State( d3, name="s3" )
>>>
>>> model = HiddenMarkovModel('example')
>>> model.add_transition( model.start, s1, 0.90 )
>>> model.add_transition( model.start, s2, 0.10 )
>>> model.add_transition( s1, s1, 0.80 )
>>> model.add_transition( s1, s2, 0.20 )
>>> model.add_transition( s2, s2, 0.90 )
>>> model.add_transition( s2, s3, 0.10 )
>>> model.add_transition( s3, s3, 0.70 )
>>> model.add_transition( s3, model.end, 0.30 )
>>> model.bake()
>>>
>>> print model.log_probability(list('ACGACTATTCGAT'))
-4.31828085576
>>> print ", ".join( state.name for i, state in model.viterbi(list('ACGACTATTCGAT'))[1] )
example-start, s1, s2, s2, s2, s2, s2, s2, s2, s2, s2, s2, s2, s3, example-end


Attributes

 start (State) A state object corresponding to the initial start of the model end (State) A state object corresponding to the forced end of the model start_index (int) The index of the start object in the state list end_index (int) The index of the end object in the state list silent_start (int) The index of the beginning of the silent states in the state list states (list) The list of all states in the model, with silent states at the end
add_edge()

Add a transition from state a to state b which indicates that B is dependent on A in ways specified by the distribution.

add_model()

Add the states and edges of another model to this model.

Parameters: other : HiddenMarkovModel The other model to add None
add_node()

Add a node to the graph.

add_nodes()

Add multiple states to the graph.

add_state()

Add a state to the given model.

The state must not already be in the model, nor may it be part of any other model that will eventually be combined with this one.

Parameters: state : State A state object to be added to the model. None
add_states()

Add multiple states to the model at the same time.

Parameters: states : list or generator Either a list of states which are entered sequentially, or just comma separated values, for example model.add_states(a, b, c, d). None
add_transition()

Add a transition from state a to state b.

Add a transition from state a to state b with the given (non-log) probability. Both states must be in the HMM already. self.start and self.end are valid arguments here. Probabilities will be normalized such that every node has edges summing to 1. leaving that node, but only when the model is baked. Psueodocounts are allowed as a way of using edge-specific pseudocounts for training.

By specifying a group as a string, you can tie edges together by giving them the same group. This means that a transition across one edge in the group counts as a transition across all edges in terms of training.

Parameters: a : State The state that the edge originates from b : State The state that the edge goes to probability : double The probability of transitioning from state a to state b in [0, 1] pseudocount : double, optional The pseudocount to use for this specific edge if using edge pseudocounts for training. Defaults to the probability. Default is None. group : str, optional The name of the group of edges to tie together during training. If groups are used, then a transition across any one edge counts as a transition across all edges. Default is None. None
add_transitions()

Add many transitions at the same time,

Parameters: a : State or list Either a state or a list of states where the edges originate. b : State or list Either a state or a list of states where the edges go to. probabilities : list The probabilities associated with each transition. pseudocounts : list, optional The pseudocounts associated with each transition. Default is None. groups : list, optional The groups of each edge. Default is None. None

Examples

>>> model.add_transitions([model.start, s1], [s1, model.end], [1., 1.])
>>> model.add_transitions([model.start, s1, s2, s3], s4, [0.2, 0.4, 0.3, 0.9])
>>> model.add_transitions(model.start, [s1, s2, s3], [0.6, 0.2, 0.05])

backward()

Run the backward algorithm on the sequence.

Calculate the probability of each observation being aligned to each state by going backward through a sequence. Returns the full backward matrix. Each index i, j corresponds to the sum-of-all-paths log probability of starting at the end of the sequence, and aligning observations to hidden states in such a manner that observation i was aligned to hidden state j. Uses row normalization to dynamically scale each row to prevent underflow errors.

If the sequence is impossible, will return a matrix of nans.

• Silent state handling taken from p. 71 of “Biological

Sequence Analysis” by Durbin et al., and works for anything which does not have loops of silent states.

• Row normalization technique explained by
Parameters: sequence : array-like An array (or list) of observations. matrix : array-like, shape (len(sequence), n_states) The probability of aligning the sequences to states in a backward fashion.
bake()

Finalize the topology of the model.

Finalize the topology of the model and assign a numerical index to every state. This method must be called before any of the probability- calculating methods.

This fills in self.states (a list of all states in order) and self.transition_log_probabilities (log probabilities for transitions), as well as self.start_index and self.end_index, and self.silent_start (the index of the first silent state).

Parameters: verbose : bool, optional Return a log of changes made to the model during normalization or merging. Default is False. merge : “None”, “Partial, “All” Merging has three options: “None”: No modifications will be made to the model. “Partial”: A silent state which only has a probability 1 transition to another silent state will be merged with that silent state. This means that if silent state “S1” has a single transition to silent state “S2”, that all transitions to S1 will now go to S2, with the same probability as before, and S1 will be removed from the model. “All”: A silent state with a probability 1 transition to any other state, silent or symbol emitting, will be merged in the manner described above. In addition, any orphan states will be removed from the model. An orphan state is a state which does not have any transitions to it OR does not have any transitions from it, except for the start and end of the model. This will iteratively remove orphan chains from the model. This is sometimes desirable, as all states should have both a transition in to get to that state, and a transition out, even if it is only to itself. If the state does not have either, the HMM will likely not work as intended. Default is ‘All’. None
clear_summaries()

Clear the summary statistics stored in the object.

Parameters: None None
concatenate()

Concatenate this model to another model.

Concatenate this model to another model in such a way that a single probability 1 edge is added between self.end and other.start. Rename all other states appropriately by adding a suffix or prefix if needed.

Parameters: other : HiddenMarkovModel The other model to concatenate suffix : str, optional Add the suffix to the end of all state names in the other model. Default is ‘’. prefix : str, optional Add the prefix to the beginning of all state names in the other model. Default is ‘’. None
copy()

Returns a deep copy of the HMM.

Parameters: None model : HiddenMarkovModel A deep copy of the model with entirely new objects.
dense_transition_matrix()

Returns the dense transition matrix.

Parameters: None matrix : numpy.ndarray, shape (n_states, n_states) A dense transition matrix, containing the log probability of transitioning from each state to each other state.
edge_count()

Returns the number of edges present in the model.

fit()

Fit the model to data using either Baum-Welch or Viterbi training.

Given a list of sequences, performs re-estimation on the model parameters. The two supported algorithms are “baum-welch” and “viterbi,” indicating their respective algorithm.

Training supports a wide variety of other options including using edge pseudocounts and either edge or distribution inertia.

Parameters: sequences : array-like An array of some sort (list, numpy.ndarray, tuple..) of sequences, where each sequence is a numpy array, which is 1 dimensional if the HMM is a one dimensional array, or multidimensional of the HMM supports multiple dimensions. weights : array-like or None, optional An array of weights, one for each sequence to train on. If None, all sequences are equaly weighted. Default is None. stop_threshold : double, optional The threshold the improvement ratio of the models log probability in fitting the scores. Default is 1e-9. min_iterations : int, optional The minimum number of iterations to run Baum-Welch training for. Default is 0. max_iterations : int, optional The maximum number of iterations to run Baum-Welch training for. Default is 1e8. algorithm : ‘baum-welch’, ‘viterbi’ The training algorithm to use. Baum-Welch uses the forward-backward algorithm to train using a version of structured EM. Viterbi iteratively runs the sequences through the Viterbi algorithm and then uses hard assignments of observations to states using that. Default is ‘baum-welch’. verbose : bool, optional Whether to print the improvement in the model fitting at each iteration. Default is True. transition_pseudocount : int, optional A pseudocount to add to all transitions to add a prior to the MLE estimate of the transition probability. Default is 0. use_pseudocount : bool, optional Whether to use pseudocounts when updatiing the transition probability parameters. Default is False. inertia : double or None, optional, range [0, 1] If double, will set both edge_inertia and distribution_inertia to be that value. If None, will not override those values. Default is None. edge_inertia : bool, optional, range [0, 1] Whether to use inertia when updating the transition probability parameters. Default is 0.0. distribution_inertia : double, optional, range [0, 1] Whether to use inertia when updating the distribution parameters. Default is 0.0. n_jobs : int, optional The number of threads to use when performing training. This leads to exact updates. Default is 1. improvement : double The total improvement in fitting the model to the data
forward()

Run the forward algorithm on the sequence.

Calculate the probability of each observation being aligned to each state by going forward through a sequence. Returns the full forward matrix. Each index i, j corresponds to the sum-of-all-paths log probability of starting at the beginning of the sequence, and aligning observations to hidden states in such a manner that observation i was aligned to hidden state j. Uses row normalization to dynamically scale each row to prevent underflow errors.

If the sequence is impossible, will return a matrix of nans.

• Silent state handling taken from p. 71 of “Biological

Sequence Analysis” by Durbin et al., and works for anything which does not have loops of silent states.

• Row normalization technique explained by
Parameters: sequence : array-like An array (or list) of observations. matrix : array-like, shape (len(sequence), n_states) The probability of aligning the sequences to states in a forward fashion.
forward_backward()

Run the forward-backward algorithm on the sequence.

This algorithm returns an emission matrix and a transition matrix. The emission matrix returns the normalized probability that each each state generated that emission given both the symbol and the entire sequence. The transition matrix returns the expected number of times that a transition is used.

If the sequence is impossible, will return (None, None)

• Forward and backward algorithm implementations. A comprehensive

description of the forward, backward, and forward-background algorithm is here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm

Parameters: sequence : array-like An array (or list) of observations. emissions : array-like, shape (len(sequence), n_nonsilent_states) The normalized probabilities of each state generating each emission. transitions : array-like, shape (n_states, n_states) The expected number of transitions across each edge in the model.
freeze()

Freeze the distribution, preventing updates from occuring.

freeze_distributions()

Freeze all the distributions in model.

Upon training only edges will be updated. The parameters of distributions will not be affected.

Parameters: None None
from_json()

Read in a serialized model and return the appropriate classifier.

Parameters: s : str A JSON formatted string containing the file. Returns ——- model : object A properly initialized and baked model.
from_matrix()

Create a model from a more standard matrix format.

Take in a 2D matrix of floats of size n by n, which are the transition probabilities to go from any state to any other state. May also take in a list of length n representing the names of these nodes, and a model name. Must provide the matrix, and a list of size n representing the distribution you wish to use for that state, a list of size n indicating the probability of starting in a state, and a list of size n indicating the probability of ending in a state.

Parameters: transition_probabilities : array-like, shape (n_states, n_states) The probabilities of each state transitioning to each other state. distributions : array-like, shape (n_states) The distributions for each state. Silent states are indicated by using None instead of a distribution object. starts : array-like, shape (n_states) The probabilities of starting in each of the states. ends : array-like, shape (n_states), optional If passed in, the probabilities of ending in each of the states. If ends is None, then assumes the model has no explicit end state. Default is None. state_names : array-like, shape (n_states), optional The name of the states. If None is passed in, default names are generated. Default is None name : str, optional The name of the model. Default is None verbose : bool, optional The verbose parameter for the underlying bake method. Default is False. merge : ‘None’, ‘Partial’, ‘All’, optional The merge parameter for the underlying bake method. Default is All model : HiddenMarkovModel The baked model ready to go.

Examples

matrix = [ [ 0.4, 0.5 ], [ 0.4, 0.5 ] ] distributions = [NormalDistribution(1, .5), NormalDistribution(5, 2)] starts = [ 1., 0. ] ends = [ .1., .1 ] state_names= [ “A”, “B” ]

model = Model.from_matrix( matrix, distributions, starts, ends,
state_names, name=”test_model” )
from_summaries()

Fit the model to the stored summary statistics.

Parameters: inertia : double or None, optional The inertia to use for both edges and distributions without needing to set both of them. If None, use the values passed in to those variables. Default is None. transition_pseudocount : int, optional A pseudocount to add to all transitions to add a prior to the MLE estimate of the transition probability. Default is 0. use_pseudocount : bool, optional Whether to use pseudocounts when updatiing the transition probability parameters. Default is False. edge_inertia : bool, optional, range [0, 1] Whether to use inertia when updating the transition probability parameters. Default is 0.0. distribution_inertia : double, optional, range [0, 1] Whether to use inertia when updating the distribution parameters. Default is 0.0. None
log_probability()

Calculate the log probability of a single sequence.

If a path is provided, calculate the log probability of that sequence given the path.

Parameters: sequence : array-like Return the array of observations in a single sequence of data check_input : bool, optional Check to make sure that all emissions fall under the support of the emission distributions. Default is True. logp : double The log probability of the sequence
maximum_a_posteriori()

Run posterior decoding on the sequence.

MAP decoding is an alternative to viterbi decoding, which returns the most likely state for each observation, based on the forward-backward algorithm. This is also called posterior decoding. This method is described on p. 14 of http://ai.stanford.edu/~serafim/CS262_2007/ notes/lecture5.pdf

WARNING: This may produce impossible sequences.

Parameters: sequence : array-like An array (or list) of observations. logp : double The log probability of the sequence under the Viterbi path path : list of tuples Tuples of (state index, state object) of the states along the posterior path.
node_count()

Returns the number of nodes/states in the model

plot()

Draw this model’s graph using NetworkX and matplotlib.

Note that this relies on networkx’s built-in graphing capabilities (and not Graphviz) and thus can’t draw self-loops.

See networkx.draw_networkx() for the keywords you can pass in.

Parameters: precision : int, optional The precision with which to round edge probabilities. Default is 4. **kwargs : any The arguments to pass into networkx.draw_networkx() None
predict()

Calculate the most likely state for each observation.

This can be either the Viterbi algorithm or maximum a posteriori. It returns the probability of the sequence under that state sequence and the actual state sequence.

This is a sklearn wrapper for the Viterbi and maximum_a_posteriori methods.

Parameters: sequence : array-like An array (or list) of observations. algorithm : “map”, “viterbi” The algorithm with which to decode the sequence logp : double The log probability of the sequence under the Viterbi path path : list of tuples Tuples of (state index, state object) of the states along the Viterbi path.
predict_log_proba()

Calculate the state log probabilities for each observation in the sequence.

Run the forward-backward algorithm on the sequence and return the emission matrix. This is the log normalized probability that each each state generated that emission given both the symbol and the entire sequence.

This is a sklearn wrapper for the forward backward algorithm.

• Forward and backward algorithm implementations. A comprehensive

description of the forward, backward, and forward-background algorithm is here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm

Parameters: sequence : array-like An array (or list) of observations. emissions : array-like, shape (len(sequence), n_nonsilent_states) The log normalized probabilities of each state generating each emission.
predict_proba()

Calculate the state probabilities for each observation in the sequence.

Run the forward-backward algorithm on the sequence and return the emission matrix. This is the normalized probability that each each state generated that emission given both the symbol and the entire sequence.

This is a sklearn wrapper for the forward backward algorithm.

• Forward and backward algorithm implementations. A comprehensive

description of the forward, backward, and forward-background algorithm is here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm

Parameters: sequence : array-like An array (or list) of observations. emissions : array-like, shape (len(sequence), n_nonsilent_states) The normalized probabilities of each state generating each emission.
probability()

Return the probability of the given symbol under this distribution.

Parameters: symbol : object The symbol to calculate the probability of probability : double The probability of that point under the distribution.
sample()

Generate a sequence from the model.

Returns the sequence generated, as a list of emitted items. The model must have been baked first in order to run this method.

If a length is specified and the HMM is infinite (no edges to the end state), then that number of samples will be randomly generated. If the length is specified and the HMM is finite, the method will attempt to generate a prefix of that length. Currently it will force itself to not take an end transition unless that is the only path, making it not a true random sample on a finite model.

WARNING: If the HMM has no explicit end state, must specify a length to use.

Parameters: length : int, optional Generate a sequence with a maximal length of this size. Used if you have no explicit end state. Default is 0. path : bool, optional Return the path of hidden states in addition to the emissions. If true will return a tuple of (sample, path). Default is False. sample : list or tuple If path is true, return a tuple of (sample, path), otherwise return just the samples.
state_count()

Returns the number of states present in the model.

summarize()

Summarize data into stored sufficient statistics for out-of-core training. Only implemented for Baum-Welch training since Viterbi is less memory intensive.

Parameters: sequences : array-like An array of some sort (list, numpy.ndarray, tuple..) of sequences, where each sequence is a numpy array, which is 1 dimensional if the HMM is a one dimensional array, or multidimensional of the HMM supports multiple dimensions. weights : array-like or None, optional An array of weights, one for each sequence to train on. If None, all sequences are equaly weighted. Default is None. n_jobs : int, optional The number of threads to use when performing training. This leads to exact updates. Default is 1. algorithm : ‘baum-welch’ or ‘viterbi’, optional The algorithm to use to collect the statistics, either Baum-Welch or Viterbi training. Defaults to Baum-Welch. parallel : joblib.Parallel or None, optional The joblib threadpool. Passed between iterations of Baum-Welch so that a new threadpool doesn’t have to be created each iteration. Default is None. check_input : bool, optional Check the input. This casts the input sequences as numpy arrays, and converts non-numeric inputs into numeric inputs for faster processing later. Default is True. logp : double The log probability of the sequences.
thaw()

Thaw the distribution, re-allowing updates to occur.

thaw_distributions()

Thaw all distributions in the model.

Upon training distributions will be updated again.

Parameters: None None
to_json()

Serialize the model to a JSON.

Parameters: separators : tuple, optional The two separaters to pass to the json.dumps function for formatting. indent : int, optional The indentation to use at each level. Passed to json.dumps for formatting. json : str A properly formatted JSON object.
viterbi()

Run the Viteri algorithm on the sequence.

Run the Viterbi algorithm on the sequence given the model. This finds the ML path of hidden states given the sequence. Returns a tuple of the log probability of the ML path, or (-inf, None) if the sequence is impossible under the model. If a path is returned, it is a list of tuples of the form (sequence index, state object).

This is fundamentally the same as the forward algorithm using max instead of sum, except the traceback is more complicated, because silent states in the current step can trace back to other silent states in the current step as well as states in the previous step.

pomegranate.hmm.log()