Hidden Markov Models¶

Hidden Markov models (HMMs) are a structured probabilistic model that forms a probability distribution of sequences, as opposed to individual symbols. It is similar to a Bayesian network in that it has a directed graphical structure where nodes represent probability distributions, but unlike Bayesian networks in that the edges represent transitions and encode transition probabilities, whereas in Bayesian networks edges encode dependence statements. A HMM can be thought of as a general mixture model plus a transition matrix, where each component in the general Mixture model corresponds to a node in the hidden Markov model, and the transition matrix informs the probability that adjacent symbols in the sequence transition from being generated from one component to another. A strength of HMMs is that they can model variable length sequences whereas other models typically require a fixed feature set. They are extensively used in the fields of natural language processing to model speech, bioinformatics to model biosequences, and robotics to model movement.

The HMM implementation in pomegranate is based off of the implementation in its predecessor, Yet Another Hidden Markov Model (YAHMM). To convert a script that used YAHMM to a script using pomegranate, you only need to change calls to the Model class to call HiddenMarkovModel. For example, a script that previously looked like the following:

from yahmm import *
model = Model()

would now be written as

from pomegranate import *
model = HiddenMarkovModel()

and the remaining method calls should be identical.

Initialization¶

Hidden Markov models can be initialized in one of two ways depending on if you know the initial parameters of the model, either (1) by defining both the distributions and the graphical structure manually, or (2) running the from_samples method to learn both the structure and distributions directly from data. The first initialization method can be used either to specify a pre-defined model that is ready to make predictions, or as the initialization to a training algorithm such as Baum-Welch. It is flexible enough to allow sparse transition matrices and any type of distribution on each node, i.e. normal distributions on several nodes, but a mixture of normals on some nodes modeling more complex phenomena. The second initialization method is less flexible, in that currently each node must have the same distribution type, and that it will only learn dense graphs. Similar to mixture models, this initialization method starts with k-means to initialize the distributions and a uniform probability transition matrix before running Baum-Welch.

If you are initializing the parameters manually, you can do so either by passing in a list of distributions and a transition matrix, or by building the model line-by-line. Let’s first take a look at building the model from a list of distributions and a transition matrix.

from pomegranate import *
dists = [NormalDistribution(5, 1), NormalDistribution(1, 7), NormalDistribution(8,2)]
trans_mat = numpy.array([[0.7, 0.3, 0.0],
                             [0.0, 0.8, 0.2],
                             [0.0, 0.0, 0.9]])
starts = numpy.array([1.0, 0.0, 0.0])
ends = numpy.array([0.0, 0.0, 0.1])
model = HiddenMarkovModel.from_matrix(trans_mat, dists, starts, ends)

Next, let’s take a look at building the same model line by line.

from pomegranate import *
s1 = State(Distribution(NormalDistribution(5, 1)))
s2 = State(Distribution(NormalDistribution(1, 7)))
s3 = State(Distribution(NormalDistribution(8, 2)))
model = HiddenMarkovModel()
model.add_states(s1, s2, s3)
model.add_transition(model.start, s1, 1.0)
model.add_transition(s1, s1, 0.7)
model.add_transition(s1, s2, 0.3)
model.add_transition(s2, s2, 0.8)
model.add_transition(s2, s3, 0.2)
model.add_transition(s3, s3, 0.9)
model.add_transition(s3, model.end, 0.1)
model.bake()

Initially it may seem that the first method is far easier due to it being fewer lines of code. However, when building large sparse models defining a full transition matrix can be cumbersome, especially when it is mostly 0s.

Models built in this manner must be explicitly “baked” at the end. This finalizes the model topology and creates the internal sparse matrix which makes up the model. This step also automatically normalizes all transitions to make sure they sum to 1.0, stores information about tied distributions, edges, pseudocounts, and merges unneccesary silent states in the model for computational efficiency. This can cause the bake step to take a little bit of time. If you want to reduce this overhead and are sure you specified the model correctly you can pass in merge=”None” to the bake step to avoid model checking.

The second way to initialize models is to use the from_samples class method. The call is identical to initializing a mixture model.

from pomegranate import *
model = HiddenMarkovModel.from_samples(NormalDistribution, n_components=5, X=X)

Much like a mixture model, all arguments present in the fit step can also be passed in to this method. Also like a mixture model, it is initialized by running k-means on the concatenation of all data, ignoring that the symbols are part of a structured sequence. The clusters returned are used to initialize all parameters of the distributions, i.e. both mean and covariances for multivariate Gaussian distributions. The transition matrix is initialized as uniform random probabilities. After the components (distributions on the nodes) are initialized, the given training algorithm is used to refine the parameters of the distributions and learn the appropriate transition probabilities.

Log Probability¶

There are two common forms of the log probability which are used. The first is the log probability of the most likely path the sequence can take through the model, called the Viterbi probability. This can be calculated using model.viterbi(sequence). However, this is \(P(D|S_{ML}, S_{ML}, S_{ML})\) not \(P(D|M)\). In order to get \(P(D|M)\) we have to sum over all possible paths instead of just the single most likely path. This can be calculated using model.log_probability(sequence) and uses the forward algorithm internally. On that note, the full forward matrix can be returned using model.forward(sequence) and the full backward matrix can be returned using model.backward(sequence), while the full forward-backward emission and transition matrices can be returned using model.forward_backward(sequence).

Prediction¶

A common prediction technique is calculating the Viterbi path, which is the most likely sequence of states that generated the sequence given the full model. This is solved using a simple dynamic programming algorithm similar to sequence alignment in bioinformatics. This can be called using model.viterbi(sequence). A sklearn wrapper can be called using model.predict(sequence, algorithm='viterbi').

Another prediction technique is called maximum a posteriori or forward-backward, which uses the forward and backward algorithms to calculate the most likely state per observation in the sequence given the entire remaining alignment. Much like the forward algorithm can calculate the sum-of-all-paths probability instead of the most likely single path, the forward-backward algorithm calculates the best sum-of-all-paths state assignment instead of calculating the single best path. This can be called using model.predict(sequence, algorithm='map') and the raw normalized probability matrices can be called using model.predict_proba(sequence).

Fitting¶

A simple fitting algorithm for hidden Markov models is called Viterbi training. In this method, each observation is tagged with the most likely state to generate it using the Viterbi algorithm. The distributions (emissions) of each states are then updated using MLE estimates on the observations which were generated from them, and the transition matrix is updated by looking at pairs of adjacent state taggings. This can be done using model.fit(sequence, algorithm='viterbi').

However, this is not the best way to do training and much like the other sections there is a way of doing training using sum-of-all-paths probabilities instead of maximally likely path. This is called Baum-Welch or forward-backward training. Instead of using hard assignments based on the Viterbi path, observations are given weights equal to the probability of them having been generated by that state. Weighted MLE can then be done to update the distributions, and the soft transition matrix can give a more precise probability estimate. This is the default training algorithm, and can be called using either model.fit(sequences) or explicitly using model.fit(sequences, algorithm='baum-welch').

Fitting in pomegranate also has a number of options, including the use of distribution or edge inertia, freezing certain states, tying distributions or edges, and using pseudocounts. See the tutorial linked to at the top of this page for full details on each of these options.

API Reference¶

class pomegranate.hmm.HiddenMarkovModel¶

A Hidden Markov Model

A Hidden Markov Model (HMM) is a directed graphical model where nodes are hidden states which contain an observed emission distribution and edges contain the probability of transitioning from one hidden state to another. HMMs allow you to tag each observation in a variable length sequence with the most likely hidden state according to the model.

Parameters:	name : str, optional The name of the model. Default is None. start : State, optional An optional state to force the model to start in. Default is None. end : State, optional An optional state to force the model to end in. Default is None.

Examples

>>> from pomegranate import *
>>> d1 = DiscreteDistribution({'A' : 0.35, 'C' : 0.20, 'G' : 0.05, 'T' : 40})
>>> d2 = DiscreteDistribution({'A' : 0.25, 'C' : 0.25, 'G' : 0.25, 'T' : 25})
>>> d3 = DiscreteDistribution({'A' : 0.10, 'C' : 0.40, 'G' : 0.40, 'T' : 10})
>>>
>>> s1 = State(d1, name="s1")
>>> s2 = State(d2, name="s2")
>>> s3 = State(d3, name="s3")
>>>
>>> model = HiddenMarkovModel('example')
>>> model.add_states([s1, s2, s3])
>>> model.add_transition(model.start, s1, 0.90)
>>> model.add_transition(model.start, s2, 0.10)
>>> model.add_transition(s1, s1, 0.80)
>>> model.add_transition(s1, s2, 0.20)
>>> model.add_transition(s2, s2, 0.90)
>>> model.add_transition(s2, s3, 0.10)
>>> model.add_transition(s3, s3, 0.70)
>>> model.add_transition(s3, model.end, 0.30)
>>> model.bake()
>>>
>>> print model.log_probability(list('ACGACTATTCGAT'))
-4.31828085576
>>> print ", ".join(state.name for i, state in model.viterbi(list('ACGACTATTCGAT'))[1])
example-start, s1, s2, s2, s2, s2, s2, s2, s2, s2, s2, s2, s2, s3, example-end

Attributes:

start : State: A state object corresponding to the initial start of the model
end : State: A state object corresponding to the forced end of the model
start_index : int: The index of the start object in the state list
end_index : int: The index of the end object in the state list
silent_start : int: The index of the beginning of the silent states in the state list
states : list: The list of all states in the model, with silent states at the end

add_edge()¶: Add a transition from state a to state b which indicates that B is dependent on A in ways specified by the distribution.

add_model()¶

Add the states and edges of another model to this model.

Parameters:	other : HiddenMarkovModel The other model to add
Returns:	None

add_node()¶: Add a node to the graph.

add_nodes()¶: Add multiple states to the graph.

add_state()¶

Add a state to the given model.

The state must not already be in the model, nor may it be part of any other model that will eventually be combined with this one.

Parameters:	state : State A state object to be added to the model.
Returns:	None

add_states()¶

Add multiple states to the model at the same time.

Parameters:	states : list or generator Either a list of states which are entered sequentially, or just comma separated values, for example model.add_states(a, b, c, d).
Returns:	None

add_transition()¶

Add a transition from state a to state b.

Add a transition from state a to state b with the given (non-log) probability. Both states must be in the HMM already. self.start and self.end are valid arguments here. Probabilities will be normalized such that every node has edges summing to 1. leaving that node, but only when the model is baked. Psueodocounts are allowed as a way of using edge-specific pseudocounts for training.

By specifying a group as a string, you can tie edges together by giving them the same group. This means that a transition across one edge in the group counts as a transition across all edges in terms of training.

Parameters:

a : State: The state that the edge originates from
b : State: The state that the edge goes to
probability : double: The probability of transitioning from state a to state b in [0, 1]
pseudocount : double, optional: The pseudocount to use for this specific edge if using edge pseudocounts for training. Defaults to the probability. Default is None.
group : str, optional: The name of the group of edges to tie together during training. If groups are used, then a transition across any one edge counts as a transition across all edges. Default is None.

Returns:

None

add_transitions()¶

Add many transitions at the same time,

Parameters:

a : State or list: Either a state or a list of states where the edges originate.
b : State or list: Either a state or a list of states where the edges go to.
probabilities : list: The probabilities associated with each transition.
pseudocounts : list, optional: The pseudocounts associated with each transition. Default is None.
groups : list, optional: The groups of each edge. Default is None.

Returns:

None

Examples

>>> model.add_transitions([model.start, s1], [s1, model.end], [1., 1.])
>>> model.add_transitions([model.start, s1, s2, s3], s4, [0.2, 0.4, 0.3, 0.9])
>>> model.add_transitions(model.start, [s1, s2, s3], [0.6, 0.2, 0.05])

backward()¶

Run the backward algorithm on the sequence.

Calculate the probability of each observation being aligned to each state by going backward through a sequence. Returns the full backward matrix. Each index i, j corresponds to the sum-of-all-paths log probability of starting at the end of the sequence, and aligning observations to hidden states in such a manner that observation i was aligned to hidden state j. Uses row normalization to dynamically scale each row to prevent underflow errors.

If the sequence is impossible, will return a matrix of nans.

See also:

Silent state handling taken from p. 71 of “Biological

Sequence Analysis” by Durbin et al., and works for anything which does not have loops of silent states.

Row normalization technique explained by

http://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf on p. 14.

Parameters:	sequence : array-like An array (or list) of observations.
Returns:	matrix : array-like, shape (len(sequence), n_states) The probability of aligning the sequences to states in a backward fashion.

bake()¶

Finalize the topology of the model.

Finalize the topology of the model and assign a numerical index to every state. This method must be called before any of the probability- calculating methods.

This fills in self.states (a list of all states in order) and self.transition_log_probabilities (log probabilities for transitions), as well as self.start_index and self.end_index, and self.silent_start (the index of the first silent state).

Parameters:

verbose : bool, optional

Return a log of changes made to the model during normalization or merging. Default is False.

merge : “None”, “Partial, “All”

Merging has three options: “None”: No modifications will be made to the model. “Partial”: A silent state which only has a probability 1 transition

to another silent state will be merged with that silent state. This means that if silent state “S1” has a single transition to silent state “S2”, that all transitions to S1 will now go to S2, with the same probability as before, and S1 will be removed from the model.

“All”: A silent state with a probability 1 transition to any other: state, silent or symbol emitting, will be merged in the manner described above. In addition, any orphan states will be removed from the model. An orphan state is a state which does not have any transitions to it OR does not have any transitions from it, except for the start and end of the model. This will iteratively remove orphan chains from the model. This is sometimes desirable, as all states should have both a transition in to get to that state, and a transition out, even if it is only to itself. If the state does not have either, the HMM will likely not work as intended.

Default is ‘All’.

Returns:

None

clear_summaries()¶

Clear the summary statistics stored in the object.

Parameters:	None
Returns:	None

concatenate()¶

Concatenate this model to another model.

Concatenate this model to another model in such a way that a single probability 1 edge is added between self.end and other.start. Rename all other states appropriately by adding a suffix or prefix if needed.

Parameters:	other : HiddenMarkovModel The other model to concatenate suffix : str, optional Add the suffix to the end of all state names in the other model. Default is ‘’. prefix : str, optional Add the prefix to the beginning of all state names in the other model. Default is ‘’.
Returns:	None

copy()¶

Returns a deep copy of the HMM.

Parameters:	None
Returns:	model : HiddenMarkovModel A deep copy of the model with entirely new objects.

dense_transition_matrix()¶

Returns the dense transition matrix.

Parameters:	None
Returns:	matrix : numpy.ndarray, shape (n_states, n_states) A dense transition matrix, containing the log probability of transitioning from each state to each other state.

edge_count()¶: Returns the number of edges present in the model.

fit()¶

Fit the model to data using either Baum-Welch, Viterbi, or supervised training.

Given a list of sequences, performs re-estimation on the model parameters. The two supported algorithms are “baum-welch”, “viterbi”, and “labeled”, indicating their respective algorithm. “labeled” corresponds to supervised learning that requires passing in a matching list of labels for each symbol seen in the sequences.

Training supports a wide variety of other options including using edge pseudocounts and either edge or distribution inertia.

Parameters:

sequences : array-like: An array of some sort (list, numpy.ndarray, tuple..) of sequences, where each sequence is a numpy array, which is 1 dimensional if the HMM is a one dimensional array, or multidimensional if the HMM supports multiple dimensions.
weights : array-like or None, optional: An array of weights, one for each sequence to train on. If None, all sequences are equaly weighted. Default is None.
labels : array-like or None, optional: An array of state labels for each sequence. This is only used in ‘labeled’ training. If used this must be comprised of n lists where n is the number of sequences to train on, and each of those lists must have one label per observation. A None in this list corresponds to no labels for the entire sequence and triggers semi-supervised learning, where the labeled sequences are summarized using labeled fitting and the unlabeled are summarized using the specified algorithm. Default is None.
stop_threshold : double, optional: The threshold the improvement ratio of the models log probability in fitting the scores. Default is 1e-9.
min_iterations : int, optional: The minimum number of iterations to run Baum-Welch training for. Default is 0.
max_iterations : int, optional: The maximum number of iterations to run Baum-Welch training for. Default is 1e8.
algorithm : ‘baum-welch’, ‘viterbi’, ‘labeled’: The training algorithm to use. Baum-Welch uses the forward-backward algorithm to train using a version of structured EM. Viterbi iteratively runs the sequences through the Viterbi algorithm and then uses hard assignments of observations to states using that. Default is ‘baum-welch’. Labeled training requires that labels are provided for each observation in each sequence.
pseudocount : double, optional: A pseudocount to add to both transitions and emissions. If supplied, it will override both transition_pseudocount and emission_pseudocount in the same way that specifying inertia will override both edge_inertia and distribution_inertia. Default is None.
transition_pseudocount : double, optional: A pseudocount to add to all transitions to add a prior to the MLE estimate of the transition probability. Default is 0.
emission_pseudocount : double, optional: A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Only effects hidden Markov models defined over discrete distributions. Default is 0.
use_pseudocount : bool, optional: Whether to use the pseudocounts defined in the add_edge method for edge-specific pseudocounts when updating the transition probability parameters. Does not effect the transition_pseudocount and emission_pseudocount parameters, but can be used in addition to them. Default is False.
inertia : double or None, optional, range [0, 1]: If double, will set both edge_inertia and distribution_inertia to be that value. If None, will not override those values. Default is None.
edge_inertia : bool, optional, range [0, 1]: Whether to use inertia when updating the transition probability parameters. Default is 0.0.
distribution_inertia : double, optional, range [0, 1]: Whether to use inertia when updating the distribution parameters. Default is 0.0.
batch_size : int or None, optional: The number of samples in a batch to summarize on. This controls the size of the set sent to summarize and so does not make the update any less exact. This is useful when training on a memory map and cannot load all the data into memory. If set to None, batch_size is 1 / n_jobs. Default is None.
batches_per_epoch : int or None, optional: The number of batches in an epoch. This is the number of batches to summarize before calling from_summaries and updating the model parameters. This allows one to do minibatch updates by updating the model parameters before setting the full dataset. If set to None, uses the full dataset. Default is None.
lr_decay : double, optional, positive: The step size decay as a function of the number of iterations. Functionally, this sets the inertia to be (2+k)^{-lr_decay} where k is the number of iterations. This causes initial iterations to have more of an impact than later iterations, and is frequently used in minibatch learning. This value is suggested to be between 0.5 and 1. Default is 0, meaning no decay.
verbose : bool, optional: Whether to print the improvement in the model fitting at each iteration. Default is True.
n_jobs : int, optional: The number of threads to use when performing training. This leads to exact updates. Default is 1.

Returns:

improvement : double: The total improvement in fitting the model to the data

forward()¶

Run the forward algorithm on the sequence.

Calculate the probability of each observation being aligned to each state by going forward through a sequence. Returns the full forward matrix. Each index i, j corresponds to the sum-of-all-paths log probability of starting at the beginning of the sequence, and aligning observations to hidden states in such a manner that observation i was aligned to hidden state j. Uses row normalization to dynamically scale each row to prevent underflow errors.

If the sequence is impossible, will return a matrix of nans.

See also:

Silent state handling taken from p. 71 of “Biological

Sequence Analysis” by Durbin et al., and works for anything which does not have loops of silent states.

Row normalization technique explained by

http://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf on p. 14.

Parameters:	sequence : array-like An array (or list) of observations.
Returns:	matrix : array-like, shape (len(sequence), n_states) The probability of aligning the sequences to states in a forward fashion.

forward_backward()¶

Run the forward-backward algorithm on the sequence.

This algorithm returns an emission matrix and a transition matrix. The emission matrix returns the normalized probability that each each state generated that emission given both the symbol and the entire sequence. The transition matrix returns the expected number of times that a transition is used.

If the sequence is impossible, will return (None, None)

See also:

Forward and backward algorithm implementations. A comprehensive

description of the forward, backward, and forward-background algorithm is here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm

Parameters:	sequence : array-like An array (or list) of observations.
Returns:	emissions : array-like, shape (len(sequence), n_nonsilent_states) The normalized probabilities of each state generating each emission. transitions : array-like, shape (n_states, n_states) The expected number of transitions across each edge in the model.

freeze()¶: Freeze the distribution, preventing updates from occuring.

freeze_distributions()¶

Freeze all the distributions in model.

Upon training only edges will be updated. The parameters of distributions will not be affected.

Parameters:	None
Returns:	None

from_json()¶

Read in a serialized model and return the appropriate classifier.

Parameters:	s : str A JSON formatted string containing the file. Returns ——- model : object A properly initialized and baked model.

from_matrix()¶

Create a model from a more standard matrix format.

Take in a 2D matrix of floats of size n by n, which are the transition probabilities to go from any state to any other state. May also take in a list of length n representing the names of these nodes, and a model name. Must provide the matrix, and a list of size n representing the distribution you wish to use for that state, a list of size n indicating the probability of starting in a state, and a list of size n indicating the probability of ending in a state.

Parameters:

transition_probabilities : array-like, shape (n_states, n_states): The probabilities of each state transitioning to each other state.
distributions : array-like, shape (n_states): The distributions for each state. Silent states are indicated by using None instead of a distribution object.
starts : array-like, shape (n_states): The probabilities of starting in each of the states.
ends : array-like, shape (n_states), optional: If passed in, the probabilities of ending in each of the states. If ends is None, then assumes the model has no explicit end state. Default is None.
state_names : array-like, shape (n_states), optional: The name of the states. If None is passed in, default names are generated. Default is None
name : str, optional: The name of the model. Default is None
verbose : bool, optional: The verbose parameter for the underlying bake method. Default is False.
merge : ‘None’, ‘Partial’, ‘All’, optional: The merge parameter for the underlying bake method. Default is All

Returns:

model : HiddenMarkovModel: The baked model ready to go.

Examples

matrix = [[0.4, 0.5], [0.4, 0.5]] distributions = [NormalDistribution(1, .5), NormalDistribution(5, 2)] starts = [1., 0.] ends = [.1., .1] state_names= [“A”, “B”]

model = Model.from_matrix(matrix, distributions, starts, ends,: state_names, name=”test_model”)

from_samples()¶

Learn the transitions and emissions of a model directly from data.

This method will learn both the transition matrix, emission distributions, and start probabilities for each state. This will only return a dense graph without any silent states or explicit transitions to an end state. Currently all components must be defined as the same distribution, but soon this restriction will be removed.

If learning a multinomial HMM over discrete characters, the initial emisison probabilities are initialized randomly. If learning a continuous valued HMM, such as a Gaussian HMM, then kmeans clustering is used first to identify initial clusters.

Regardless of the type of model, the transition matrix and start probabilities are initialized uniformly. Then the specified learning algorithm (Baum-Welch recommended) is used to refine the parameters of the model.

Parameters:

distribution : callable: The emission distribution of the components of the model.
n_components : int: The number of states (or components) to initialize.
X : array-like: An array of some sort (list, numpy.ndarray, tuple..) of sequences, where each sequence is a numpy array, which is 1 dimensional if the HMM is a one dimensional array, or multidimensional if the HMM supports multiple dimensions.
weights : array-like or None, optional: An array of weights, one for each sequence to train on. If None, all sequences are equaly weighted. Default is None.
labels : array-like or None, optional: An array of state labels for each sequence. This is only used in ‘labeled’ training. If used this must be comprised of n lists where n is the number of sequences to train on, and each of those lists must have one label per observation. A None in this list corresponds to no labels for the entire sequence and triggers semi-supervised learning, where the labeled sequences are summarized using labeled fitting and the unlabeled are summarized using the specified algorithm. Default is None.
algorithm : ‘baum-welch’, ‘viterbi’, ‘labeled’: The training algorithm to use. Baum-Welch uses the forward-backward algorithm to train using a version of structured EM. Viterbi iteratively runs the sequences through the Viterbi algorithm and then uses hard assignments of observations to states using that. Default is ‘baum-welch’. Labeled training requires that labels are provided for each observation in each sequence.
inertia : double or None, optional, range [0, 1]: If double, will set both edge_inertia and distribution_inertia to be that value. If None, will not override those values. Default is None.
edge_inertia : bool, optional, range [0, 1]: Whether to use inertia when updating the transition probability parameters. Default is 0.0.
distribution_inertia : double, optional, range [0, 1]: Whether to use inertia when updating the distribution parameters. Default is 0.0.
pseudocount : double, optional: A pseudocount to add to both transitions and emissions. If supplied, it will override both transition_pseudocount and emission_pseudocount in the same way that specifying inertia will override both edge_inertia and distribution_inertia. Default is None.
transition_pseudocount : double, optional: A pseudocount to add to all transitions to add a prior to the MLE estimate of the transition probability. Default is 0.
emission_pseudocount : double, optional: A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Only effects hidden Markov models defined over discrete distributions. Default is 0.
use_pseudocount : bool, optional: Whether to use the pseudocounts defined in the add_edge method for edge-specific pseudocounts when updating the transition probability parameters. Does not effect the transition_pseudocount and emission_pseudocount parameters, but can be used in addition to them. Default is False.
stop_threshold : double, optional: The threshold the improvement ratio of the models log probability in fitting the scores. Default is 1e-9.
min_iterations : int, optional: The minimum number of iterations to run Baum-Welch training for. Default is 0.
max_iterations : int, optional: The maximum number of iterations to run Baum-Welch training for. Default is 1e8.
n_init : int, optional: The number of times to initialize the k-means clustering before taking the best value. Default is 1.
init : str, optional: The initialization method for kmeans. Must be one of ‘first-k’, ‘random’, ‘kmeans++’, or ‘kmeans||’. Default is kmeans++.
max_kmeans_iterations : int, optional: The number of iterations to run k-means for before starting EM.
batch_size : int or None, optional: The number of samples in a batch to summarize on. This controls the size of the set sent to summarize and so does not make the update any less exact. This is useful when training on a memory map and cannot load all the data into memory. If set to None, batch_size is 1 / n_jobs. Default is None.
batches_per_epoch : int or None, optional: The number of batches in an epoch. This is the number of batches to summarize before calling from_summaries and updating the model parameters. This allows one to do minibatch updates by updating the model parameters before setting the full dataset. If set to None, uses the full dataset. Default is None.
lr_decay : double, optional, positive: The step size decay as a function of the number of iterations. Functionally, this sets the inertia to be (2+k)^{-lr_decay} where k is the number of iterations. This causes initial iterations to have more of an impact than later iterations, and is frequently used in minibatch learning. This value is suggested to be between 0.5 and 1. Default is 0, meaning no decay.
end_state : bool, optional: Whether to calculate the probability of ending in each state or not. Default is False.
state_names : array-like, shape (n_states), optional: The name of the states. If None is passed in, default names are generated. Default is None
name : str, optional: The name of the model. Default is None
verbose : bool, optional: Whether to print the improvement in the model fitting at each iteration. Default is True.
n_jobs : int, optional: The number of threads to use when performing training. This leads to exact updates. Default is 1.

Returns:

model : HiddenMarkovModel: The model fit to the data.

from_summaries()¶

Fit the model to the stored summary statistics.

Parameters:

inertia : double or None, optional: The inertia to use for both edges and distributions without needing to set both of them. If None, use the values passed in to those variables. Default is None.
pseudocount : double, optional: A pseudocount to add to both transitions and emissions. If supplied, it will override both transition_pseudocount and emission_pseudocount in the same way that specifying inertia will override both edge_inertia and distribution_inertia. Default is None.
transition_pseudocount : double, optional: A pseudocount to add to all transitions to add a prior to the MLE estimate of the transition probability. Default is 0.
emission_pseudocount : double, optional: A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Only effects hidden Markov models defined over discrete distributions. Default is 0.
use_pseudocount : bool, optional: Whether to use the pseudocounts defined in the add_edge method for edge-specific pseudocounts when updating the transition probability parameters. Does not effect the transition_pseudocount and emission_pseudocount parameters, but can be used in addition to them. Default is False.
edge_inertia : bool, optional, range [0, 1]: Whether to use inertia when updating the transition probability parameters. Default is 0.0.
distribution_inertia : double, optional, range [0, 1]: Whether to use inertia when updating the distribution parameters. Default is 0.0.

Returns:

None

log_probability()¶

Calculate the log probability of a single sequence.

If a path is provided, calculate the log probability of that sequence given the path.

Parameters:	sequence : array-like Return the array of observations in a single sequence of data check_input : bool, optional Check to make sure that all emissions fall under the support of the emission distributions. Default is True.
Returns:	logp : double The log probability of the sequence

maximum_a_posteriori()¶

Run posterior decoding on the sequence.

MAP decoding is an alternative to viterbi decoding, which returns the most likely state for each observation, based on the forward-backward algorithm. This is also called posterior decoding. This method is described on p. 14 of http://ai.stanford.edu/~serafim/CS262_2007/ notes/lecture5.pdf

WARNING: This may produce impossible sequences.

Parameters:	sequence : array-like An array (or list) of observations.
Returns:	logp : double The log probability of the sequence under the Viterbi path path : list of tuples Tuples of (state index, state object) of the states along the posterior path.

node_count()¶: Returns the number of nodes/states in the model

plot()¶

Draw this model’s graph using NetworkX and matplotlib.

Note that this relies on networkx’s built-in graphing capabilities (and not Graphviz) and thus can’t draw self-loops.

See networkx.draw_networkx() for the keywords you can pass in.

Parameters:	precision : int, optional The precision with which to round edge probabilities. Default is 4. **kwargs : any The arguments to pass into networkx.draw_networkx()
Returns:	None

predict()¶

Calculate the most likely state for each observation.

This can be either the Viterbi algorithm or maximum a posteriori. It returns the probability of the sequence under that state sequence and the actual state sequence.

This is a sklearn wrapper for the Viterbi and maximum_a_posteriori methods.

Parameters:	sequence : array-like An array (or list) of observations. algorithm : “map”, “viterbi” The algorithm with which to decode the sequence
Returns:	path : list of integers A list of the ids of states along the MAP or the Viterbi path.

predict_log_proba()¶

Calculate the state log probabilities for each observation in the sequence.

Run the forward-backward algorithm on the sequence and return the emission matrix. This is the log normalized probability that each each state generated that emission given both the symbol and the entire sequence.

This is a sklearn wrapper for the forward backward algorithm.

See also:

Forward and backward algorithm implementations. A comprehensive

description of the forward, backward, and forward-background algorithm is here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm

Parameters:	sequence : array-like An array (or list) of observations.
Returns:	emissions : array-like, shape (len(sequence), n_nonsilent_states) The log normalized probabilities of each state generating each emission.

predict_proba()¶

Calculate the state probabilities for each observation in the sequence.

Run the forward-backward algorithm on the sequence and return the emission matrix. This is the normalized probability that each each state generated that emission given both the symbol and the entire sequence.

This is a sklearn wrapper for the forward backward algorithm.

See also:

Forward and backward algorithm implementations. A comprehensive

description of the forward, backward, and forward-background algorithm is here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm

Parameters:	sequence : array-like An array (or list) of observations.
Returns:	emissions : array-like, shape (len(sequence), n_nonsilent_states) The normalized probabilities of each state generating each emission.

probability()¶

Return the probability of the given symbol under this distribution.

Parameters:	symbol : object The symbol to calculate the probability of
Returns:	probability : double The probability of that point under the distribution.

sample()¶

Generate a sequence from the model.

Returns the sequence generated, as a list of emitted items. The model must have been baked first in order to run this method.

If a length is specified and the HMM is infinite (no edges to the end state), then that number of samples will be randomly generated. If the length is specified and the HMM is finite, the method will attempt to generate a prefix of that length. Currently it will force itself to not take an end transition unless that is the only path, making it not a true random sample on a finite model.

WARNING: If the HMM has no explicit end state, must specify a length to use.

Parameters:	length : int, optional Generate a sequence with a maximal length of this size. Used if you have no explicit end state. Default is 0. path : bool, optional Return the path of hidden states in addition to the emissions. If true will return a tuple of (sample, path). Default is False.
Returns:	sample : list or tuple If path is true, return a tuple of (sample, path), otherwise return just the samples.

state_count()¶: Returns the number of states present in the model.

summarize()¶

Summarize data into stored sufficient statistics for out-of-core training. Only implemented for Baum-Welch training since Viterbi is less memory intensive.

Parameters:

sequences : array-like: An array of some sort (list, numpy.ndarray, tuple..) of sequences, where each sequence is a numpy array, which is 1 dimensional if the HMM is a one dimensional array, or multidimensional if the HMM supports multiple dimensions.
weights : array-like or None, optional: An array of weights, one for each sequence to train on. If None, all sequences are equaly weighted. Default is None.
labels : array-like or None, optional: An array of state labels for each sequence. This is only used in ‘labeled’ training. If used this must be comprised of n lists where n is the number of sequences to train on, and each of those lists must have one label per observation. Default is None.
algorithm : ‘baum-welch’, ‘viterbi’, ‘labeled’: The training algorithm to use. Baum-Welch uses the forward-backward algorithm to train using a version of structured EM. Viterbi iteratively runs the sequences through the Viterbi algorithm and then uses hard assignments of observations to states using that. Default is ‘baum-welch’. Labeled training requires that labels are provided for each observation in each sequence.
check_input : bool, optional: Check the input. This casts the input sequences as numpy arrays, and converts non-numeric inputs into numeric inputs for faster processing later. Default is True.

Returns:

logp : double: The log probability of the sequences.

thaw()¶: Thaw the distribution, re-allowing updates to occur.

thaw_distributions()¶

Thaw all distributions in the model.

Upon training distributions will be updated again.

Parameters:	None
Returns:	None

to_json()¶

Serialize the model to a JSON.

Parameters:	separators : tuple, optional The two separaters to pass to the json.dumps function for formatting. indent : int, optional The indentation to use at each level. Passed to json.dumps for formatting.
Returns:	json : str A properly formatted JSON object.

viterbi()¶

Run the Viteri algorithm on the sequence.

Run the Viterbi algorithm on the sequence given the model. This finds the ML path of hidden states given the sequence. Returns a tuple of the log probability of the ML path, or (-inf, None) if the sequence is impossible under the model. If a path is returned, it is a list of tuples of the form (sequence index, state object).

This is fundamentally the same as the forward algorithm using max instead of sum, except the traceback is more complicated, because silent states in the current step can trace back to other silent states in the current step as well as states in the previous step.

See also:

Viterbi implementation described well in the wikipedia article

http://en.wikipedia.org/wiki/Viterbi_algorithm

Parameters:	sequence : array-like An array (or list) of observations.
Returns:	logp : double The log probability of the sequence under the Viterbi path path : list of tuples Tuples of (state index, state object) of the states along the Viterbi path.

pomegranate.hmm.log()¶: Return the natural log of the value or -infinity if the value is 0.