Hidden Markov Models¶
Hidden Markov models (HMMs) are a structured probabilistic model that forms a probability distribution of sequences, as opposed to individual symbols. It is similar to a Bayesian network in that it has a directed graphical structure where nodes represent probability distributions, but unlike Bayesian networks in that the edges represent transitions and encode transition probabilities, whereas in Bayesian networks edges encode dependence statements. A HMM can be thought of as a general mixture model plus a transition matrix, where each component in the general Mixture model corresponds to a node in the hidden Markov model, and the transition matrix informs the probability that adjacent symbols in the sequence transition from being generated from one component to another. A strength of HMMs is that they can model variable length sequences whereas other models typically require a fixed feature set. They are extensively used in the fields of natural language processing to model speech, bioinformatics to model biosequences, and robotics to model movement.
The HMM implementation in pomegranate is based off of the implementation in its predecessor, Yet Another Hidden Markov Model (YAHMM). To convert a script that used YAHMM to a script using pomegranate, you only need to change calls to the Model
class to call HiddenMarkovModel
. For example, a script that previously looked like the following:
from yahmm import *
model = Model()
would now be written as
from pomegranate import *
model = HiddenMarkovModel()
and the remaining method calls should be identical.
Initialization¶
Hidden Markov models can be initialized in one of two ways depending on if you know the initial parameters of the model, either (1) by defining both the distributions and the graphical structure manually, or (2) running the from_samples
method to learn both the structure and distributions directly from data. The first initialization method can be used either to specify a predefined model that is ready to make predictions, or as the initialization to a training algorithm such as BaumWelch. It is flexible enough to allow sparse transition matrices and any type of distribution on each node, i.e. normal distributions on several nodes, but a mixture of normals on some nodes modeling more complex phenomena. The second initialization method is less flexible, in that currently each node must have the same distribution type, and that it will only learn dense graphs. Similar to mixture models, this initialization method starts with kmeans to initialize the distributions and a uniform probability transition matrix before running BaumWelch.
If you are initializing the parameters manually, you can do so either by passing in a list of distributions and a transition matrix, or by building the model linebyline. Let’s first take a look at building the model from a list of distributions and a transition matrix.
from pomegranate import *
dists = [NormalDistribution(5, 1), NormalDistribution(1, 7), NormalDistribution(8,2)]
trans_mat = numpy.array([[0.7, 0.3, 0.0],
[0.0, 0.8, 0.2],
[0.0, 0.0, 0.9]])
starts = numpy.array([1.0, 0.0, 0.0])
ends = numpy.array([0.0, 0.0, 0.1])
model = HiddenMarkovModel.from_matrix(trans_mat, dists, starts, ends)
Next, let’s take a look at building the same model line by line.
from pomegranate import *
s1 = State(NormalDistribution(5, 1))
s2 = State(NormalDistribution(1, 7))
s3 = State(NormalDistribution(8, 2))
model = HiddenMarkovModel()
model.add_states(s1, s2, s3)
model.add_transition(model.start, s1, 1.0)
model.add_transition(s1, s1, 0.7)
model.add_transition(s1, s2, 0.3)
model.add_transition(s2, s2, 0.8)
model.add_transition(s2, s3, 0.2)
model.add_transition(s3, s3, 0.9)
model.add_transition(s3, model.end, 0.1)
model.bake()
Initially it may seem that the first method is far easier due to it being fewer lines of code. However, when building large sparse models defining a full transition matrix can be cumbersome, especially when it is mostly 0s.
Models built in this manner must be explicitly “baked” at the end. This finalizes the model topology and creates the internal sparse matrix which makes up the model. This step also automatically normalizes all transitions to make sure they sum to 1.0, stores information about tied distributions, edges, pseudocounts, and merges unnecessary silent states in the model for computational efficiency. This can cause the bake step to take a little bit of time. If you want to reduce this overhead and are sure you specified the model correctly you can pass in merge=”None” to the bake step to avoid model checking.
The second way to initialize models is to use the from_samples
class method. The call is identical to initializing a mixture model.
>>> from pomegranate import *
>>> model = HiddenMarkovModel.from_samples(NormalDistribution, n_components=5, X=X)
Much like a mixture model, all arguments present in the fit
step can also be passed in to this method. Also like a mixture model, it is initialized by running kmeans on the concatenation of all data, ignoring that the symbols are part of a structured sequence. The clusters returned are used to initialize all parameters of the distributions, i.e. both mean and covariances for multivariate Gaussian distributions. The transition matrix is initialized as uniform random probabilities. After the components (distributions on the nodes) are initialized, the given training algorithm is used to refine the parameters of the distributions and learn the appropriate transition probabilities.
Log Probability¶
There are two common forms of the log probability which are used. The first is the log probability of the most likely path the sequence can take through the model, called the Viterbi probability. This can be calculated using model.viterbi(sequence)
. However, this is \(P(DS_{ML}, S_{ML}, S_{ML})\) not \(P(DM)\). In order to get \(P(DM)\) we have to sum over all possible paths instead of just the single most likely path. This can be calculated using model.log_probability(sequence)
and uses the forward algorithm internally. On that note, the full forward matrix can be returned using model.forward(sequence)
and the full backward matrix can be returned using model.backward(sequence)
, while the full forwardbackward emission and transition matrices can be returned using model.forward_backward(sequence)
.
Prediction¶
A common prediction technique is calculating the Viterbi path, which is the most likely sequence of states that generated the sequence given the full model. This is solved using a simple dynamic programming algorithm similar to sequence alignment in bioinformatics. This can be called using model.viterbi(sequence)
. A sklearn wrapper can be called using model.predict(sequence, algorithm='viterbi')
.
Another prediction technique is called maximum a posteriori or forwardbackward, which uses the forward and backward algorithms to calculate the most likely state per observation in the sequence given the entire remaining alignment. Much like the forward algorithm can calculate the sumofallpaths probability instead of the most likely single path, the forwardbackward algorithm calculates the best sumofallpaths state assignment instead of calculating the single best path. This can be called using model.predict(sequence, algorithm='map')
and the raw normalized probability matrices can be called using model.predict_proba(sequence)
.
Fitting¶
A simple fitting algorithm for hidden Markov models is called Viterbi training. In this method, each observation is tagged with the most likely state to generate it using the Viterbi algorithm. The distributions (emissions) of each states are then updated using MLE estimates on the observations which were generated from them, and the transition matrix is updated by looking at pairs of adjacent state taggings. This can be done using model.fit(sequence, algorithm='viterbi')
.
However, this is not the best way to do training and much like the other sections there is a way of doing training using sumofallpaths probabilities instead of maximally likely path. This is called BaumWelch or forwardbackward training. Instead of using hard assignments based on the Viterbi path, observations are given weights equal to the probability of them having been generated by that state. Weighted MLE can then be done to update the distributions, and the soft transition matrix can give a more precise probability estimate. This is the default training algorithm, and can be called using either model.fit(sequences)
or explicitly using model.fit(sequences, algorithm='baumwelch')
.
pomegranate also supports labeled training of hidden Markov models. This setting is where one has state labels for each observation and wishes to derive the transition matrix and observations given those labels. The emissions simply become MLE estimates of the data partitioned by the labels and the transition matrix is calculated directly from the adjacency of labels. This option can be specified using model.fit(sequences, labels=labels, state_names=state_names)
where labels
has the same shape as sequences
and state_names
has the set of all possible labels.
Note
The sequence of labels can include hidden states! However, a consequence of this is that each sequence of labels must begin with the start state because that is where each sequence begins with being aligned to the model. For instance, for the sequence of observations [1, 5, 6, 2]
the corresponding labels would be ['Nonestart', 'a', 'b', 'b', 'a']
because the default name of a model is None
and the name of the start state is {name}start
. Likewise, you will need to add the end state label at the end of each sequence if you want an explicit end state, making the labels ['Nonestart', 'a', 'b', 'b', 'a', 'Noneend']
.
There are a number of optional parameters that provide more control over the training process, including the use of distribution or edge inertia, freezing certain states, tying distributions or edges, and using pseudocounts. See the tutorial linked to at the top of this page for full details on each of these options.
API Reference¶

class
pomegranate.hmm.
HiddenMarkovModel
¶ A Hidden Markov Model
A Hidden Markov Model (HMM) is a directed graphical model where nodes are hidden states which contain an observed emission distribution and edges contain the probability of transitioning from one hidden state to another. HMMs allow you to tag each observation in a variable length sequence with the most likely hidden state according to the model.
Parameters:  name : str, optional
The name of the model. Default is None.
 start : State, optional
An optional state to force the model to start in. Default is None.
 end : State, optional
An optional state to force the model to end in. Default is None.
Examples
>>> from pomegranate import * >>> d1 = DiscreteDistribution({'A' : 0.35, 'C' : 0.20, 'G' : 0.05, 'T' : 0.40}) >>> d2 = DiscreteDistribution({'A' : 0.25, 'C' : 0.25, 'G' : 0.25, 'T' : 0.25}) >>> d3 = DiscreteDistribution({'A' : 0.10, 'C' : 0.40, 'G' : 0.40, 'T' : 0.10}) >>> >>> s1 = State(d1, name="s1") >>> s2 = State(d2, name="s2") >>> s3 = State(d3, name="s3") >>> >>> model = HiddenMarkovModel('example') >>> model.add_states([s1, s2, s3]) >>> model.add_transition(model.start, s1, 0.90) >>> model.add_transition(model.start, s2, 0.10) >>> model.add_transition(s1, s1, 0.80) >>> model.add_transition(s1, s2, 0.20) >>> model.add_transition(s2, s2, 0.90) >>> model.add_transition(s2, s3, 0.10) >>> model.add_transition(s3, s3, 0.70) >>> model.add_transition(s3, model.end, 0.30) >>> model.bake() >>> >>> print(model.log_probability(list('ACGACTATTCGAT'))) 22.73896159971087 >>> print(", ".join(state.name for i, state in model.viterbi(list('ACGACTATTCGAT'))[1])) examplestart, s1, s2, s2, s2, s2, s2, s2, s2, s2, s2, s2, s2, s3, exampleend
Attributes:  start : State
A state object corresponding to the initial start of the model
 end : State
A state object corresponding to the forced end of the model
 start_index : int
The index of the start object in the state list
 end_index : int
The index of the end object in the state list
 silent_start : int
The index of the beginning of the silent states in the state list
 states : list
The list of all states in the model, with silent states at the end

add_edge
()¶ Add a transition from state a to state b which indicates that B is dependent on A in ways specified by the distribution.

add_model
()¶ Add the states and edges of another model to this model.
Parameters:  other : HiddenMarkovModel
The other model to add
Returns:  None

add_node
()¶ Add a node to the graph.

add_nodes
()¶ Add multiple states to the graph.

add_state
()¶ Add a state to the given model.
The state must not already be in the model, nor may it be part of any other model that will eventually be combined with this one.
Parameters:  state : State
A state object to be added to the model.
Returns:  None

add_states
()¶ Add multiple states to the model at the same time.
Parameters:  states : list or generator
Either a list of states which are entered sequentially, or just comma separated values, for example model.add_states(a, b, c, d).
Returns:  None

add_transition
()¶ Add a transition from state a to state b.
Add a transition from state a to state b with the given (nonlog) probability. Both states must be in the HMM already. self.start and self.end are valid arguments here. Probabilities will be normalized such that every node has edges summing to 1. leaving that node, but only when the model is baked. Psueodocounts are allowed as a way of using edgespecific pseudocounts for training.
By specifying a group as a string, you can tie edges together by giving them the same group. This means that a transition across one edge in the group counts as a transition across all edges in terms of training.
Parameters:  a : State
The state that the edge originates from
 b : State
The state that the edge goes to
 probability : double
The probability of transitioning from state a to state b in [0, 1]
 pseudocount : double, optional
The pseudocount to use for this specific edge if using edge pseudocounts for training. Defaults to the probability. Default is None.
 group : str, optional
The name of the group of edges to tie together during training. If groups are used, then a transition across any one edge counts as a transition across all edges. Default is None.
Returns:  None

add_transitions
()¶ Add many transitions at the same time,
Parameters:  a : State or list
Either a state or a list of states where the edges originate.
 b : State or list
Either a state or a list of states where the edges go to.
 probabilities : list
The probabilities associated with each transition.
 pseudocounts : list, optional
The pseudocounts associated with each transition. Default is None.
 groups : list, optional
The groups of each edge. Default is None.
Returns:  None
Examples
>>> model.add_transitions([model.start, s1], [s1, model.end], [1., 1.]) >>> model.add_transitions([model.start, s1, s2, s3], s4, [0.2, 0.4, 0.3, 0.9]) >>> model.add_transitions(model.start, [s1, s2, s3], [0.6, 0.2, 0.05])

backward
()¶ Run the backward algorithm on the sequence.
Calculate the probability of each observation being aligned to each state by going backward through a sequence. Returns the full backward matrix. Each index i, j corresponds to the sumofallpaths log probability of starting at the end of the sequence, and aligning observations to hidden states in such a manner that observation i was aligned to hidden state j. Uses row normalization to dynamically scale each row to prevent underflow errors.
If the sequence is impossible, will return a matrix of nans.
 See also:
 Silent state handling taken from p. 71 of “Biological
Sequence Analysis” by Durbin et al., and works for anything which does not have loops of silent states.
 Row normalization technique explained by
http://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf on p. 14.
Parameters:  sequence : arraylike
An array (or list) of observations.
Returns:  matrix : arraylike, shape (len(sequence), n_states)
The probability of aligning the sequences to states in a backward fashion.

bake
()¶ Finalize the topology of the model.
Finalize the topology of the model and assign a numerical index to every state. This method must be called before any of the probability calculating methods.
This fills in self.states (a list of all states in order) and self.transition_log_probabilities (log probabilities for transitions), as well as self.start_index and self.end_index, and self.silent_start (the index of the first silent state).
Parameters:  verbose : bool, optional
Return a log of changes made to the model during normalization or merging. Default is False.
 merge : “None”, “Partial, “All”
Merging has three options: “None”: No modifications will be made to the model. “Partial”: A silent state which only has a probability 1 transition
to another silent state will be merged with that silent state. This means that if silent state “S1” has a single transition to silent state “S2”, that all transitions to S1 will now go to S2, with the same probability as before, and S1 will be removed from the model.
 “All”: A silent state with a probability 1 transition to any other
state, silent or symbol emitting, will be merged in the manner described above. In addition, any orphan states will be removed from the model. An orphan state is a state which does not have any transitions to it OR does not have any transitions from it, except for the start and end of the model. This will iteratively remove orphan chains from the model. This is sometimes desirable, as all states should have both a transition in to get to that state, and a transition out, even if it is only to itself. If the state does not have either, the HMM will likely not work as intended.
Default is ‘All’.
Returns:  None

clear_summaries
()¶ Clear the summary statistics stored in the object.
Parameters:  None
Returns:  None

concatenate
()¶ Concatenate this model to another model.
Concatenate this model to another model in such a way that a single probability 1 edge is added between self.end and other.start. Rename all other states appropriately by adding a suffix or prefix if needed.
Parameters:  other : HiddenMarkovModel
The other model to concatenate
 suffix : str, optional
Add the suffix to the end of all state names in the other model. Default is ‘’.
 prefix : str, optional
Add the prefix to the beginning of all state names in the other model. Default is ‘’.
Returns:  None

copy
()¶ Returns a deep copy of the HMM.
Parameters:  None
Returns:  model : HiddenMarkovModel
A deep copy of the model with entirely new objects.

dense_transition_matrix
()¶ Returns the dense transition matrix.
Parameters:  None
Returns:  matrix : numpy.ndarray, shape (n_states, n_states)
A dense transition matrix, containing the log probability of transitioning from each state to each other state.

edge_count
()¶ Returns the number of edges present in the model.

fit
()¶ Fit the model to data using either BaumWelch, Viterbi, or supervised training.
Given a list of sequences, performs reestimation on the model parameters. The two supported algorithms are “baumwelch”, “viterbi”, and “labeled”, indicating their respective algorithm. “labeled” corresponds to supervised learning that requires passing in a matching list of labels for each symbol seen in the sequences.
Training supports a wide variety of other options including using edge pseudocounts and either edge or distribution inertia.
Parameters:  sequences : arraylike
An array of some sort (list, numpy.ndarray, tuple..) of sequences, where each sequence is a numpy array, which is 1 dimensional if the HMM is a one dimensional array, or multidimensional if the HMM supports multiple dimensions.
 weights : arraylike or None, optional
An array of weights, one for each sequence to train on. If None, all sequences are equally weighted. Default is None.
 labels : arraylike or None, optional
An array of state labels for each sequence. This is only used in ‘labeled’ training. If used this must be comprised of n lists where n is the number of sequences to train on, and each of those lists must have one label per observation. A None in this list corresponds to no labels for the entire sequence and triggers semisupervised learning, where the labeled sequences are summarized using labeled fitting and the unlabeled are summarized using the specified algorithm. Default is None.
 stop_threshold : double, optional
The threshold the improvement ratio of the models log probability in fitting the scores. Default is 1e9.
 min_iterations : int, optional
The minimum number of iterations to run BaumWelch training for. Default is 0.
 max_iterations : int, optional
The maximum number of iterations to run BaumWelch training for. Default is 1e8.
 algorithm : ‘baumwelch’, ‘viterbi’, ‘labeled’
The training algorithm to use. BaumWelch uses the forwardbackward algorithm to train using a version of structured EM. Viterbi iteratively runs the sequences through the Viterbi algorithm and then uses hard assignments of observations to states using that. Default is ‘baumwelch’. Labeled training requires that labels are provided for each observation in each sequence.
 pseudocount : double, optional
A pseudocount to add to both transitions and emissions. If supplied, it will override both transition_pseudocount and emission_pseudocount in the same way that specifying inertia will override both edge_inertia and distribution_inertia. Default is None.
 transition_pseudocount : double, optional
A pseudocount to add to all transitions to add a prior to the MLE estimate of the transition probability. Default is 0.
 emission_pseudocount : double, optional
A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Only effects hidden Markov models defined over discrete distributions. Default is 0.
 use_pseudocount : bool, optional
Whether to use the pseudocounts defined in the add_edge method for edgespecific pseudocounts when updating the transition probability parameters. Does not effect the transition_pseudocount and emission_pseudocount parameters, but can be used in addition to them. Default is False.
 inertia : double or None, optional, range [0, 1]
If double, will set both edge_inertia and distribution_inertia to be that value. If None, will not override those values. Default is None.
 edge_inertia : bool, optional, range [0, 1]
Whether to use inertia when updating the transition probability parameters. Default is 0.0.
 distribution_inertia : double, optional, range [0, 1]
Whether to use inertia when updating the distribution parameters. Default is 0.0.
 batches_per_epoch : int or None, optional
The number of batches in an epoch. This is the number of batches to summarize before calling from_summaries and updating the model parameters. This allows one to do minibatch updates by updating the model parameters before setting the full dataset. If set to None, uses the full dataset. Default is None.
 lr_decay : double, optional, positive
The step size decay as a function of the number of iterations. Functionally, this sets the inertia to be (2+k)^{lr_decay} where k is the number of iterations. This causes initial iterations to have more of an impact than later iterations, and is frequently used in minibatch learning. This value is suggested to be between 0.5 and 1. Default is 0, meaning no decay.
 callbacks : list, optional
A list of callback objects that describe functionality that should be undertaken over the course of training.
 return_history : bool, optional
Whether to return the history during training as well as the model.
 verbose : bool, optional
Whether to print the improvement in the model fitting at each iteration. Default is True.
 n_jobs : int, optional
The number of threads to use when performing training. This leads to exact updates. Default is 1.
Returns:  improvement : double
The total improvement in fitting the model to the data

forward
()¶ Run the forward algorithm on the sequence.
Calculate the probability of each observation being aligned to each state by going forward through a sequence. Returns the full forward matrix. Each index i, j corresponds to the sumofallpaths log probability of starting at the beginning of the sequence, and aligning observations to hidden states in such a manner that observation i was aligned to hidden state j. Uses row normalization to dynamically scale each row to prevent underflow errors.
If the sequence is impossible, will return a matrix of nans.
 See also:
 Silent state handling taken from p. 71 of “Biological
Sequence Analysis” by Durbin et al., and works for anything which does not have loops of silent states.
 Row normalization technique explained by
http://www.cs.sjsu.edu/~stamp/RUA/HMM.pdf on p. 14.
Parameters:  sequence : arraylike
An array (or list) of observations.
Returns:  matrix : arraylike, shape (len(sequence), n_states)
The probability of aligning the sequences to states in a forward fashion.

forward_backward
()¶ Run the forwardbackward algorithm on the sequence.
This algorithm returns an emission matrix and a transition matrix. The emission matrix returns the normalized probability that each each state generated that emission given both the symbol and the entire sequence. The transition matrix returns the expected number of times that a transition is used.
If the sequence is impossible, will return (None, None)
 See also:
 Forward and backward algorithm implementations. A comprehensive
description of the forward, backward, and forwardbackground algorithm is here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm
Parameters:  sequence : arraylike
An array (or list) of observations.
Returns:  emissions : arraylike, shape (len(sequence), n_nonsilent_states)
The normalized probabilities of each state generating each emission.
 transitions : arraylike, shape (n_states, n_states)
The expected number of transitions across each edge in the model.

freeze
()¶ Freeze the distribution, preventing updates from occurring.

freeze_distributions
()¶ Freeze all the distributions in model.
Upon training only edges will be updated. The parameters of distributions will not be affected.
Parameters:  None
Returns:  None

from_json
()¶ Read in a serialized model and return the appropriate classifier.
Parameters:  s : str
A JSON formatted string containing the file.
 Returns
 ——
 model : object
A properly initialized and baked model.

from_matrix
()¶ Create a model from a more standard matrix format.
Take in a 2D matrix of floats of size n by n, which are the transition probabilities to go from any state to any other state. May also take in a list of length n representing the names of these nodes, and a model name. Must provide the matrix, and a list of size n representing the distribution you wish to use for that state, a list of size n indicating the probability of starting in a state, and a list of size n indicating the probability of ending in a state.
Parameters:  transition_probabilities : arraylike, shape (n_states, n_states)
The probabilities of each state transitioning to each other state.
 distributions : arraylike, shape (n_states)
The distributions for each state. Silent states are indicated by using None instead of a distribution object.
 starts : arraylike, shape (n_states)
The probabilities of starting in each of the states.
 ends : arraylike, shape (n_states), optional
If passed in, the probabilities of ending in each of the states. If ends is None, then assumes the model has no explicit end state. Default is None.
 state_names : arraylike, shape (n_states), optional
The name of the states. If None is passed in, default names are generated. Default is None
 name : str, optional
The name of the model. Default is None
 verbose : bool, optional
The verbose parameter for the underlying bake method. Default is False.
 merge : ‘None’, ‘Partial’, ‘All’, optional
The merge parameter for the underlying bake method. Default is All
Returns:  model : HiddenMarkovModel
The baked model ready to go.
Examples
matrix = [[0.4, 0.5], [0.4, 0.5]] distributions = [NormalDistribution(1, .5), NormalDistribution(5, 2)] starts = [1., 0.] ends = [.1., .1] state_names= [“A”, “B”]
 model = Model.from_matrix(matrix, distributions, starts, ends,
 state_names, name=”test_model”)

from_samples
()¶ Learn the transitions and emissions of a model directly from data.
This method will learn both the transition matrix, emission distributions, and start probabilities for each state. This will only return a dense graph without any silent states or explicit transitions to an end state. Currently all components must be defined as the same distribution, but soon this restriction will be removed.
If learning a multinomial HMM over discrete characters, the initial emisison probabilities are initialized randomly. If learning a continuous valued HMM, such as a Gaussian HMM, then kmeans clustering is used first to identify initial clusters.
Regardless of the type of model, the transition matrix and start probabilities are initialized uniformly. Then the specified learning algorithm (BaumWelch recommended) is used to refine the parameters of the model.
Parameters:  distribution : callable
The emission distribution of the components of the model.
 n_components : int
The number of states (or components) to initialize.
 X : arraylike or generator
An array of some sort (list, numpy.ndarray, tuple..) of sequences, where each sequence is a numpy array, which is 1 dimensional if the HMM is a one dimensional array, or multidimensional if the HMM supports multiple dimensions. Alternatively, a data generator object that yields sequences.
 weights : arraylike or None, optional
An array of weights, one for each sequence to train on. If None, all sequences are equally weighted. Default is None.
 labels : arraylike or None, optional
An array of state labels for each sequence. This is only used in ‘labeled’ training. If used this must be comprised of n lists where n is the number of sequences to train on, and each of those lists must have one label per observation. A None in this list corresponds to no labels for the entire sequence and triggers semisupervised learning, where the labeled sequences are summarized using labeled fitting and the unlabeled are summarized using the specified algorithm. Default is None.
 algorithm : ‘baumwelch’, ‘viterbi’, ‘labeled’
The training algorithm to use. BaumWelch uses the forwardbackward algorithm to train using a version of structured EM. Viterbi iteratively runs the sequences through the Viterbi algorithm and then uses hard assignments of observations to states using that. Default is ‘baumwelch’. Labeled training requires that labels are provided for each observation in each sequence.
 inertia : double or None, optional, range [0, 1]
If double, will set both edge_inertia and distribution_inertia to be that value. If None, will not override those values. Default is None.
 edge_inertia : bool, optional, range [0, 1]
Whether to use inertia when updating the transition probability parameters. Default is 0.0.
 distribution_inertia : double, optional, range [0, 1]
Whether to use inertia when updating the distribution parameters. Default is 0.0.
 pseudocount : double, optional
A pseudocount to add to both transitions and emissions. If supplied, it will override both transition_pseudocount and emission_pseudocount in the same way that specifying inertia will override both edge_inertia and distribution_inertia. Default is None.
 transition_pseudocount : double, optional
A pseudocount to add to all transitions to add a prior to the MLE estimate of the transition probability. Default is 0.
 emission_pseudocount : double, optional
A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Only effects hidden Markov models defined over discrete distributions. Default is 0.
 use_pseudocount : bool, optional
Whether to use the pseudocounts defined in the add_edge method for edgespecific pseudocounts when updating the transition probability parameters. Does not effect the transition_pseudocount and emission_pseudocount parameters, but can be used in addition to them. Default is False.
 stop_threshold : double, optional
The threshold the improvement ratio of the models log probability in fitting the scores. Default is 1e9.
 min_iterations : int, optional
The minimum number of iterations to run BaumWelch training for. Default is 0.
 max_iterations : int, optional
The maximum number of iterations to run BaumWelch training for. Default is 1e8.
 n_init : int, optional
The number of times to initialize the kmeans clustering before taking the best value. Default is 1.
 init : str, optional
The initialization method for kmeans. Must be one of ‘firstk’, ‘random’, ‘kmeans++’, or ‘kmeans’. Default is kmeans++.
 max_kmeans_iterations : int, optional
The number of iterations to run kmeans for before starting EM.
 initialization_batch_size : int or None, optional
The number of batches to use to initialize the model. None means use the entire data set. Default is None.
 batches_per_epoch : int or None, optional
The number of batches in an epoch. This is the number of batches to summarize before calling from_summaries and updating the model parameters. This allows one to do minibatch updates by updating the model parameters before setting the full dataset. If set to None, uses the full dataset. Default is None.
 lr_decay : double, optional, positive
The step size decay as a function of the number of iterations. Functionally, this sets the inertia to be (2+k)^{lr_decay} where k is the number of iterations. This causes initial iterations to have more of an impact than later iterations, and is frequently used in minibatch learning. This value is suggested to be between 0.5 and 1. Default is 0, meaning no decay.
 end_state : bool, optional
Whether to calculate the probability of ending in each state or not. Default is False.
 state_names : arraylike, shape (n_states), optional
The name of the states. If None is passed in, default names are generated. Default is None
 name : str, optional
The name of the model. Default is None
 keys : list
A list of sets where each set is the keys present in that column. If there are d columns in the data set then this list should have d sets and each set should have at least two keys in it.
 random_state : int, numpy.random.RandomState, or None
The random state used for generating samples. If set to none, a random seed will be used. If set to either an integer or a random seed, will produce deterministic outputs.
 callbacks : list, optional
A list of callback objects that describe functionality that should be undertaken over the course of training.
 return_history : bool, optional
Whether to return the history during training as well as the model.
 verbose : bool, optional
Whether to print the improvement in the model fitting at each iteration. Default is True.
 n_jobs : int, optional
The number of threads to use when performing training. This leads to exact updates. Default is 1.
Returns:  model : HiddenMarkovModel
The model fit to the data.

from_summaries
()¶ Fit the model to the stored summary statistics.
Parameters:  inertia : double or None, optional
The inertia to use for both edges and distributions without needing to set both of them. If None, use the values passed in to those variables. Default is None.
 pseudocount : double, optional
A pseudocount to add to both transitions and emissions. If supplied, it will override both transition_pseudocount and emission_pseudocount in the same way that specifying inertia will override both edge_inertia and distribution_inertia. Default is None.
 transition_pseudocount : double, optional
A pseudocount to add to all transitions to add a prior to the MLE estimate of the transition probability. Default is 0.
 emission_pseudocount : double, optional
A pseudocount to add to the emission of each distribution. This effectively smoothes the states to prevent 0. probability symbols if they don’t happen to occur in the data. Only effects hidden Markov models defined over discrete distributions. Default is 0.
 use_pseudocount : bool, optional
Whether to use the pseudocounts defined in the add_edge method for edgespecific pseudocounts when updating the transition probability parameters. Does not effect the transition_pseudocount and emission_pseudocount parameters, but can be used in addition to them. Default is False.
 edge_inertia : bool, optional, range [0, 1]
Whether to use inertia when updating the transition probability parameters. Default is 0.0.
 distribution_inertia : double, optional, range [0, 1]
Whether to use inertia when updating the distribution parameters. Default is 0.0.
Returns:  None

from_yaml
()¶ Deserialize this object from its YAML representation.

log_probability
()¶ Calculate the log probability of a single sequence.
If a path is provided, calculate the log probability of that sequence given the path.
Parameters:  sequence : arraylike
Return the array of observations in a single sequence of data
 check_input : bool, optional
Check to make sure that all emissions fall under the support of the emission distributions. Default is True.
Returns:  logp : double
The log probability of the sequence

maximum_a_posteriori
()¶ Run posterior decoding on the sequence.
MAP decoding is an alternative to viterbi decoding, which returns the most likely state for each observation, based on the forwardbackward algorithm. This is also called posterior decoding. This method is described on p. 14 of http://ai.stanford.edu/~serafim/CS262_2007/ notes/lecture5.pdf
WARNING: This may produce impossible sequences.
Parameters:  sequence : arraylike
An array (or list) of observations.
Returns:  logp : double
The log probability of the sequence under the Viterbi path
 path : list of tuples
Tuples of (state index, state object) of the states along the posterior path.

node_count
()¶ Returns the number of nodes/states in the model

plot
()¶ Draw this model’s graph using NetworkX and matplotlib.
Note that this relies on networkx’s builtin graphing capabilities (and not Graphviz) and thus can’t draw selfloops.
See networkx.draw_networkx() for the keywords you can pass in.
Parameters:  precision : int, optional
The precision with which to round edge probabilities. Default is 4.
 **kwargs : any
The arguments to pass into networkx.draw_networkx()
Returns:  None

predict
()¶ Calculate the most likely state for each observation.
This can be either the Viterbi algorithm or maximum a posteriori. It returns the probability of the sequence under that state sequence and the actual state sequence.
This is a sklearn wrapper for the Viterbi and maximum_a_posteriori methods.
Parameters:  sequence : arraylike
An array (or list) of observations.
 algorithm : “map”, “viterbi”
The algorithm with which to decode the sequence
Returns:  path : list of integers
A list of the ids of states along the MAP or the Viterbi path.

predict_log_proba
()¶ Calculate the state log probabilities for each observation in the sequence.
Run the forwardbackward algorithm on the sequence and return the emission matrix. This is the log normalized probability that each each state generated that emission given both the symbol and the entire sequence.
This is a sklearn wrapper for the forward backward algorithm.
 See also:
 Forward and backward algorithm implementations. A comprehensive
description of the forward, backward, and forwardbackground algorithm is here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm
Parameters:  sequence : arraylike
An array (or list) of observations.
Returns:  emissions : arraylike, shape (len(sequence), n_nonsilent_states)
The log normalized probabilities of each state generating each emission.

predict_proba
()¶ Calculate the state probabilities for each observation in the sequence.
Run the forwardbackward algorithm on the sequence and return the emission matrix. This is the normalized probability that each each state generated that emission given both the symbol and the entire sequence.
This is a sklearn wrapper for the forward backward algorithm.
 See also:
 Forward and backward algorithm implementations. A comprehensive
description of the forward, backward, and forwardbackground algorithm is here: http://en.wikipedia.org/wiki/Forward%E2%80%93backward_algorithm
Parameters:  sequence : arraylike
An array (or list) of observations.
Returns:  emissions : arraylike, shape (len(sequence), n_nonsilent_states)
The normalized probabilities of each state generating each emission.

probability
()¶ Return the probability of the given symbol under this distribution.
Parameters:  symbol : object
The symbol to calculate the probability of
Returns:  probability : double
The probability of that point under the distribution.

sample
()¶ Generate a sequence from the model.
Returns the sequence generated, as a list of emitted items. The model must have been baked first in order to run this method.
If a length is specified and the HMM is infinite (no edges to the end state), then that number of samples will be randomly generated. If the length is specified and the HMM is finite, the method will attempt to generate a prefix of that length. Currently it will force itself to not take an end transition unless that is the only path, making it not a true random sample on a finite model.
WARNING: If the HMM has no explicit end state, must specify a length to use.
Parameters:  n : int or None, optional
The number of samples to generate. If None, return only one sample.
 length : int, optional
Generate a sequence with a maximal length of this size. Used if you have no explicit end state. Default is 0.
 path : bool, optional
Return the path of hidden states in addition to the emissions. If true will return a tuple of (sample, path). Default is False.
 random_state : int, numpy.random.RandomState, or None
The random state used for generating samples. If set to none, a random seed will be used. If set to either an integer or a random seed, will produce deterministic outputs.
Returns:  sample : list or tuple
If path is true, return a tuple of (sample, path), otherwise return just the samples.

score
()¶ Return the accuracy of the model on a data set.
Parameters:  X : numpy.ndarray, shape=(n, d)
The values of the data set
 y : numpy.ndarray, shape=(n,)
The labels of each value

state_count
()¶ Returns the number of states present in the model.

summarize
()¶ Summarize data into stored sufficient statistics for outofcore training. Only implemented for BaumWelch training since Viterbi is less memory intensive.
Parameters:  sequences : arraylike
An array of some sort (list, numpy.ndarray, tuple..) of sequences, where each sequence is a numpy array, which is 1 dimensional if the HMM is a one dimensional array, or multidimensional if the HMM supports multiple dimensions.
 weights : arraylike or None, optional
An array of weights, one for each sequence to train on. If None, all sequences are equally weighted. Default is None.
 labels : arraylike or None, optional
An array of state labels for each sequence. This is only used in ‘labeled’ training. If used this must be comprised of n lists where n is the number of sequences to train on, and each of those lists must have one label per observation. Default is None.
 algorithm : ‘baumwelch’, ‘viterbi’, ‘labeled’
The training algorithm to use. BaumWelch uses the forwardbackward algorithm to train using a version of structured EM. Viterbi iteratively runs the sequences through the Viterbi algorithm and then uses hard assignments of observations to states using that. Default is ‘baumwelch’. Labeled training requires that labels are provided for each observation in each sequence.
 check_input : bool, optional
Check the input. This casts the input sequences as numpy arrays, and converts nonnumeric inputs into numeric inputs for faster processing later. Default is True.
Returns:  logp : double
The log probability of the sequences.

thaw
()¶ Thaw the distribution, reallowing updates to occur.

thaw_distributions
()¶ Thaw all distributions in the model.
Upon training distributions will be updated again.
Parameters:  None
Returns:  None

to_json
()¶ Serialize the model to a JSON.
Parameters:  separators : tuple, optional
The two separators to pass to the json.dumps function for formatting.
 indent : int, optional
The indentation to use at each level. Passed to json.dumps for formatting.
Returns:  json : str
A properly formatted JSON object.

to_yaml
()¶ Serialize the model to YAML for compactness.

viterbi
()¶ Run the Viteri algorithm on the sequence.
Run the Viterbi algorithm on the sequence given the model. This finds the ML path of hidden states given the sequence. Returns a tuple of the log probability of the ML path, or (inf, None) if the sequence is impossible under the model. If a path is returned, it is a list of tuples of the form (sequence index, state object).
This is fundamentally the same as the forward algorithm using max instead of sum, except the traceback is more complicated, because silent states in the current step can trace back to other silent states in the current step as well as states in the previous step.
 See also:
 Viterbi implementation described well in the wikipedia article
Parameters:  sequence : arraylike
An array (or list) of observations.
Returns:  logp : double
The log probability of the sequence under the Viterbi path
 path : list of tuples
Tuples of (state index, state object) of the states along the Viterbi path.