CSLU Toolkit Packages


Statenet
Version: 1.0
Created:
25 May 2003
Modified:
10 October 2005

Overview
The Statenet package creates a finite-state network of linked states, specifically for use as a Hidden Markov Model for speech recognition.


Synopsis
package require Statenet
statenet create  grammar  className [-startToken startTokenName] [-collapseIdentical collapseValue] [-selfLoops selfLoopsValue] [-oneLevelExpansion oneLevelValue]
statenet add  statenetObject  grammar  className [-collapseIdentical collapseValue] [-selfLoops selfLoopsValue] [-oneLevelExpansion oneLevelValue]
statenet addSpec  statenetObject recognizerSpecification  className   [-collapseIdentical collapseValue] [-selfLoops selfLoopsValue]  [-format stateFormat]
statenet prune statenetObject
statenet specUpdate  statenetObject  recognizerSpecification
statenet info statenetObject [-name returnInfo] [-numStates returnInfo] [-enter returnInfo] [-exit returnInfo] [-collapse returnInfo] [-selfLoops returnInfo] [-durModel returnInfo] [-sampFreq returnInfo] [-frameSize returnInfo] [-featuresURI returnInfo] [-contextURI returnInfo]    
statenet writeCDRules  parts cdRulesFile
statenet print  statenetObject

Parameters
Parameter Name
Type
Description
cdRulesFile
string
A file that specifies context-dependent rules for a grammar.  The format, which is a modification of ABNF, is described in the grammar format section.
className
string
The name of the grammar class.  The grammar class name is typically the type of expansion performed by the grammar.  For example, a top-level class name would usually be "grammar", the lexicon file would have the "word" grammar class, and the expansion of phonemes into states would have the "phoneme" grammar class.  The "word" grammar class is particularly important, as this class name will determine where in the state network word boundaries are specified.  The use of "phoneme" as a grammar class name will determine where in the state network phoneme boundaries are specified.  Other class names may be chosen according to the user's preference.
collapseValue
integer
If collapseValue is 1, then if a context-independent node in the state network expands to context-dependent nodes that have the same name, these context-dependent nodes will be collapsed into a single node with that name.  This may greatly reduce the size of the state network, and for most applications will work fine.  However, the collapsed state network is not guaranteed to be equivalent to the non-collapsed network in all cases.  If non-equivalence is suspected, then set collapseValue to 0 to not perform this collapsing of nodes.  The default value of collapseValue is 1.
grammar
string
The value of grammar may be either a filename or a Tcl list.  The Statenet package automatically determines if grammar is a filename or Tcl list.  The format for a grammar file or list is described in the grammar format section.
parts
string
The value of parts may be either a filename or a Tcl list.  The Statenet package automatically determines if parts is a filename or a Tcl list.  The format for a parts file or a parts list is described in the parts format section.  
recognizerSpecification string
The value of recognizerSpecification may be either a filename or a Tcl list.  The Statenet package automatically determines if recognizerSpecification is a filename or Tcl list.  The format for a specification file or list is described in the recognizer specification format section.  
returnInfo
integer
This value should be 1 if information about a parameter should be returned, or 0 if information about a parameter should not be returned.  The parameters are:
-name
The name of the top-level grammar
-numStates
The current number of states in the state network
-enter
The state ID of the enter node (usually 0)
-exit
The state ID of the exit node (usually 1)
-collapse
Whether identical context-dependent nodes are collapsed.  See collapseValue for a description.
-selfLoops
Whether self loops will be added to each state during creation of new states.  This is specified in the statenet create, statenet add, and statenet addspec commands.
-durModel
The type of duration model.  Valid types are "exponential", "gamma", and "minmax", although currently only "minmax" is supported.  This parameter is specified in recognizerSpecification.
-sampFreq
The sampling frequency used by the recognizer.  This parameter is specified in recognizerSpecification.  There is no default value.  
-frameSize
The frame size used by the recognizer.  This parameter is specified in recognizerSpecification. There is no default value.
-featuresURI
The URI that specifies the location and filename of Tcl code used by the recognizer to compute features for recognition.  This parameter is specified in recognizerSpecification.  Currently, the URI specification is restricted to local files only.
-contextURI
The URI that specifies the location and filename of Tcl code used by the recognizer to compute a context window of features for recognition.  This parameter is specified in recognizerSpecification.  Currently, the URI specification is restricted to local files only.
selfLoopsValue
integer
If selfLoopsValue is 1, then any state created by the statenet create, statenet add, or statenet addSpec commands will automatically receive a self-loop.  If selfLoopsValue is 0, then states do not recieve self loops.  In typical usage, selfLoopsValue is 0 when specifying the grammar and lexicon, and then set to 1 only at the very last expansion when recognizer-specific HMM states are created.  The default value of selfLoopsValue is 0.
startTokenName
string
startTokenName specifies the name of the root token that is the "top" of the grammar.  This name must exist in the grammar specification as a token for a context-independent rule.  The default value is "$grammar".
stateFormat
string
This parameter specifies the format for specifying states that is used in the recognizerSpecification description.  Currently, only one format is supported, "multistatebiphone".  This format allows for context-independent single-state monophones as well as context-dependent biphones specified in two or three states.  This format is described in the state specification format.  In the future, other formats, such as context-dependent triphones, are expected. Everything done by the statenet addSpec command can also be done by a statenet add command using context-dependent grammar rules; the statenet addSpec command simply processes known formats (e.g. multi-state biphones) faster that statenet add.
oneLevelValue
integer
If oneLevelValue is 0 (the default), then rules are continuously expanded until no more rules can be applied.  If oneLevelValue is 1, then only one "level" of rules is applied to a token within a single call to statenet create or statenet add.  For example, consider the case in which a lexicon contains the words "I" and "did", the pronunciation of "I" is (Worldbet) aI, and the pronunciation of "did" is dc d I dc [d].  When expanding a grammar, all occurrences of "I" will be expanded to aI, and all occurrences of "did" will be expanded to dc d I dc [d].  If oneLevelValue is 0 (default), then the I in dc d I dc [d] will be further expanded with the rule I -> aI, yielding dc d aI dc [d].  If oneLevelValue is 1, then this second "level" of applying rules is not performed, and the pronunciation of "did" remains dc d I dc [d].  Because lexicons in general only require one level of applying rules, the default for a lexicon grammar that is specified using the lexicon keyword within a higher-level grammar is that oneLevelValue set to 1.
Any value of oneLevelValue specified as part of a statenet create or statenet add command will override the default value.
statenetObject
statenet
object
This is the object returned by statenet create and used or modified by other Statenet commands as well as Viterbi search commands.


Description
The Statenet package creates a finite-state network of linked states, specifically for use as a Hidden Markov Model.  The grammar is specified in a modified form of ABNF, and is described in the grammar format section.   Both context-independent and context-dependent grammar rules are allowed.  Once the grammar has been specified to the level of classifier categories, an update is made to the statenet object to specify duration model parameters, sampling rate, frame size, and other recognizer-specific aspects of a Hidden Markov Model.  This object containing state network as well as recognizer-specific information is then used in the probability estimation and Viterbi search process.

statenet create creates an initial state network based on a grammar specification.  Typically, the grammar is specified in terms of words and connections between words.

statenet add adds to an existing statenet object using the specified grammar.  Typically, the statenet add command is used to expand a state network from words to phonemes.

statenet addSpec performs expansion of phonemes into context-dependent states for use by a recognizer, based on the information in a recognizer specification.  The result of statenet addSpec can be duplicated using a statenet add command with context-dependent rules, but statenet addSpec tends to be faster because the context-dependent rules are hard-coded.

statenet prune removes unnecessary nodes from a statenet object.  Pruning is done automatically after every statenet create, statenet add, and statenet addSpec command, and so the statenet prune command does not typically need to be used.  It is provided here for debugging purposes.

statenet specUpdate takes information from a recognizer specification and updates the state information in a statenet object.  
State-specific information is updated, such as the recognizer category index associated with each state as well as duration model parameters.  In addition, global information such as the frame size, sampling frequency, URIs that specify code for computing features and context windows is added to the statenet object at this time.

statenet info returns various parameter values in the statenet object.

statenet writeCDRules takes a small set of rules for creating context-dependent phonemes and generates (a much longer list of) context-dependent grammar rules that can be used by the statenet add command.  Typically, statenet writeCDRules is not used, as statenet addSpec provides the same functionality.  This function is provided primarily for debugging purposes.

statenet print writes information about the entire state network to stdout.  This function is provided for debugging purposes.


Example
The following example of Tcl code performs all steps in the recognition of a continuous digit string using the Statenet package.  The code and other data necessary to run this example are in the following files:

statenetExample.tcl
the following Tcl code
features.tcl
Tcl code for computing PLP features with RASTA
digit.grammar
the digits grammar, allowing any number of digits separated by optional pause or garbage
digit.lexicon
the specification of each word in terms of phonemes
digit.force.spec
the specification of all context-dependent states in the recognizer, as well as other information used during recognition
fanet.28
the neural network used for recognition
NU-78.zipcode.wav
the file containing the test waveform


# load in packages necessary for recognition
foreach package {Statenet Nnrun Wave TrainLibrary Garbage Feature \
    Encode Context Prep Mx Viterbi} {
    package require $package
    }

# these variables are global... used by the trainLibrary package
# to specify user-specific feature computation and context window
# code
global UserComputeFeatures
global UserComposeVector

# set the necessary parameters
set nnet_file    fanet.28
set wav_file     NU-78.zipcode.wav
set grammar_file "digit.grammar"
set lexicon_file "digit.lexicon"
set spec_file    "digit.force.spec"
set garbage      5

# create the state network for the digits task
set stateNet [statenet create "digit.grammar" "grammar" \
    -startToken {$grammar}]
statenet add $stateNet "digit.lexicon" "word"
statenet addSpec $stateNet "digit.force.spec" "phoneme" -selfLoops 1
statenet specUpdate $stateNet "digit.force.spec"

# set user-specific code for feature computation and context window
set featuresURI [lindex [statenet info $stateNet -featuresURI] 0]
if {$featuresURI != ""} {
    regexp {(.+)#(.+)} $featuresURI whole features_file features_proc
    source $features_file
    set UserComputeFeatures $features_proc
    }
set contextURI [lindex [statenet info $stateNet -contextURI] 0]
if {$contextURI != ""} {
    regexp {(.+)#(.+)} $contextURI whole context_file context_proc
    source $context_file
    set UserComposeVector $context_proc
    }

# set the sampling rate based on information in state network
set sampling_rate [expr int([statenet info -sampFreq $stateNet])]

# load in the neural network, compute features and context window,
# compute observation probabilities, and add in garbage estimation.
set nnet [nnet optload $nnet_file]
set wave [wave read $wav_file]
get_features $wave feat $sampling_rate
compose_feature_vector $wave $feat $sampling_rate myNetIn
set myNetOut [nnet x $nnet $myNetIn]
set myNetOutG [garbage median -N $garbage $myNetOut]

# do viterbi search and get answer
set vObj [viterbi init $stateNet]
viterbi search $vObj $stateNet $myNetOutG
set answer [viterbi answer $vObj $stateNet]

# write answer to stdout
foreach item [lindex $answer 1] {
    puts -nonewline "[lindex $item 2] "
    }
puts ""

# remove objects no longer needed
nuke $feat $wave
nuke $myNetIn $myNetOut $myNetOutG
nuke $stateNet
nuke $vObj


Returns
statenet create returns a statenet object.
statenet info returns a Tcl list containing information about the specified parameters.
statenet print writes to stdout but does not return anything.
other Statenet functions do not return anything.

See Also
The Viterbi package
The Specfeat package
The TTP package

Author
John-Paul Hosom, hosom@{cslu, bme, cse}.ogi.edu