candlewill/kaldi.md

## kaldi.md

      
    Raw
  

              kaldi.md
            
          
    Kaldi nnet3 教程: nnet3中的数据类型

引言

nnet3目标是支持更加通用的网络结构。希望通过简单的配置文件，就可以构造出复杂的网络结构（LSTMs、RNNs）。和nnnet2一样，nnet3支持多机多GPU训练。
nnet3中的数据类型

目标与背景介绍

nnet1和nnet2是基于Componet对象构建网络结构。通过堆叠多层Componet定义多层的网络结构。要想构建复杂网络结构，nnet1需要嵌套Component；要想实现LSTM，nnet1中提供了用C++实现的Component。在nnet2中，为了支持沿着时间步进行特征拼接（splicing），网络定义时引入了时间索引（time index）概念。这样就可以实现诸如TDNN之类的网络。
nnet1和nnet2中的定义方式，nnet3都继续支持；同时，还提供了更多方式。基于配置文件方式的网络方式，可以不用写一句C++代码，就可实现许多复杂网络，或实现一些新想法。
概要

nnet3网络定义有两部分组成：

一组Components，每个Components有个名字，顺序没关系；
网络图结构定义，就像“胶水”一样，确定各个Components如何连接在一起；

网络图会通过名字引用需要用到的Components，这可以实现特定类型的参数共享。这个网络图“胶水”，可以实现RNN，告诉t依赖于t-1；也可以解决特定边在特定时刻的特殊处理。配置文件例子如下（后续详解）：
# First the components
component name=affine1 type=NaturalGradientAffineComponent input-dim=48 output-dim=65
component name=relu1 type=RectifiedLinearComponent dim=65
component name=affine2 type=NaturalGradientAffineComponent input-dim=65 output-dim=115
component name=logsoftmax type=LogSoftmaxComponent dim=115

# Next the nodes
input-node name=input dim=12
component-node name=affine1_node component=affine1 input=Append(Offset(input, -1), Offset(input, 0), Offset(input, 1), Offset(input, 2))
component-node name=nonlin1 component=relu1 input=affine1_node
component-node name=affine2 component=affine2 input=nonlin1
component-node name=output_nonlin component=logsoftmax input=affine2
output-node name=output input=output_nonlin
上述配置文件由两部分组成，Components和Graph。这两部分再加上输入和输出，就可以构建一个计算图（computation graph）。这一计算图是一个无环图，其节点表示了对向量的一种操作。
使用神经网络（训练或者解码）的流程如下：

用户提供计算请求（ComputationRequest），告知什么输入的什么位置（indexes）存在，以及需要请求何种输出；
ComputationRequest和一个神经网络（泛指所有NN，包括RNN，LSTM等）会被编译成一系列的命令，成为NnetComputation；
加速。这些NnetComputation会进一步优化加速（有点像gcc的-O标志）；
通过类NnetComputer 类接收矩阵形式的输入、评估NnetComputation、以及提供矩阵形式的输出。

nnet3基本数据结构

indexes

Indexes是一个三元组(n, t, x)，其中n是一个minibatch中的索引；t是时间索引(time index)；x是placeholder index，将来可能会用到，目前取零即可。
举例，如果我们想训练前向NN网络，indexes可以只改变n这个维度：
 [ (0, 0, 0)  (1, 0, 0)  (2, 0, 0) ... ]

另一方便，如果我想解码（decode），输入是一条样本，那么indexes可以只改变t这个维度：
 [ (0, 0, 0)  (0, 1, 0)  (0, 2, 0) ... ]

Cindexes

Cindex是一个pair（类似于Python中的dict）：(int32, Index)，其中int32对应神经网络中节点index。
Components

配置文件

# First the components
component name=affine1 type=NaturalGradientAffineComponent input-dim=48 output-dim=65
component name=relu1 type=RectifiedLinearComponent dim=65
component name=affine2 type=NaturalGradientAffineComponent input-dim=65 output-dim=115
component name=logsoftmax type=LogSoftmaxComponent dim=115
# Next the nodes
input-node name=input dim=12
component-node name=affine1_node component=affine1 input=Append(Offset(input, -1), Offset(input, 0), Offset(input, 1), Offset(input, 2))
component-node name=nonlin1 component=relu1 input=affine1_node
component-node name=affine2 component=affine2 input=nonlin1
component-node name=output_nonlin component=logsoftmax input=affine2
output-node name=output input=output_nonlin
描述器 Descriptors

# caution, this is a simplification that overgenerates descriptors.
<descriptor>  ::=   <node-name>      ;; node name of kInput or kComponent node.
<descriptor>  ::=   Append(<descriptor>, <descriptor> [, <descriptor> ... ] )
<descriptor>  ::=   Sum(<descriptor>, <descriptor>)
<descriptor>  ::=   Const(<value>, <dimension>)    ;; e.g. Const(1.0, 512)
<descriptor>  ::=   Scale(<scale>, <descriptor>)   ;; e.g. Scale(-1.0, tdnn2)
;; Failover or IfDefined might be useful for time t=-1 in a RNN, for instance.
<descriptor>  ::=   Failover(<descriptor>, <descriptor>)   ;; 1st arg if computable, else 2nd
<descriptor>  ::=   IfDefined(<descriptor>)     ;; the arg if defined, else zero.
<descriptor>  ::=   Offset(<descriptor>, <t-offset> [, <x-offset> ] ) ;; offsets are integers
;; Switch(...) is intended to be used in clockwork RNNs or similar schemes.  It chooses
;; one argument based on the value of t (in the requested Index) modulo the number of
;; arguments
<descriptor>  ::=   Switch(<descriptor>, <descriptor> [, <descriptor> ...])
;; For use in clockwork RNNs or similar, Round() rounds the time-index t of the
;; requested Index to the next-lowest multiple of the integer <t-modulus>,
;; and evaluates the input argument for the resulting Index.
<descriptor>  ::=   Round(<descriptor>, <t-modulus>)  ;; <t-modulus> is an integer
;; ReplaceIndex replaces some <variable-name> (t or x) in the requested Index
;; with a fixed integer <value>.  E.g. might be useful when incorporating
;; iVectors; iVector would always have time-index t=0.
<descriptor>  ::=   ReplaceIndex(<descriptor>, <variable-name>, <value>)