1 百度CTC
https://github.com/baidu-research/warp-ctc/blob/master/README.zh_cn.md 优点:速度快很多。。。
2 CTC详解
总的来说就是想不对齐标签,来设计一个loss,通过最小化这个loss,可以得到精确的识别效果(即最后还能在不对齐标签的情况下解码出来),在语音识别方面效果和优势明显。 未完待续
3 解读百度warp-ctc参数以及例子
1 ctc函数
ctc(activations, flat_labels, label_lengths, input_lengths, blank_label=
0)
Computes
the CTC loss between
a sequence
of activations
and a
ground truth labeling.
Args:
activations: A
3-D Tensor
of floats. The dimensions
should be (t, n,
a), where t is
the time index, n
is
the minibatch index,
and a indexes over
activations
for each symbol
in the alphabet.
flat_labels: A
1-D Tensor
of ints,
a concatenation
of all
the
labels
for the minibatch.
label_lengths: A
1-D Tensor
of ints,
the length of each label
for each example
in the minibatch.
input_lengths: A
1-D Tensor
of ints,
the number of time steps
for each sequence
in the minibatch.
blank_label: int,
the label
value/index that
the CTC
calculation should use
as the blank label
Returns:
1-D float Tensor,
the cost
of each example
in the minibatch
(
as negative
log probabilities).
* This class performs
the softmax operation internally.
* The label reserved
for the blank symbol should be label
0.
2 基础测试 _test_basic输入解读
#开始activations维度为(2,5)
activations = np
.array([
[
0.1,
0.6,
0.1,
0.1,
0.1],
[
0.1,
0.1,
0.6,
0.1,
0.1]
], dtype=np
.float32)
alphabet_size =
5
# dimensions should be t, n, p: (t timesteps, n minibatches,
# p prob of each alphabet). This is one instance, so expand
# dimensions in the middle
#现在activations维度为(2,1,5),对应为(t,batch_size,dims)
activations = np
.expand_dims(activations,
1)
#label
labels = np
.asarray([
1,
2], dtype=np
.int32)
#每个minibatch中每个例子的每个label的长度
label_lengths = np
.asarray([
2], dtype=np
.int32)
#输入的时间序列长度
input_lengths = np
.asarray([
2], dtype=np
.int32)
3 多batch测试 输入解读
#开始activations维度为(2,5)
activations = np
.array([
[
0.1,
0.6,
0.1,
0.1,
0.1],
[
0.1,
0.1,
0.6,
0.1,
0.1]
], dtype=np
.float32)
alphabet_size =
5
# dimensions should be t, n, p: (t timesteps, n minibatches,
# p prob of each alphabet). This is one instance, so expand
# dimensions in the middle
#现在activations维度为(2,1,5),对应为(t,batch_size,dims)
_activations = np
.expand_dims(activations,
1)
#现在activations维度为(2,2,5),对应为(t,batch_size,dims)
activations = np
.concatenate([_activations, _activations[...]], axis=
1)
#flat labels
labels = np
.asarray([
1,
2,
1,
2], dtype=np
.int32)
#每个minibatch中每个例子的每个label的长度,然后再组合起来
label_lengths = np
.asarray([
2,
2], dtype=np
.int32)
#输入的时间序列长度,然后也再组合起来
input_lengths = np
.asarray([
2,
2], dtype=np
.int32)
转载请注明原文地址: https://ju.6miu.com/read-965424.html