import torch.nn as nn
import torch
 
rnn = nn.GRU(10, 20, 2, bidirectional=True)
input = torch.randn(5 ,3, 10)
 
h0 = input.new_zeros(4, 3, 20)
x, h_n = rnn(input, h0)
x.shape # torch.Size([5, 3, 40])

nn.GRU(input_dim, hidden_dim, layer_num,..., bidirectional)

  • 输入格式:(seq_len, batch, input_size)
  • 每一层有两个输出:
    • 一个是输入的隐藏层向量,经过前文加工后的新向量。如果使用了 bidirectional最终的输出结果,维度会是 2 * hidden_dim,也就是 (seq_len, batch, hidden_dim * 2)
    • 一个是顺序考虑每一层前后文后产生的向量,维度和 hidden_size 一致,输出的序列长度(第一维)为 layer_num 层数,但是如果开了 bidirectional,那么双向的向量会按照:前 1,后 1,前 2,后 2 … 排列。最终输出为:(layer_num * 2, batch, hidden_dim)

可以用:

def combine_bidir(self, outs, batch_size):  
    out = outs.view(self.num_layers, 2, batch_size, -1).transpose(1, 2).contiguous()  
    return out.view(self.num_layers, batch_size, -1)
# 其中,outs 就是前文的 h_n

来拼接输出向量,变为 (layer_num, batch, hidden_dim * 2)