Dropout of RNN, batch size, accumulation steps

Dropout

Srivastava et al. (2014) applied dropout to feed forward neural network’s and RBM’s and noted a probability of dropout around 0.5 for hidden units and 0.2 for inputs worked well for a variety of tasks.

Reference: A review of Dropout as applied to RNNs

When I apply the 0.5 for hidden units and 0.2 for inputs, it works well. But it is not the case in decoder. In decoder, I would suggest not to use dropout.

Batch Size

Suggest <=32， See Efficient Mini-batch Training for Stochastic Optimization and this RNN study

Accumulation_steps

Have the same function like batch size, but it can be use when Graphics memory is not enough.

Yanjian Zhang's blog

Dropout of RNN, batch size, accumulation steps
Click back to the top

Home

About

Archives

Dropout of RNN, batch size, accumulation steps

Dropout

Batch Size

Accumulation_steps