August 6, 2016

Link: LSTM

Written Memories: Understanding, Deriving and Extending the LSTM - R2RT

In this post, we do a few things: We’ll define and describe RNNs generally, focusing on the limitations of vanilla RNNs that led to the development of the LSTM. We’ll describe the intuitions behind the LSTM architecture, which will enable us to build up to and derive the LSTM. Along the way we will derive the GRU. We’ll also derive a pseudo LSTM, which we’ll see is better in principle and performance to the standard LSTM. We’ll then extend these intuitions to show how they lead directly to a few recent and exciting architectures: highway and residual networks, and Neural Turing Machines.

Neural系言語処理,全く追いつけていないので勉強しないと… (精度が高いとか流行りだからとかではなく,複合的なタスクが解けるようになるような新しさに興味がある)

©2011-2018 tuxedocat