When ReLU’s extrapolation capabilities aren’t sufficient
![Towards Data Science](https://miro.medium.com/v2/resize:fill:48:48/1*CJe3891yB1A1mzMdqemkdg.jpeg)
Neural networks are recognized to be nice approximators for any operate — a minimum of every time we don’t transfer too distant from our dataset. Allow us to see what which means. Right here is a few knowledge:
It doesn’t solely appear like a sine wave, it really is, with some noise added. We will now prepare a traditional feed-forward neural community having 1 hidden layer with 1000 neurons and ReLU activation. We get the next match:
It seems fairly respectable, other than the perimeters. We might repair this by including extra neurons to the hidden layer in keeping with Cybenko’s common approximation theorem. However I need to level you one thing else:
We might argue now that this extrapolation conduct is dangerous if we assume the wave sample to proceed outdoors of the noticed vary. But when there isn’t any area information or extra knowledge we will resort to, it will simply be this: an assumption.
Nevertheless, within the the rest of this text, we are going to assume that any periodic sample we will choose up inside the knowledge continues outdoors as nicely. This can be a widespread assumption when doing time collection modeling, the place we naturally need to extrapolate into the long run. We assume that any noticed seasonality within the coaching knowledge will simply proceed like that, as a result of what else can we are saying with none further data? On this article, I need to present you ways utilizing sine-based activation capabilities helps bake this assumption into the mannequin.
However earlier than we go there, allow us to shortly dive deeper into how ReLU-based neural networks extrapolate normally, and why we should always not use them for time collection forecasting as is.