Topics in machine learning#

  • KEYWORDS: autograd

Choice of activation functions in neural networks#

The activation function in a neural network provides the nonlinearity in the model. We previously learned that one interpretation of the activation function is that it is a basis function that you can expand the data in to find a functional representation that fits the data.

Today we explore the impact of the activation function on the fitting, and extrapolation of neural networks. The following code is for setting up a neural network, and initializing the parameters with random numbers.

layer_sizes = [1, 3, 1]
list(zip(layer_sizes[:-1], layer_sizes[1:]))
[(1, 3), (3, 1)]
import autograd.numpy as np
import autograd.numpy.random as npr

def nn(params, inputs, activation=np.tanh):
    """a neural network.
    params is a list of (weights, bias) for each layer.
    inputs goes into the nn. Each row corresponds to one output label.
    activation is the nonlinear activation function.
    """
    for W, b in params[:-1]:
        outputs = np.dot(inputs, W) + b
        inputs = activation(outputs)
    # no activation on the last layer
    W, b = params[-1]
    return np.dot(inputs, W) + b

def init_random_params(scale, layer_sizes, rs=npr.RandomState(0)):
    """Build a list of (weights, biases) tuples, one for each layer."""
    return [(rs.randn(insize, outsize) * scale,   # weight matrix
             rs.randn(outsize) * scale)           # bias vector
            for insize, outsize in zip(layer_sizes[:-1], layer_sizes[1:])]

init_random_params(0.1, (1, 3, 1))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 import autograd.numpy as np
      2 import autograd.numpy.random as npr
      4 def nn(params, inputs, activation=np.tanh):

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/__init__.py:2
      1 from __future__ import absolute_import
----> 2 from .differential_operators import (
      3     make_vjp, grad, multigrad_dict, elementwise_grad, value_and_grad,
      4     grad_and_aux, hessian_tensor_product, hessian_vector_product, hessian,
      5     jacobian, tensor_jacobian_product, vector_jacobian_product, grad_named,
      6     checkpoint, make_hvp, make_jvp, make_ggnvp, deriv, holomorphic_grad)
      7 from .builtins import isinstance, type, tuple, list, dict
      8 from autograd.core import primitive_with_deprecation_warnings as primitive

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/differential_operators.py:16
     13 from .core import make_vjp as _make_vjp, make_jvp as _make_jvp
     14 from .extend import primitive, defvjp_argnum, vspace
---> 16 import autograd.numpy as np
     18 make_vjp = unary_to_nary(_make_vjp)
     19 make_jvp = unary_to_nary(_make_jvp)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/__init__.py:5
      3 from . import numpy_boxes
      4 from . import numpy_vspaces
----> 5 from . import numpy_vjps
      6 from . import numpy_jvps
      7 from . import linalg

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/numpy_vjps.py:563
    561     return lambda g: unpermuter(g, sort_perm)
    562 defvjp(anp.sort, grad_sort)
--> 563 defvjp(anp.msort, grad_sort)  # Until multi-D is allowed, these are the same.
    565 def grad_partition(ans, x, kth, axis=-1, kind='introselect', order=None):
    566     #TODO: Cast input with np.asanyarray()
    567     if len(x.shape) > 1:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/tracer.py:48, in primitive.<locals>.f_wrapped(*args, **kwargs)
     46     return new_box(ans, trace, node)
     47 else:
---> 48     return f_raw(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/numpy/__init__.py:410, in __getattr__(attr)
    407     import numpy.char as char
    408     return char.chararray
--> 410 raise AttributeError("module {!r} has no attribute "
    411                      "{!r}".format(__name__, attr))

AttributeError: module 'numpy' has no attribute 'msort'

As before, we are going to consider this dataset so we can evaluate fitting and extrapolation.

# Some generated data
X = np.linspace(0, 1)
Y = X**(1. / 3.)


import matplotlib.pyplot as plt
plt.plot(X, Y, 'b.')
plt.xlabel('x')
plt.ylabel('y');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 2
      1 # Some generated data
----> 2 X = np.linspace(0, 1)
      3 Y = X**(1. / 3.)
      6 import matplotlib.pyplot as plt

NameError: name 'np' is not defined

tanh#

First we review the case of tanh which is a classic activation function. The tanh function is “active” between about ± 2.5, and outside that window it saturates. That means the derivative of this function becomes close to zero outside that window. So if you have large values of inputs, you should scale them to avoid this issue.

xt = np.linspace(-10, 10)
plt.plot(xt, np.tanh(xt))
plt.xlabel('x')
plt.ylabel('y');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 xt = np.linspace(-10, 10)
      2 plt.plot(xt, np.tanh(xt))
      3 plt.xlabel('x')

NameError: name 'np' is not defined
def objective1(params, step=None):
    pred = nn(params, np.array([X]).T)
    err = np.array([Y]).T - pred
    return np.mean(err**2)

from autograd.misc.optimizers import adam
from autograd import grad

params1 = init_random_params(0.1, layer_sizes=[1, 3, 1])

N = 50
MAX_EPOCHS = 500

for i in range(MAX_EPOCHS):
    params1 = adam(grad(objective1), params1,
                  step_size=0.01, num_iters=N)
    if i % 100 == 0:  # print every 100th step
        print(f'Step {i}: {objective1(params1)}')
    if objective1(params1, _) < 2e-5:
        print('Tolerance reached, stopping')
        break
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[5], line 6
      3     err = np.array([Y]).T - pred
      4     return np.mean(err**2)
----> 6 from autograd.misc.optimizers import adam
      7 from autograd import grad
      9 params1 = init_random_params(0.1, layer_sizes=[1, 3, 1])

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/__init__.py:2
      1 from __future__ import absolute_import
----> 2 from .differential_operators import (
      3     make_vjp, grad, multigrad_dict, elementwise_grad, value_and_grad,
      4     grad_and_aux, hessian_tensor_product, hessian_vector_product, hessian,
      5     jacobian, tensor_jacobian_product, vector_jacobian_product, grad_named,
      6     checkpoint, make_hvp, make_jvp, make_ggnvp, deriv, holomorphic_grad)
      7 from .builtins import isinstance, type, tuple, list, dict
      8 from autograd.core import primitive_with_deprecation_warnings as primitive

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/differential_operators.py:16
     13 from .core import make_vjp as _make_vjp, make_jvp as _make_jvp
     14 from .extend import primitive, defvjp_argnum, vspace
---> 16 import autograd.numpy as np
     18 make_vjp = unary_to_nary(_make_vjp)
     19 make_jvp = unary_to_nary(_make_jvp)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/__init__.py:5
      3 from . import numpy_boxes
      4 from . import numpy_vspaces
----> 5 from . import numpy_vjps
      6 from . import numpy_jvps
      7 from . import linalg

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/numpy_vjps.py:563
    561     return lambda g: unpermuter(g, sort_perm)
    562 defvjp(anp.sort, grad_sort)
--> 563 defvjp(anp.msort, grad_sort)  # Until multi-D is allowed, these are the same.
    565 def grad_partition(ans, x, kth, axis=-1, kind='introselect', order=None):
    566     #TODO: Cast input with np.asanyarray()
    567     if len(x.shape) > 1:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/tracer.py:48, in primitive.<locals>.f_wrapped(*args, **kwargs)
     46     return new_box(ans, trace, node)
     47 else:
---> 48     return f_raw(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/numpy/__init__.py:410, in __getattr__(attr)
    407     import numpy.char as char
    408     return char.chararray
--> 410 raise AttributeError("module {!r} has no attribute "
    411                      "{!r}".format(__name__, attr))

AttributeError: module 'numpy' has no attribute 'msort'
params1
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 params1

NameError: name 'params1' is not defined

Now we can examine the fit and extrapolation.

X2 = np.linspace(-2, 10)
Y2 = X2**(1/3)
Z2 = nn(params1, X2.reshape([-1, 1]))

plt.plot(X2, Y2, 'b.', label='analytical')
plt.plot(X2, Z2, label='NN')
plt.fill_between(X2 < 1, 0, 1.4, facecolor='gray', alpha=0.5)
plt.xlabel('x')
plt.ylabel('y');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 X2 = np.linspace(-2, 10)
      2 Y2 = X2**(1/3)
      3 Z2 = nn(params1, X2.reshape([-1, 1]))

NameError: name 'np' is not defined

For large enough \(x\), all of the tanh functions saturate at \(y=1\). So, the neural network also saturates at a constant value for large \(x\).

exercise Can you work out from the NN math what the saturated values should be?

relu#

A common activation function in deep learning is the Relu:

def relu(x):
    return x * (x > 0)

plt.plot(X2, relu(X2));
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 4
      1 def relu(x):
      2     return x * (x > 0)
----> 4 plt.plot(X2, relu(X2));

NameError: name 'plt' is not defined

This is popular because if is very fast to compute, and the derivatives are constant. For positive \(x\) there is no saturation. For negative \(x\), however, the neuron is “dead”.

def objective2(par, step=None):
    pred = nn(par, np.array([X]).T, activation=relu)
    err = np.array([Y]).T - pred
    return np.mean(err**2)

from autograd.misc.optimizers import adam
from autograd import grad

params2 = init_random_params(0.01, layer_sizes=[1, 3, 1])

N = 50
MAX_EPOCHS = 500

for i in range(MAX_EPOCHS):
    params2 = adam(grad(objective2), params2,
                  step_size=0.01, num_iters=N)
    if i % 100 == 0:  # print every 100th step
        print(f'Step {i}: {objective2(params2)}')
    if objective2(params2, _) < 2e-5:
        print('Tolerance reached, stopping')
        break
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[9], line 6
      3     err = np.array([Y]).T - pred
      4     return np.mean(err**2)
----> 6 from autograd.misc.optimizers import adam
      7 from autograd import grad
      9 params2 = init_random_params(0.01, layer_sizes=[1, 3, 1])

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/__init__.py:2
      1 from __future__ import absolute_import
----> 2 from .differential_operators import (
      3     make_vjp, grad, multigrad_dict, elementwise_grad, value_and_grad,
      4     grad_and_aux, hessian_tensor_product, hessian_vector_product, hessian,
      5     jacobian, tensor_jacobian_product, vector_jacobian_product, grad_named,
      6     checkpoint, make_hvp, make_jvp, make_ggnvp, deriv, holomorphic_grad)
      7 from .builtins import isinstance, type, tuple, list, dict
      8 from autograd.core import primitive_with_deprecation_warnings as primitive

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/differential_operators.py:16
     13 from .core import make_vjp as _make_vjp, make_jvp as _make_jvp
     14 from .extend import primitive, defvjp_argnum, vspace
---> 16 import autograd.numpy as np
     18 make_vjp = unary_to_nary(_make_vjp)
     19 make_jvp = unary_to_nary(_make_jvp)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/__init__.py:5
      3 from . import numpy_boxes
      4 from . import numpy_vspaces
----> 5 from . import numpy_vjps
      6 from . import numpy_jvps
      7 from . import linalg

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/numpy_vjps.py:563
    561     return lambda g: unpermuter(g, sort_perm)
    562 defvjp(anp.sort, grad_sort)
--> 563 defvjp(anp.msort, grad_sort)  # Until multi-D is allowed, these are the same.
    565 def grad_partition(ans, x, kth, axis=-1, kind='introselect', order=None):
    566     #TODO: Cast input with np.asanyarray()
    567     if len(x.shape) > 1:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/tracer.py:48, in primitive.<locals>.f_wrapped(*args, **kwargs)
     46     return new_box(ans, trace, node)
     47 else:
---> 48     return f_raw(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/numpy/__init__.py:410, in __getattr__(attr)
    407     import numpy.char as char
    408     return char.chararray
--> 410 raise AttributeError("module {!r} has no attribute "
    411                      "{!r}".format(__name__, attr))

AttributeError: module 'numpy' has no attribute 'msort'
X2 = np.linspace(0., 1)
Y2 = X2**(1/3)
Z2 = nn(params2, X2.reshape([-1, 1]), activation=relu)

plt.plot(X2, Y2, 'b.', label='analytical')
plt.plot(X2, Z2, label='NN')
plt.xlabel('x')
plt.ylabel('y');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 X2 = np.linspace(0., 1)
      2 Y2 = X2**(1/3)
      3 Z2 = nn(params2, X2.reshape([-1, 1]), activation=relu)

NameError: name 'np' is not defined
params2
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 params2

NameError: name 'params2' is not defined

Notes:

  1. The fit is not very good.

  2. we have piecewise linear fits here.

  3. There are negative weights, which means there are some “dead neurons”. Maybe other initial guesses might improve this.

Let’s look at the extrapolating behavior.

X2 = np.linspace(0, 1)
Y2 = X2**(1/3)

xf = np.linspace(-2, 2)
Z2 = nn(params2, xf.reshape([-1, 1]), activation=relu)

plt.plot(X2, Y2, 'b.', label='analytical')
plt.plot(xf, Z2, label='NN')
plt.fill_between(X2 < 1, 0, 1.4, facecolor='gray', alpha=0.5)
plt.xlabel('x')
plt.ylabel('y');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 X2 = np.linspace(0, 1)
      2 Y2 = X2**(1/3)
      4 xf = np.linspace(-2, 2)

NameError: name 'np' is not defined

Notes this extrapolates linearly on the right, and is constant on the left. These are properties of the Relu.

Gaussian (radial basis function)#

Finally we consider the Gaussian activation function.

def rbf(x):
    return np.exp(-x**2)

x3 = np.linspace(-3, 3)
plt.plot(x3, rbf(x3));
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 4
      1 def rbf(x):
      2     return np.exp(-x**2)
----> 4 x3 = np.linspace(-3, 3)
      5 plt.plot(x3, rbf(x3));

NameError: name 'np' is not defined

Now we fit the data.

def objective3(pars, step=None):
    pred = nn(pars, np.array([X]).T, activation=rbf)
    err = np.array([Y]).T - pred
    return np.mean(err**2)

from autograd.misc.optimizers import adam
from autograd import grad

params3 = init_random_params(0.1, layer_sizes=[1, 3, 1])

N = 50
MAX_EPOCHS = 500

for i in range(MAX_EPOCHS):
    params3 = adam(grad(objective3), params3,
                  step_size=0.01, num_iters=N)
    if i % 100 == 0:  # print every 100th step
        print(f'Step {i}: {objective3(params3)}')
    if objective3(params3, _) < 2e-5:
        print('Tolerance reached, stopping')
        break
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[14], line 6
      3     err = np.array([Y]).T - pred
      4     return np.mean(err**2)
----> 6 from autograd.misc.optimizers import adam
      7 from autograd import grad
      9 params3 = init_random_params(0.1, layer_sizes=[1, 3, 1])

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/__init__.py:2
      1 from __future__ import absolute_import
----> 2 from .differential_operators import (
      3     make_vjp, grad, multigrad_dict, elementwise_grad, value_and_grad,
      4     grad_and_aux, hessian_tensor_product, hessian_vector_product, hessian,
      5     jacobian, tensor_jacobian_product, vector_jacobian_product, grad_named,
      6     checkpoint, make_hvp, make_jvp, make_ggnvp, deriv, holomorphic_grad)
      7 from .builtins import isinstance, type, tuple, list, dict
      8 from autograd.core import primitive_with_deprecation_warnings as primitive

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/differential_operators.py:16
     13 from .core import make_vjp as _make_vjp, make_jvp as _make_jvp
     14 from .extend import primitive, defvjp_argnum, vspace
---> 16 import autograd.numpy as np
     18 make_vjp = unary_to_nary(_make_vjp)
     19 make_jvp = unary_to_nary(_make_jvp)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/__init__.py:5
      3 from . import numpy_boxes
      4 from . import numpy_vspaces
----> 5 from . import numpy_vjps
      6 from . import numpy_jvps
      7 from . import linalg

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/numpy_vjps.py:563
    561     return lambda g: unpermuter(g, sort_perm)
    562 defvjp(anp.sort, grad_sort)
--> 563 defvjp(anp.msort, grad_sort)  # Until multi-D is allowed, these are the same.
    565 def grad_partition(ans, x, kth, axis=-1, kind='introselect', order=None):
    566     #TODO: Cast input with np.asanyarray()
    567     if len(x.shape) > 1:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/tracer.py:48, in primitive.<locals>.f_wrapped(*args, **kwargs)
     46     return new_box(ans, trace, node)
     47 else:
---> 48     return f_raw(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/numpy/__init__.py:410, in __getattr__(attr)
    407     import numpy.char as char
    408     return char.chararray
--> 410 raise AttributeError("module {!r} has no attribute "
    411                      "{!r}".format(__name__, attr))

AttributeError: module 'numpy' has no attribute 'msort'
X2 = np.linspace(0., 1)
Y2 = X2**(1/3)
Z2 = nn(params3, X2.reshape([-1, 1]), activation=rbf)

plt.plot(X2, Y2, 'b.', label='analytical')
plt.plot(X2, Z2, label='NN')
plt.xlabel('x')
plt.ylabel('y');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 X2 = np.linspace(0., 1)
      2 Y2 = X2**(1/3)
      3 Z2 = nn(params3, X2.reshape([-1, 1]), activation=rbf)

NameError: name 'np' is not defined

Note we have piecewise linear fits here.

X2 = np.linspace(-2.5, 5)
Y2 = X2**(1/3)
Z2 = nn(params3, X2.reshape([-1, 1]), activation=rbf)

plt.plot(X2, Y2, 'b.', label='analytical')
plt.plot(X2, Z2, label='NN')
plt.fill_between(X2 < 1, 0, 1.4, facecolor='gray', alpha=0.5)
plt.xlabel('x')
plt.ylabel('y');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 X2 = np.linspace(-2.5, 5)
      2 Y2 = X2**(1/3)
      3 Z2 = nn(params3, X2.reshape([-1, 1]), activation=rbf)

NameError: name 'np' is not defined

Notes this extrapolates to zero when you are far from the data. It fits reasonably in the region trained. “If your function is nonlinear enough, somewhere the nonlinearity matches your data.” (Z. Ulissi).

def objective33(pars, step=None):
    pred = nn(pars, np.array([X]).T, activation=np.sin)
    err = np.array([Y]).T - pred
    return np.mean(err**2)

from autograd.misc.optimizers import adam
from autograd import grad

params33 = init_random_params(0.1, layer_sizes=[1, 3, 1])

N = 50
MAX_EPOCHS = 500

for i in range(MAX_EPOCHS):
    params33 = adam(grad(objective33), params33,
                  step_size=0.01, num_iters=N)
    if i % 100 == 0:  # print every 100th step
        print(f'Step {i}: {objective33(params33)}')
    if objective33(params33, _) < 2e-5:
        print('Tolerance reached, stopping')
        break
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[17], line 6
      3     err = np.array([Y]).T - pred
      4     return np.mean(err**2)
----> 6 from autograd.misc.optimizers import adam
      7 from autograd import grad
      9 params33 = init_random_params(0.1, layer_sizes=[1, 3, 1])

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/__init__.py:2
      1 from __future__ import absolute_import
----> 2 from .differential_operators import (
      3     make_vjp, grad, multigrad_dict, elementwise_grad, value_and_grad,
      4     grad_and_aux, hessian_tensor_product, hessian_vector_product, hessian,
      5     jacobian, tensor_jacobian_product, vector_jacobian_product, grad_named,
      6     checkpoint, make_hvp, make_jvp, make_ggnvp, deriv, holomorphic_grad)
      7 from .builtins import isinstance, type, tuple, list, dict
      8 from autograd.core import primitive_with_deprecation_warnings as primitive

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/differential_operators.py:16
     13 from .core import make_vjp as _make_vjp, make_jvp as _make_jvp
     14 from .extend import primitive, defvjp_argnum, vspace
---> 16 import autograd.numpy as np
     18 make_vjp = unary_to_nary(_make_vjp)
     19 make_jvp = unary_to_nary(_make_jvp)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/__init__.py:5
      3 from . import numpy_boxes
      4 from . import numpy_vspaces
----> 5 from . import numpy_vjps
      6 from . import numpy_jvps
      7 from . import linalg

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/numpy_vjps.py:563
    561     return lambda g: unpermuter(g, sort_perm)
    562 defvjp(anp.sort, grad_sort)
--> 563 defvjp(anp.msort, grad_sort)  # Until multi-D is allowed, these are the same.
    565 def grad_partition(ans, x, kth, axis=-1, kind='introselect', order=None):
    566     #TODO: Cast input with np.asanyarray()
    567     if len(x.shape) > 1:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/tracer.py:48, in primitive.<locals>.f_wrapped(*args, **kwargs)
     46     return new_box(ans, trace, node)
     47 else:
---> 48     return f_raw(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/numpy/__init__.py:410, in __getattr__(attr)
    407     import numpy.char as char
    408     return char.chararray
--> 410 raise AttributeError("module {!r} has no attribute "
    411                      "{!r}".format(__name__, attr))

AttributeError: module 'numpy' has no attribute 'msort'
X2 = np.linspace(-15, 5)
Y2 = X2**(1/3)
Z2 = nn(params3, X2.reshape([-1, 1]), activation=np.sin)

plt.plot(X2, Y2, 'b.', label='analytical')
plt.plot(X2, Z2, label='NN')
plt.fill_between(X2 < 1, 0, 1.4, facecolor='gray', alpha=0.5)
plt.xlabel('x')
plt.ylabel('y');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 1
----> 1 X2 = np.linspace(-15, 5)
      2 Y2 = X2**(1/3)
      3 Z2 = nn(params3, X2.reshape([-1, 1]), activation=np.sin)

NameError: name 'np' is not defined

Exercise how many neurons do you need to get a better fit for sin as the activation function.

Summary#

We can think of single layer neural networks as partial expansions in the activation function space. That means the extrapolation behavior will be like the dominating feature of the activation functions, e.g. relu extrapolates like a line, tanh saturates at large x, and Gaussians effectively go to zero. Unexpected things can happen at the edges of the data, so at intermediate extrapolations you do not always know what will happen.

Train/test splits on data#

So far we have not considered how to split your data when fitting. This becomes important for a few reasons:

  1. We need to be able to tell if we are overfitting. One way to do this is to compare fitting errors to prediction errors.

This means we need a way to split a dataset into a train set and a test set. Then, we can do training on the train set, and testing on the test set.

Let’s start by remembering what our dataset is.

X = np.linspace(0, 1)
Y2 = X**(1/3)
X, Y2
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 X = np.linspace(0, 1)
      2 Y2 = X**(1/3)
      3 X, Y2

NameError: name 'np' is not defined

The way to split this is that we use indexing. We start by making an array of integers.

ind = np.arange(len(X))
ind
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 ind = np.arange(len(X))
      2 ind

NameError: name 'np' is not defined

Next, we randomly shuffle the array of integers.

pind = np.random.permutation(ind)
pind
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 pind = np.random.permutation(ind)
      2 pind

NameError: name 'np' is not defined

Next, we decide on the train/test split. A common choice is 80/20. We find the integer that is closest to 80% of the index array.

split = int(0.8 * len(pind))
split
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 split = int(0.8 * len(pind))
      2 split

NameError: name 'pind' is not defined
train_ind = pind[:split]
test_ind = pind[split:]
print(len(train_ind), len(test_ind))
test_ind
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 train_ind = pind[:split]
      2 test_ind = pind[split:]
      3 print(len(train_ind), len(test_ind))

NameError: name 'pind' is not defined

We check that we have a reasonable choice here.

train_x = X[train_ind]
train_y = Y2[train_ind]

test_x = X[test_ind]
test_y = Y2[test_ind]
plt.plot(test_x, test_y, 'ro')
plt.plot(train_x, train_y, 'bo')
plt.legend(['test','train']);
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 train_x = X[train_ind]
      2 train_y = Y2[train_ind]
      4 test_x = X[test_ind]

NameError: name 'X' is not defined

Now, we train on the train data.

def objective10(params, step=None):
    pred = nn(params, np.array([train_x]).T)
    err = np.array([train_y]).T - pred
    return np.mean(err**2)

from autograd.misc.optimizers import adam
from autograd import grad

params10 = init_random_params(0.1, layer_sizes=[1, 3, 1])

N = 50
MAX_EPOCHS = 500

for i in range(MAX_EPOCHS):
    params10 = adam(grad(objective10), params10,
                  step_size=0.01, num_iters=N)
    if i % 100 == 0:  # print every 100th step
        print(f'Step {i}: {objective10(params10)}')
    if objective10(params10, _) < 2e-5:
        print('Tolerance reached, stopping')
        break
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[25], line 6
      3     err = np.array([train_y]).T - pred
      4     return np.mean(err**2)
----> 6 from autograd.misc.optimizers import adam
      7 from autograd import grad
      9 params10 = init_random_params(0.1, layer_sizes=[1, 3, 1])

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/__init__.py:2
      1 from __future__ import absolute_import
----> 2 from .differential_operators import (
      3     make_vjp, grad, multigrad_dict, elementwise_grad, value_and_grad,
      4     grad_and_aux, hessian_tensor_product, hessian_vector_product, hessian,
      5     jacobian, tensor_jacobian_product, vector_jacobian_product, grad_named,
      6     checkpoint, make_hvp, make_jvp, make_ggnvp, deriv, holomorphic_grad)
      7 from .builtins import isinstance, type, tuple, list, dict
      8 from autograd.core import primitive_with_deprecation_warnings as primitive

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/differential_operators.py:16
     13 from .core import make_vjp as _make_vjp, make_jvp as _make_jvp
     14 from .extend import primitive, defvjp_argnum, vspace
---> 16 import autograd.numpy as np
     18 make_vjp = unary_to_nary(_make_vjp)
     19 make_jvp = unary_to_nary(_make_jvp)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/__init__.py:5
      3 from . import numpy_boxes
      4 from . import numpy_vspaces
----> 5 from . import numpy_vjps
      6 from . import numpy_jvps
      7 from . import linalg

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/numpy/numpy_vjps.py:563
    561     return lambda g: unpermuter(g, sort_perm)
    562 defvjp(anp.sort, grad_sort)
--> 563 defvjp(anp.msort, grad_sort)  # Until multi-D is allowed, these are the same.
    565 def grad_partition(ans, x, kth, axis=-1, kind='introselect', order=None):
    566     #TODO: Cast input with np.asanyarray()
    567     if len(x.shape) > 1:

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/autograd/tracer.py:48, in primitive.<locals>.f_wrapped(*args, **kwargs)
     46     return new_box(ans, trace, node)
     47 else:
---> 48     return f_raw(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/numpy/__init__.py:410, in __getattr__(attr)
    407     import numpy.char as char
    408     return char.chararray
--> 410 raise AttributeError("module {!r} has no attribute "
    411                      "{!r}".format(__name__, attr))

AttributeError: module 'numpy' has no attribute 'msort'

As usual, we should check the fit on the train data. This is a little trickier than before, because the points are out of order.

Z2 = nn(params10, train_x.reshape([-1, 1]))
plt.plot(train_x, Z2, 'bo', label='NN')
plt.plot(train_x, train_y, 'r+', label='analytical')
plt.xlabel('x')
plt.ylabel('y')

plt.plot(test_x, nn(params10, test_x.reshape([-1, 1])), 'go', label='NN')
plt.plot(test_x, test_y, 'y*', label='analytical');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 1
----> 1 Z2 = nn(params10, train_x.reshape([-1, 1]))
      2 plt.plot(train_x, Z2, 'bo', label='NN')
      3 plt.plot(train_x, train_y, 'r+', label='analytical')

NameError: name 'nn' is not defined
rmse_train = np.mean((train_y - nn(params10, train_x.reshape([-1, 1]))**2))
rmse_test =  np.mean((test_y - nn(params10, test_x.reshape([-1, 1]))**2))

print(f'''RMSE train = {rmse_train:1.3f}
RMSE test = {rmse_test:1.3f}''')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[27], line 1
----> 1 rmse_train = np.mean((train_y - nn(params10, train_x.reshape([-1, 1]))**2))
      2 rmse_test =  np.mean((test_y - nn(params10, test_x.reshape([-1, 1]))**2))
      4 print(f'''RMSE train = {rmse_train:1.3f}
      5 RMSE test = {rmse_test:1.3f}''')

NameError: name 'np' is not defined

Here, the test RMSE is a little higher than the train data. This suggests a possible overfitting, but not by much. This may also be due to extrapolation errors because the first two test points are technically outside the training data. For the train/test split to be meaningful, it is important that the two datasets have similar distributions of values.

Summary#

Today we reviewed the role of activation functions in neural networks, and observed that it doesn’t generally matter what you use (but the details always matter in individual cases). The mathematical form of these activation functions determines how they will extrapolate, which can be important depending on your application.

We then explored how to efficiently split a dataset into a train and test set so that overfitting can be evaluated. This becomes increasingly important for when you plan to explore many models (choices of hyperparameters), and then you split the data three ways (train, test and validate).