Supervised Models: Neural networks

Jeremiah Green

17 Supervised Models: Neural networks

Learning Objectives

Explain neural network models
Explain neural network model adaptations
Explain common tuning parameters for neural network models
Use R packages for neural network models

Chapter content

Neural networks are another type of machine learning model that get their name from the synapses in the human brain. The basic idea of neural networks is to take `x` variables and predict a `y` variable but the `x` variables can be combined in many different ways to predict `y`. A neural network is defined by an input layer, the `x` variables, and output layer, the `y` variable, and hidden layers, the pieces that define how the `x` variables are combined to predict `y`. Each piece within the network is called a node.

Beginning with a single node in the hidden layer can build an understanding of how a neural network functions. A hidden layer node has two components, the weights applied to the input variables and the activation function. Applying weights to the input variables is a linear function.

$z = w_1x_1 + w_2x_2 + \ldots + w_nx_n$

Weights are unknown and estimated by the model. This looks a lot like a linear regression. The difference is that the weights are estimated as part of the full network. The combination of the weights and the input variables is then passed through an activation function. The activation function transforms the weighted inputs into a new variable. There are different activation functions, but they are all some form of non-linear transformation. A sigmoid function is a common activation function. The sigmoid function is:

$\sigma(z) = \frac{1}{1+e^{-z}}$

The sigmoid function is a function that takes any value and transforms it to a value between 0 and 1.

The same process is repeated for each node in the first hidden layer. The output from the first layer, the `z`s now become the input to the next hidden layer, and so forth until the last hidden layer. The number of layers is the depth of the neural network and is a choice that we have to determine. A deep neural network or deep learning has many hidden layers. The more hidden layers and the more nodes in each layer, the more complex the model. As with tree based models, the more complex the model, the more closely it can fit the data, but the higher the likelihood that it will overfit the data.

The output from the last hidden layer is then passed to the output layer. The output layer is a linear combination of the last hidden layer. The output layer is the prediction of the dependent variable. Fitting the model is done in a similar way to the other models and uses a measure of model fit, for example, the mean squared error. A computer then searches through weights to find the best fit of the model. Because there are a large number of weights that need to be estimated, the model is computationally intensive.

There are many resources for learning more about neural networks. For example, a description of neural networks is available here: https://www.freecodecamp.org/news/deep-learning-neural-networks-explained-in-plain-english/ and a short video introduction is available here: https://www.youtube.com/watch?v=jmmW0F0biz0.

The algorithm for estimating a neural network is as follows:

1. Supply features and propose a neural network structure. This includes the number of hidden layers, the number of nodes in each hidden layer, and the activation function.

2. Estimate the weights in the network by minimizing the error in the prediction of the dependent variable. The error is typically the mean squared error.

3. The model is then used to predict the dependent variable.

The simple description of the algorithm above belies the complexity of estimating a neural network. The weights are estimated by a process called backpropagation. Backpropagation is a method that adjusts the weights in the network to minimize the error in the prediction. (A description of backpropagation is available here: https://www.youtube.com/watch?v=Ilg3gGewQ5U).

There are also different types of neural networks. The different types have been developed for different tasks and to address different problems. The feedforward neural network was described above.

A recurrent neural network (RNN) does not require that the input to a node come from the previous layer. A node can receive input from any node in the network. RNNs have been highly successful with audio and text.

A convolutional neural network (CNN) is designed for image data and affects how the input it processed. It captures spacial relationships. Other types of networks have other properties. More details about neural networks are beyond the purposes of this textbook.

Neural network hyperparameters

Neural networks also have hyperparameters that require tuning. Some of the most common hyperparameters are described in the dropdown list below.

Example in R

h2o has a feedforward neural network algorithm that allows for many hidden layers (deep neural network). The algorithm is applied similar to other algorithms in h2o. Note that there is only one algorithm but because it has hyperparameters, there are many models that can be estimated with different hyperparameters.

The sample code below uses the data and dependent variables from the prior chapters.

The csv file for this chapter is available here: https://www.dropbox.com/scl/fi/jrrnvo9xeyud863q63cpr/dEarningsPred.csv?rlkey=nqv9in8ukx4xxhf9cr78h0wxp&dl=0. The data is from Compustat – annual financial statement information. The two variables we will be trying to model, i.e. predict, are `DIncr_ld1` and `Incr_ld1`. The first is a binary variable indicating for whether year t+1 earnings will be higher than year t earnings. The second is the percentage change in earnings from year to to year t+1 scaled by the company’s market value of equity at the end of fiscal year t. The other features are financial statement analysis variables from the papers linked above. The features have been winsorized and standardized each fiscal year.

Assuming that the data is imported into R as “df”, the following code prepares the h2o environment and creates a training data set and moves it to the h2o environment.

library(DALEX)

library(DALEXtra)

library(tidyverse)

library(parallel)

h2o.init(nthreads=8)

train <- df %>%
filter(fyear<2010)
test <- df %>%
filter(fyear>2010)
rm(df)

tmp <- train %>%

select(Incr_ld1, CurRat:NiCf)

trn.h2o <- as.h2o(tmp)

mdlres <- h2o.automl(y = "Incr_ld1",
training_frame = trn.h2o, include_algos = c("DeepLearning"), max_models = 25,
seed = 1)

lb<-h2o.get_leaderboard(mdlres, extra_columns = "ALL")
print(lb, n = nrow(lb))

bmdl <- h2o.get_best_model(mdlres)
bmdl

vi <- h2o.varimp(bmdl)
vi

h2o.shutdown()

As noted in the last chapter, the code could be extended for binary dependent variables, prediction outputs, and explainable AI.

Tutorial video

Conclusion

This chapter introduced neural networks and variations to neural network models. Neural network models are widely applied in machine learning and in particular for AI models. This chapter also introduced neural network hyperparameters and used h2o AutoML to implement neural network models.

17 Supervised Models: Neural networks

Chapter content

Neural network hyperparameters

Example in R

Tutorial video

Conclusion

Review

Mini-case video

References

License

Share This Book