Hyperparameter tuning of Neural Network is the process of finding the optimum parameters for the model which can help the model produce optimum results. When building neural networks, we don’t know how many hidden layers, the number of nodes, epoch value, and which optimization function will work best for the model based on the given dataset. To get optimum parameters in such cases, we use hyperparameter tuning of the model. In this article, we will discuss why hyperparameter tuning is important, how to do hyperparameter tuning of neural networks, and will implement various ways to find optimum parameters for neural networks.
Check hyperparameter tuning of linear regression and KNN model.
What is Hyperparameter Tuning of Neural Networks?
Before going to the hyperparameter tuning of neural networks, let us first understand the hyperparameters. You might have seen in the full architecture of the Neural network, that while building a neural network, we specify different layers ( input layer, output layer, and hidden layers), then we also specify the number of nodes, the loss function, the number of an epoch, learning rate, etc. All these are known as hyperparameters of the model. The performance of the model is highly dependent on these hyperparameters and if we somehow find the optimum values for these hyperparameters, we will be able to build the optimum model.
So, hyperparameter tuning is the process of finding the optimum values for the hyperparameters of the model.
What Does Hyperparameter Tuning of Neural Network Do?
Hyperparameter tuning takes a range of various parameter values start training and testing the model on various values and returns the best possible combinations. For example, if we specify the range of epochs to test from 50-500, then the hyperparameter tuning will test for each value and returns the possible optimum value for the epochs.
How Does Hyperparameter Tuning of Neural Network Work?
Some various ways and algorithms work in different ways to find the optimum hyperparameters which we will discuss later in the upcoming sections. But here, we will generalize the working of hyperparameter tuning.
One thing to make clear is that hyperparameter tuning is not like magic that you just applied and it returns optimum parameter values. While hyperparameter tuning, we will have to provide the values for the parameters but for tuning, we specify ranges of values and the tuning process used all those values ( depending on the tuning process that we specify) and return the best possible combinations of the values of the hyperparameters.
How to Avoid Overfitting While Doing Hyperparameter Tuning of Neural Network?
Again there are various ways to make sure that our model is not being overfitted during the process of hyperparameter tuning. Usually, it depends on the model or the algorithm that we are using for training purposes. For example, in the case of LightGBM, we can specify the feature_selection, add regularization, fix the number of iterations, etc. In general, we can specify an optimum accuracy (e.g. 85%) and whenever the model achieves this accuracy, we will stop further parameter tuning.
One of the important features of hyperparameter tuning is EarlyStopping. It means the tuning process can be stopped at any time when the specified optimum results/accuracy is achieved. We will learn about this in the upcoming section.
Important Parameters to Know About Neural Network
Now, we will discuss some of the important parameters that play a huge role in the model’s predictions, and later we will use hyperparameter tuning to find the optimum values as well.
The very first important thing to consider is knowing how many hidden layers to be built in the neural network and how many nodes should each layer contains. The next important.
The next important parameter is the learning rate. The learning rate is simply adjusting the weights of our network with respect to the loss gradient. The lower the value, the slower we travel along the downward slope.
Other important parameters are optimization and activation functions. Many times, we don’t know which optimization or activation function will perform better on our dataset, so we will use parameter tuning to find optimization and activation functions.
Epoch is the passing whole dataset once to the model to learn the important trends. Finding an optimum epoch value is always challenging for Machine learning developers but nearly optimum results can be achieved by parameter tuning.
Hyperparameter Tuning of Neural Network
Now we will practically implement hyperparameter tuning and find out the optimum parameters for the model. You can get access to the source code from my GitHub account.
We will use the following different ways of hyperparameter tuning to tune the neural networks.
- Bayesian Optimizer
- RandomGridSearch
- GridSearchCV
- Hyperband
In this article, we will use the Keras tuner to tune the neural network which comes with Random search Bayesian and hyerband algorithms.
We will also use a sample dataset of MINIST fashion. We will recommend going through Classification using a neural network to see how we can create a Deep-learning model for CIFAR images.
Let us first explore the dataset and then will apply the Keras tuning method to find the optimum parameters.
# importing the tensorflow module
import tensorflow as tf
# importing the training and testing dataset
(x_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
# Normalize pixels to values between 0 and 1
x_train = x_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
We will not into the details of exploring the dataset and building the neural network, because we assume that you already have enough knowledge about building neural networks and training the model.
Building a Baseline Deep Neural Model
We will not build a simple neural network model with random values for the parameters and later will learn how parameter tuning is time-consuming even for a small and simple model. Let us initialize the model and create neural networks.
# Building model
model = tf.keras.Sequential()
# flattening the layer
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
# defining hidden layer with 200 nodes
model.add(tf.keras.layers.Dense(units=200, activation='relu', name='dense_1'))
# adding a drop out in the hidden layer
model.add(tf.keras.layers.Dropout(0.2))
# adding output laers
model.add(tf.keras.layers.Dense(10, activation='softmax'))
As you can see, we have only one hidden layer with 200 units and the dropout percentage is 0.2. These are randomly selected to build a baseline model. Later, we will learn how to find the optimum values for these parameters.
Let us now compile the model by specifying the loss function and optimizer.
# compiling the model with learning rate is 0.001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.001),
# loss function is categorical cross entropy
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
Now, our model is ready and we will use the training dataset to train the model. We will use 10 epochs for the training purpose and will also use the EarlyStop function to stop training if there is no increase in the accuracy after 4 epochs.
# Early stopping set after 4 epochs
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=4)
# Training the model using 10 epochs
model.fit(x_train, y_train, epochs=10, validation_split=0.2, callbacks=[stop_early], verbose=2)
Once, the training is complete, we can build a model to use the testing data to make predictions and return the accuracy score.
# importing the pandas module
import pandas as pd
# defining the function
def evaluate_model(model, X_test, y_test):
# Evauting the model using teh testing dataset
eval_dict = model.evaluate(X_test, y_test, return_dict=True)
# converting the evaluation ( accuracy and loss) to pandas dataframe
result = pd.DataFrame([eval_dict.values()], columns=[list(eval_dict.keys())])
# returining dataframe
return result
Let us now call this function and find the accuracy and loss.
# calling the funtion
results = evaluate_model(model, X_test, y_test)
# Display results
results.head()
Output:
loss accuracy
0 0.34886 0.8783
This is the accuracy score for the randomly selected learning rate, hidden layers, nodes, and epoch values. Manually finding the optimum values for these parameters is time-consuming and is not feasible as well. So, we will use the Keras tuner to find the optimum values for the parameters.
Basically, there are four main steps to finding the best parameters using the Keras runner.
- Defining the model
- Specifying the parameters to be tunned
- specifying the search space
- Specifying the algorithm for tuning
Let us now follow these four steps for the hyperparameter tuning.
Defining Model and Search Space for Hyperparameter Tuning of Neural Networks
The model that we will define for the hyperparameter tuning is known as a hypermodel. Let us now create a function that will define the model and create search space.
# creating function
def build_model(hyper_parameter):
# building the model
model = tf.keras.Sequential()
# flattenting the images
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
# tunning hidden layers (1-3), units ( 50 -300),
for i in range(1, hyper_parameter.Int("num_layers", 2, 4)):
model.add(
tf.keras.layers.Dense(
units=hyper_parameter.Int("units_" + str(i), min_value=50, max_value=300, step=30),
activation="relu")
)
# drop out layer with values form 0-0.3
model.add(tf.keras.layers.Dropout(hyper_parameter.Float("dropout_" + str(i), 0, 0.3, step=0.1)))
# Add output layer.
model.add(tf.keras.layers.Dense(units=10, activation="softmax"))
# Tune learning from 0.01, 0.001
hyper_parameter_learning_rate = hyper_parameter.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])
# Define optimizer, loss, and metrics
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hyper_parameter_learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=["accuracy"])
return model
As you can see, we have specified a range of values for each of the parameters. The next step is to specify the search algorithm to find the best values from the range of parameters’ values.
Defining the Search Algorithm for the Hyperparameter Tuning of Neural Networks
There are many search algorithms available in the Keras tuner, for example, Random search, Bayesian Optimizer, and Hyperband. In this section, we will be using Hyperband to find the optimum parameters.
The concept behind Hyperband is straightforward; Hyperband selects a large number of models with random hyperparameter permutations from the search space at random using a bracket. Only the top-performing half of each model advances to the following round once it has been trained for a few epochs.
Let us now define the search space.
# importing the karas tuner
import kerastuner as kt
# Instantiate the tuner
tuner = kt.Hyperband(build_model,
objective="val_accuracy",
max_epochs=20,
factor=3,
hyperband_iterations=10,
directory="kt_dir",
project_name="kt_hyperband",)
As you can see, the hyperband search space takes some parameters ( as defined above). You can also find the summary of the tuner by running the following commands.
# summary of tuner
tuner.search_space_summary()
Output:
Search space summary
Default search space size: 4
num_layers (Int)
{'default': None, 'conditions': [], 'min_value': 2, 'max_value': 4, 'step': 1, 'sampling': 'linear'}
units_1 (Int)
{'default': None, 'conditions': [], 'min_value': 50, 'max_value': 300, 'step': 30, 'sampling': 'linear'}
dropout_1 (Float)
{'default': 0.0, 'conditions': [], 'min_value': 0.0, 'max_value': 0.3, 'step': 0.1, 'sampling': 'linear'}
learning_rate (Choice)
{'default': 0.01, 'conditions': [], 'values': [0.01, 0.001, 0.0001], 'ordered': True}
Let us first initialize the EarlyStop function
# early stop function for hyperparameter tuning of neural network
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)
Now, we are ready to start the search and find out the optimum parameters.
# Tunner search
tuner.search(x_train, y_train, epochs=20, validation_split=0.2, callbacks=[stop_early], verbose=2)
This search will take some time, so keep patience and wait for the results. In my case, it took nearly 3 hours and 30 minutes.
Once, the search is complete, we can then train the model on the optimum values for the parameter.
# Get the optimum values for parameters
best_hps=tuner.get_best_hyperparameters()[0]
# Build model on optimum paramters
optimum_model = tuner.hypermodel.build(best_hps)
# Train the hypertuned model
optimum_model.fit(x_train, y_train, epochs=10, validation_split=0.2, callbacks=[stop_early], verbose=2)
Now, we can evaluate the model by calling the evaluation function, that we had already created.
# get te optimum models results
hyper_df = evaluate_model(optimum_model, X_test, y_test)
# adding result to the dataframe already created
results.append(hyper_df)
# printing the results
results
You get a little increase in the accuracy of the model.
Now, we will check for the optimum epoch value for our neural network.
# creating model with optimimum parameters
optimum_model = tuner.hypermodel.build(best_hps)
# history of the model
history = optimum_model.fit(x_train, y_train, epochs=200, validation_split=0.2)
# fining the optimum epochs
val_acc_per_epoch = history.history['val_accuracy']
# printing the best epoch value
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch value is: ' ,best_epoch)
Output:
Best epoch value is: 70
It will also take some time, so be patient. In our case, we get 70 for the epoch value. So, let us now train the model using the value.
# Train the hypertuned model with 70 epoch value
optimum_model.fit(x_train, y_train, epochs=70, validation_split=0.2, callbacks=[stop_early], verbose=2)
Once the training is complete, we will go through the evaluation part and call the function.
# get te optimum models results
hyper_df = evaluate_model(optimum_model, X_test, y_test)
# printing the accuracy
hyper_df
As you can see, this time we get an optimum accuracy score.
Summary
Hyperparameter tuning of neural networks is the process of finding optimum values for the given parameters of the model from the specified range. Sometimes, this process is time-consuming as well but it is always better than going to find the optimum values manually. In this article, we discussed how we can use hyperparameter tuning to find the optimum parameter values for a neural network using the Keras tuner.