Coding a Recurrent Neural Network from Scratch in R

This is an example of how to build a Recurrent Neural Network in R. What we will do is generate two sets of random numbers, a and b,
which we will sum up, the sum will be stored in the variable c. We then convert a and b into 8-bit binary values. E.g. 5 becomes: 0 0 0 0 0 1 0 1. We then sum the binary formatted a and b, bit by bit (i.e. column by column).

If we are summing to bits which are both of value: 1, then the result will be 0, with a 1 carried over to the next column (i.e. the column on to the left).

It is this step of carrying the 1, that requires a memory state. This is the recurrent component of the recurrent neural network, which is what sets it apart from a traditional feed-forward neural network, such as the one we built in the post Coding a Neural Network from Scratch in R.

We begin by setting the seed of the psuedo-random number generator, so that all ‘random’ values generated can be replicated by you on your own machine.

# Set the seed to obtain identical random values
set.seed(0)

We now define a sigmoid function, to be used as an activation function. In this case, we define the logistic function. We will simply call this function sigmoid.

# compute sigmoid nonlinearity
sigmoid = function(x)
	1 / (1+exp(-x))

Next, we definte a function that takes the output of the logistic sigmoid function (which we called sigmoid) and takes its derivative. We will call this second function sigmoid_out_to_derivate.

# convert output of sigmoid function to its derivative
sigmoid_output_to_derivative = function(output)
  output*(1-output)

Next, we define the length of the binary encoding to be 8 bits in binary_dim. From that, we compute the maximum value to be 255 (i.e. the range of possible values will be 0-255).

# training dataset generation
# int2binary =
binary_dim = 8

largest_number = 2^binary_dim

int2binary = function(x)
  tail(rev(as.integer(intToBits(x))), binary_dim)

We now define a series of hyperparameters that are needed for the training of the model.

# define the hyperparameters
alpha       =  0.1
input_dim   =  2
hidden_dim  = 16
output_dim  =  1

Next, we initialize the weights of the neural network using random values (which will be identical in replication due to the use of set.seed(0) at the beginning of this post).

# initialize neural network weights
synapse_0 = matrix(runif(n = input_dim*hidden_dim, min=-1, max=1), nrow=input_dim)
synapse_1 = matrix(runif(n = hidden_dim*output_dim, min=-1, max=1), nrow=hidden_dim)
synapse_h = matrix(runif(n = hidden_dim*hidden_dim, min=-1, max=1), nrow=hidden_dim)

synapse_0_update = matrix(0, nrow = input_dim, ncol = hidden_dim)
synapse_1_update = matrix(0, nrow = hidden_dim, ncol = output_dim)
synapse_h_update = matrix(0, nrow = hidden_dim, ncol = hidden_dim)

Now we run the training logic.

# training logic
for (j in 1:10000) {

  # generate a simple addition problem (a + b = c)
  a_int = sample(1:(largest_number/2), 1) # int version
  a = int2binary(a_int) # binary encoding

  b_int = sample(1:(largest_number/2), 1) # int version
  b = int2binary(b_int)

  # true answer
  c_int = a_int + b_int
  c = int2binary(c_int)

  # where we'll store our best guesss (binary encoded)
  d = matrix(0, nrow = 1, ncol = binary_dim)

  overallError = 0

  layer_2_deltas = matrix(0)
  layer_1_values = matrix(0, nrow=1, ncol = hidden_dim)
  # layer_1_values = rbind(layer_1_values, matrix(0, nrow=1, ncol=hidden_dim))

  # moving along the positions in the binary encoding
  for (position in 0:(binary_dim-1)) {

    # generate input and output
    X = cbind(a[binary_dim - position],b[binary_dim - position])
    y = c[binary_dim - position]

    # hidden layer (input ~+ prev_hidden)
    layer_1 = sigmoid((X%*%synapse_0) + (layer_1_values[dim(layer_1_values)[1],] %*% synapse_h))

    # output layer (new binary representation)
    layer_2 = sigmoid(layer_1 %*% synapse_1)

    # did we miss?... if so, by how much?
    layer_2_error = y - layer_2
    layer_2_deltas = rbind(layer_2_deltas, layer_2_error * sigmoid_output_to_derivative(layer_2))
    overallError = overallError + abs(layer_2_error)

    # decode estimate so we can print it out
    d[binary_dim - position] = round(layer_2)

    # store hidden layer so we can print it out
    layer_1_values = rbind(layer_1_values, layer_1)                                                  }

  future_layer_1_delta = matrix(0, nrow = 1, ncol = hidden_dim)

  for (position in 0:(binary_dim-1)) {

    X = cbind(a[position+1], b[position+1])
    layer_1 = layer_1_values[dim(layer_1_values)[1]-position,]
    prev_layer_1 = layer_1_values[dim(layer_1_values)[1]-(position+1),]

    # error at output layer
    layer_2_delta = layer_2_deltas[dim(layer_2_deltas)[1]-position,]
    # error at hidden layer
    layer_1_delta = (future_layer_1_delta %*% t(synapse_h) + layer_2_delta %*% t(synapse_1)) * sigmoid_output_to_derivative(layer_1)

    # let's update all our weights so we can try again
    synapse_1_update = synapse_1_update + matrix(layer_1) %*% layer_2_delta
    synapse_h_update = synapse_h_update + matrix(prev_layer_1) %*% layer_1_delta
    synapse_0_update = synapse_0_update + t(X) %*% layer_1_delta

    future_layer_1_delta = layer_1_delta                             }


  synapse_0 = synapse_0 + ( synapse_0_update * alpha )
  synapse_1 = synapse_1 + ( synapse_1_update * alpha )
  synapse_h = synapse_h + ( synapse_h_update * alpha )

  synapse_0_update = synapse_0_update * 0
  synapse_1_update = synapse_1_update * 0
  synapse_h_update = synapse_h_update * 0

  # print out progress
  if(j %% 1000 ==0) {
    print(paste("Error:", overallError))
    print(paste("Pred:", paste(d, collapse = " ")))
    print(paste("True:", paste(c, collapse = " ")))
    out = 0
    for (x in 1:length(d)) {
      out[x] = rev(d)[x]*2^(x-1) }
    print(paste(a_int, "+", b_int, "=", sum(out)))
    print("----------------")                     }             }

This gives the following output.

## [1] "Error: 3.81649822582355"
## [1] "Pred: 1 1 1 1 1 1 1 1"
## [1] "True: 1 1 0 1 1 0 0 1"
## [1] "101 + 116 = 255"
## [1] "----------------"
## [1] "Error: 4.14830099040879"
## [1] "Pred: 0 0 0 1 1 0 0 0"
## [1] "True: 0 1 1 0 0 0 1 1"
## [1] "71 + 28 = 24"
## [1] "----------------"
## [1] "Error: 3.66757955972057"
## [1] "Pred: 1 0 1 0 1 1 0 0"
## [1] "True: 1 1 0 0 1 1 0 0"
## [1] "86 + 118 = 172"
## [1] "----------------"
## [1] "Error: 4.17065609215719"
## [1] "Pred: 0 0 1 1 1 0 1 1"
## [1] "True: 0 1 0 0 0 0 1 0"
## [1] "53 + 13 = 59"
## [1] "----------------"
## [1] "Error: 3.0150689961351"
## [1] "Pred: 1 1 0 1 0 1 0 1"
## [1] "True: 1 1 0 1 0 1 0 1"
## [1] "114 + 99 = 213"
## [1] "----------------"
## [1] "Error: 1.57097151104127"
## [1] "Pred: 1 0 1 1 1 0 1 0"
## [1] "True: 1 0 1 1 1 0 1 0"
## [1] "61 + 125 = 186"
## [1] "----------------"
## [1] "Error: 1.19476602300567"
## [1] "Pred: 1 0 0 1 0 0 1 0"
## [1] "True: 1 0 0 1 0 0 1 0"
## [1] "35 + 111 = 146"
## [1] "----------------"
## [1] "Error: 0.788522354842796"
## [1] "Pred: 0 0 1 1 1 1 1 1"
## [1] "True: 0 0 1 1 1 1 1 1"
## [1] "40 + 23 = 63"
## [1] "----------------"
## [1] "Error: 0.369834653195442"
## [1] "Pred: 1 0 1 0 0 1 1 0"
## [1] "True: 1 0 1 0 0 1 1 0"
## [1] "66 + 100 = 166"
## [1] "----------------"
## [1] "Error: 0.315997944714675"
## [1] "Pred: 0 1 1 0 0 1 0 1"
## [1] "True: 0 1 1 0 0 1 0 1"
## [1] "37 + 64 = 101"
## [1] "----------------"

Bastiaan Quast

Coding a Recurrent Neural Network from Scratch in R

You may also enjoy

Model vs. Algorithm

Stanford Palo Alto

Using Days of the Week to Understand Modulo

BGV in R