Learn a multi-layer perceptron with Julien
Table of Contents
- 1. Introduction
- 2. First example: the SUM function!
- 3. Second example: the absolute difference
- 4. Library API
- 4.1.
struct julien
- 4.2.
struct julien *
julien_alloc
(size_t
dim input,size_t
dim output,size_t
n hidden,const size_t *restrict
dim hidden) - 4.3.
void
julien_free
(struct julien *
julien) - 4.4.
const double *
julien_data
(const struct julien *
julien,size_t *
n parameters) - 4.5.
void
julien_set_data
(struct julien *
julien,size_t
start,size_t
n,const double *restrict
parameters) - 4.6.
double
julien_learn
(struct julien *
julien,size_t
dim input,const double *restrict
input,size_t
dim output,const double *restrict
output,double
learning rate) - 4.7.
size_t
julien_predict
(struct julien *
julien,size_t
dim input,const double *restrict
input,size_t
max output,size_t
start output,double *restrict
output)
- 4.1.
1 Introduction
A multi-layer perceptron is a machine learning algorithm that is able to learn some useful functions. In this package, the outputs are real-valued.
2 First example: the SUM function!
Let us take an example! Suppose that we want to learn the sum function. The function has two values, that can be either 0 or 1, and returns the sum of the two variables. So the dataset is in table 1.
Input x | Input y | Output |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 2 |
So, how do we learn this? Let us write our first julien-enabled program! Write code 1 to a file, main_sum.c.
1: /* Ensure that julien.h is in the current directory */ 2: #include "julien.h" 3: #include <stdio.h> 4: #include <stdlib.h> 5: #include <assert.h> 6: 7: /* Print the model internals as a CSV row */ 8: static void dump_model (const struct julien *jl); 9: 10: int 11: main () 12: { 13: /* We have two inputs, and one output. We do not use the other 14: * arguments for now. */ 15: struct julien *jl = julien_alloc (2, 1, 0, NULL); // 16: /* Keep track of how much we updated our model */ 17: double update; 18: /* Keep learning! */ 19: do 20: { 21: double input_data[2]; 22: update = 0; 23: /* Iterate over the dataset */ 24: for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++) 25: { 26: for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++) 27: { 28: /* We want to learn to produce the sum! */ 29: double expected_output = input_data[0] + input_data[1]; 30: /* Call the learning function (see the commentary) */ 31: update += julien_learn (jl, 2, input_data, 1, &expected_output, 1); // 32: } 33: } 34: /* Visualize our progress! */ 35: fprintf (stderr, "%s:%d: mean update is %g\n", 36: __FILE__, __LINE__, update / 4); 37: } 38: while (update > 1e-8); 39: dump_model (jl); 40: /* Each alloc() comes with its associated free() */ 41: julien_free (jl); // 42: return 0; 43: } 44: 45: static void 46: dump_model (const struct julien *jl) 47: { 48: size_t n_parameters; 49: /* This function gets the internal parameters of the perceptron */ 50: const double *data = julien_data (jl, &n_parameters); // 51: size_t i; 52: for (i = 0; i < n_parameters; i++) 53: { 54: if (i != 0) 55: { 56: printf (","); 57: } 58: printf ("%f", data[i]); 59: } 60: printf ("\n"); 61: }
This example demonstrates all there is to know to start learning a
model. We first call the allocation function, passing the number of
inputs and the number of outputs, in line 15. Then, we
repeatedly loop over our dataset, and learn from it in line
31. The function expects us to specify the number of
inputs and outputs, so that it will not overflow our input_data
array or our expected_output
pointer. The function returns the mean
amount of change for our model. With time, this value is decreasing.
The last parameter in line 31, which is now set to 1, is called the learning rate. When it is close to 0, then the model will learn very little at each step, so the number of steps required increases. When it is too high, the model will over-react to any data point and the number of steps increases, or the model can even refuse to converge. Unfortunately, its value can only be set up through trial and errors. Start with a relative low value, and increase it while the number of iteration decreases.
In line 41, we do not forget to discard our perceptron to
avoid memory leaks. For each call to julien_alloc
, there should be
a call to julien_free
.
Finally, once we have trained our model, we call the julien_data
function to retreive the learnt function. This is done in line
50. For this example, the sole output is simply
computed as a weighted sum of all the inputs. The weights are the two
values in the data.
To compile, run the following command in the same directory as both julien.h and julien.c:
gcc -o learn-sum main_sum.c julien.c
When we run this program, we get:
./learn-sum
1.000000,1.000000
This means that the output is exactly one time the first input, plus one time the second input. We successfully learnt how to sum two numbers!
3 Second example: the absolute difference
Suppose now that we want to learn how to compute the absolute difference between both inputs. Our dataset is now in table 2.
Input x | Input y | Output |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
We will follow the same procedure as for the previous example. This gives us the program 2, main_absdiff.c. This is exactly the same code as program 1, except that we decided to change what we want to learn.
1: #include "julien.h" 2: #include <stdio.h> 3: #include <stdlib.h> 4: #include <assert.h> 5: 6: static void dump_model (const struct julien *jl); 7: 8: int 9: main () 10: { 11: struct julien *jl = julien_alloc (2, 1, 0, NULL); 12: double update; 13: do 14: { 15: double input_data[2]; 16: update = 0; 17: for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++) 18: { 19: for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++) 20: { 21: /* Now, we want to compute the absolute difference between inputs. */ 22: double expected_output = input_data[0] - input_data[1]; 23: if (expected_output < 0) 24: { 25: expected_output = -expected_output; 26: } 27: update += julien_learn (jl, 2, input_data, 1, &expected_output, 1); 28: } 29: } 30: fprintf (stderr, "%s:%d: mean update is %g\n", 31: __FILE__, __LINE__, update / 4); 32: } 33: while (update > 1e-8); 34: dump_model (jl); 35: julien_free (jl); 36: return 0; 37: } 38: 39: static void 40: dump_model (const struct julien *jl) 41: { 42: size_t n_parameters; 43: const double *data = julien_data (jl, &n_parameters); 44: size_t i; 45: for (i = 0; i < n_parameters; i++) 46: { 47: if (i != 0) 48: { 49: printf (","); 50: } 51: printf ("%f", data[i]); 52: } 53: printf ("\n"); 54: }
So, let us compile it and run it:
gcc -o learn-absdiff main_absdiff.c julien.c || exit 1 timeout 3 ./learn-absdiff || echo "Failed :("
Failed :(
What happened? We did not get a response in 3 seconds! Changing the learning rate does not help, you can check for yourself.
In the previous example, we learnt the sum function, and we ended up learning the following behavior: multiply the first input by 1, the second by 1, and add the two results. This can be summed up in the diagram figure 1.
Figure 1: How we learnt to add two numbers
When learning the absdiff function, we are looking for values to replace the 1.0 weights. Unfortunately, it is not possible to find such values. Suppose that we had w1 and w2 such that for all and such that both are either 0 or 1, .
Let and be two distinct values (one is 0, the other is 1).
Then, by the property of the abs
function,
Since still satisfies the input requirements (two values being either 0 or 1), we can also write:
So by combining both last equations, we get:
Factoring the 's and 's, we get:
Since we explicitely chose two distinct values, then
Let us call this value .
Now, what happens if we look at ?
So, . So we do not actually learn our absdiff function; we only learn to tell the zero value.
By this demonstration, we cannot learn the absdiff function with the same kind of models as what we did for the sum. Fortunately, we can reuse a part of what we have. If we were to write the absolute difference between and as a sum of two terms, what would it be?
There are two cases: either , in which case we could sum and , or , in which case we could sum 0 and . Let us define for all as if and 0 otherwise. Then, the function would be:
which has the nice property of being symmetric. If we wanted to represent this operation as a graph, we would get the diagram figure 2.
Figure 2: How we should compute the absolute difference
The yellow elements are called hidden neurons. Fortunately, julien already lets you use this. So let us revise our code (program 3) to add this hidden layer of two neurons. We also need to reduce the learning rate.
1: #include "julien.h" 2: #include <stdio.h> 3: #include <stdlib.h> 4: #include <assert.h> 5: 6: static void dump_model (const struct julien *jl); 7: 8: int 9: main () 10: { 11: size_t n_hidden_layers = 1; 12: size_t hidden_sizes[1] = { 2 }; 13: struct julien *jl = julien_alloc (2, 1, n_hidden_layers, hidden_sizes); // 14: double update; 15: do 16: { 17: double input_data[2]; 18: update = 0; 19: for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++) 20: { 21: for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++) 22: { 23: /* Now, we want to compute the absolute difference between inputs. */ 24: double expected_output = input_data[0] - input_data[1]; 25: if (expected_output < 0) 26: { 27: expected_output = -expected_output; 28: } 29: update += julien_learn (jl, 2, input_data, 1, &expected_output, 0.5); 30: } 31: } 32: fprintf (stderr, "%s:%d: mean update is %g\n", 33: __FILE__, __LINE__, update / 4); 34: } 35: while (update > 1e-8); 36: dump_model (jl); 37: julien_free (jl); 38: return 0; 39: } 40: 41: static void 42: dump_model (const struct julien *jl) 43: { 44: size_t n_parameters; 45: const double *data = julien_data (jl, &n_parameters); 46: size_t i; 47: for (i = 0; i < n_parameters; i++) 48: { 49: if (i != 0) 50: { 51: printf (","); 52: } 53: printf ("%f", data[i]); 54: } 55: printf ("\n"); 56: }
In line 13, we use the last two parameters of the allocation function to reserve some space for one hidden layer of 2 neurons. Otherwise, nothing changed.
gcc -o learn-absdiff main_absdiff_fixed.c julien.c || exit 1
./learn-absdiff
-0.879890,0.878679,0.878679,-0.879890,1.138072,1.138072
This is rather surprising; we expected to have the absolute value of all weights set to 1. Is it correct though? Let us check with an adequate ad-hoc program. Write program 4 as predict_absdiff.c.
1: #include "julien.h" 2: #include <stdio.h> 3: #include <stdlib.h> 4: #include <assert.h> 5: 6: int 7: main () 8: { 9: size_t n_hidden_layers = 1; 10: size_t hidden_sizes[1] = { 2 }; 11: double data[] = 12: { 13: -0.879890,0.878679,0.878679,-0.879890,1.138072,1.138072 14: 15: }; 16: size_t n_data = sizeof (data) / sizeof (data[0]); 17: struct julien *jl = julien_alloc (2, 1, n_hidden_layers, hidden_sizes); 18: double input_data[2]; 19: julien_set_data (jl, 0, n_data, data); // 20: for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++) 21: { 22: for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++) 23: { 24: double expected_output = input_data[0] - input_data[1]; 25: double actual_output; 26: size_t n_outputs; 27: if (expected_output < 0) 28: { 29: expected_output = -expected_output; 30: } 31: n_outputs = julien_predict (jl, 2, input_data, 1, 0, &actual_output); // 32: assert (n_outputs == 1); 33: printf ("| %.1f | %.1f | %.1f | %.1f | %g |\n", 34: input_data[0], input_data[1], actual_output, expected_output, 35: expected_output - actual_output); 36: } 37: } 38: julien_free (jl); 39: return 0; 40: }
Since we know the data of the perceptron, we can skip the training
phase and directly use it, by calling julien_set_data
(in line
19). This function takes four arguments: the
perceptron, the amount of data that we will leave untouched (0, since
we set all the data at once), the amount of data that we set (all of
it), and an array of the appropriate size.
In line 31, we call the last useful function of the library, to make a prediction. This function takes the perceptron, the number of inputs (the remaining ones would be padded with 0), the maximum number of outputs we care about, the number of outputs after the first one that we care about, and where to store them. The function returns the total number of outputs.
We get the results in table 3. We can check that the function learnt is actually correct. Given that we only kept 7 digits for the perceptron data.
x1 | x2 | output | expected | error |
---|---|---|---|---|
0.0 | 0.0 | 0.0 | 0.0 | 0 |
0.0 | 1.0 | 1.0 | 1.0 | 3.3112e-08 |
1.0 | 0.0 | 1.0 | 1.0 | 3.3112e-08 |
1.0 | 1.0 | 0.0 | 0.0 | 0 |
4 Library API
4.1 struct julien
This opaque structure holds everything required to predict and learn a
perceptron. It is allocated by julien_alloc
, and discarded with
julien_free
. The perceptron data are initialized with a very weak
pseudo-random number generator; maybe you will want to initialize it
yourself with julien_set_data
.
4.2 struct julien *
julien_alloc
(size_t
dim input, size_t
dim output, size_t
n hidden, const size_t *restrict
dim hidden)
Allocate and initialize a new perceptron with dim input inputs,
dim output outputs, and n hidden layers (with respective
dimensions dim hidden). The returned object must be freed with
julien_free
.
4.3 void
julien_free
(struct julien *
julien)
Free the resources used by julien, as allocated by julien_alloc
.
4.4 const double *
julien_data
(const struct julien *
julien, size_t *
n parameters)
Return a pointer to the internal data of julien, and set
n parameters (if it is not NULL
) to the size of the return value.
4.5 void
julien_set_data
(struct julien *
julien, size_t
start, size_t
n, const double *restrict
parameters)
Skip the start first parameters in julien, then initialize the following n parameters.
4.6 double
julien_learn
(struct julien *
julien, size_t
dim input, const double *restrict
input, size_t
dim output, const double *restrict
output, double
learning rate)
Teach julien to learn the output array (of dimension dim output) when seing the input array (of dimension dim input). Learning is more efficient as learning rate grows, but if it is too high then there is a risk julien would forget too fast to learn anything.
Both dim input and dim output need not be of the expected
dimension (as given at allocation time to julien_alloc
). If the
dimensions are too high, then only the first elements are considered.
If the arrays are too small, then the remaining elements will be set
to 0.
Return the mean update to the internal weights. If this value is close to 0, then julien does not learn anymore and you can stop it.
4.7 size_t
julien_predict
(struct julien *
julien, size_t
dim input, const double *restrict
input, size_t
max output, size_t
start output, double *restrict
output)
Use julien to predict the output given input. More specifically, discard the first start output elements, then store up to the following max output elements into output, and return the total number of elements.