Learn a multi-layer perceptron with Julien

Table of Contents

1 Introduction

A multi-layer perceptron is a machine learning algorithm that is able to learn some useful functions. In this package, the outputs are real-valued.

2 First example: the SUM function!

Let us take an example! Suppose that we want to learn the sum function. The function has two values, that can be either 0 or 1, and returns the sum of the two variables. So the dataset is in table 1.

Table 1: The "SUM" function
Input x Input y Output
0 0 0
0 1 1
1 0 1
1 1 2

So, how do we learn this? Let us write our first julien-enabled program! Write code 1 to a file, main_sum.c.

 1: /* Ensure that julien.h is in the current directory */
 2: #include "julien.h"
 3: #include <stdio.h>
 4: #include <stdlib.h>
 5: #include <assert.h>
 6: 
 7: /* Print the model internals as a CSV row */
 8: static void dump_model (const struct julien *jl);
 9: 
10: int
11: main ()
12: {
13:   /* We have two inputs, and one output.  We do not use the other
14:    * arguments for now. */
15:   struct julien *jl = julien_alloc (2, 1, 0, NULL); //
16:   /* Keep track of how much we updated our model */
17:   double update;
18:   /* Keep learning! */
19:   do
20:     {
21:       double input_data[2];
22:       update = 0;
23:       /* Iterate over the dataset */
24:       for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++)
25:         {
26:           for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++)
27:             {
28:               /* We want to learn to produce the sum! */
29:               double expected_output = input_data[0] + input_data[1];
30:               /* Call the learning function (see the commentary) */
31:               update += julien_learn (jl, 2, input_data, 1, &expected_output, 1); //
32:             }
33:         }
34:       /* Visualize our progress! */
35:       fprintf (stderr, "%s:%d: mean update is %g\n",
36:                __FILE__, __LINE__, update / 4);
37:     }
38:   while (update > 1e-8);
39:   dump_model (jl);
40:   /* Each alloc() comes with its associated free() */
41:   julien_free (jl); //
42:   return 0;
43: }
44: 
45: static void
46: dump_model (const struct julien *jl)
47: {
48:   size_t n_parameters;
49:   /* This function gets the internal parameters of the perceptron */
50:   const double *data = julien_data (jl, &n_parameters); //
51:   size_t i;
52:   for (i = 0; i < n_parameters; i++)
53:     {
54:       if (i != 0)
55:         {
56:           printf (",");
57:         }
58:       printf ("%f", data[i]);
59:     }
60:   printf ("\n");
61: }

This example demonstrates all there is to know to start learning a model. We first call the allocation function, passing the number of inputs and the number of outputs, in line 15. Then, we repeatedly loop over our dataset, and learn from it in line 31. The function expects us to specify the number of inputs and outputs, so that it will not overflow our input_data array or our expected_output pointer. The function returns the mean amount of change for our model. With time, this value is decreasing.

The last parameter in line 31, which is now set to 1, is called the learning rate. When it is close to 0, then the model will learn very little at each step, so the number of steps required increases. When it is too high, the model will over-react to any data point and the number of steps increases, or the model can even refuse to converge. Unfortunately, its value can only be set up through trial and errors. Start with a relative low value, and increase it while the number of iteration decreases.

In line 41, we do not forget to discard our perceptron to avoid memory leaks. For each call to julien_alloc, there should be a call to julien_free.

Finally, once we have trained our model, we call the julien_data function to retreive the learnt function. This is done in line 50. For this example, the sole output is simply computed as a weighted sum of all the inputs. The weights are the two values in the data.

To compile, run the following command in the same directory as both julien.h and julien.c:

gcc -o learn-sum main_sum.c julien.c

When we run this program, we get:

./learn-sum
1.000000,1.000000

This means that the output is exactly one time the first input, plus one time the second input. We successfully learnt how to sum two numbers!

3 Second example: the absolute difference

Suppose now that we want to learn how to compute the absolute difference between both inputs. Our dataset is now in table 2.

Table 2: The "ABSDIFF" function
Input x Input y Output
0 0 0
0 1 1
1 0 1
1 1 0

We will follow the same procedure as for the previous example. This gives us the program 2, main_absdiff.c. This is exactly the same code as program 1, except that we decided to change what we want to learn.

 1: #include "julien.h"
 2: #include <stdio.h>
 3: #include <stdlib.h>
 4: #include <assert.h>
 5: 
 6: static void dump_model (const struct julien *jl);
 7: 
 8: int
 9: main ()
10: {
11:   struct julien *jl = julien_alloc (2, 1, 0, NULL);
12:   double update;
13:   do
14:     {
15:       double input_data[2];
16:       update = 0;
17:       for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++)
18:         {
19:           for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++)
20:             {
21:               /* Now, we want to compute the absolute difference between inputs. */
22:               double expected_output = input_data[0] - input_data[1];
23:               if (expected_output < 0)
24:                 {
25:                   expected_output = -expected_output;
26:                 }
27:               update += julien_learn (jl, 2, input_data, 1, &expected_output, 1);
28:             }
29:         }
30:       fprintf (stderr, "%s:%d: mean update is %g\n",
31:                __FILE__, __LINE__, update / 4);
32:     }
33:   while (update > 1e-8);
34:   dump_model (jl);
35:   julien_free (jl);
36:   return 0;
37: }
38: 
39: static void
40: dump_model (const struct julien *jl)
41: {
42:   size_t n_parameters;
43:   const double *data = julien_data (jl, &n_parameters);
44:   size_t i;
45:   for (i = 0; i < n_parameters; i++)
46:     {
47:       if (i != 0)
48:         {
49:           printf (",");
50:         }
51:       printf ("%f", data[i]);
52:     }
53:   printf ("\n");
54: }

So, let us compile it and run it:

gcc -o learn-absdiff main_absdiff.c julien.c || exit 1
timeout 3 ./learn-absdiff || echo "Failed :("
Failed :(

What happened? We did not get a response in 3 seconds! Changing the learning rate does not help, you can check for yourself.

In the previous example, we learnt the sum function, and we ended up learning the following behavior: multiply the first input by 1, the second by 1, and add the two results. This can be summed up in the diagram figure 1.

explain-network-sum.png

Figure 1: How we learnt to add two numbers

When learning the absdiff function, we are looking for values to replace the 1.0 weights. Unfortunately, it is not possible to find such values. Suppose that we had w1 and w2 such that for all julien_18f80d68a5f17683caf82aed3905edfde976d4ac.png and julien_d196a455d6f2fad6f053896701b333499472dbcf.png such that both are either 0 or 1, julien_0b64ec973f83865d7677783723570ead61ce3b61.png.

Let julien_18f80d68a5f17683caf82aed3905edfde976d4ac.png and julien_d196a455d6f2fad6f053896701b333499472dbcf.png be two distinct values (one is 0, the other is 1). Then, by the property of the abs function,

julien_7972a83c6600d105c3525f5957494b67f6bb20a8.png

Since julien_423bb3915c212f7ce54106d82a012137ff8847c9.png still satisfies the input requirements (two values being either 0 or 1), we can also write:

julien_1f3fe05ea4177ec211c484a4e5b4d0ed61308cdf.png

So by combining both last equations, we get:

julien_eab8929e6e058bd23eece7581241d64339e6068a.png

Factoring the julien_18f80d68a5f17683caf82aed3905edfde976d4ac.png's and julien_d196a455d6f2fad6f053896701b333499472dbcf.png's, we get:

julien_c639400b57964a22e17b5b5e568869b1b45ce21d.png

Since we explicitely chose two distinct values, then

julien_1be431d9dac0c617a26979590295ccbfcf717a8e.png

Let us call this value julien_d4d1aed1667079a50cac8b266a4019c8e334bb24.png.

Now, what happens if we look at julien_17989ad6803f76f6c0eaf67b17e807363eb242a9.png?

julien_37fea7713539c566ce12fcedf3049e6a12160f49.png

So, julien_12cda6a9de7fe3468c8527013b34382e25617084.png. So we do not actually learn our absdiff function; we only learn to tell the zero value.

By this demonstration, we cannot learn the absdiff function with the same kind of models as what we did for the sum. Fortunately, we can reuse a part of what we have. If we were to write the absolute difference between julien_18f80d68a5f17683caf82aed3905edfde976d4ac.png and julien_d196a455d6f2fad6f053896701b333499472dbcf.png as a sum of two terms, what would it be?

There are two cases: either julien_f6dbd6d41bdf2a863c3a3c34f72f025f56a9e029.png, in which case we could sum julien_1f2d96de3a6690a1bbf3cade458c523c661e6ba6.png and julien_b2c1faafe389054fb76ed2f22fee4e477bf9270f.png, or julien_ea411b543286aeb6826a5929578abc549692afe7.png, in which case we could sum 0 and julien_35b300a95d049b160499b7ac74ed7f87695cc56f.png. Let us define for all julien_3773bfa0a05421f0f67e6c4791a370dbe60310e2.png julien_9703a562bcf7a1cae105fe92c1587d5a8a732cc7.png as julien_3773bfa0a05421f0f67e6c4791a370dbe60310e2.png if julien_96ed7948549cb1d57c9079b50b2229f2f2ded632.png and 0 otherwise. Then, the function would be:

julien_4cc0e84967c5d84795a6a5cabc5210ccb7539f7a.png

which has the nice property of being symmetric. If we wanted to represent this operation as a graph, we would get the diagram figure 2.

explain-network-absdiff.png

Figure 2: How we should compute the absolute difference

The yellow elements are called hidden neurons. Fortunately, julien already lets you use this. So let us revise our code (program 3) to add this hidden layer of two neurons. We also need to reduce the learning rate.

 1: #include "julien.h"
 2: #include <stdio.h>
 3: #include <stdlib.h>
 4: #include <assert.h>
 5: 
 6: static void dump_model (const struct julien *jl);
 7: 
 8: int
 9: main ()
10: {
11:   size_t n_hidden_layers = 1;
12:   size_t hidden_sizes[1] = { 2 };
13:   struct julien *jl = julien_alloc (2, 1, n_hidden_layers, hidden_sizes); //
14:   double update;
15:   do
16:     {
17:       double input_data[2];
18:       update = 0;
19:       for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++)
20:         {
21:           for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++)
22:             {
23:               /* Now, we want to compute the absolute difference between inputs. */
24:               double expected_output = input_data[0] - input_data[1];
25:               if (expected_output < 0)
26:                 {
27:                   expected_output = -expected_output;
28:                 }
29:               update += julien_learn (jl, 2, input_data, 1, &expected_output, 0.5);
30:             }
31:         }
32:       fprintf (stderr, "%s:%d: mean update is %g\n",
33:                __FILE__, __LINE__, update / 4);
34:     }
35:   while (update > 1e-8);
36:   dump_model (jl);
37:   julien_free (jl);
38:   return 0;
39: }
40: 
41: static void
42: dump_model (const struct julien *jl)
43: {
44:   size_t n_parameters;
45:   const double *data = julien_data (jl, &n_parameters);
46:   size_t i;
47:   for (i = 0; i < n_parameters; i++)
48:     {
49:       if (i != 0)
50:         {
51:           printf (",");
52:         }
53:       printf ("%f", data[i]);
54:     }
55:   printf ("\n");
56: }

In line 13, we use the last two parameters of the allocation function to reserve some space for one hidden layer of 2 neurons. Otherwise, nothing changed.

gcc -o learn-absdiff main_absdiff_fixed.c julien.c || exit 1
./learn-absdiff
-0.879890,0.878679,0.878679,-0.879890,1.138072,1.138072

This is rather surprising; we expected to have the absolute value of all weights set to 1. Is it correct though? Let us check with an adequate ad-hoc program. Write program 4 as predict_absdiff.c.

 1: #include "julien.h"
 2: #include <stdio.h>
 3: #include <stdlib.h>
 4: #include <assert.h>
 5: 
 6: int
 7: main ()
 8: {
 9:   size_t n_hidden_layers = 1;
10:   size_t hidden_sizes[1] = { 2 };
11:   double data[] =
12:     {
13:      -0.879890,0.878679,0.878679,-0.879890,1.138072,1.138072
14:      
15:     };
16:   size_t n_data = sizeof (data) / sizeof (data[0]);
17:   struct julien *jl = julien_alloc (2, 1, n_hidden_layers, hidden_sizes);
18:   double input_data[2];
19:   julien_set_data (jl, 0, n_data, data); //
20:   for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++)
21:     {
22:       for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++)
23:         {
24:           double expected_output = input_data[0] - input_data[1];
25:           double actual_output;
26:           size_t n_outputs;
27:           if (expected_output < 0)
28:             {
29:               expected_output = -expected_output;
30:             }
31:           n_outputs = julien_predict (jl, 2, input_data, 1, 0, &actual_output); //
32:           assert (n_outputs == 1);
33:           printf ("| %.1f | %.1f | %.1f | %.1f | %g |\n",
34:                   input_data[0], input_data[1], actual_output, expected_output,
35:                   expected_output - actual_output);
36:         }
37:     }
38:   julien_free (jl);
39:   return 0;
40: }

Since we know the data of the perceptron, we can skip the training phase and directly use it, by calling julien_set_data (in line 19). This function takes four arguments: the perceptron, the amount of data that we will leave untouched (0, since we set all the data at once), the amount of data that we set (all of it), and an array of the appropriate size.

In line 31, we call the last useful function of the library, to make a prediction. This function takes the perceptron, the number of inputs (the remaining ones would be padded with 0), the maximum number of outputs we care about, the number of outputs after the first one that we care about, and where to store them. The function returns the total number of outputs.

We get the results in table 3. We can check that the function learnt is actually correct. Given that we only kept 7 digits for the perceptron data.

Table 3: Checking that the strange function that we've just learnt is actually correct
x1 x2 output expected error
0.0 0.0 0.0 0.0 0
0.0 1.0 1.0 1.0 3.3112e-08
1.0 0.0 1.0 1.0 3.3112e-08
1.0 1.0 0.0 0.0 0

4 Library API

4.1 struct julien

This opaque structure holds everything required to predict and learn a perceptron. It is allocated by julien_alloc, and discarded with julien_free. The perceptron data are initialized with a very weak pseudo-random number generator; maybe you will want to initialize it yourself with julien_set_data.

4.2 struct julien * julien_alloc (size_t dim input, size_t dim output, size_t n hidden, const size_t *restrict dim hidden)

Allocate and initialize a new perceptron with dim input inputs, dim output outputs, and n hidden layers (with respective dimensions dim hidden). The returned object must be freed with julien_free.

4.3 void julien_free (struct julien * julien)

Free the resources used by julien, as allocated by julien_alloc.

4.4 const double * julien_data (const struct julien * julien, size_t * n parameters)

Return a pointer to the internal data of julien, and set n parameters (if it is not NULL) to the size of the return value.

4.5 void julien_set_data (struct julien * julien, size_t start, size_t n, const double *restrict parameters)

Skip the start first parameters in julien, then initialize the following n parameters.

4.6 double julien_learn (struct julien * julien, size_t dim input, const double *restrict input, size_t dim output, const double *restrict output, double learning rate)

Teach julien to learn the output array (of dimension dim output) when seing the input array (of dimension dim input). Learning is more efficient as learning rate grows, but if it is too high then there is a risk julien would forget too fast to learn anything.

Both dim input and dim output need not be of the expected dimension (as given at allocation time to julien_alloc). If the dimensions are too high, then only the first elements are considered. If the arrays are too small, then the remaining elements will be set to 0.

Return the mean update to the internal weights. If this value is close to 0, then julien does not learn anymore and you can stop it.

4.7 size_t julien_predict (struct julien * julien, size_t dim input, const double *restrict input, size_t max output, size_t start output, double *restrict output)

Use julien to predict the output given input. More specifically, discard the first start output elements, then store up to the following max output elements into output, and return the total number of elements.

Author: Vivien Kraus

Created: 2019-11-25 Mon 19:28

Validate