Learn a multi-layer perceptron with Julien

1. Introduction
2. First example: the SUM function!
3. Second example: the absolute difference
4. Library API

1 Introduction

A multi-layer perceptron is a machine learning algorithm that is able to learn some useful functions. In this package, the outputs are real-valued.

2 First example: the SUM function!

Let us take an example! Suppose that we want to learn the sum function. The function has two values, that can be either 0 or 1, and returns the sum of the two variables. So the dataset is in table 1.

Table 1: The "SUM" function
Input x	Input y	Output
0	0	0
0	1	1
1	0	1
1	1	2

So, how do we learn this? Let us write our first julien-enabled program! Write code 1 to a file, main_sum.c.

Listing 1: Learn to sum with julien!

 1: /* Ensure that julien.h is in the current directory */
 2: #include "julien.h"
 3: #include <stdio.h>
 4: #include <stdlib.h>
 5: #include <assert.h>
 6: 
 7: /* Print the model internals as a CSV row */
 8: static void dump_model (const struct julien *jl);
 9: 
10: int
11: main ()
12: {
13:   /* We have two inputs, and one output.  We do not use the other
14:    * arguments for now. */
15:   struct julien *jl = julien_alloc (2, 1, 0, NULL); //
16:   /* Keep track of how much we updated our model */
17:   double update;
18:   /* Keep learning! */
19:   do
20:     {
21:       double input_data[2];
22:       update = 0;
23:       /* Iterate over the dataset */
24:       for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++)
25:         {
26:           for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++)
27:             {
28:               /* We want to learn to produce the sum! */
29:               double expected_output = input_data[0] + input_data[1];
30:               /* Call the learning function (see the commentary) */
31:               update += julien_learn (jl, 2, input_data, 1, &expected_output, 1); //
32:             }
33:         }
34:       /* Visualize our progress! */
35:       fprintf (stderr, "%s:%d: mean update is %g\n",
36:                __FILE__, __LINE__, update / 4);
37:     }
38:   while (update > 1e-8);
39:   dump_model (jl);
40:   /* Each alloc() comes with its associated free() */
41:   julien_free (jl); //
42:   return 0;
43: }
44: 
45: static void
46: dump_model (const struct julien *jl)
47: {
48:   size_t n_parameters;
49:   /* This function gets the internal parameters of the perceptron */
50:   const double *data = julien_data (jl, &n_parameters); //
51:   size_t i;
52:   for (i = 0; i < n_parameters; i++)
53:     {
54:       if (i != 0)
55:         {
56:           printf (",");
57:         }
58:       printf ("%f", data[i]);
59:     }
60:   printf ("\n");
61: }

This example demonstrates all there is to know to start learning a model. We first call the allocation function, passing the number of inputs and the number of outputs, in line 15. Then, we repeatedly loop over our dataset, and learn from it in line 31. The function expects us to specify the number of inputs and outputs, so that it will not overflow our input_data array or our expected_output pointer. The function returns the mean amount of change for our model. With time, this value is decreasing.

The last parameter in line 31, which is now set to 1, is called the learning rate. When it is close to 0, then the model will learn very little at each step, so the number of steps required increases. When it is too high, the model will over-react to any data point and the number of steps increases, or the model can even refuse to converge. Unfortunately, its value can only be set up through trial and errors. Start with a relative low value, and increase it while the number of iteration decreases.

In line 41, we do not forget to discard our perceptron to avoid memory leaks. For each call to julien_alloc, there should be a call to julien_free.

Finally, once we have trained our model, we call the julien_data function to retreive the learnt function. This is done in line 50. For this example, the sole output is simply computed as a weighted sum of all the inputs. The weights are the two values in the data.

To compile, run the following command in the same directory as both julien.h and julien.c:

gcc -o learn-sum main_sum.c julien.c

When we run this program, we get:

./learn-sum

1.000000,1.000000

This means that the output is exactly one time the first input, plus one time the second input. We successfully learnt how to sum two numbers!

3 Second example: the absolute difference

Suppose now that we want to learn how to compute the absolute difference between both inputs. Our dataset is now in table 2.

Table 2: The "ABSDIFF" function
Input x	Input y	Output
0	0	0
0	1	1
1	0	1
1	1	0

We will follow the same procedure as for the previous example. This gives us the program 2, main_absdiff.c. This is exactly the same code as program 1, except that we decided to change what we want to learn.

Listing 2: Learn the absolute difference with julien

 1: #include "julien.h"
 2: #include <stdio.h>
 3: #include <stdlib.h>
 4: #include <assert.h>
 5: 
 6: static void dump_model (const struct julien *jl);
 7: 
 8: int
 9: main ()
10: {
11:   struct julien *jl = julien_alloc (2, 1, 0, NULL);
12:   double update;
13:   do
14:     {
15:       double input_data[2];
16:       update = 0;
17:       for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++)
18:         {
19:           for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++)
20:             {
21:               /* Now, we want to compute the absolute difference between inputs. */
22:               double expected_output = input_data[0] - input_data[1];
23:               if (expected_output < 0)
24:                 {
25:                   expected_output = -expected_output;
26:                 }
27:               update += julien_learn (jl, 2, input_data, 1, &expected_output, 1);
28:             }
29:         }
30:       fprintf (stderr, "%s:%d: mean update is %g\n",
31:                __FILE__, __LINE__, update / 4);
32:     }
33:   while (update > 1e-8);
34:   dump_model (jl);
35:   julien_free (jl);
36:   return 0;
37: }
38: 
39: static void
40: dump_model (const struct julien *jl)
41: {
42:   size_t n_parameters;
43:   const double *data = julien_data (jl, &n_parameters);
44:   size_t i;
45:   for (i = 0; i < n_parameters; i++)
46:     {
47:       if (i != 0)
48:         {
49:           printf (",");
50:         }
51:       printf ("%f", data[i]);
52:     }
53:   printf ("\n");
54: }

So, let us compile it and run it:

gcc -o learn-absdiff main_absdiff.c julien.c || exit 1
timeout 3 ./learn-absdiff || echo "Failed :("

Failed :(

What happened? We did not get a response in 3 seconds! Changing the learning rate does not help, you can check for yourself.

In the previous example, we learnt the sum function, and we ended up learning the following behavior: multiply the first input by 1, the second by 1, and add the two results. This can be summed up in the diagram figure 1.

Figure 1: How we learnt to add two numbers

When learning the absdiff function, we are looking for values to replace the 1.0 weights. Unfortunately, it is not possible to find such values. Suppose that we had w₁ and w₂ such that for all and such that both are either 0 or 1, .

Let and be two distinct values (one is 0, the other is 1). Then, by the property of the abs function,

Since still satisfies the input requirements (two values being either 0 or 1), we can also write:

So by combining both last equations, we get:

Factoring the 's and 's, we get:

Since we explicitely chose two distinct values, then

Let us call this value .

Now, what happens if we look at ?

So, . So we do not actually learn our absdiff function; we only learn to tell the zero value.

By this demonstration, we cannot learn the absdiff function with the same kind of models as what we did for the sum. Fortunately, we can reuse a part of what we have. If we were to write the absolute difference between and as a sum of two terms, what would it be?

There are two cases: either , in which case we could sum and , or , in which case we could sum 0 and . Let us define for all as if and 0 otherwise. Then, the function would be:

which has the nice property of being symmetric. If we wanted to represent this operation as a graph, we would get the diagram figure 2.

Figure 2: How we should compute the absolute difference

The yellow elements are called hidden neurons. Fortunately, julien already lets you use this. So let us revise our code (program 3) to add this hidden layer of two neurons. We also need to reduce the learning rate.

Listing 3: Learn the absolute difference with julien, really this time

 1: #include "julien.h"
 2: #include <stdio.h>
 3: #include <stdlib.h>
 4: #include <assert.h>
 5: 
 6: static void dump_model (const struct julien *jl);
 7: 
 8: int
 9: main ()
10: {
11:   size_t n_hidden_layers = 1;
12:   size_t hidden_sizes[1] = { 2 };
13:   struct julien *jl = julien_alloc (2, 1, n_hidden_layers, hidden_sizes); //
14:   double update;
15:   do
16:     {
17:       double input_data[2];
18:       update = 0;
19:       for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++)
20:         {
21:           for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++)
22:             {
23:               /* Now, we want to compute the absolute difference between inputs. */
24:               double expected_output = input_data[0] - input_data[1];
25:               if (expected_output < 0)
26:                 {
27:                   expected_output = -expected_output;
28:                 }
29:               update += julien_learn (jl, 2, input_data, 1, &expected_output, 0.5);
30:             }
31:         }
32:       fprintf (stderr, "%s:%d: mean update is %g\n",
33:                __FILE__, __LINE__, update / 4);
34:     }
35:   while (update > 1e-8);
36:   dump_model (jl);
37:   julien_free (jl);
38:   return 0;
39: }
40: 
41: static void
42: dump_model (const struct julien *jl)
43: {
44:   size_t n_parameters;
45:   const double *data = julien_data (jl, &n_parameters);
46:   size_t i;
47:   for (i = 0; i < n_parameters; i++)
48:     {
49:       if (i != 0)
50:         {
51:           printf (",");
52:         }
53:       printf ("%f", data[i]);
54:     }
55:   printf ("\n");
56: }

In line 13, we use the last two parameters of the allocation function to reserve some space for one hidden layer of 2 neurons. Otherwise, nothing changed.

gcc -o learn-absdiff main_absdiff_fixed.c julien.c || exit 1
./learn-absdiff

-0.879890,0.878679,0.878679,-0.879890,1.138072,1.138072

This is rather surprising; we expected to have the absolute value of all weights set to 1. Is it correct though? Let us check with an adequate ad-hoc program. Write program 4 as predict_absdiff.c.

Listing 4: Learn the absolute difference with julien, really this time

 1: #include "julien.h"
 2: #include <stdio.h>
 3: #include <stdlib.h>
 4: #include <assert.h>
 5: 
 6: int
 7: main ()
 8: {
 9:   size_t n_hidden_layers = 1;
10:   size_t hidden_sizes[1] = { 2 };
11:   double data[] =
12:     {
13:      -0.879890,0.878679,0.878679,-0.879890,1.138072,1.138072
14:      
15:     };
16:   size_t n_data = sizeof (data) / sizeof (data[0]);
17:   struct julien *jl = julien_alloc (2, 1, n_hidden_layers, hidden_sizes);
18:   double input_data[2];
19:   julien_set_data (jl, 0, n_data, data); //
20:   for (input_data[0] = 0; input_data[0] <= 1; input_data[0]++)
21:     {
22:       for (input_data[1] = 0; input_data[1] <= 1; input_data[1]++)
23:         {
24:           double expected_output = input_data[0] - input_data[1];
25:           double actual_output;
26:           size_t n_outputs;
27:           if (expected_output < 0)
28:             {
29:               expected_output = -expected_output;
30:             }
31:           n_outputs = julien_predict (jl, 2, input_data, 1, 0, &actual_output); //
32:           assert (n_outputs == 1);
33:           printf ("| %.1f | %.1f | %.1f | %.1f | %g |\n",
34:                   input_data[0], input_data[1], actual_output, expected_output,
35:                   expected_output - actual_output);
36:         }
37:     }
38:   julien_free (jl);
39:   return 0;
40: }

Since we know the data of the perceptron, we can skip the training phase and directly use it, by calling julien_set_data (in line 19). This function takes four arguments: the perceptron, the amount of data that we will leave untouched (0, since we set all the data at once), the amount of data that we set (all of it), and an array of the appropriate size.

In line 31, we call the last useful function of the library, to make a prediction. This function takes the perceptron, the number of inputs (the remaining ones would be padded with 0), the maximum number of outputs we care about, the number of outputs after the first one that we care about, and where to store them. The function returns the total number of outputs.

We get the results in table 3. We can check that the function learnt is actually correct. Given that we only kept 7 digits for the perceptron data.

Table 3: Checking that the strange function that we've just learnt is actually correct
x₁	x₂	output	expected	error
0.0	0.0	0.0	0.0	0
0.0	1.0	1.0	1.0	3.3112e-08
1.0	0.0	1.0	1.0	3.3112e-08
1.0	1.0	0.0	0.0	0

4 Library API

4.1 `struct julien`

This opaque structure holds everything required to predict and learn a perceptron. It is allocated by julien_alloc, and discarded with julien_free. The perceptron data are initialized with a very weak pseudo-random number generator; maybe you will want to initialize it yourself with julien_set_data.

4.2 `struct julien ` `julien_alloc`* (`size_t` dim input, `size_t` dim output, `size_t` n hidden, `const size_t restrict` dim hidden*)

Allocate and initialize a new perceptron with dim input inputs, dim output outputs, and n hidden layers (with respective dimensions dim hidden). The returned object must be freed with julien_free.

4.3 `void` `julien_free` (`struct julien ` julien*)

Free the resources used by julien, as allocated by julien_alloc.

4.4 `const double ` `julien_data`* (`const struct julien ` julien, `size_t ` n parameters)

Return a pointer to the internal data of julien, and set n parameters (if it is not NULL) to the size of the return value.

4.5 `void` `julien_set_data` (`struct julien ` julien, `size_t` start, `size_t` n, `const double restrict` parameters)

Skip the start first parameters in julien, then initialize the following n parameters.

4.6 `double` `julien_learn` (`struct julien ` julien, `size_t` dim input, `const double restrict` input, `size_t` dim output, `const double restrict` output, `double` learning rate*)

Teach julien to learn the output array (of dimension dim output) when seing the input array (of dimension dim input). Learning is more efficient as learning rate grows, but if it is too high then there is a risk julien would forget too fast to learn anything.

Both dim input and dim output need not be of the expected dimension (as given at allocation time to julien_alloc). If the dimensions are too high, then only the first elements are considered. If the arrays are too small, then the remaining elements will be set to 0.

Return the mean update to the internal weights. If this value is close to 0, then julien does not learn anymore and you can stop it.

4.7 `size_t` `julien_predict` (`struct julien ` julien, `size_t` dim input, `const double restrict` input, `size_t` max output, `size_t` start output, `double restrict` output*)

Use julien to predict the output given input. More specifically, discard the first start output elements, then store up to the following max output elements into output, and return the total number of elements.

Learn a multi-layer perceptron with Julien

Table of Contents

1 Introduction

2 First example: the SUM function!

3 Second example: the absolute difference

4 Library API

4.1 struct julien

4.2 struct julien * julien_alloc (size_t dim input, size_t dim output, size_t n hidden, const size_t *restrict dim hidden)

4.3 void julien_free (struct julien * julien)

4.4 const double * julien_data (const struct julien * julien, size_t * n parameters)

4.5 void julien_set_data (struct julien * julien, size_t start, size_t n, const double *restrict parameters)

4.6 double julien_learn (struct julien * julien, size_t dim input, const double *restrict input, size_t dim output, const double *restrict output, double learning rate)

4.7 size_t julien_predict (struct julien * julien, size_t dim input, const double *restrict input, size_t max output, size_t start output, double *restrict output)