A logic gate network is a neural network which consists of logic gates in the form of neurons. A traditional neural network consist of the following component:

  • neurons
  • connections(weights)

Every neuron has properties like bias and weights. During a NN training, these weights are learned and updated at every layer during back propagation.

In a logic gate network, instead of having a 32 bit floating point weights, we have a network which is weightless. During training, instead of updating the weights, now that we have a weightless net, we will be selecting a logic gate for every neuron.

So, training for logic gate network → Not backprop → Choosing appropriate logic gate out of total no. of possible gate combinations.

In our example, we have 16 total gate operations combinations. During training, most preferred gates are learned by a neural net. After training, we have a hardcoded network which is full of logic gates connected to the subsequent logic gates of another layer(we will get to connections little later). This hardcoded logic network can be easily flashed on an FPGA’s storage unit- LUTs(given we have an adequate number of LUTs on our FPGA). Every logic gate can be broken down into inputs, outputs and ops and these can be represented in a logic gate truth table which can be stored in a LUT.

Example: AND gate truth table

a | b | out

0 | 0 | 0

0 | 1 | 0

1 | 0 | 0

1 | 1 | 1

The AND gate’s truth table can be implemented as a 4-bit LUT in an FPGA. The inputs a and b form a 2-bit address, and the output is looked up as follows:

Address (ab) Output 00 0 01 0 10 0 11 1

This LUT stores the binary sequence 0001, which defines the AND operation. In our logic gate network, each neuron’s gate (like AND) is mapped to such a LUT.

Assuming we have the required no. of LUTs in our FPGA, we can flash this weightless network on FPGA and perform ML inference in “nanoseconds”(ideally).

Here, the inputs of the logic gate layer → no. of logic gates * 2 (because every logic gate got 2 inputs). Let’s say our neural net consist of 5 neurons or logic gates per layer, so total no. of inputs for one logic gate layer = 5* 2 =10, and total no. of outputs will be equal to the no. of logic gates = 5.

             → a                         

(Neuron 1)        → out1 

             → b

             →a

(Neuron 2)        → out2  

             →b

             →a 

(Neuron 3)         → out3

             →b

             →a

(Neuron 4)         → out4

             →b

             →a

(Neuron 5)         → out5

             →b

The above is the first layer. Now, the challenge is how these layers are connected → how neurons at every layer is connected to neurons of the subsequent layer ? → Are the connections random or unique ?

These connections are set during the training time and follows an algorithm(more details on that later on). To perform the faster inference on various accelerators, this pytorch trained network can be converted to C code and then compiled by C compiler to generate a shared object file(.so) which can be called from outside during inference. This enables faster inference on CPU.

For GPUs, the same network is written from scratch using CUDA and it follows packbits tensor algorithm during inference to leverage the GPU architecture for faster inference.

Our contribution:

For FPGAs, we created a verilog generator that takes a pytorch logic network and generate a verilog(RTL) implementation of the network. Further, we synthesised the generated verilog and ran it on simulator(iverilog) for simulation. Our main goal is to run inference using the generated bitstream of the above network on Vaaman FPGA board which has a Trion 120 FPGA with 1.2k LUTs.

Experimental results-

Inference on AMD Ryzen 5 4000 Series CPU

• Dataset: CIFAR-10-3-thresholds • Performance: 5.9131 seconds for 10k test images • Latency: ∼590 microseconds/image • Accuracy: 57%

Inference on Intel i5 9300H CPU

• Dataset: MNIST 20x20 • Latency: 270 μs/image

Inference on Trion 120 FPGA

• Dataset: MNIST 20x20 • Clock Frequency: 20 MHz • Logic Elements: 7,588 • Latency: 5.5 ms/image • Breakdown: – Pre-processing: 600 μs – Inference: 3.5 ms – Post-processing: 98 μs

Experiments on NVIDIA GTX 1650 GPU:

  • Inference: 97.03% accuracy

Link to inference pipeline code: https://github.com/b1shtream/conv-differentiable-logic-gate-networks

Conclusion: Logic Gate Networks (LGNs)

Problem with FP32 Baseline Models:

• Require large number of DSP slices. • FPGAs have limited DSP resources — hence inefficient.

Solution:

• Use 1-bit logic gate networks.

Work Done:

  1. LGN training on MNIST 20x20 and MNIST 28x28 using CPU (Python).
  2. LGN training using GPU (CUDA).
  3. Verilog generator for LGN.
  4. FPGA synthesis of Verilog.
  5. Flashing on Vaaman board.
  6. Inference using UART-based communication.