Using Logic Gates as Neurons - Deep Differentiable Logic Gate Networks!

День тому

What if we could do away with all the complexities of a neuron and just model neural networks with logic gates? Fundamentally, logic gates are not differentiable, but with some modifications, we can make it differentiable. We can also make the network learn which logic gate to use using a differentiable categorical distribution. This interesting paper at NeurIPS 2022 shows that using logic gates, we can get much faster inference times and similar accuracy to that of neural networks. Scaling it up, though, is an issue, and we discuss some ways which can potentially help to scale in the next phase of improvements.
Some references:
Paper: arxiv.org/pdf/2210.08277
DiffLogic Code Implementation: github.com/Felix-Petersen/dif...
Slides: github.com/tanchongmin/Tensor...
De Morgan's Laws: en.wikipedia.org/wiki/De_Morg...
Universal Logic Gates: www.electronics-tutorials.ws/....
Gated Linear Units (GLU): arxiv.org/abs/1908.07442
/ glu-gated-linear-unit
~~~~~~~~~~~~~~~~~~~~~~~~~~~
0:00 Introduction
1:48 Perceptron and Logic Gates
16:08 Differences between Perceptron and Logic Gates
20:10 What Logic Gates to model?
23:26 Logic Gates Network Overall Architecture
36:02 Difficulty in training Logic Gates
37:17 Relaxation 1: Real-valued Logics
38:33 Relaxation 2: Distribution over the choice of parameter
43:55 Training Setup
45:05 Configuring output for classification
49:21 Results
59:04 Exponential Growth of Gates
59:44 Limitations
1:01:43 My thoughts on how to model biological neurons
1:08:40 Discussion
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Discord: / discord
Online AI blog: delvingintotech.wordpress.com/.
LinkedIn: / chong-min-tan-94652288
Twitch: / johncm99
Twitter: / johntanchongmin
Try out my games here: simmer.io/@chongmin

КОМЕНТАРІ: 22

@johntanchongmin Рік тому

44:12 The normal distribution is not just for neural networks, but also for the weights of the categorical distribution in order to choose which logic gate to use.

@johntanchongmin Рік тому

40:10 Note that during training, the logic gate neuron's output is still the weighted sum based on softmax. Only during inference, the logic gate is selected to be the one with highest probability, in order to be able to be implemented on hardware devices such as Field-programmable Gate Array (FPGA).

@PRINCE-fj6iu Рік тому

You are Underrated sir

@johntanchongmin Рік тому

28:54 I actually wanted to refer to Gated Linear Units (GLU), instead of GeLU. This is actually implemented in TabNet (arxiv.org/abs/1908.07442). The idea is to have the input be a "volume control" via a sigmoid gate, which is multiplied to the original output of the neuron, to control how much of the actual output should flow through to the next layer.

@johntanchongmin Рік тому

1:03:33 Do note that capping output to 1 can also lead to vanishing gradients, if we are at the saturation point of some activation function like sigmoid or tanh. ReLU actually was designed to help with vanishing gradients, but can cause exploding gradients instead as it passes through the entire gradient from the next layer down to the earlier layers. Overall, the vanishing/exploding gradients is actually more of the result of backpropagation through layers, but can be worsened by larger/smaller weights. There is still some merit to limiting the output to prevent large magnitude output causing excessive weight change via backpropagation. The relation of capping output to 1 solving vanishing and exploding gradients is not as direct as I intended, and would also need a fundamental relook at backpropagation.

@andrewferguson6901 Рік тому

The obvious solution is to model the architecture like a biological system and use competing feedback loops to find a tunable equilibrium

@andrewferguson6901 Рік тому

Or find an algorithm that can jump through relative maxima in n dimensional space to find a lower state in the other side a-la quantum tunneling

@SarveRadhaNaam Рік тому

aah quality content

@Jd4x711 Рік тому

with this information i can now understand minecraft redstone :)

@johntanchongmin Рік тому

haha how are you planning to do this?

@Shahawir Рік тому

Interesting work, well done. Those who are asking are just annoying. Let the man finish his idea then ask, most of them are commenting not asking.

@johntanchongmin Рік тому

Haha thanks:) No worries, these ppl are my friends, they are just very eager to clarify.

@johntanchongmin Рік тому

5:10 The universal gates are actually NAND and NOR. Refer to www.electronics-tutorials.ws/logic/universal-gates.html#:~:text=Universal%20Logic%20Gates%20using%20only,it%20a%20universal%20logic%20gate Also, De Morgan's Law just helps to simplify boolean/logical expressions, but it does not guarantee expressivity. It is actually the universal gates that guarantee expressivity of any boolean expression from just a single fixed gate.

@marvin.marciano Рік тому

Genius

@Whom1337 Рік тому

Are neurons accessed the same way a logic gate wold be? I assume gates would need to be accessed through the initial source of the chain whereas neurons seem to be accessed independently. Maybe I misunderstand my, English second language

@johntanchongmin Рік тому

Thanks for the question. A neuron has inputs, just like logic gate. The only difference is that the neuron has a learnable function that maps inputs into outputs y=learnable_fn(x). A logic gate has a fixed function, y = fixed_fn(x). The paper tries to make the function "learnable" by choosing one of the 16 logic gates, which I feel may be the cause of training instability because the change between logic gates can be very drastic.

@Whom1337 Рік тому

I see! Thank you for the response! This is all very fascinating.

@qwertasd7 Рік тому

An interesting topic, eventually it is about the math that most closely matches fast operations on hardware, (like multiply-add). in this problem scenario, it might perhaps not be about the gate types and what they can do but rather the connections themselves, ea where an axion gets the most reward form (ea think of the perspective of the connecting lines, not the neurons), evolves a network based upon the idea of self-connecting axions, maybe guided as in 'food- ant' simulations (those pixel ants leaving trials for where to find food) that is some kind of evolutionary road mapping. (in such a network generic neurons of all types can be added, ea And, Or, NAND XOR, delays, even bit/byte/int array shifters ).

@johntanchongmin Рік тому

I agree. Modifying the strength of connections, rather than changing the gates, makes more sense to me. The ant simulation theory is interesting, wonder if that can also work for traditional neural network training.

@qwertasd7 Рік тому

@@johntanchongmin well maybe it can mimic the growing structure of the brain too, though I wouldn't know how to code your novel ideas

@andrewferguson6901 Рік тому

@qwertasd7 you break it down into simple steps then break it down further into even simpler steps. Execute in order. Congrats you've done it