Neural Networks is an advanced technique in building artificially intelligent machines. I will more specifically define artificially intelligent as machines that think at least as well as humans on a specific task. There are many things we do not know the definitive answers to with our current technology or methodology. The knowledge of the brain is what has led us to the design of neural networks. The increasing knowledge of brain physiology is what is helping us come closer to building effective neural networks in theory. Hardware limitations aside theoretically we can use neural networks to help us learn how to mimic the brain intelligence artificially. The application of Chaos Theory to AI and neural is another interesting and very important consideration I will make.

In contrast to the notion of the processing of a normal computer neural networks cause 4 major differences that vastly change how the computer processes information. The first difference is in adaptive learning. It can react to different sets of input without having being told how to react to every imaginable possible set of input as with conventional computing. Thus it can use general properties of the input data to make a decision. As it receives new input at various times it will improvise what it “knows” so in essence it becomes more intelligent in computations. It adapts to new information as it receives it about its input. The second major feature is self-organization. The neural network will organize its structure in response of its input data sets. In most neural network types this is simply the strength of certain links between neurons. This leads to the network reflecting and organizing itself based on the type of information it receives. Third difference is error tolerance. The neural network can generalize the input data sets it has received so if there is slight errors in the data due to accidental or intentional reasons the network will be rather forgiving in its decisions. Finally parallel information processing. Neural networks have the inherent ability to process input and output data in parallel.

Conventional computers follow a preprogrammed algorithmic approach. A computer follows step by step of exact instructions and it must know everything at that time to solve the problem at hand. On the other hand neural networks solve by learning from different examples and adjusting their network in response. The disadvantage of this over conventional computers is their learning is unpredictable. We may think they are learning something one way when in fact they have learned something else unexpectedly from the data input. We also have little control over what they learn.

A simple example is if I gave a neural network a white background with nothing on it as input one. Input number two as white background blue square. Third input as white background purple circle. Fourth input as green triangle. My goal is for the neural network to detect if there was a circle. From this example we can see that it could be trained either to detect any object that has the color purple or an object that is just a circle regardless of color or it could look for only circles that are purple colored. We don’t know what it has really learned at this point. The data set is limited. No matter which of the three ways it has learned, the output (solution) will be the same for this example. We have no control over what it has learned in this situation. Giving it a wide range of random different kinds of examples will help gear it to learning what we want it to learn. It is not an easy to know when it is learning exactly what we want it to learn. There can be more then one pattern seen in a group of data sets.

An artificial network uses a node to represent a neuron in a brain. Structurally it looks like a graph. The links to nodes have weights just like an edge in a graph. It has weighted inputs that only cause the weighted output to fire (in computer science terms return a Boolean value of 1) when the sum total has reached some determined threshold. Weights theoretically can be positive (excitatory) or negative (inhibitory). There is usually “layers” of nodes. Each neuron in a layer works in parallel. The inputs and outputs of the network are also done in parallel. These ideas have come from the neuron in the brain where a certain threshold of firings from the dendrites (inputs) can cause the axon (output of neuron) to fire. These nodes are connected to other neurons various ways depending on the type of net. There are 4 possible types of connections between nodes. Feed-forward brings weighted output from the node in a layer to the next layer’s nodes. Feedback connections bring weighted output back to a lower level layer. Lateral connections are weighted outputs between the nodes in only their own layer. Finally time-delayed connections, which add a time element. A node does not have to do any of these connections between nodes globally. In other words if we have a feed-forward connection to one node in the next layer it is possible not all of the nodes of the next layer have to be connected too.

Some common, but rather relatively simple networks are the perceptron network, the Hopfield net, the Adaptive Resonance Theory (ART) network and the Self-Organizing Map of Kohonen. The perceptron is a multi-layered feed-forward network. It has no lateral or feedback connections. The Hopfield is a one layered network with all lateral connections between nodes of that one layer. ART has bi-directional connections between layers. It “resonates” a certain amount of times before it propagates to the next layer. The SOM of Kohonen is where each node has a feature vector and is compared to the input of the neural network until the closest match is found. This vector is then updated and the neighboring nodes are updated closer or farther away from it depending on implementation.

There are various algorithms for the network to learn from its data sets. Some are better for certain types of networks. Every Neural network mentioned except the SOM of Kohonen uses a supervised learning algorithm. This means the data of the output is compared to the output that is expected and then adjusted (usually weight changes on connections) until it has learned correctly the appropriate expected output from the input. This is in correspondence to the theory that neurons strengthen their synaptic gaps in the brain to influence the interaction between neurons. The simplest example is the perceptron. It uses the backpropogation algorithm to learn which is the standard algorithm and easiest to understand. When the network gets output it is compared to what is expected for the input and then the weights of the links between the nodes of the outermost layer are decreased or increased to reduce error output. Then the layer before that is fixed and so on until it is back to the input layer. You must cycle through the whole range of the sample data set types before going back again for another cycle. If you have it learn one input type and then move on and never go back again for that particular input type, then it will forget previous memory of the previous input. For example if the input is the alphabet you would cycle through a to z over and over instead of cycling through letter a many times before moving onto letter b. We want it to learn an average of weight values for the given set of data inputs. This keeps things general and not too specific to one particular input and causing it to only remember the most recent input type. The SOM of Kohonen is an example of unsupervised learning, which is not driven by comparing expected output from a given input, but instead just by its input data sets. It learns the input data and clusters the data in a self-organization process to reflect the properties of the input data. An example would be if we had a large data set of numbers and wanted to know more information about its properties. There is no expected output to calculate so the only way to learn about the properties is for the network to internally organize itself based on its input. A pattern of organization will emerge as the network learns from the data set. The pattern can be displayed as a tree structure showing how the data relates to one another. http://odur.let.rug.nl/~kleiweg/kohonen/kohonen.html

There are many ways to build a neural network. Not all-neural networks are created equal. Some of the practical application areas for neural networks is in pattern recognition (simple walk through example of image pattern recognition http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html# Firing rules), filtering, data segmentation and compression, optimization. Some have been engineered towards doing some of these tasks, but there is no hard and set rules on what a network can and cannot do.

While some of the generic aforementioned networks may seem like enough on the surface to solve a variety of problems, they may not be enough or even adequate for many kinds of nontrivial problems that AI usually encompasses. If the above models were too simplistic it would be like having a monkey doing linear algebra. While it may do other tasks fine a monkey cannot do certain complex things such as linear algebra. The previously mentioned basic models do not model a complex system like the brain accurately. The general model of these networks can do some tasks intelligently to a very limited extent. If they did them as well as humans that would be the end of the story and all new research would be halted. That is not the case however and there is research today into making them more like the brain. We must look further for answers in advanced neurobiology to see if and how we can improve on the above kinds of models. The study of chaos in conjunction with neurobiology is only roughly 10 years old. This is extremely young field relatively speaking. It appears chaos in the brain has been discovered on both the microscopic (neural) and macroscopic level. Most new research on neural networks is learning how to simulate brain activity on the macroscopic and microscopic level. To understand what we mean by chaos and its possible relevance to neural network construction we will have to delve into more detail on it.

Chaos theory had slow growth for most of the 20^th century and is now starting to get a lot of attention by some scientist in the last couple decades. It is a new paradigm of science and it seems that it may hold some answers or in some areas at least a new way to view the world around us.

Henri Poincare was the first one to suggest anything of the likes of modern chaos theory. At the end of the 19^th century King Oscar II of Sweden posed the question asking if the universe using Newton’s equations is dynamically stable. It was a generalization of the 3-body problem, the most difficult problem in mathematical physics. In the process of working on the 3-body problem he came up with the Poincare section and the first and most major idea of modern chaos theory, sensitive dependence on initial conditions.

He was the first one to come up with this revolutionary idea of unpredictability over the long-term coming out of determinism mathematical formulas. This means that if I give a mathematical formula a slight change in its input it will lead to vastly different output. It is unpredictable on what will happen with each slightly different input. This is only possible due to the nonlinearity inherent in the mathematical formula. Even though it is deterministic (not random) it is unpredictable for a given set of inputs.

Unlike classical Newtonian mechanics where predictability was relatively easy based on formulas and where there were no “surprises” just based on what the initial conditions were. The non-linearity is necessity for chaos, but not all-unpredictable nor deterministic systems are chaotic. Deterministic chaos is not the only explanation of random, noisy or unpredictable behavior. Since it is deterministic there is no randomness built into deterministic chaos, but through its repeated iterations over the long term it becomes totally unpredictable. A graph of the movement of a chaotic expression is cyclical in fashion but it is not predictable in what the graph will look like for a given input. An example to demonstrate this conceptually is a function with a modulo in it. You can start with input that is only different in very small amount and after a while on the graph you get a totally different looking pattern. There was no way to predict the other based on the other one even though they started with very similar inputs.

In the 1960’s when Edward Lorenz accidentally put in a different value for weather simulation program he got unpredictable results. His input only varied by a very tiny amount since he accidentally put in the wrong digit but he still got widely varying results. The Lorenz attractors, which are a solution to the Lorenz differential equations for atmosphere, are two lobes where they jump back and forth in a unpredictable fashion.

We do not fully understand chaos theory and the universe as a whole right now. This makes things difficult when putting it into practical applications such as neural networks. Knowing if something is exactly chaotic or not is also hard to determine. There is “tests” to see if something is chaotic, but they don’t really diagnose for absolute certainty. For example we once thought we had tests proving that the heart was chaotic. When research was done on dogs they found that it was not chaotic in the mathematical sense. The tests gave a false conclusion. It is hard to say what chaos is exactly and when we exactly see it in nature if the tests failed. It is hard to come up with tests to determine what it is because we obviously don’t completely understand what it is if the tests do not really work. We can not simply at face value tell if it’s just random noise or if it has some underlying deterministic behavior.

Random activity had been seen by brain physiologist for a while, but they had originally thought it was just irrelevant random noise. Now some believe this is not noise at all, but this is true chaotic activity essential to brain functioning. However some believe that chaotic behavior is a side effect not an integral part to have a well functioning brain. Gail Carpenter and Stephen Grossberg, founders of adaptive resonance theory believe that it can be achieved in artificial neural networks in other ways without doing chaos. There has been research into more advanced ART (adaptive resonance theory) networks since then. Dr. Walter Freeman a UC Berkeley Neurophysiologist who has written many articles on chaos in the brain disagrees strongly. He believes chaos in integral part to neural network and brain functioning. There seems to be quite a bit of evidence he may be right. In comparing artificial neural networks to the brain Freeman and his co-researchers said:

“…pattern recognition systems based on the perceptron... operate by relaxation to one of a collectionof equlibrium states, constituting the minimization of an energy function" on the other hand "…biological pattern recognition systems do not go to equilibrium and do not minimize an energy function. Instead, they maintain continuing oscillatory activity, sometimes nearly periodic but most commonly chaotic." 1

Freeman believes that systems that go to equilibrium or low level periodic and hence not chaotic will fail and are basically be in a coma like state. Freeman pioneered chaotic behavior on the macroscopic level with his work on EEG’s of the olfactory bulb on rabbits. He discovered the patterns found in the waveform of EEG’s were chaotic. He believes this also extends to our brains. He believes that this chaotic activity is a controlled source of noise as a way to learn new sensory patterns and to have continual access to old ones. Further he believes since its not in a equilibrium sleep state it is ready for new input at any time since it doesn’t have to come out of a resting state. He believes that chaos is the reason why we can respond almost instantaneously to complicated stimuli even in different contexts.

According to Freeman to have continual access to previously learned behaviors is to have different chaotic attractors for different learned stimulus. The background chaotic activity allows it to transition to one of the many attractors in the phase space with relatively quick speed when given a stimulus that was learned. Now if it doesn’t cause it to jump to a chaotic attractor its considered a never before learned stimulus. It makes a new highly chaotic pathway unlike the other ones and thus avoids the pattern of all previously learned stimuli. Researchers believe this is extremely crucial to continual learning. Instead of putting new stimuli discovered into old categories it is put in a new category. If this mechanism did not exist according to Freeman then the brain would reproduce already learned behavior and it would converge to only behaving in what it has already learned and mimicking a state of comatose or death. One profound advantage chaos may confer on the brain is that chaotic systems continually produce novel activity patterns. According to Freeman:

“We propose that such patterns are crucial to the development of nerve cell assemblies that differ from established assemblies. More generally, the ability to create activity patterns I may underlie the brain's ability to generate insight and the "trials" of trial and-error problem solving.” 2

There seems to be more evidence for Freeman’s argument then for those who believe chaos is just a coincidental side effect of the brain or that we can find other ways to achieve the same end result.

With artificial networks they are trying to capture the brains capabilities and hence they try to model what exactly happens down to the neuron level. An interesting theory is that maybe chaos theory is a necessity for neural networks for advanced intelligence since it may be a requirement for true consciousness in the human brain. Freeman believes that chaos is the difference between the brain and an artificially intelligent machine that can only act in a controlled environment. Thus chaos may be a necessity if Freeman is right if we want to make a artificially intelligent machine through neural networks that has human like intelligence.

It is unknown whether microscopic (neuron) level chaos is critical to the macroscopic chaotic behavior. That’s because chaotic behavior can be modeled on the macroscopic level with more traditional models. According to research macroscopic chaos was found to be an unintentional by product of neural networks with special properties that were based on feed-forward and feedback nodes, such as a Hopfield net. If these types of nets have inhibitory and excitatory links can display chaotic behavior macroscopically. It also cropped up surprisingly when the nodes themselves were made the task of excitation or inhibition instead of being neutral as normal. This was to simulate the "Dale hypothesis" that in the brain each neuron has only an excitatory or an inhibitory nature. This is profound since the brain also has inhibitory and excitatory connections and explains the up and down motion of EEG’s. When chaos was added to a Hopfield type net it was able to engage in selective learning. Thus it could recognize specific classes of stimulus and the rest were ignored. A neural network that was to identify 4 different types of industrial parts did not do as good in identifying defective and non-defective parts as the one with chaos added to it.

Some examples of structures that could be added to a neural network to display important chaotic behavior at the macroscopic layer is; interlayer and intralayer connections, excitatory and inhibitory links, ability of weights on links to switch from negative to positive, neurodes that can display individual chaotic behavior, nodes that are either excitatory or inhibitory instead of neutral. This chaotic behavior could lead to selective memorization, faster recognition of learned patterns, recognition of new patterns and creation of new categories for these newly found patterns, and better pattern recognition.

Widespread use of chaos in artificial neural networks has not happened. Most of its been displayed in limited specific applications. The evidence seems to point to the notion that Freeman may be right. It might be that chaos is a necessity in building advanced neural networks and in brain processing and not just a side effect or one way to get there.

There is also research into many other dynamics to help mimic closer to how the brain may think intelligently. One example is fractal neural networks. They are organized in a hierarchical fashion to process information in a modular hierarchical fashion. http://www.comp.nus.edu.sg/~inns/IvoWidjaja/report.htm .It is hard to say which approach is most effective in making a network be more intelligent. In time though their use in practical applications, maybe we will find out.

References:

Yao, Y., Freeman, W. J., Burke, B., & Yang, Q. (1991) Pattern Recognition by a Distributed Neural Network: An Industrial Application Neural Networks, Vol. 4, Pg. 103-121.

Freeman, Walter J. (1991) The Physiology of Perception, Scientific American, Vol. 264, (2) Pg. 78-85 http://sulcus.berkeley.edu/FreemanWWW/manuscripts/IE1/91.html

Stergiou, Christos and Siganos, Dimitrios Neural Networks,

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#Why%20use%20neural%20networks

Structures of neural networks

http://www.gc.ssr.upm.es/inves/neural/ann1/concepts/structnn.htm

Neural Networks Information Homepage

http://koti.mbnet.fi/~phodju/nenet/NeuralNetworks/NeuralNetworks.html

Gross, David The Importance of Chaos Theory in the Development of Artificial Neural Systems http://www.geocities.com/CapeCanaveral/Lab/3765/chaos/neuro1.html

Bradley, Stewart Chaos Theory http://students.bath.ac.uk/ma2bs/chaostheoryhomepage.html

Chaos in the Solar System, ICIC center for mathematical sciences

http://www.icmsstephens.com/chaos1.htm

Extended Kohonen Maps

http://odur.let.rug.nl/~kleiweg/kohonen/kohonen.html#alg

Bowles, Richard, Richard Bowles Idiot Guide to Neural Networks

http://richardbowles.tripod.com/neural/hopfield/hopfield.htm

Cambel, A.B (1993) Applied Chaos Theory, Chapter 11

Widjaja, Ivo, Fractal Neural Network

http://www.comp.nus.edu.sg/~inns/IvoWidjaja/report.htm

1 Yao, Y., Freeman, W. J., Burke, B., & Yang, Q. (1991) Pattern Recognition by a Distributed Neural Network: An Industrial Application Neural Networks, 4, 103-121

2 Freeman, Walter J. (1991) The Physiology of Perception, Scientific American, Vol. 264, (2) Pg. 78-85.