By,
M. V. Vinod and M Nikhil Chakravarthy,
Final Year BE ( Comp Sci ), NIE, Mysore.
1.0 Introduction
The majority of information processing today is carried out by digital computers. This has led to a widely held misperception that information processing is dependent on digital computers. However, if we look at cybernetics and the other disciplines that form the basis of information science, we see that information processing originates with living creatures in their struggle to survive in their environments, and that the information being processed by computers today accounts for only a small part - the automated portion - of this. Viewed in this light, we can begin to consider the possibility of information processing devices that differ from conventional computers.
The fundamental structure of digital computers is based on the principle of sequential processing, which has little if anything in common with the human nervous system. The human nervous system consists of an extremely large number of nerve cells, or neurons, which operate in parallel to process various types of information. By taking a hint from the structure of the human nervous system, we should be able to build a new type of advanced parallel information processing device.
Artificial Neural Networks (ANN) are computational models broadly inspired by the organization of the human brain. The most important features of a Neural Network are its abilities to learn, to associate, and to be error-tolerant. Unlike conventional problem-solving algorithms, Neural Networks can be trained to perform a particular task. The Neural Network can even recognize incomplete or noisy data - an important feature that is often used for prediction, diagnosis or control purposes.
An Example Neural Network:
Imagine a highly experienced bank manager that must decide which customers will qualify for a loan. His decision is based on a completed application form that contains ten questions. Each question is answered by a number from 1 to 5 (some responses may be subjective in nature).
Early attempts at "Artificial Intelligence" took a simplistic view of this problem. The Knowledge Engineer would interview the bank manager(s) and decide that question one is worth 30 points, question two is worth 10 points, question three is worth 15 points,...etc. Simple arithmetic was used to determine the applicant's total rating. A hurdle value was set for successful applicants. This approach helped to give artificial intelligence a bad name.
The problem is that most real-life problems are non-linear in nature. Response #2 may be meaningless if both response #8 and #9 are high. Response #5 should be the sole criterion if both #7 and #8 are low.
Our ten-question application has almost 10 million possible responses. The bank manager's brain contains a Neural Network that allows him to use INTUITION. Intuition will allow the bank manager to recognize certain similarities and patterns that his brain has become attuned to. He may never have seen this exact pattern before, but his intuition can detect similarities, as well as dealing with the non-linearity's.
If we had a large number of loan applications as input, along with the manager's decision as output, a Neural Network could be TRAINED on these patterns. The inner workings of the Neural Network have enough mathematical sophistication to reasonably simulate the expert's intuition.
2.0 Analogy to the Brain
The Neuron is the fundamental cellular unit of the nervous system. The Human Brain consists of millions and billions of such processing units. The Neuron receives and combines signals transmitted by many other Neurons through an input structure called Dendrites. If the combined input signal is strong enough then an output is triggered along a component called an Axon. This transfer of information is chemical in nature but has electrical side effects, which can be measured.
The Axon of a Neuron splits up and connects to Dendrites of other Neurons through a junction referred to as Synapse. The transmission across this junction is chemical in nature and the strength of the signal is dependent on the amount of chemicals released by the Axon and received by the Dendrite. This synaptic efficiency is what is modified when the Brain learns.
The above figure pictorially represents the actual structure of the Neuron. We do not pay much attention to how we hear, see, speak etc. When we process speech we are able to detect and often correct errors in Pronunciation, Grammar, and Meaning. This method of reasoning by the brains termed as Pre-Attentive Processing. This processing is in real time and is prevalent in all forms of perception like smell, taste, touch etc.
The above picture shows the Kanizsa Square, which goes a long way in explaining how the human brain percepts vision. Basically the illusion of a square is produced as light photons bounce off the ink spots and stimulate our surface receptors, in this case, the Retinal Neurons. These calculations on perception performed by the Brain are taken for granted by us. Internally this is a High Speed, Distributed, Nonlinear, and Parallel Network of enormous magnitude and proportion. One does not realize this until one tries to simulate this environment, as is the case in Neural Networks. The transmission of signals from one neuron to another at a synapse is a complex chemical process in which specific transmitter substances are released from the sending end of the junction. The effect is to raise or to lower the electrical potential inside the body of the receiving cell. If the potential reaches a threshold, a pulse is sent down the axon - we then say the cell has "fired".
In a simplified mathematical model of the neuron, the effects of the synapses are represented by "weights" which modulates the effect of the associated input signals, and the nonlinear characteristics exhibited by neurons is represented by a transfer function which is usually the sigmoid function. The neuron impulse is then computed as the weighted sum of the input signals, transformed by the transfer function. The learning capability of an artificial neuron is achieved by adjusting the weights in accordance to the chosen learning algorithm, usually by a small amount *Wj = **Xj where * is called the learning rate and * the momentum rate.
3.0 Artificial Neural Network
An Artificial Neural Network is an information processing system that is non-algorithmic, non-digital and intensely parallel. It is not a computer in the sense we think of them today, nor is it programmed like a computer. Instead it consists of a number of very simple and highly interconnected processors called neurons, which are the analogs of the biological neural cells, or neurons, in the Human Brain.
Basically speaking an Artificial Neural Networks is the man made equivalent of a Biological Neural network and all early and later ANN's where based on the amazing and intensely complex BNN the Human Brain. Now lets analyze the Various Terms used while defining an Artificial Neural Network.
Non-algorithmic:
In the Traditional thinking of computer programming the steps to be executed by the computer was written as an algorithm and coded in one of the many available programming languages which was then feed to the computer as executable code. If the computer while executing these instructions encountered a situation which the programmer had not foreseen then the program would crash. In the case of ANNs, they do not use algorithms or procedures in the literal sense.
Here knowledge is not made explicit in the form rules but ANNs generate their own rules by learning from being shown examples. The Learning process is achieved through a learning rule, which adapts in response to the inputs. During the training the weights of the Neural Network are adjusted. Depending on the type of the Neural Network and on the problem it is going to solve, either a supervised or an unsupervised method can be used for adapting the weights. In both cases however, every training starts with a recall where the input is propagated through the Neural Network and all its neurons change their activity accordingly.
The phase when a Neural Network applies the information acquired during the learning phase is called the recall phase. The recall always starts by applying an input pattern to the input layer of the Neural Network. Each of the input neurons holds a specific component of the input pattern and normally does not process it, but simply sends it directly to all the connected neurons. However, before their output can reach the succeeding neurons, it is modified by the weight on the connection. All the neurons of the second layer then receive modified (i.e. weighted) input values and process them. Afterwards these neurons send their output to succeeding neurons of the next layer. This procedure is repeated till the neurons of the output layer finally produce an output, which is the Neural Network's answer to the presented input pattern.
A supervised training is typically chosen when you want the Neural Network to map input to output patterns. This requires that you know the output to a given input. After the recall phase, the output of the Neural Network is compared to what should be the resulting output pattern. The observed difference is used to adapt the weights. The adaptation of the weights starts at the output neurons and continues downward toward the input layer. The weight adaptation for one pattern often does not correct the Neural Network's faulty response completely, but improves it.
Methods of Learning:
The learning rule alters the internal architecture of the network by adjusting the values of the connecting weights between the fundamental processing units the Neurons. This alteration is based on the input at the Input Buffer and the desired output for the corresponding inputs. There are two methods of learning are:
Supervised Learning:
In Supervised learning the input is presented to the input buffer and the desired output is presented to the ANN at the output buffer. The alteration of the internal architecture by the learning rule is now done in order to achieve the desired output. The three types of learning are:
Here the connecting weight on an input path to a processing element is incremented if both the input and desired output is high. Biologically speaking, a neural pathway is strengthened each time when activation on each side of the synapse is correlated.
The error between the actual output of a processing element and its desired output is reduced by modifying incoming connection weights.
Unsupervised Learning:
In unsupervised learning only the input stimuli is given to the ANN and the alteration of internal architecture is performed by the Network itself in such a way that each hidden processing element responds strongly to a different set of input stimuli or closely related group of stimuli.
Non-digital:
This means that and ANN is not restricted to an input /output of zero or one. Based on its internal architecture an ANN can give Binary (1 and 0), Bipolar (-1 and 1), or output corresponding to values of any Differentiable Curve.
Intensely Parallel:
The ANN consists of many processing elements joined together to form a network. Processing elements are usually organized into groups called layers or slabs. A typical ANN consists of a sequence of layers or slabs with full or random connections between successive layers. The idea behind this is to have processing to occur simultaneously or in parallel at each individual processing element. The two basic layers are an input buffer where data is presented to the ANN and an output buffer where the response to the given input is presented to the outside world.
4.0 Building A Neural Network
Since 1958, when psychologist Frank Rosenblatt proposed the "Perceptron," a pattern recognition device with learning capabilities, the hierarchical neural network has been the most widely studied form of network structure. A hierarchical neural network is one that links multiple neurons together hierarchically. The special characteristic of this type of network is its simple dynamics. That is, when a signal is input into the input layer, it is propagated to the next layer by the interconnections between the neurons. Simple processing is performed on this signal by the neurons of the receiving layer prior to its being propagated on to the next layer. This process is repeated until the signal reaches the output layer completing the processing process for that signal.
The manner in which the various neurons in the intermediary (hidden) layers process the input signal will determine the kind of output signal it becomes (how it is transformed). As you can see, then, hierarchical network dynamics are determined by the weight and threshold parameters of each of their units. If input signals can be transformed to the proper output signals by adjusting these values (parameters), then hierarchical networks can be used effectively to perform information processing. Since it is difficult to accurately determine multiple parameter values, a learning method is employed. This involves creating a network that randomly determines parameter values. This network is then used to carry out input-to-output transformations for actual problems. The correct final parameters are obtained by properly modifying the parameters in accordance with the errors that the network makes in the process. Quite a few such learning methods have been proposed. One of these is the error back-propagation learning method. This learning method has played a major role in the recent neurocomputing boom. The back-propagation paradigm has been tested in numerous applications including bond rating, mortgage application evaluation, protein structure determination, backgammon playing, and handwritten digit recognition. Choosing the right methodology, or backpropagation algorithm, is another important consideration. In working with the financial applications, many have found that the back-propagation algorithm can be very slow. Without using advanced learning techniques to speed the process up, it is hard to effectively apply backpropagation to real-world problems. Overfitting of a neural network model is another area, which can cause beginners difficulty. Overfitting happens when an ANN model is trained on one set of data, and it learns that data too well. Namely, at some point the Neural Network starts to memorize exactly the training examples with their inherent noise and later on it will not be able to generalize from the trained examples to new patterns presented during recall.
This may cause the model to have poor generalization abilities - the model may instead give quite poor results for other sets of data. It is important to define the point where the training is terminated, to prevent this over-training of the Neural Network.
5.0 Real-World Applications
ANNs can be regarded, in one respect, as multivariate nonlinear analytical tools, and are known to be very good at recognizing patterns from noisy, complex data, and estimating their nonlinear relationships. Many studies have shown that ANNs have the capability to learn the underlying mechanics of the time series, or, in the case of trading applications, the market dynamics. An example is the experimental work on applying neural network technology to the learning and recognition of stock price chart patterns for use in stock price forecasting.
Mitsubishi Electric has combined neural network technology with optical technology to achieve the world's first basic optical neurocomputer system capable of recognizing the 26 letters of the alphabet. The system comprises a set of light-emitting diodes (LED) that output letter patterns as optical signals, optical fibers, liquid crystal displays (LCD) that display letter patterns and light receiving devices that read these letters. When letter data is input into this system, light emitted from the LED's is input to the light receiving devices through the LCD's. At that time, the light receiving devices that receive the light, as well as the strength of the light they receive, is determined by the manner in which that light passes through the LCD's. The letter in question is delineated by the light receiving devices that receive the strongest light. This system is capable of 100% letter recognition even when slightly misshapen handwritten letters are input.
Another example is a development project for a facilities diagnosis system that employs a neural network system. This project is attracting considerable attention, as it is the first time, research has been carried out on applying neural network systems to facilities diagnosis. Initially, the project will be aimed at developing a diagnosis system for pump facilities that employs vibration analysis.
Bond rating is another successful ANN application. Bond rating refers to the process by which a particular bond is assigned a label that categorizes the ability of the bond's issuer to repay the coupon and par value of that bond. The problem here is that there is no hard and fast rule for determining these ratings. Rating agencies must consider a vast spectrum of factors before assigning a rating to an issuer. Some of these factors, such as sales, assets, liabilities, and the like, might be well defined. Others such as willingness to repay are quite nebulous.
6.0 Conclusion
At the most abstract level, a ANN can be thought of as a black box, where data is fed in on one side, processed by the ANN which then produces an output according to the supplied input. Even though the digital revolution has been dominating the information processing field for so long, slowly but steadily ANN's have become a force to reckon with when its come to solutions to problems having non-linear solution space. Basically speaking these networks have been applied in the field of natural language processing, voice recognition, hand writing recognition, video image recognition, stock predictions etc.
Our brain performs functions involved in reasoning, decision-making, and planning under uncertainty. We reason with scant evidence, vague concepts and recollection's, rules of thumb, hunches, suspicions, beliefs, estimates, and guesses -basically intuitions. A fashionable trend has been to denigrate this uncertainty calculus as illogical and unscientific. It has also called fuzzy reasoning or fuzzy thinking. This concept is being used to overcome some of the drawbacks of ANN like one cannot predict the response of the network for a given input pattern. So Expert systems or Neuro-Fuzzy systems i.e. rule based systems offer a better choice. Also the problem of finding the optimal amount of neurons can be a time consuming test and error approach. To optimize an ANN there exist methods like Genetic Algorithms.
7.0 Acknowledgement
The authors of this paper thank Dr. M. S. Shiva Kumar, Head, Department of Computer Science & Engineering, The National Institute of Engineering, Mysore for his valuable guidance in writing this paper.
8.0 References