- Classification :
" Assigning Data cases to one of the fixed number of classes (Represented by Nominal Numbers) "
" Classification is the separation of ordering of classes "
- Prediction :
" It is one type of process with the use of predictors in the prediction of one or more than one variable "
- Application of Classification and Prediction :
- Weather Analysis
- Prediction of Null space
- Cryptography
- Target Marketing
- Medical Diagnosis
- Fraud Detection
- Performance Prediction
- Manufacturing
- Five criteria for evaluation of Classification and Prediction :
- Predictive Accuracy
- Computational Speed
- Robustness
- Scalability
- Interoperability
- Classification by Decision Tree Induction
- It is the learning of Class labled training tuples.
- A decision tree is a structure like flow chart. Here each non-leaf nodes denotes a test on online attributes.
- Each branch represents an outcome of the test, and each leaf node holds a class label.
- Top most node in a tree is Root node.
- Now the question can be asked that how are decision tree used for classification ?
- Why decision tree classifiers so popular ?
Decision trees can handle high dimensional data.
- Generally easy to estimate by humans also.
- This type of Induction is easy and fast.
- Decision tree classifiers have good accuracy.
- Different algorithms can be generated for Decision Tree Induction Most popular of them are ID3, C4.5.
- C4.5 is successor of ID3. The book named CART(Classification and Regression Tree) describes the generation of binary trees.
- most algorithms for decision tree induction also follows such a top-down approach.
- Basic algorithm for Decision Tree Induction :
- create a node N.
- If tuples in D are all of the same class, C then
- Return N as a leaf node labeled with class C
- If attribute_list is empty then
- Return N as a leaf node labeled with the majority class in D
- Apply attribute_selection_method(D,attribute list) to find best splitting ctiterian
- Label node N with splitting_criterian
- if splitting attribute is discrete-valued and
- multiway splits allowed then //not restricted to binary trees
- attribute_list << attribute_list - splitting_attribute; // remove splitting attribute
- for each outcome of j of splitting_criterian // partition the tuples and grow subtrees for each partition.
- let Dj be the set of data tuples in D satisfying outcome of j; // a partition
- if Dj is empty then
- attach a leaf labeled with the majority class in D to node N;
- else attach the node returned by generate decision tree(Dj,attribute_list) to node N;
- end for
- Return N;
- Attribute Selection Measures
- It is a heuristic for selecting the splitting criterion that "best" separates a given data D.
- and also class labels and sub classes
- The best splitting criterion is that which gives the most pure result.
- " Attribute selection measure are also known as splitting rules because they determine how the tuples at a given training tuples "
- The attribute having best score of the measure can be chosen as splitting criterion.
- This selection describes mainly three :
- Information Gain
- Gain Ratio
- Gini Index
- Information Gain :
- ID3 use it as its attribute selection measure.
- The attribute with the highest information gain is chosen as the splitting attribute for node N.
- This attributes minimizes the impurity of the classification.
- It also Minimizes the expected number of tests and guarantees that a simple tree is found.
- The expected information need to be classified a tuple in D is given by
- Info(D) = - Pi Log2(Pi), where i is 1 to m.
- here Pi is the probability
- It denotes that arbitrary tuple in D belongs to class Ci and is estimated by |Ci,d| / |D|
- here base 2 function is used because the information is encoded in bits.
- " Info(D) is just the average amount of information needed to identify the class label of tuple in D "
- How much more information would still need in order to arrive ate an exact classification ?
- It can be measured by InfoA(D) = ( |Dj| / |D| ) x InfoA(D)
- The smaller the information required the greater the purity of the partition.
- Information gain is defined as the difference between the original information requirement and the new requirement.
- Gain(A) = Info(D) - InfoA(D)
- here Gain(A) tells that how much would be gained by branching on A.
- The attribute A with the highest information gain can be chosen as the splitting criterion on node N.
- Gain Ratio :
- Information Gain prefers to select attributes having a large number of values.
- Partitioning on multi-unique valued attribute is not an good Idea.
- Ex : Suppose are going to partition by considering the attribute Product_Id so it will have many branches which are pure having one data content.
- So this type of attributes will not considered as the splitting criterion or for further classification.
- C4.5 algorithm uses extension of Information Gain as Gain Ratio.
- It overcomes above overhead by applying Normalization to information gain using split_information.
- SplitInfoA(D) = - |Dj| / |D| * Log2(|Dj| / |D|), where j ranges from 1 to v.
- Above equation represents the information generated by splitting the training data set D into v partition.
- Then GainRatio(A) = Gain(A) / SplitInfo(A)
- Attribute with the minimum gain ration is selected as the splitting attribute.
- Gini Index :
- It is used in CART(Classification and Regression Tree) method.
- It is used to measures the impurity of D, a data partition or set of training tuples.
- Gini(D) = 1- pi2., pi is upto 1 to m.
- The Gini index considers a binary split for each attribute.
- If A has v possible values, then there are 2v possible subset.
- When considering binary split, we compute a weighted sum of the impurity of each resulting partition.
Tree Pruning :
- While constructing decision tree, many of the branches will reflect as noise or outliers.
- Tree pruning addresses the problem of over fitting the data.
- There are mostly two common approaches for establishing pruning method,
- 1. Pre pruning 2. Post Pruning
Back Propagation :
- Back propagation basically works on backward manner.
- It contains the concept of neural network.
- Also widely used for prediction
- Also include normalization with scale of 0.0 to 1.0
- Neural network can be used for both Classification and Prediction
- Classification (Predict Class Label of Given Tuple)
- Prediction (To Predict Continuous valued output)
No comments:
Post a Comment