An Integrated Study on Decision Tree Induction Algorithms in Data Mining


There are many alternatives to represent classifiers. The decision tree is probably the most widely used approach for this purpose. Originally it has been studied in the fields of decision theory and statistics. However, it was found to be effective in other disciplines such as data mining, machine learning, and pattern recognition. Decision trees are also implemented in many real-world applications. Given the long history and the intense interest in this approach, it is not surprising that several surveys on decision trees are available in the literature. Nevertheless, this survey proposes a profound but concise description of issues related specifically to top-down construction of decision trees, which is considered the most popular construction approach. This paper aims to organize all significant methods developed into a coherent and unified reference.


A decision tree (or tree diagram) is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. Another use of decision trees is as a descriptive means for calculating conditional probabilities. In data mining and machine learning, a decision tree is a predictive model; that is, a mapping from observations about an item to conclusions about its target value. More descriptive names for such tree models are classification tree (discrete outcome) or regression tree (continuous outcome). In these tree structures, leaves represent classifications and branches represent conjunctions of features that lead to those classifications. The machine learning technique for inducing a decision tree from data is called decision tree learning, or (colloquially) decision trees.


The decision tree induction algorithm has been used broadly for several years. It is an approximation discrete function method and can yield lots of useful expressions. It is one of the most important methods for classification. This algorithm’s terms follow the “tree” metaphor. It has a root, which is the first split point of the data attribute for building a decision tree. It also has leaves, so that every path from root to leaf will form a rule that is easily understood. Since the decision tree is built by given data, the data value and character will be more important. For example, the amount of data will affect the result of the tree building procedure. The type of attribute value will also affect the tree model. Decision trees need two kinds of data: Training and Testing.

Training data, which are usually the bigger part of data, are used for constructing trees. The more training data collected, the higher the accuracy of the results. The other group of data, testing, is used to get the accuracy rate and misclassification rate of the decision tree. Many decision-tree algorithms have been developed. One of the most famous is ID3 (Quinlan 1986, 1983), whose choice of split attribute is based on information entropy. C4.5 is an extension of ID3 (Prather et al. 1997). It improves computing efficiency, deals with continuous values, handles attributes with missing values, avoids over fitting, and performs other functions.

CART (Classification and Regression tree) is a data-exploration and prediction algorithm similar to C4.5, which is a tree construction algorithm. Breiman et al. (1984) summarized the classification and regression tree. Instead of information entropy, it introduces measures of node impurity. It is used on a variety of different problems, such as the detection of chlorine from the data contained in a mass spectrum). Although decision trees may not be the best method for classification accuracy, even people who are not familiar with them find them easy to use and understand. Figure 1 shows a binary decision tree. It gives us an impression of a decision. It uses a circle as the decision node and a square as the terminal node. Each decision node has a condition that is represented by a function F, and the parameter is the split point of the split attribute. Each terminal node has a class label C, the value of which represents a class. It is apparent that it is easy to use decision trees to interpret the tree to rules, from which we can do analysis, and easy to interpret the representation of a nonlinear input-output mapping (Jang 1994).

Figure 1: A Typical binary Decision tree

Figure 1. A typical binary decision tree Lots of works address the splitting node choosing method and optimization of tree size, but less attention has been given to the weight of the data attributes. In this study, we use a system-reconstruction analysis method to get the weight of each attribute, which we use to reform raw data. After that, we use the decision-tree algorithm mentioned above to build a decision tree, from which we can find the decision-accuracy and misclassification rates.


The ID3 algorithm can be summarized as follows:

Take all unused attributes and count their entropy concerning test samples

Choose attribute for which entropy is maximum Make node containing that attribute

The algorithm is as follows:

According to Gestwicki, Itemized Dichotomozer 3 algorithm, or better known as ID3 algorithm was first introduced by J.R Quinlan in the late 1970’s. The algorithm ‘learned’ from relatively small training set of data to organize and process very large data sets. Ballard stated that ID3 algorithm is a greedy algorithm that selects the next attributes based on the information gain associated with the attributes. The information gain is measured by entropy, where Claude Shannon first introduced the idea in 1948.

ID3 algorithm prefers that the generated tree is shorter and the attributes with lower entropies are put near the top of the tree. These techniques satisfy the idea of Occam’s Razor. Occam’s Razor stated that, “one should not increase, beyond what is necessary, the number of entities required to explain anything”, which means that one should not make more assumptions than minimum needed. Hild described the basic technique on the implementation of ID3 algorithm and it is shown below.

For each uncategorized attribute, its entropy would be calculated with respect to the categorized attribute or conclusion. The attribute with lowest entropy would be selected. The data would be divided into sets according to the attribute’s value. For example, if the   attribute ‘Size’ was chosen, and the values for ‘Size’ were ‘big’, ‘medium’ and ‘small, therefore three sets would be created, divided by these values. A tree with branches that represent the sets would be constructed. For the above example, three branches would be created where first branch would be ‘big’, second branch would be ‘medium’ and third branch would be ‘small’. Step 1 would be repeated for each branch, but the already selected attribute would be removed and the data used was only the data that exists in the sets. The process stopped when there were no more attribute to be considered or the data in the set had the same conclusion, for example, all data had the ‘Result’ = yes.

ID3 algorithm had been used and implemented in many fields. One of the earliest implementation of ID3 algorithm is on a chess game. Ivan Bratko, the artificial intelligence researcher was the one implemented this chess game. According to Gestwicki, Bratko supplied the ID3 program with several pages of textbook recommendations for playing the chess endgame of white king and rook versus black king and knight. He made the rules around the idea of ‘knight’s side lost in at most n moves’. The result shows that ID3 algorithm is efficient in both time and space considerations, where the featur
e vector of the games and the decision tree size is small, compared to the training instances.

In a study by Gestwicki, one experiment had been conducted to predict the greyhound race. The experiment was to compare between the net profit gained by the ID3 algorithm and by three greyhound-racing experts. In this experiment, the system had been trained with 200 training races and 1600 dogs. The result shows that there are 26 races that the ID3 did not make any bet. This showed that the system was restricted from making any illogical choices, which is unlike human that were to gamble without logic in order to gain more winning.


At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. Its criterion is the normalized information gain (difference in entropy) that results from choosing an attribute for splitting the data. The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurses on the smaller sublists. This algorithm has a few base cases.

All the samples in the list belong to the same class. When this happens, it simply creates a leaf node for the decision tree saying to choose that class. None of the features provide any information gain. In this case, C4.5 creates a decision node higher up the tree using the expected value of the class. Instance of previously-unseen class encountered. Again, C4.5 creates a decision node higher up the tree using the expected value.

In pseudo code the algorithm is

Check for base cases For each attribute a Find the normalized information gain Let a_best be the attribute with the highest normalized information gain Create a decision node that splits on a_best Recurse on the sublists obtained by splitting on a_best, and add those nodes as children of node Improvements from ID3 algorithm

C4.5 made a number of improvements to ID3. Some of these are:

Handling both continuous and discrete attributes – In order to handle continuous attributes, C4.5 creates a threshold and then splits the list into those whose attribute value is above the threshold and those that are less than or equal to it. Handling training data with missing attribute values – C4.5 allows attribute values to be marked for missing. Missing attribute values are simply not used in gain and entropy calculations. Handling attributes with differing costs. Pruning trees after creation – C4.5 goes back through the tree once it’s been created and attempts to remove branches that do not help by replacing them with leaf nodes.


Classification and regression trees (CART) is a non-parametric technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively. Trees are formed by a collection of rules based on values of certain variables in the modeling data set.

Rules are selected based on how well splits based on variables’ values can differentiate observations based on the dependent variable Once a rule is selected and splits a node into two, the same logic is applied to each “child” node (i.e. it is a recursive procedure) Splitting stops when CART detects no further gain can be made, or some pre-set stopping rules are met

Each branch of the tree ends in a terminal node

Each observation falls into one and exactly one terminal node Each terminal node is uniquely defined by a set of rules

The basic idea of tree growing is to choose a split among all the possible splits at each node so that the resulting child nodes are the “purest”. In this algorithm, only univariate splits are considered. That is, each split depends on the value of only one predictor variable. All possible splits consist of possible splits of each predictor.


Algorithm designers have had much success with greedy, divide-and-conquer approaches to building class descriptions. It is chosen decision tree learners made popular by ID3, C4.5 (Quinlan1986) and CART (Breiman, Friedman, Olshen, and Stone 1984) for this survey, because they are relatively fast and typically they produce competitive classifiers. In fact, the decision tree generator C4.5, a successor to ID3, has become a standard factor for comparison in machine learning research, because it produces good classifiers quickly. For non numeric datasets, the growth of the run time of ID3 (and C4.5) is linear in all examples.

The practical run time complexity of C4.5 has been determined empirically to be worse than O (e2) on some datasets. One possible explanation is based on the observation of Oates and Jensen (1998) that the size of C4.5 trees increases linearly with the number of examples. One of the factors of a in C4.5’s run-time complexity corresponds to the tree depth, which cannot be larger than the number of attributes. Tree depth is related to tree size, and thereby to the number of examples. When compared with C4.5, the run time complexity of CART is satisfactory.


The decision-tree algorithm is one of the most effective classification methods. The data will judge the efficiency and correction rate of the algorithm. The survey is made on the decision tree algorithms ID3, C4.5 and CART towards their steps of processing data and Complexity of running data. The inductive learning algorithms had successfully recognized and generalized the rules contains in the training data given. The accuracies for the algorithms were also very high, which means the system produced a reliable result. This result also showed that inductive learning can be successfully implemented in a complex problem domain, and therefore it is very useful to be implemented in the real world problems. The second conclusion is that the algorithms had the ability to learn new rules and therefore had the ability to adapt to changes. Finally it can be concluded that between the three algorithms, the CART algorithm performs better in performance of rules generated and accuracy. CART algorithm produced less rules yet was more accurate than the other two algorithms. This showed that the CART algorithm is better in induction and rules generalization compared to ID3 algorithm and C4.5 algorithm.


First, I would like to thank Almighty for His blessings towards the successful completion of this survey paper. I would like to extend my thanks to my Research Guide                            Dr. (Mrs.) M. Punithavalli, Director, Dept. of Computer Science, Sri Rama Krishna College for Women, Coimbatore for her valuable assistance, help and guidance during the research process. I also would like to extend my gratitude to my Husband                      Mr. M. S. Raja Sekaran for his moral support and co-operation.


[1] S. R. Safavin and D. Landgrebe. A survey of decision tree classifier methodology. IEEE Trans. on Systems, Man and Cybernetics, 21(3):660-674, 1991.

[2] S. K. Murthy, Automatic Construction of Decision Trees from Data: A  MultiDisciplinary Survey. Data Mining and Knowledge Discovery, 2(4):345-389, 1998.

[3] R. Kohavi and J. R. Quinlan. Decision-tree discovery. In Will Klosgen and Jan M. Zytkow, editors, Handbook of Data Mining and Knowledge Discovery, chapter 16.1.3, pages 267-276. Oxford University Press, 2002.

[4] S. Grumbach and T. Milo: Towards Tractable Algebras for Bags. Journal of Computer and System Sciences 52(3): 570-588, 1996. IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS: PART C, VOL. 1, NO. 11, NOVEMBER 2002 11

[5] L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth Int. Group, 1984.

[6] J.R. Quinlan, Simplifying decision trees, International Journal of Man-Machine Studies, 27, 221-234, 1987.

[7] T. R. Hancock, T. Jiang,
M. Li, J. Tromp: Lower Bounds on Learning Decision Lists and Trees. Information and Computation 126(2): 114-122, 1996.

[8] L. Hyafil and R.L. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15-17, 1976

[9] H. Zantema and H. L. Bodlaender, Finding Small Equivalent Decision Trees is Hard, International Journal of Foundations of Computer Science, 11(2):343-354, 2000.

[10] G.E. Naumov. NP-completeness of problems of construction of optimal decision trees. Soviet Physics: Doklady, 36(4):270-271, 1991.

[11] J.R. Quinlan, Induction of decision trees, Machine Learning 1, 81-106, 1986.


How You Can Use a Small Business List

Small businesses are usually sole proprietorships, partnerships or privately owned corporations. They employ a small number of employees and often do not have financial turnovers as large as those of large multinational companies. Small businesses are present in every industry. Ranging from interior designing to jewelry sales and even pet services, it is hard to avoid any of these businesses in our daily lives.

However, with so many small businesses selling products and services in society, we often forget that they can potentially be a very lucrative pool of customers as well. Any company that provides products or services to small businesses can benefit greatly from having a small business list.

Fortunately, the advancement of technology has enabled almost anyone to get access to a small business list with ease. There are online portals available for organizations and individuals to communicate with these small businesses, allowing them to get up-to-date information about company details and any other updates. One can also do a quick search online to look for such publicly available small company details. However, this is a potentially long and tedious process.

A good alternative is to look for small business mailing lists offered by list brokers and compliers on the web. The small business lists provided by brokers enable immediate access to a large pool of up-to-date information about other small businesses. The best part is that these lists are continuously being updated, so you won’t have to worry too much about getting outdated information. A quick rental of such a list would save one the time of having to personally search and update records.

Next, after having a small business list, individuals and other businesses can use it to their advantage. Entrepreneurs, start-up firms and home-based workers can use the list to conduct market analysis and research for example. This way, they easily identify market needs and are able to offer their products and services to these potential clients.

Another way a small business list can be used is to send out promotional material to highly targeted business prospects. For example, if you own an air-conditioning repair service, you could send out promotional brochures advertising your services to small businesses that could very well be your clients in the future. After all, which business office doesn’t own an air-conditioner or two?

The benefits of having access to a small business list is worth it’s weight in gold. But should one compile the list themselves, or rent one instead? If one wishes to manually compile the small business list himself, he will have to ensure that they are constantly being kept up-to-date.

This is because companies often make changes in their management, causing contact persons to change, or even a change in company address. The time and monetary costs required to maintain an accurate list of small business contact information is not small.

To avoid this, it would be best to rent a small business list and spend their time on making actual sales.

Having immediate access to a complied updated business list helps individuals and organizations to save time and money. When they require small businesses for their marketing needs or require help in their businesses, they can tap on this large resource base easily and efficiently.

Chris Burns is an authority on mailing list services providing valuable advice at where you can learn more about Business Mailing Lists.

What Is An Investment Club?

The definition of an investment club is simple: a group of people who share an interest in the stock market pooling their resources into one large investment. Defining how an investment club works is more complicated.

In most cases the investment club will be registered as a partnership and the members of the club will make decisions together on what stocks they consider to be a good investment risk.

The majority of the time the investment decisions will be made after some research has been done regarding the stock that is under consideration. This will be discussed at length further in this book.

An important feature of an investment club is that the members are there to have fun as they invest their money and learn about the stock market. Making a profit isn’t the only goal of the club and members are encouraged to have fun as they invest their money.

An investment club isn’t for those people who are looking for a fast way to make some easy money. People who want a quick turn around are discouraged from joining an investment group and investing on their own.

A main feature of the investment group is to start to learn how to invest your money and to invest for a long term rather than a short one.

There are several things that you should keep in mind if you are thinking about starting an investment club or have in interest in joining one that already exists.

Make sure that you understand all the reasons why you should start an investment group and the requirements needed to be successful as a group. The following is a list of important ideas and information that you should consider before starting your club:

Be realistic. If you’re starting an investment club to make a killing in the stock market, you’ll most likely be very disappointed. The goal of an investment club is to learn more about the stock market and if you have dreams of becoming rich you’ll be starting the club for the wrong reasons. Joining an investment club means joining for a long period of time.
Expect to be an amateur. Starting an investment club doesn’t mean that you have to be an expert on the stock market. In fact, an investment club is ideal for a group of amateurs who want to learn about how the stock market works and what it can do for them. An investment club is a safe environment in which you can invest a little bit of money and not worry about losing a large amount of your hard earned dollars when something unexpected happens.
Amount of money to invest. Don’t think that you need a lot of money for investment purposes to start an investment club. The opposite is in fact true: you don’t need to have a lot of money to invest to start an investment club. You can set a minimal fee for each month’s contribution that is fits into your budget. You’ll have the chance to determine what the minimum monthly contribution should be each month when you have your first meeting of the investment club.
Combined investment money. On your own you may not have enough money to invest in the stock market in a way in which you may be able to realize a profit. However, when you combine your investment dollars with the dollars of others in the club you’ll have a significant amount of money to invest in the stocks that you’ve been watching and think may be successful. Keep in mind that just as there is strength in numbers there is also a shared sense of security when you’re not investing alone.
Diplomacy. One thing that you should keep in mind is that your voice will be part of the larger group and you may not always have a say in which stocks you want to invest in. If you’re unable to sit back and let another decision take the place of something that you would rather see, then an investment club might not be for you. You’ll need to have the ability to let the majority rule whenever a decision is made.
Learning experience. You should be prepared to be satisfied to never realize a profit from the stock market. One of the important goals and features of an investment club is that you benefit from the learning experience of being with other people with the same interests in the stock market. If you never make a penny you should still be pleased with your participation as part of an investment group.

Starting your own investment club will be a pleasurable, and perhaps profitable, way to spend time with other people that share the same investment passion that you do.

You’ll be able to learn about the stock market in a safe and secure environment with other people that understand your fascination with the stock market.

To read more on investment clubs, purchase the ebook How to Start an Investment Club for Fun and Profit

Debit Card Loans No Faxing – A Way To Hassle-Free Instant Money

The loam market place is devising new ways of creating loan opportunities for the people in order to make the availing more accessible. Debit cards loans no faxing is one such products that gives easier access to the people having a debit card. The loan amount is usually instantly approved, as the lenders do not incur many risks. As a general trend, the approved loan amount is directly deposited into the borrower’s bank account within 24 hours.

The only eligibility that the lenders want to see in the borrowers is that they should be having a debit card. The card is the assurance that the applicant has a bank account and gets monthly amount from varied or one source into it. So, the lenders will approve the loan amount just showing them your debit card that you can use to draw the loan amount from your account after the lender has deposited it.

Debit card loans no faxing also do not require you to fax any papers of your income, residence proof and employment etc to the lender at the time of making the application. This is because of the debit card you are having as a proof of your repayment capability. These loans can fetch you £100 to £1000 or even higher amount, depending on your repayment capability. So, the loan works well for both the salaried and non-salaried people.

Generally, these are short-term loans, given for a few weeks. Because of the short period and also due to lack of concrete security, the lenders charge interest and fees at higher rate on every £100 borrowed. This makes the borrowed amount sometimes burdensome.

However, the borrowers are not subjected to credit checks even if they carry a bad credit history of late payments, defaults and arrears or CCJs.

But irrespective of credit history, the interest rate charges are higher. You can compare lenders of debit card loans no faxing on Internet for suitable interest charges. Ensure to repay the loan on time for avoiding the debt build-up.

Gordon Mitchell is offering loan advice for quite some time. Apply For Debit Card Loans has a vast network of lenders who provide loans to the borrowers at lower APR. To find Debit Card Loans No Faxing, same day loans, payday loans no debit card, same day debit card loans, debit card loans online visit Debit Card Loans No Faxing.

Key Points You Should Understand About Home Businesses

So what are your prospects should you decide to start a home business? What type of business should you get into, who will be your market and how far can you take it? For many of us wanting to start a business we can run from home, it almost seems like a pipe dream. What we don’t realize is that given the right materials, market, skills and timing, we could take an idea, skill or hobby and turn it into a money-making machine.

What is a home business?

By definition, a home business is any money-earning venture that involves the sale of products and services strictly from a home setting. Virtually anyone can go into a home business, provided he or she has the necessary equipment, materials, skills and know-how required.

From chicken pot pies to massages and wine to virtual assistance to specialized services to MLM, home businesses have thrived and created an industry of their own.

Starting a home business

People will probably tell you that you should never go into business without passion. However, on the practical side, it’s wiser if you go into a business you understand and are ready for. It may be a small home business you’ll be running but it will still cost you in terms of time, money and effort. To ensure that you start a home business on the right foot, here are important considerations you should keep in mind:

Money, money, money

Assuming that you have the skills and training to start a business, consider your finances. Can you handle the cost of starting a business? Although there are home businesses that have been launched with next to nothing, you’ll find that there are certain expenses that you will have to shell out money for. These include equipment, registrations, advertising, even construction.

Granted that your bank account can handle the cost, consider if you have enough to sustain the business. This is to ensure that you can afford to buy additional equipment and materials or hire additional manpower in case you need to expand.

Your home business and the law

Most home businesses start as sole proprietorship or partnership. If you’ll be running your business with a group, you might have to apply as a corporation. Depending on the type of business you’ll be building, there may be permits and licenses required. Some businesses, for example, might even require you to have certifications or specialized training. Check with your local government for laws and regulations you might have to comply with.

Do you need insurance?

That depends on the type of business you go into. Generally, you’ll need insurance if you foresee you’ll need some sort of protection, either to cover for your products, services or for employees you will be hiring. Insurance can add to your cost but it can save you from huge expenses in case something goes wrong.

Advertising and your home business

Whoever said you don’t need to promote your home business doesn’t have an inkling how it works. First of all, the biggest challenge you will have as a home business owner is recognition.

There is still some sort of prejudice that many uninformed consumers associate with businesses run from homes. Your goal is to ensure that people see you as a professional with a serious approach and a business that has something reliable and valuable to offer. As such, you’ll have to put as much thought in your home business advertising as you would in a traditional business venture.

Daegan Smith Is And Expert Online Marketer
“Wanna Lean The Secret To Making $85,147,717
Per Month While Quickly And Easily EXPLODING
Your Network Marketing Organization by 7,141
People Without EVER Buying Or Calling a
Single Stinking Lead?” Free CD Explains All: