Machine learning is a branch

Table of Contents


Machine learning is a branch of artificial intelligence that employs a variety of statistical,probabilistic and optimization techniques that allows computers to “learn” from past examplesand to detect hard-to-discern patterns from large, noisy or complex data sets. This capability isparticularly well-suited for medical applications, especially those that depend on complexproteomic and genomic measurements. As a result, machine learning is frequently used indiagnosis and detection of many diseases.

The health care industry collects a huge amount of data which is neither properly analyzed; norput to an optimum use. Discovery of these hidden patterns and relationships often goesunexplored. This research focuses on this aspect of Medical diagnosis by learning patternsthrough the collected data of various diseases and to develop intelligent medical decisionsupport systems to help the physicians. Here I will explore various machine learning techniqueswhich will help in better diagnosis in health care.Keywords: Machine learning, risk, prediction, diagnosis, detection, Medical diagnosis, C4.5algorithm and, ID3 algorithm.


Machine learning is a branch of artificial intelligence research that employs a variety ofstatistical, probabilistic and optimization tools to “learn” from past examples and to utilize thatknowledge to classify new data, identify new patterns or predict novel trends. Machine learning,like statistics, is used to analyze and interpret absolutely. However, unlike statistics, machinelearning methods employs Boolean logic (AND, OR, NOT), absolute conditionality (IF, THEN,ELSE), conditional probabilities (the probability of X given Y) and unconventional optimizationstrategies to model data or classify patterns. These latter methods actually resemble theapproaches humans typically use to learn and classify. Machine learning still draws heavily fromstatistics and probability, but it is fundamentally more powerful because it allows inferences ordecisions to be made that could not otherwise be made using conventional methods. In the fieldof disease research, machine learning is not a novice. Artificial neural networks (ANNs) anddecision trees (DTs) have been used in cancer detection and diagnosis for nearly 20 years [3,4].Today machine learning methods are being used in a wide range of applications ranging fromdetecting and classifying tumors via X-ray and CRT images [5,6]to the classification ofmalignancies from proteomic and gnomic (micro array) assays. The major challenge facing thehealthcare industry is the provision for quality services at affordable costs.

A quality serviceimplies diagnosing patients correctly and treating them effectively. Poor clinical decisions canlead to disastrous results which is unacceptable. Machine learning can be used to automaticallyinfer diagnostic information from past experiences, treat patients with negligible or no error, andcan be helpful as well as reliable for the personnel involved. Decision making tasks are vital in ahealth care system, so ,’Clinical Decision Support System’ is intended to assist physicians andother health professionals on decision making tasks. It can be also defined as “A computersystem that uses two or more patient data to generate case specific or encounter specificadvice.”[7]. Most CDSS comprises of three parts, the knowledge base, inference engine, andmechanism to communicate. [8]

The knowledge base comprises of compiled information that isoften, but not always, in the form of if–then rules. The inference engine comprises theexpressions for combining the rules or associations in the knowledge base with actual patientdata. Then a communication mechanism is used for bringing the patient data into the system andsupplying the output of the system to the user who will make the actual decision.A typical user of CDSS is a physician, nurse or any other paramedical service provider. It gathersthe patient health information (PHI) entered by the user in the system. Using per-determinedalgorithms or rules, CDSS provides clinically relevant information and conclusions to the user.The rules used in the system can be configured by the administrator. Security of each patient’spersonal record must be provided [9].Literature Review:Up to now, several studies have been reported that have focused on medical diagnosis.

Thesestudies have applied different approaches to the given problem and achieved high classificationaccuracies, of 77% or higher, using the dataset taken from the UCI machine learningrepository.[10]:1. L. Ariel [11] used Fuzzy Support Vector Clustering to identify heart disease. This algorithmapplied a kernel induced metric to assign each piece of data and experimental results wereobtained using a well known benchmark of heart disease.Ischemic -heart:-disease (IHD) -Support .Vector Machines serve as excellent classifiers andpredictors and can do so with high accuracy. In this, tree based: classifier uses non-linearproximal support vector machines (PSVM).2. Campos-Delgado et al. developed a fuzzy-based controller that incorporates expert knowledgeto regulate the blood glucose level. Magni and Bellazzi devised a stochastic model to extractvariability from a self-monitoring blood sugar level time series [12]Also, Markos G. Tsipouras et al 2007 presented a methodology for the automated developmentof fuzzy expert systems (FES). They proposed methodology is tested by applying it to problemsrelated to cardiovascular diseases.

The FES indicates significant improvement of the initialclassification system, which is based on expert’s knowledge and has the form of a set of rules.The obtained results are also fully interpretable. [13]Mrudula Gudadhe et al 2010 presented a DSS for heart disease classification based on SVM.They classified the heart disease data into two classes that indicates presence of heart disease orabsence of heart disease with 80.41% accuracy.[14]

Machine Learning Methods:1. Bayesian Belief Network:The Bayesian network is a knowledge-based graphical representation that shows a set ofvariables and their probabilistic relationships between diseases and symptoms. Bayesian networkis utilized to find the probability of the presence of possible diseases given their symptoms. Itsadvantage is that it requires the knowledge and conclusions of experts in the form ofprobabilities. But it is not practical for large complex systems given multiple symptoms. Iliadsystem developed by University of Utah School of Medicine’s Dept. of Medical Informatics is aCDSS based on Bayesian network which applies Bayesian reasoning to calculate posteriorprobabilities of possible diagnoses depending on the symptoms provided. Iliad was developedprimarily for diagnosis in Internal Medicine now covers about 1500 diagnoses in this domain,based on several thousand findings.[15]. DXplain is also a CDSS that uses a modified form ofthe Bayesian logic. It produces a list of ranked diagnoses associated with the symptoms. It isvery useful for the physician who has no computer expertise. It also serves as a clinicianreference with a searchable database of diseases and clinical manifestations.[16]2. Neural Network:Neural Networks is a non-knowledge-based CDSS that is adaptive. It allows the systems to learnfrom existing knowledge and experiences. Neural Network has three main layers: Input, Outputand Hidden layer.

Neural Network is made of nodes called neurons. And there is weightedconnection between nodes of different layers, which is used to transfer signals between thenodes. Neural Network is able to continue with incomplete data that gives educated guessesabout missing data and get improved with every use due to its adaptive system learning.Mr. P. A. Kharat et al 2011 proposed clinical decision support system based on Jordan/Elmanneural network for the diagnosis of epilepsy and they obtained relatively high overall accuracyfor training data 99.83% and for cross-validation data and testing data 99.92%.[17].R.R.Janghel et al 2009 developed a CDSS using artificial neural network to predict the fetaldelivery to be done normal or by surgical procedure.

In that system, they used three differenttraining algorithms to train the neural network, which are Back Propagation algorithm, RadialBasis function and Learning vector quantization Network and they were able to gain accuracy of93.75%, 99% and 87.5% respectively.[18]3.Decision Tree:The most often used techniques of data analysis is a decision tree. It is applied to classify recordsto a proper class. In medical field decision trees determine the sequence of attributes. First itmakes a set of solved cases. Then the whole set is divided into training set and testing set. Atraining set is used for the induction of a decision tree. A testing set is used to find the accuracyof an obtained solution. AY AI-Hyari et al 2013 developed a CDSS for diagnosing patients withChronic Renal Failure using various classification methods like neural network, naïve bays anddecision tree. They proved that Decision tree algorithm is the most accurate CRF classifier(92.2%) when compared to all other algorithms/implementations involved in their study.[19].3.a ID3 algorithm: In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithminvented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor tothe C4.5 algorithm, and is typically used in the machine learning and natural language processingdomains.[20]Algorithm:The ID3 algorithm begins with the original set S as the root node. On each iteration of thealgorithm, it iterates through every unused attribute of the set S and calculates the entropy H(S)or information gain IG(S) of that attribute.

It then selects the attribute which has the smallestentropy (or largest information gain) value. The set S is then split or partitioned by the selectedattribute to produce subsets of the data. (For example, a node can be split into child nodes basedupon the subsets of the population whose ages are less than 50, between 50 and 100, and greaterthan 100.) The algorithm continues to recur on each subset, considering only attributes neverselected before.Recursion on a subset may stop in one of these cases:Every element in the subset belongs to the same class; in which case the node is turned into aleaf node and labeled with the class of the examples.There are no more attributes to be selected, but the examples still do not belong to the sameclass. In this case, the node is made a leaf node and labeled with the most common class of theexamples in the subset.There are no examples in the subset, which happens when no example in the parent set wasfound to match a specific value of the selected attribute.

An example could be the absence of aperson among the population with age over 100 years. Then a leaf node is created and labeledwith the most common class of the examples in the parent node’s set.Throughout the algorithm, the decision tree is constructed with each non-terminal node (internalnode) representing the selected attribute on which the data was split, and terminal nodes (leafnodes) representing the class label of the final subset of this branch.3b.C4.5 algorithm: C4.5 is an algorithm used to generate a decision tree developed by RossQuinlan. C4.5 is an extension of Quinlan’s earlier ID3 algorithm. The decision trees generated byC4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statisticalclassifier. Authors of the Weka machine learning software described the C4.5 algorithm as “alandmark decision tree program that is probably the machine learning workhorse most widelyused in practice to date.

[21]Algorithm:C4.5 builds decision trees from a set of training data in the same way as ID3, using the conceptof information entropy. The training data is a set S=s 1 ,s 2,,……… of already classified samples. Eachsample consists of a p-dimensional vector (x 1i ,x 2i ,……..,x pi ), where x i represent attribute values orfeatures of the sample, as well as the class in which s i falls. At each node of the tree, C4.5chooses the attribute of the data that most effectively splits its set of samples into subsetsenriched in one class or the other. The splitting criterion is the normalized information gain(difference in entropy). The attribute with the highest normalized information gain is chosen tomake the decision. The C4.5 algorithm then recurses on the partitioned sublists.This algorithm has a few base cases.. All the samples in the list belong to the same class. When this happens, it simply creates a leafnode for the decision tree saying to choose that class.. None of the features provide any information gain. In this case, C4.5 creates a decision nodehigher up the tree using the expected value of the class.. Instance of previously-unseen class encountered. Again, C4.5 creates a decision node higher upthe tree using the expected value.

4. Fuzzy Set Approach:Fuzzy set theory is useful for data mining systems performing rule-based classification. It givesoperations for combining fuzzy measurements. The Fuzzy Logic Rule based classifier is veryeffective in high degree of positive predictive value and diagnostic accuracy. Fuzzy Logic is atype of multi-valued logic derived from fuzzy set theory to deal with approximate reasoning.Aniele C. Ribeiro et al 2014 proposed fuzzy breast cancer system to map two controlled and twonon-controlled input variable into the risk of breast cancer occurrence. It can provide healthsupport to predict measurement of developing breast cancer to the female population and thehealth authorities, to reduce both the outcomes and mortality rate [22]. Chang-Shing Lee et al2011 presented a novel five-layer fuzzy ontology to model the domain knowledge withuncertainty and extend the fuzzy ontology to the diabetes domain. They proved that the proposedmethod works more effectively for diabetes application than previously developed ones. [23]

5.Rough Set Approach:A Rough Set is determined by a lower and upper bound of a set. Rough set theory providesmathematical tools to determine hidden patterns in data that can be used in data mining. Thelower and upper bound is chose based on selection of attributes. Therefore it may not beapplicable for some application. It does not need any preliminary or extra informationconcerning data.6.K-Nearest Neighbor:K-Nearest neighbor classifies item based on nearest training data in the feature space. It is a typeof instance base learning or lazy learning. It is very simple but its accuracy can be affected bynoisy or irrelevant features. k-NN is a type of instance-based learning, or lazy learning, wherethe function is only approximated locally and all computation is deferred until classification. Thek-NN algorithm is among the simplest of all machine learning algorithms. [24]Both for classification and regression, a useful technique can be used to assign weight to thecontributions of the neighbors, so that the nearer neighbors contribute more to the average thanthe more distant ones.[25].The neighbors are taken from a set of objects for which the class (for k-NN classification) or theobject property value (for k-NN regression) is known. This can be thought of as the training setfor the algorithm, though no explicit training step is required.[26]Proposed Model:I am going to continue my work to develop a model that should perform the following functions:a) Develop a model to check the presence of a disease like cancer or heart disease in anindividual.b) Predict the mortality rates in a certain age group or a geographic location.c) Predict the birth rates in a certain age group or a geographic location.It can be achieved by the use of supervised learning methods and classification techniques andalgorithms by using them to analyse the data sets available in the open source.


[1].Mitchell T. 1997. Machine Learning. New York: McGraw Hill.

[2]. Duda RO, Hart PE, Stork DG. (2001) Pattern classification (2 nd edition). New York: Wiley.

[3]. Simes RJ. 1985. Treatment selection for cancer patients: application of statistical decision theory tothe treatment of advanced ovarian cancer. J Chronic Dis, 38:171-86.

[4]. Maclin PS, Dempsey J, Brooks J, et al. 1991. Using neural networks to diagnose cancer. J MedSyst, 15:11-9.

[5]. Petricoin EF, Liotta LA. 2004. SELDI-TOF-based serum proteomic pattern diagnostics for earlydetection of cancer. Curr Opin Biotechnol, 15:24-30

[6]. Bocchi L, Coppini G, Nori J, Valli G. 2004. Detection of single and clustered microcalcifications inmammograms using fractals models and neural networks. Med Eng Phys, 26:303-12.

[7]. Wyatt JC, Liu JL. J Epidemiol Community Health 2002; 56(11): 808-812, “Basic concepts in medicalinformatics”

[8]. Characteristics of Clinical Decision Support System,

[9]. Dipak V. Patil , R. S. Bichkar “ Issues in Optimization of Decision Tree Learning: A Survey “,International Journal of Applied Information Systems (IJAIS) , New York, USA Volume 3 No.5, July2012.

[10]. UCI Machine Learning Repository

[11]. A. L. Gamboa, M.G.M., J. M. Vargas, N. H. Gress, and R. E. Orozco, “Hybrid Fuzzy-SV Clusteringfor Heart Disease Identification,” in Proceedings of CIMCA-IAWTIC’06. 2006.

[12]. P Magni and R. Bellazzi, “A stochastic model to assess the variability of blood glucose time seriesin diabetic patients self-monitoring,” IEEE Trans. Biomed. Eng., vol. 53, no. 6, pp. 977–985, Jun. 2006.

[13]. S. U. Amin, K. Agarwal and R. Beg, IEEE 2013, “Genetic neural network based data mining inprediction of heart disease using risk factors”

[14]. Mrudula Gudadhe, Kapil Wankhade and SnehlataDongre, IEEE 2010, Decision Support System forheart disease using support vector machine and artificial neural network”

[15]. Decision Support System, Iliad,

[16]. Barnett GO, Cimino JJ, Hupp JA, Hoffer EP, JAMA. July 3, 1987. “DXplain. An evolvingdiagnostic decision-support system”.

[17]. Mr. P. A. Kharat, Dr. S. V. Dudul, IEEE 2011, Clinical Decision Support based on Jordan/ElmanNetwork”

[18]. R. R. Janghel, Anupam Shukla and Ritu Tiwari, IEEE 2009, “Clinical Decision Support System forFetal Delivery using artificial neural network”

[19]. AY AI-Hyari, A. M. Al-Taee and M. A. Al-Taee, IEEE 2013, “Clinical Decision Support fordiagnosis and management of Chronic Renal Failure”



[22]. Aniele C. Ribeiro, Deborha P. Silva and Ernesto Araujo, IEEE 2014, “Fuzzy Breast Cancer RiskAssessment”

[23]. Chang-Shing Lee and Mei-Hui Wang, IEEE 2011, “A Fuzzy Expert System for Diabetes DecisionSupport Application”