Friday, March 29, 2019
Application Survey on Data Mining and Data Warehousing
Application keep abreast on information mine and entropy W arho useAishwarya.RSurvey narration on Bank-Loan Risk Prediction IntroductionData exploit has been the around explored topic for the past decade and has given rise to some(prenominal) peeled enhancements and proficiencys in several industries. One such mind provoking discipline of high interest is assign Risk analysis or merely the Bank- lend risk soothsaying. It has been a pressing need for several cashboxs these days to enlist a Credit Risk Analysis only to make surely that the money they invest to customers as a loan or every form is given to a rightful(a) customer who is capable of repaying and to vitiate any other fraudulent scenarios. Several techniques in info archeological site have been explored to analyze the customers computer addressworthiness and a few go out be examine and emphasized in the further sections.Discussion on Selected PapersIn this Section, I have listed the journals, IEE E papers referenced for my study and analysis on Bank-loan risk nameion and categorized various factors for each in dining table 1.Table 1. Sources utilize that focused on Bank-loan risk prediction exploitation different selective information tap techniquesReferencesObjective Data Mining Techniques sedulousAuthorsNumber of Citations1SAS Enterprise Miner 5.3, logistic Regression perplex and last maneuver diagram utilize in honorable mention scoring determines for assessing deferred payment risk.Bee Wah Yap, Seng Huat Ong, Nor Huselina Mohamed Husain.772 ratiocination Tree pose for credit assessments in a Bank.I Gusti Ngurah Narindra Mandala, Catharina Badra Nawangpalupia, FransiscusRian Praktiktoa.153Predictive Modelling technique and Nave Bayes algorithmic rule for loan risk assessment.Rob Gerritsen344Multilayer Feed preliminary anxious Network, Support Vector Machines, Genetic Programming, Logistic Regression, Group Method of Data Handling, Probabilistic Neural Network techniques for Financial Fraud assessment.P.Ravisankar, V.Ravi, G. Raghava Rao, I.Bose147Expert bodys with Applications exploitation Data Mining to improve assessment of credit worthiness via credit scoring manakinsProblem Description Bee Wah Yap et al.1 found a amateurish nine-spot has been facing difficulties in recogniseing the defaulters who do not pay their periodical subscription fee causing a lot of chaos for the club to govern the funds effectively and divide the fund for any further activities or events in the club. The management decided to evaluate the credit worthiness of the club members by using the past members data as a data set and analyzed using three different data archeological site techniques in request to conclude the fittest of all1.Solution technology Bee Wah Yap et al.1 utilize Credit scorecard model, logistic regression model and decision point model using SAS Enterprise Miner, a diverse tool to employ several data mining techniques in order to improvise and sense out the potential defaulters in the club.Solution EvaluationBee Wah Yap et al.1 in the credit scorecard model, identify the various factors determining a defaulter establish on their age, the number of dependents, the number of cars, district of address and most importantly the classification of defaulters and non-defaulters base on the payment status. They then obtained the Information order as the summation of the probability of good attribute(applicable jimmys from the old dataset taken for prediction) minus the probability of bad attribute(values from the old dataset that have no added value to be included in the prediction) and determine that values greater than 0.02 as admissible values of inclusion on the score card.They then place the Stepwise selection method suitable of all the other Logistic Regression model and found a wide range of information and conclusions on the type of defaulters.Finally, they applied the Decision tree algori thm in order to classify an if-then rule for the large dataset into smaller segments and obtained the profile of defaulters. found on the results he obtained from the in a higher place three techniques they had clearly identified that Decision Tree is by far a split approach for prediction although all three have no big deviance and that Credit scoring model without adequate and proper data sets and old data could never perform well in prediction.Further EnhancementsThe study has employed several techniques in order to justify a better model for prediction as a substitute for the Credit scoring model but has overlooked the fact that the data sets used throughout ar from past customers which may or may not be legitimate bearing of prediction and definitely not a sensible way to conclude Decision Tree better over Credit scoring as n all of the arguments is valid and may vary when using a large amount of real-time data from the present to predict the future day defaulters.Assessi ng Credit Risk an Application of Data Mining in a Rural BankProblem DescriptionI Gusti Ngurah Narindra Mandala et al.2 felt that for agrarian fixs to stay healthier, a certain benchmark has to be set on many factors out of which non-performing loan (NPL) factor played an important role. They identified that lower the NPL rate better the health of the rural bank. In order to employ this, they proposed that banks should approve only the right applicants and thereby increase the profit, credibility, and practise the improvements of their local community where such banks ar most used. They were affirmative that banks with slight than 5% of NPL are in better condition when compared to other with a greater value of NPL.Solution engineeringI Gusti Ngurah Narindra Mandala et al.2 chose Decision Tree technique to be employed in a rural bank in Bali and scrutinized the various factors that are incumbently kept in comity for lending loans to a customer.Solution EvaluationI Gusti Ngurah Narindra Mandala et al.2 found that the current NPL value of the rural bank of Bali is 11.99% very much higher than the anticipate value for a good performing bank. They make use of 84% of data from a sample data set of 1028 records for evaluation and mulish approximately 13 parameters of amity for evaluating the NPL customers. They developed a decision tree based on the animated parameters but reordered the determining factor as the collateral value and obtained an NPL of 3%, which by far is the most effectual a bank could perform.Further EnhancementsAlthough the above assessment and conclusion of a healthy bank seem appealing they could have employed a further emphasis on other factors that also contribute to a healthy bank / NPL and predicted the credibility further using various other Predictive and Descriptive modeling techniques which have better analysis and dissolver for the given scenario than what was obtained.Assessing Loan Risks A Data Mining Case psychoanalyzePr oblem DescriptionRob Gerritsen 3 identified that if customers who could not pay their loans bank can be predicted before lending using data mining techniques then the information would be worthwhile. He found that agribusinesss Rural lodging Service has been lending money to people in the rural areas and USDA realized that the huge number of applicants who are being approved of the loan may or may not be capable of repaying the amount. so USDA decided to perform a data mining technique in order to gather the information and predict the vulnerabilities of the customers3.Solution TechnologyRob Gerritsen 3 decided to use Predictive Modeling Techniques along with the Nave Bayes algorithm to come up with a solution for the above problem.Solution EvaluationRob Gerritsen 3 was given a sample data of 12,000 based on the existing mortgages of single families and had to train the given data set using the model and then predict the future scenarios. So, he first classified the dataset and ap plied the Nave Bayes binning algorithm in order to divide the customer based on loan amounts that are to be paid by each.Initially, he found this ineffective as a huge amount of people fell into a single bin as the bin range values where continuous/ unvarying in distribution and hence difficult to identify precisely the victor defaulters.He further organized the binning range distribution and made a decision tree from the results obtained to conclude the major factors of defaulters.Further EnhancementsRob Gerritsen 3 himself has identified that the data set taken was too less to conclude the results and further, a wide range of dataset has to be taken along with further factors of consideration for USDA to obtain the verified solution for their problem.Decision Support System detecting of financial statement fraud and feature selection using data mining techniquesProblem DescriptionP. Ravisankar et al.4 conducted a study on 202 Chinese companies using a variety of data mining tech niques simply to conclude if the financial statements, income statements, cash flow, and various other factors if assimilated could give an better output from the companies and also decide if the loan has to be given to customers based on the results.Solution Technology P. Ravisankar et al.4 has employed a variety of data mining techniques namely Support Vector Machines (SVM), Group Method of Data Handling (GMDH), Genetic Programming (GP), Logistic Regression (LR), Multilayer Feed Forward Neural Network (MLFF) and Probabilistic Neural Network (PNN). He made use of a number of techniques for the same datasets in order to identify the best solution for the above problem.Solution Evaluation P. Ravisankar et al.4 identified that among the 202 Chinese companies taken as a data set 101 were duplicitous and the rebrinying were Non-Fraudulent.He then applied the Genetic Algorithm to find the fitness function, SVM to obtain the permissible support vectors, GMDH to classify and obtain a Feed Forward network model(Polynomial Model), PNN and with or without Feature selection in order to obtain the features of fraudulent companies.He has clearly observed that among the several techniques used the main factors that have to be considered is the amount of dataset that is to be used should concede with the capability of the technique and with less time consumption for cultivation and obtaining results from the dataset.Further Enhancements I would abide with P. Ravisankar et al.4 conclusion of classifying with an if-then rule on the dataset and to mount other hybrid data mining techniques inorder to further enhance the solutions.REFERENCES Yap, B. W., Ong, S. H., Husain, N. H. M. (2011). development data mining to improve assessment of credit worthiness via credit scoring models. Expert Systems with Applications, 38, 13274-13283.GustiNgurah Narindra Mandalaa, Catharina Badra Nawangpalupia*, FransiscusRian Praktiktoa Assessing Credit Risk an Application of Data Mining in a Rural Bank / Procedia Economics and Finance 4 ( 2012 ) 406 412.R. Gerritsen, Assessing loan risks a data mining case study, IEEE IT Professional (1999) 16-21.P. Ravisankar, V. Ravi, G. Rao, I. Bose, Detection of financial statement fraud and feature selection using data mining techniques, Decision Support Systems 50 (2) (2011) 491-500.Question and AnswersWhy DM and DW technologies are becoming important tools for todays business mankind?Todays business world is a competitive environment where right decisions needs to be taken at right time by knowing the answers for what has happened and by predicting what will happen in the future.Data warehousing helps us to identify answers for perplexitys care what, which and how through aggregations.Data mining known as KDD helps us to predict what can happen in future. This is done by discovering and analyzing the hidden patterns. some(prenominal) DM and DW results are processed from large set of data records from either same or different data sources.What are the main differences surrounded by data mining, handed-down statistics data analysis, and information retrieval?Data Mining is a process of obtaining a derived / discovering new information based on the existing information by observing the data, identifying the patterns and obtaining meaningful analytics that can be used in business.A traditional statistics data analysis is method of testing a proposed phenomenon or hypothesis to validate and provide a statistically significant data for accepting the outcome.Information Retrieval in simple terms is the process of collecting/retrieving required data from an existing information available in any form.How is data warehouse model different from a relational database model? Why DW technology is more innovative in supporting business management?Relational Database ModelUsed for Online Transaction Processing (OLTP)Data stored are generally a fact in a single operational databaseTables are normalizedSQL are used to queryData Warehouse ModelUsed for Online analytic Processing (OLAP)Data stored in DW are generally consolidated data(aggregation) from aggregate databases or sourcesTables are de-normalizedOLAP tools are used to queryThe key difference between DW model and relational database model is that, DW is a layer on top of other databases whereas relations database is a database itself.DW technology is more advanced in supporting business management because it provides quick answer for question like WHAT, WHICH and HOW which helps the management to act accordingly on making decisions. i.e. they are very faster in generating reports for answering the management queries.What are the main difference between using OLAP on DW and using SQL on traditional database for supporting business decision making?The main difference is that confused questions which involves multiple aggregations can be answered in ad-hoc environments (i.e. data from different sources) considerably in faster way using OLAP on DW
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.