.

Monday, April 1, 2019

Data Mining techniques

selective produceing Mining techniquesABSTRACTCompetitive take account requires abilities. Abilities ar built done knowledge. companionship comes from entropy. The attend of extracting knowledge from selective info is c onlyed info Mining. selective study exploit, the extraction of hidden predictive learning from with child(p) discipline insensibles, is advance technique to help companies to highlight the close to important selective in beation in their in straination w beho parts. selective information mining tools predicts future trends and behaviors. information mining tools gutter answer melodic line questions that traditionally were too time down to resolve. entropy Mining techniques evict be implemented rapidly on existing softw be and hardware computer programmes to enhance the value of existing cultivation resources, and preserve be integrated with modern products and system as they are brought online.A Data storage wareho handling is a plat form that contains all of an physiological compositions selective information in one place in a centralized and normalized form for deployment to users, to fulfill simple reporting to complicated summary, closing erect and executive level reporting/archiving needs. Physically, a selective information warehouse is a repository of information that businesses need to thrive in the information age. Analytically, a info warehouse is a modern reporting environment that provides users maneuver nettle to their information. In the information age, selective information computer memory is a powerful strategic weapon. Not only does it let organizations compete across time, it is also a rising tide strategy that fuel elevate the strategic sagacity of all employees in a fields.This paper bows an all overview of the entropy mining and warehousing, their elementary definitions, how they are implemented and their pros and cons. selective information WAREHOUSINGIn todays competitiv e international business environment, it is crucial for organisations to image and manage enterprise wide information for fashioning timely decisions and respond to changing business conditions. With the receding economy, enterprises suck up changed their business focus towards client orientation to remain competitive. Consequently, CRM tops their schedule and more companies are realizing the business advantage of leveraging one of their learn assets selective information. M each re anticipate reports indicate that the amount of info in a given organization doubles every five years. As say earlier, the most fundamental aspect affecting the successful functioning of a business enterprise is the crucial decisions taken in this regard by the centering. The mental capacityinal entity that helps them in taking these decisions is the business critical information. This information can only be reliable and accurate if all the business link information is properly analyzed an d further a thorough analysis is only possible if all the info affecting the enterprise is pre displace at one place. The solution a information warehouseData Warehouse is a single, complete consistent memory of data obtained from a concoction of different sources made available to end users in what they can understand use in a business context. Today, data warehousing is one of the most talked-about business technologies in the corporate world.DATA mineData mining is a powerful new technology with swell potential to help companies focus on the most important information in the data they have collected about the behavior of their customers and potential customers. It discovers information at bottom the data that queries and reports cant effectively reveal. The amount of edged data stored in corporate databases is exploding. From trillions of point-of-sale transactions and credit card obtains to pixel-by-pixel images of galaxies, databases are now measured in gigabytes an d terabytes. Raw data by itself, however, does not provide much information. In todays fiercely competitive business environment, companies need to rapidly turn these terabytes of raw data into significant insights into their customers and markets to channel their marketing, investment. figure of speech Data ExplosionData mining, or knowledge discovery, is the computer-assisted process of digging by dint of and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They clean databases for hidden patterns, finding predictive information that experts may miss because it lies remote their expectations.Data mining derives its name from the similarities between searching for valuable information in a large database and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find where the value resides.Frequently, the data to be mined is first extracted from an enterprise data warehouse into a data mining database or data mart .The data mining database may be a logical rather than a physical subset of your data warehouse.DATA WAREHOUSING1. DEFINITIONA data warehousing (DW) is a subject-oriented, integrated, time variant, non-volatile collection of data in oppose of managements decision making. A data warehouse is a relational database management system (RDMS) which offer organizations the competency to gather and store enterprise information in a single conceptual enterprise repository and is knowing specifically to meet the needs of transaction processing systems. Data store deals with the organizing collecting data into database that can be searched mined for information through the use of intelligence solution. 2 . CHARACTERISTICS OF A DATA storage warehouse1) Subject-oriented The data in the database is organized so that all the data elements relating to the similar real-world event or object are linked together 2) Time-variant The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time 3) Non-volatile Data in the database is never over-written or deleted once committed, the data is static, read-only, only when retained for future reporting and 4) Integrated The database contains data from most or all of an organizations in operation(p) applications, and that this data is made consistent. 3. ARCHITECTURE OF DATA WAREHOUSEThe architecture for a data warehouse is given below. building this architecture requires four basic steps1) Data are extracted from the dissimilar and internal source system files and databases. In a large organization there may be dozens or even hundreds of such files and databases.2) The data from t he various source systems are transformed and integrated in the lead being interferenceed into the data warehouse. Transactions may be sent to the sources system to correct errors discover in data staging.3) The data warehouse is a database organized for decision support. It contains both detailed and summary data.4) substance abuser admittance the data warehouse by means of a transmutation of motion languages and analytical tools. Results (e.g. prediction, forecast ) may be fed tolerate to data ware house and operational databases. Information integrated in advanceStored in warehouse for direct interrogateing and analysis Fig architecture of typical data warehouse ,and the querying and data-analysis support Architecture in abstract ViewSingle-layer Every data element is stored once only virtual(prenominal) warehouse Two-layer Real-time + derived data Most commonly employ draw near in industry today Three-layer transformation of real-time data to derived data really requi res 2 steps 4. ISSUES IN BUILDING A WAREHOUSE1) When and how gather data In a source driven architecture for gathering data, there data sources transmit new information. In a destination -driven architecture, the data warehouse periodically sends request for new data to the data source . 2) What Schema To Use Data sources that have been constructed separately are likely to have different schemas, part of data warehouse is schema integrating, and to convert data to the integrated schema before they are stored .as a result data stored in warehouse are not just a copy of the data at the source 3) Data Cleansing The task of correcting and preprocessing data is called data cleansing data sources oftentimes deliver data with numerous minor inconsistencies that can be corrected.4) How To dissipate Updates Updates on relations at the data sources must be propagated to data warehouse, if the relations at the data warehouse are exactly the same as those data source, propagation is str aightforward 5) What To Summarize The data generated by the transaction-processing system may be too large to store online .we can maintain summary of data obtained by aggregation on a relation.5. DATA WAREHOUSE MODELData warehousing is the process of extracting and transforming operational data into informational data and loading it into a central data store or warehouse. Once the data is loaded it is accessible via desktop query and analysis tools by the decision makers. The data warehouse model is illustrated in the following figure. The materialized views contain summary data compiled from several data sources. The auxiliary views in the picture are not mandatory, and are used to contain additional information needed to support the synchronization of the materialized views with the data sources. Fig Data ware house modelThe data within the actual warehouse itself has a distinct structure with the emphasis on different levels of summarization as shown in the figure below. Fig so cial system of data warehouse6. STAGES IN IMPLEMENTATION A DW implementation requires the integration of implementation of many products. Following are the steps of implementation-Step1 stash and analyze the business requirements.Step2 Create a data model and physical design for the DW.Step3 Define the Data sources.Step4 make the DBMS and software platform for DW.Step5 Extract the data from the operational data sources, transfer it, clean it load into the DW model or data mart.Step6 Choose the database access and reporting tools.Step7 Choose the database connectivity software.Step8 Choose the data analysis and presentation software.Step9 Keep refreshing the data warehouse periodically. 7. DATA MARTSA data warehouse is the sum of all its data marts. A data mart is a complete pie-wedge of the boilers suit data warehouse pie, a restriction of the data warehouse to a single business process or to a group of cogitate business processes targeted toward a particular business group. Da ta marts can be customized for the end users ,and can present data in different formats for the end-users benefit. Data marts can employ OLAP , which is a method of database indexing that enhances quick access to data, specially in queries of data or viewing the data from many different aspects.DATA MINING1. DEFINITIONData Mining, or Knowledge Discovery in Databases (KDD) as it is also known, is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data.Data mining refers to using a variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as decision support, prediction, forecasting and estimation. The data is often voluminous, but as it stands of low value as no direct use can be made of it it is the hidden information in the data that is useful.A data mining is also defined as A new discipline lying at the int erface of statistics, data base technology, pattern recognition, and machine learning, and concerned with secondary analysis of large data bases in order to find previously unsuspected relationships, which are of absorb of value to their owners. 2. PROCESSThe data mining process can be divided into four steps Data Selection Data impact Data Transformation Data Mining Interpretation Evaluation Fig Process used in data mining3. WORKING composition large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the 2. Data mining software analyzes relationships and patterns in stored transaction data base on open-ended user queries. some(prenominal) types of analytical software are available statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought Classes Stored data is used to locate data in predetermined groups. For example, a restaurant chain could min e customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials. Clusters Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities. Associations Data can be mined to identify associations. The beer-diaper example is an example of associative mining. Sequential patterns Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likeliness of a backpack being purchased based on a consumers purchase of sleeping bags and hiking shoes. 4. MODELS RELATED TO DATA MINING there are two types of model or modes of operation, which may be used to discover information of interest to the user. 1) Verification model The verification model takes input from the user and tests the validity of it against the data. The emphasis is wi th the user who is responsible for formulating the hypothesis and issuing the query on the data to affirm or negate the hypothesis. 2) Discovery ModelThe discovery model differs in its emphasis in that it is the system mechanically discovering important information hidden in the data. The data is sifted in search of frequently occurring patterns, trends and generalizations about the data without intervention or guidance from the user. 5. TECHNIQUES apply IN DATA MINING Artificial neural networks Non-linear predictive models that learn through training and resemble biological neural networks in structure. termination trees Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square self-regulating Interaction Detection (CHAID). Genetic algorithms Optimization techniques that use processes such as genetic faction, mutati on, and natural selection in a design based on the concepts of evolution. Nearest neighbor method A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique. Rule induction The extraction of useful if-then rules from data based on statistical significance. 6. TWO STYLES OF DATA MININGThere are two styles of data mining. Directed data mining is a top-down approach, used when we know what we are looking for. This often takes the form of predictive modeling, where we know exactly what we want to predict. Undirected data mining is a bottom-up approach that lets the data speak for itself. Undirected data mining finds patterns in the data and leaves it up to the user to determine whether or not these patterns are important. 7. POTENTIAL APPLICATIONSData mining has many and alter fields of application some of which are listed below . Marketing Identify purchasing patterns from customers Market basket analysis. Banking Detect patterns of fraudulent credit card use Identify loyal customers. Insurance and Health Care Claims analysis, Predict which customers go away buy new policies Identify fraudulent behavior. Transportation Determine the diffusion schedules Analyze loading patterns.CONCLUSIONOrganizations today are under direful pressure to compete in an environment of tight deadlines and reduced profits. bequest business processes that require data to be extracted and manipulated prior to use go forth no longer be acceptable. Instead, enterprises need rapid decision support based on the analysis and forecasting of predictive behavior. Data-warehousing and data-mining techniques provide this capability.A data warehouse is a modern reporting environment that provides users direct access to their data. A Data warehousing is the sum of all its Data Marts. Data warehousing strategy allows organizations to move from a defensive to an vile decision-making position. The purpose of data warehouse is to consolidate and integrate data from a variety of sources and to format those data in a context for making accurate business decisions.Data mining offers firms in many industries the ability to discover hidden patterns in their data patterns that can help them understand customer behavior and market trends. The advent of parallel processing and new software technology enable customers to capitalize on the benefits of data mining more effectively than had been possible previously. REFERENCES1) www.geekinterview.com/Interview-Questions/Data-Warehouse 2) www.datawarehousing.com/ 3) http//en.wikipedia.org/wiki/Data_warehouse 4) www.megaputer.com5) www.research.microsoft.com

No comments:

Post a Comment