Wednesday, December 12, 2007

Data Mining - Theory

Data Mining

`We are drowning in information but starved for knowledge.'
John Naisbitt

What is data mining?

Data mining sits at the interface between statistics, computer science, artificial intelligence, pattern recognition, machine learning, database management and data visualisation (to name some of the fields).

Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately comprehensible patterns or models in data to make crucial decisions. Data mining is not a product that can be bought. Data mining is a discipline and process that must be mastered - a whole problem solving cycle.

The main part of data mining is concerned with the analysis of data and the use of software techniques for finding patterns and regularities in sets of data. The idea is that it is possible to strike gold in unexpected places as the data mining software extracts patterns not previously discernible or so obvious that no-one has noticed them before. The analysis process starts with a set of data, uses a methodology to develop an optimal representation of the structure of the data during which time knowledge is acquired. Once knowledge has been acquired this can be extended to larger sets of data working on the assumption that the larger data set has a structure similar to the sample data. This is analogous to a mining operation where large amounts of low grade materials are sifted through in order to find something of value.

Is data mining `statistical déjà vu'?

Whereas statistical analysis traditionally concerns itself with analysing primary data that has been collected to check specific research hypotheses (`primary data analysis'), data mining can also concern itself with secondary data collected for other reasons (`secondary data analysis'). Furthermore, data can be experimental data (perhaps the result of an experiment which randomly allocates all the statistical units to different kinds of treatment), but in data mining the data is typically observational data.

Data warehousing provides the enterprise with a memory

Companies are collecting data on seemingly everything. For example, a customer-focused enterprise regards every record of an interaction with a client or prospect (e.g. each call to customer support, each point-of-sale transaction, each catalogue order, each visit to a company web site) as a learning opportunity. But, learning requires more than simply gathering data. In fact, many companies gather hundreds of gigabytes of data without learning anything. For example, data are gathered because they are needed for some operational purpose, such as inventory control or billing. Once data served that purpose, data languish on tape or get discarded. The data's hidden value has largely gone untapped. For learning to take place, data from many sources (e.g. billing records, scanner data, registration forms, applications, call records, coupon redemption, surveys, manufacturing data) must first be gathered together and organised in a consistent and useful way - in a way that facilitates the retrieval of information for analytic purposes. This is called data warehousing. Data warehousing allows the enterprise to remember what it has noticed about its customers. Data warehousing provides the enterprise with a memory.

Data mining provides the enterprise with intelligence

Memory is of little use without intelligence. That is where data mining comes in. Intelligence allows us to comb through our memories noticing patterns, devising rules, coming up with new ideas to try, and making predictions about the future. The data must be analysed, understood and turned into actionable information. Using several data mining tools and techniques that add intelligence to the data warehouse, you will be able to exploit the vast mountains of data, for example, generated by interactions with your customers and prospects in order to get to know them better. Typical customer-focused business questions are:
What customers are most likely to respond to a mailing?
Are there groups (or segments) of customers with similar characteristics or behavior?
Are there interesting relationships between customer characteristics?
Who is likely to remain a loyal customer and who is likely to jump ship?
Where should the next branch be located?
What is the next product or service this customer will want?
Answers to questions like these lie buried in your corporate data, but it takes powerful data mining tools to get at them, i.e. to dig user info for gold. Data mining provides the enterprise with intelligence. Companies can use data mining findings for more profitable, proactive decision making and competitive advantage.

With data mining, companies can, for example, analyze customers' past behaviors in order to make strategic decisions for the future. Keep in mind, however, that the data mining techniques and tools are equally applicable in fields ranging from law enforcement to radio astronomy, medicine, and industrial process control.

Please contact us today in order to discuss how data mining can be applied to your field or work. Get Statooed.

Data mining myths versus realities

A great deal of what is said about data mining is incomplete, exaggerated, or wrong. Data mining has taken the business world by storm, but as with many new technologies, there seems to be a direct relationship between its potential benefits and the quantity of (often) contradictory claims, or myths, about its capabilities and weaknesses. When you undertake a data mining project, avoid a cycle of unrealistic expectations followed by disappointment. Understand the facts instead and your data mining efforts will be (hopefully) successful. A list of the most common data mining myths versus realities you will find here.

Data mining can not be ignored - the data is there, the methods are numerous, and the advantages that knowledge discovery brings are tremendous. Companies whose data mining efforts are guided by `mythology' will find themselves at a serious competitive disadvantage to those organizations taking a measured, rational approach based on facts.



Will Dwinnell said...

It's worth noting that while data warehouses can be excellent sources of development data for data mining, they are not, strictly speaking, necessary. I have worked on successful data mining projects based on extracts from production relational databases and (separately) simple flat files.

-Will Dwinnell
Data Mining in MATLAB

Auto Twitter Marketing Tools said...

Auto Tweet Generator Software!

Autopilot Twitter Marketing Just Through One Software!

Tweeting When You're Away From Your Computer Has Never Been Easier!

"Discover How This Twitter Schedule Tweets For You On Autopilot Mode"

Sit Back And Relax And Give Your Twitter Followers Valuable Content
While You're Away.... That'll Turn Them Into Raving Fans..

-> Manage your Twitter postings and schedule multiple posts in just seconds easily!

-> Set the time, edit and import messages in just a few clicks of a button!

-> Think about the time saved on making changes without having to log into Twitter!

-> Twitter marketing made easy because you can track your tweets through one software!

-> Gain rapid exposure through Twitter without having to physically be involved in tweeting all day!

-> It automatically submits your messages to Twitter while you focus on more pressing matters!

If You Can Click, Copy, and Paste, You Can Use "TwitterBuzz Auto Tweet Generator!

So go ahead! and Click Here to Download Your Software immediately...

Auto Tweeter Bot

Tweet Automator

Automated Twitter Tool

An Automated Tweeting Solution

Automatic Tweet

Extreme Tracking

eXTReMe Tracker