What is data mining? A definition, an overview of data mining tools, the process & applications

Photo Herman van Dellen
Author: Herman van Dellen
Data Mining Expert
Table of Contents

Data mining has taken off thanks to the big data revolution. The concepts of data mining and big data are like Siamese twins. Data mining now has successful applications, particularly in the (bio)medical and financial sectors. But data mining software is also successfully used in retail, municipalities, defense, and even sports. However, big data mining is not entirely uncontroversial. Without a basic knowledge of statistics, you can quickly draw wrong conclusions. What is data mining? What kind of data mining tools are there? And what is the difference between process mining vs. data mining? Our data mining experts can give you answers.

Organize your thoughts before considering data mining tools

To understand the importance and impact of the field of data mining, consulting some definitions is essential. Moreover, reading this overview article on data mining will help you organize your thoughts before you start a project. If you really want to get started with data mining, call in the help of the data mining experts at Passionned Group.

What is data mining?

Data mining, artificial intelligence, and big data are often mentioned or cited together. The same goes for data mining, deep learning, and predictive analytics. Although all these concepts have much in common, there are significant differences. To avoid confusion of concepts and to clarify the meaning of data mining, the Passionned Academy training programs and the Data Science book work with an unambiguous data mining definition. This also applies to our data mining course.

Data mining is finding connections, patterns, and correlations in structured data using machine learning, statistics, and database techniques.

The goal of data mining is to gain new insights that are “hidden” in the data and to acquire new knowledge. This knowledge can then be used to make better decisions and to further improve and/or innovate processes.

What is data mining?Figure 1: What is data mining? Combining data from different sources, understanding, analyzing, mining, and clearly visualizing the results with the goal of making better decisions.

Data mining meaning: a little bit of history

The roots of data mining go all the way back to the 18th century when the Bayesian Theorem, probability calculation, and regression analysis made their appearance as components of mathematics.

However, the term data mining as a concept only surfaced within database circles in 1990. While at first Knowledge Discovery in Databases, abbreviated as KDD, was still formally used, later on, the term data mining became more and more popular.

Given this history and the investigative nature of data mining, it is not surprising that data discovery is a generally accepted synonym for data mining. Data science is the collective term for data science in the broadest sense. Officers in industry or scientists at universities who are engaged in data mining science are logically called data scientists.

The Artificial Intelligence handbook Image of The Artificial Intelligence handbookThis complete Artifical Intelligence book (with over 25,000 copies sold) covers the entire spectrum of making organizations more intelligent. Learn how to apply AI to make better decisions faster and develop innovative new products and services. Start embedding artificial intelligence into your business processes in everything you do. With this manual and an AI-first strategy you will guide your organization through good and bad times. Artificial Intelligence handbook

Process mining and data mining

Process mining and data mining are very often mixed up in practice, or worse, lumped together. Nevertheless, the number of differences between the two concepts is greater than the number of similarities. One thing is certain. Process mining is not data mining and vice versa.

Let’s start with the most important similarities. Both data mining and process mining fall under the broad umbrella of Business Intelligence & Data Analytics. Moreover, both are increasingly using algorithms to discover hidden patterns, (causal) relationships, and irregularities. Now let’s talk about the differences.

Differences between data mining and process mining

There is a substantial difference and overlap between data mining and process mining. The illustration below clarifies this difference.

The focus in data mining is explicitly on the patterns within the data. For example, the American Pizza Hut is trying to discover patterns in customer behavior. By using artificial intelligence in data mining settings, the company wants to recommend pizzas to customers based on the current weather and depending on where customers live, or where they want to eat their pizza. A certain weather pattern assumes a certain preference for a specific pizza. Why is the pizza not delivered to the customer on time and how are differences in baking time explained? The tools used in process mining include event logs, audit trails, and time stamps.

Difference between process mining and data miningFigure 2: Process Mining lies at the intersection of Business Process Management (BPM) and data mining.

ICS: a textbook example of data mining

ICS: a textbook example of data miningData mining has already helped many organizations and companies to work smarter. ICS is perhaps not the best-known example, but it is a very good one. Every part of the process, from bringing in new customers to retaining them, from acceptance to stimulating the use of credit cards, is underpinned by data mining algorithms. These greatly increase the effectiveness of the process components. For example, the number of fraudulent transactions fell by as much as 50% in one year and card use increased by 20%. This was not only in the interest of ICS but also of the customer.

Another difference is that data mining traditionally works with more or less static tables of data, while process mining is now able to monitor business processes in real-time. In data mining, moreover, chance plays a major role, whereas with process mining you can also analyze a predefined problem. Data mining expressly looks for general patterns, while process mining looks for causal connections. Or as the platform TechTarget describes it: “Data mining is more concerned with the what – that is, the patterns themselves – while process mining seeks to answer the why.”

Text mining vs data mining

The field of data mining is considerably better known than that of text mining, also known as text analytics. With the appointment of Prof. Jan C. Scholtes, the first professor of Text Mining in the Netherlands, this changed. In his inaugural lecture, he highlighted the difference between data mining and text mining.

Illustration representing text mining vs data miningFigure 3: The different steps in the text mining process.

Data mining, according to Scholtes, is the analysis of transactional data contained in relational databases. Think of credit card payments or debit card transactions. Various additional characteristics can be attached to such transactions: date, location, age of credit card holder, salary, and so on. “Using the combination of this data, you can then determine patterns of interest or behavior. Text mining is about analyzing unstructured information and extracting relevant patterns and characteristics. Then, using those patterns and characteristics, you can search better, analyze data more deeply, and get insights faster that would otherwise often remain hidden.”

Finding without knowing exactly what you’re looking for

Text mining, then, is finding connections, patterns, and correlations in unstructured data such as text. As with data mining, the goal here is also to gain new insights and knowledge. “Finding, without knowing exactly what you are looking for, or finding what doesn’t seem to be there,” is how Scholtes sums up his field in a nutshell in his oration.

Text mining use casesFigure 4: Text mining use cases in various industries.

One of the first successful commercial applications of text mining within business, according to Scholtes, is analyzing warranty issues in the automotive and consumer electronics industries. The application consists of analyzing repair reports from dealers to discover early recurring patterns of warranty problems. Other examples of applications of text mining are in the broad area of:

  • Fraud detection
  • Crime detection
  • Intelligence analysis
  • Sentiment measurement on social media
  • BI applications, such as competitive intelligence and customer analytics
  • Clinical research and other biomedical applications
  • Spam filters
  • E-discovery
  • Due diligence investigations
  • Compliance investigations by regulators
  • Bankruptcy investigations

See also the section below describing some examples and applications of data mining. Also follow our annual trend article on the most important developments in BI, big data, data mining, machine learning, and data science. If you are serious about data mining or text mining, enlist the help of the text and data mining experts at Passionned Group.

contact us

Business intelligence vs data mining

How can you put data mining in a business intelligence perspective? Data mining is a distinct field within the business intelligence manager’s domain. While the definitions, purpose, scope, and focus differ, ideally the two (BI manager and data scientist) work together as a team.

Data mining focuses on exploring and formatting data, while business intelligence focuses on interpreting and presenting data to support managers in their decisions.

There is another difference: data mining is focused on finding new KPIs while business intelligence measures, monitors and visualizes the progress of existing KPIs. Data mining uses specific data sets to explore unstructured data, while the point of leverage for business intelligence is the relational databases and the structured data stored in them.

Difference between data warehouse and data mining

Professionals who do not have a business intelligence background sometimes confuse the terms data warehousing and data mining, even though there is a substantial difference. Data warehousing is a process of storing structured data from one or more sources in a data warehouse (a central repository). Data mining, on the other hand, is a process of distilling meaningful data and valuable business insights from a database or data warehouse. In other words, you can only get started with data mining if there is a well-integrated large database or data warehouse.

Data mining vs data science

People sometimes wonder about the content of artificial intelligence in data mining. The answer to this question is not so easy to give. What is certain is that data mining makes use of advanced algorithms that try to discover patterns in the background. Some suppliers of data mining software claim to have put hundreds of algorithms to work. In this sense, algorithms, data mining, machine learning, deep learning, and data mining always have something to do with each other directly or indirectly. The same goes for artificially created contradictions like data science vs data mining and data analytics vs data mining. There are always points of contact.

Data mining: more technique than science?

Data science is a scientific field, while data mining is more of a technique to support the business. There is a great deal of interdependence and the similarities do exist, but this absolutely does not justify mixing up all the terms, as you sometimes see with data mining analytics. If you’re looking for clarification, ask for an inspiration or moderation session at Passionned Group or follow one of our courses.

Difference between big data and data mining

Contrasting big data and data mining is not really helpful for a better understanding of data mining. Big data is simply a raw material for data mining. Nothing more and nothing less.

Different types of data mining tools

All major enterprise software vendors such as SAP, Oracle, and IBM, offer different data mining software tools, also called data discovery tools. Software vendors specializing in Business Intelligence also offer such tools on a modular basis. For a current, comparative merchandise review of data discovery tools and an R data mining tool, consult the Business Intelligence & Data Analytics Guide™ 2024.

Consider open-source data mining software

In addition to Software-as-a-Service vendors, there are also several providers of open-source data mining software. Some vendors specialize in specific data mining software, such as text mining, or in certain data mining techniques, such as classification, clustering, regression, association, outlier detection, and so on. Either way, recognizing patterns in big data is key.

Data mining techniquesFigure 5: Different techniques that support data mining tools

Data mining tools are also becoming more user-friendly. They claim that users without any programming experience can also achieve appealing results with data mining software. Claims such as “no code” or “low code” and drag and drop functionality, among others, are said to promote that user-friendliness.

Some data mining software vendors specialize in a particular sector, for example, agriculture, industry, or education. Specific market knowledge of the software supplier can be helpful in quickly understanding the business problem that you want to solve with a data mining tool.

Selection of data mining software: 5 tips

Providers of data mining software sometimes advertise with compound terms like “predictive data mining software” or “data mining predictive analytics” in order to broaden their scope and optically increase their market exposure. However, the lack of clear definitions in vendors’ product and service portfolios makes the data mining market less transparent, clouds substantive discussions, and disrupts an orderly, objective process of vendor and tool selection.

Definitions, therefore, do matter. Data mining, the detection of patterns, is very different from predicting patterns and processes as done in data analytics and process mining respectively. After pattern detection, data analytics is usually the next step in the process of making business processes more predictable.

Clear definitions help to select the right vendor and data mining tool to solve your specific business problem. The following 5 tips for doing business with data mining software vendors can save you from a gaffe or miscalculation:

  1. Don’t fall for the familiar sales tricks. Software suppliers often work with different basic and premium versions. Text mining, for example, can only be used if you purchase a premium version. Be aware of this.
  2. Use your negotiating power. Realize that software vendors are almost always willing to offer discounts, especially when purchasing large numbers of licenses. After all, sellers of data mining software also have to meet their targets. So-called street prices always deviate from the official list prices or brochure prices.
  3. Be critical of the number of licenses. Remember that every extra module, functionality, or feature usually has a price tag. Not every user of data mining software needs to have all the functionalities and plug-ins at their disposal. Saving on licensing costs is always an option.
  4. Beware of sharp offers. There are quite a few providers on the market offering so-called free versions of data mining software, sometimes in English. Although this may seem attractive at first, you should always be aware that behind these offers there is usually a subscription model that involves upgrading and/or premium versions.
  5. Get the most out of your investment. Remember that good documentation of the data mining software and data mining is very important and is often lacking in freeware. But still: you can’t learn data mining from a book. Practice makes perfect. So follow a relevant, supporting training course in data mining.

How does the data mining process work?

When it comes to setting up your data mining process, data mining clustering, or creating a data mining process diagram, you can hardly ignore the so-called CRISP-DM standard. Since the late 1990s, this has been the de facto standard widely accepted for data mining.

The data mining process according to CRISP-DMFigure 6: The Cross-industry standard process for data mining (CRISP-DM).

The CRISP-DM protocol is not built on a theoretical, academic foundation or based on purely technical principles, but is grafted onto everyday practice. The standard was not developed from the ivory tower, but describes in detail how to carry out data mining projects.

The Cross-industry standard process for data mining (CRISP-DM) describes the standard process of data mining clearly in the following six steps.

  1. Getting to the bottom of the business question. During this first step, you will clearly formulate the objectives of the data mining project and translate the requirements from a business perspective. The result is a problem statement and a preliminary plan of action aimed at goal realization. This is where the business consultant plays a crucial role.
  2. Understanding the data. The second step of data mining is all about collecting the data. You further engage in activities aimed at becoming completely familiar with the data. You recognize data quality problems and gain initial insights. You discover interesting subsets of data and formulate hypotheses about hidden information.
  3. Data preparation. Based on the first raw data, you work towards a final dataset that will serve as input for the data mining model. You perform various preparatory tasks, such as selecting tables, records, and attributes, and transforming and cleaning data. You repeat the tasks as necessary in any order. This step is typically the responsibility of the data analyst.
  4. Modeling. In this phase, various modeling techniques are selected and applied. You calibrate the parameters toward optimal values. Typically, there are several techniques for the same data mining problem. Some techniques have specific requirements for the shape of the data. Therefore, it is often necessary to return to stage 3 of data preparation. The data scientist creates the model.
  5. Evaluation. A data mining model has now been built that appears to be of high quality from a data analysis point of view. The task now is to thoroughly evaluate the model step by step. Are you sure you are going to achieve the business goals with this model or is revision necessary? Have you not overlooked any important issues? Finally, you make a decision about how to use the data mining results.
  6. Implementation in production. Usually, it is the customer, not the data analyst, who carries out the implementation steps. But even if the analyst does perform the implementation, it is important for the customer to understand in advance what actions he must perform to actually use the data mining model created. Depending on the requirements, the implementation phase can be as simple as generating a report, or as complex as implementing a repeatable data mining process throughout the organization.

The creation of the model is generally not the end of the project. Even if the goal of the model is to increase knowledge of the data, you will need to organize and present the knowledge gained in a way that the client can use. This often involves applying “live” models within an organization’s decision-making processes, for example, personalizing web pages in real-time or repeatedly scoring marketing databases.

Data Science book

Some examples and applications of data mining

The data mining examples or practical use cases for data mining have been there for the taking, so to speak, since their emergence in the 1990s. Organizations do not always use them because they are sensitive to competition.

The most appealing data mining example undoubtedly comes from the film MoneyBall, where data mining turned the traditional baseball world upside down. In the extensive series of articles and books on data mining, a number of classic use cases of data mining applied in business and government are now well documented.

We briefly summarize some of the applications of data mining below:

  • Data mining algorithms identify (credit card) fraud in financial services by detecting (anomalous) patterns in payment behavior and human behavior.
  • Supermarket chains and web stores use data mining methods such as associative techniques to discover consumer patterns: for example, which items are often bought together.
  • Within the framework of precision agriculture and so-called smart farming, weather patterns and data on crop growth and fertilization, among other things, are analyzed using data mining techniques.
  • Why does a specific machine part wear out much faster than other parts? In factories, data mining methods are used to detect certain patterns and errors in the manufacturing process and predict preventive maintenance.
  • Abuse and fraud in healthcare are uncovered when you use data mining software and look for anomalous patterns in patient behavior, medication use, and billing behavior, and compare them to protocols.
  • By deploying data mining and big data analytics in hospitals on a large scale, disease patterns, for example in and hereditary diseases, come to light and the quality of care can eventually be raised to a higher level.
  • In telecom, a data mining model has been used for decades by telecom providers to analyze patterns in calling behavior and switching behavior of telecom customers with the aim of retaining customers longer.
  • Data mining software is used by the government to monitor and analyze the tax behavior of citizens.
  • Within the public space, governments deploy data mining software and a data mining dashboard to discover patterns in the context of crowd control, traffic regulation, and crime prevention.

Data mining consultancy

Do you have big data and want to set up a data mining model, but don’t know where to start? Then the data mining consultancy branch of Passionned Group can help you. We can deliver an interim data mining expert on short notice who can support you in every phase of the data mining process. We can also organize an in-company data mining course for a select group of colleagues who need to get started.

Would you like more information about deploying one or more data mining consultants or our Data Mining in R training course? Feel free to contact the consultants of the Passionned Group. We would love to help you out.

About Passionned Group

Logo Passionned Group, the expert in Data MiningPassionned Group is a professional company specializing in data mining & data science. Our consultants help smaller and larger organizations with the digital transformation into intelligent, data-driven organizations. Every other year we present the Dutch BI & Data Science Award™ to the Smartest Organization of the Netherlands.

contact us