What is Machine Learning: algorithms, models, applications, and techniques including a roadmap

Photo Daan van Beek
Author: Daan van Beek
Machine Learning expert and author of the Data Science Book
Table of Contents

Machine learning is fully in the spotlight. That is the case because the improvement potential of machine learning applications for organizations is great. At the same time, the explainability of machine learning algorithms in the private and public domain is under discussion. For years, BI managers used dashboards, reports, and data analytics to uncover the large and relatively simple connections within organizations. Machine learning models, in addition to those big connections, are now uncovering the smaller, more subtle, more complex patterns. But what is machine learning in essence? And how do you build a reliable and working machine learning model? This process is labor intensive, challenging, and time-consuming. Passionned Group recommends the right machine learning tools, implements them, provides a customized Machine Learning course, and delivers interim support.

What is machine learning?

After analyzing different machine learning definitions, we formulated our own conclusive definition, which is simultaneously our main takeaway from this page:

Machine learning is a collection of self-learning algorithms that are able to independently discover patterns based on structured and unstructured data and make reliable predictions that support organizations’ decisions.

In this article, we provide a clear machine learning explanation, a practical machine learning definition, an overview of the available machine learning models and applications, the most important machine learning trends, and last but not least, suggestions for the most suitable basic course in Machine Learning as a stepping stone to a masters in Data Science & Machine Learning.

Self-learning Machine Learning Algorithms

By now machine learning has become a common collective term, an umbrella term for so-called self-learning algorithms. But what are (learning) algorithms anyway? What is the link between business intelligence and machine learning? What is the difference between AI and machine learning?

Illustration shows how self-learning algorithms train machine learning modelsFigure 1: Machine learning models are created by training them with algorithms. These are self-learning computer programs that preserve the optimal end result in a machine learning model.

Machine learning is booming because companies and institutions have access to more and more machine learning data. Therefore, to understand this digital society dominated by big data and machine learning, an overview of some clear machine learning definitions is indispensable here.

Talk to a ML Specialist

Some machine learning definitions

Machine learning was originally defined in the 1950s as “a field that lets computer algorithms learn without having to explicitly program them” (Samuel, 1959). Another common definition is the following: “Machine Learning is the study of computer algorithms that allow computer programs to improve automatically through experience.” (Mitchell, 1997).

Both machine learning definitions may sound somewhat outdated and static to the ears, but in essence, they do justice to the self-learning nature that is so characteristic of both older and certainly modern advanced algorithms. Algorithms change and are able to develop themselves continuously. In fact, machine learning is a dynamic learning process: the algorithms learn by recognizing patterns in the data.

According to one of the largest software companies in the world, machine learning (ML) is a form of artificial intelligence (AI) aimed at building systems that can learn from processed data or use data to perform better. This definition is practical, dynamic, performance-oriented, and also emphasizes the learning nature, only lacking the predictive capabilities of algorithms.

Big data, data science, machine learning, and deep learning are often mentioned together. The same goes for data analytics and machine learning. However, lumping all these terms together, without specifying the differences and/or similarities between AI, data science, machine learning, and BI, quickly creates confusion.

What is the similarity between BI and machine learning?

The similarity between machine learning and business intelligence is that both fields make intensive use of data. BI is the continuous process by which organizations collect and record big data in a focused way, combine it, analyze it, and consistently use the resulting (predictive) information and knowledge to make better decisions. Machine learning algorithms have that same focus, but dig deeper and are able to also discover the more complex patterns that remain invisible to the human eye. Passionned Group’s senior consultants are able to build and successfully implement these kinds of algorithms.

What is the difference between machine learning and AI?

Artificial intelligence is the umbrella field that deals with developing intelligence outside the human brain. Machine learning, on the other hand, is a specialized part of AI that deals with developing self-learning algorithms and techniques that enable software systems to learn independently.

What is the difference between data science & machine learning?

The difference between data science and machine learning can be explained as follows. Data science, according to Wikipedia is an interdisciplinary research field that uses scientific methods, processes, and systems to extract knowledge and insights from both structured and unstructured data. In this sense, you can equate data science and BI. But data science uses the subdomains of machine learning, such as classification, cluster analysis, data mining, databases, and data visualization, among others.

What is the link between machine learning and data analytics?

Data analytics is the field that deals with analyzing large amounts of data within organizations. Machine learning data analytics allows you to better inform organizational decisions. Four main streams can be distinguished within the field of data analytics: descriptive, diagnostic, predictive, and prescriptive analytics. Machine learning, in this context, is primarily concerned with predictive analytics, or predictions.

Let the definitions sink in, but in the meantime remember this sequence from machine learning data scientist David Robinson (2017): data science delivers insights, machine learning delivers predictions, and artificial intelligence delivers guidelines for autonomous actions.

3 different types of machine learning models

Machine learning models can be classified into three different categories on the basis of the learning process that self-learning algorithms go through. Machine learning thus has three main categories.

Supervised learning

Supervised learning occurs when the algorithm learns under the supervision of a teacher or expert. The teacher labels or tags the input data and tells the algorithm which target variable to look for. They give away the correct answer in advance.

In a classification of vacation photos, for example, the supervisor indicates which photos contain a church and which photos do not contain a church. Slowly, the algorithm starts to recognize the photos with churches. You keep this up until you feel that the algorithm has been sufficiently trained and achieves high scores. You then pour this knowledge into a machine learning model after which the algorithm can start assessing new photos, whether or not in real time.

Illustration of a supervised learning schemeFigure 2: A schematic representation of a supervised learning process to recognize fruit.

Unsupervised learning

In unsupervised learning the algorithm receives a set of data and, without supervision (i.e. without a teacher or expert), looks for associations, categories, and clusters. You do not give it a target variable. The algorithm automatically tries to find structure in the data set and can extract features, for example in which neighborhood a house is located.

The outcomes of this category of machine learning models can be less predictable (there is no right answer) than with supervised learning. Therefore, the success of this type of algorithm is also often difficult to measure. Nevertheless, the next breakthrough in machine learning is expected to come from this angle.

Illustration of a unsupervised learning machine learning modelFigure 3: Unsupervised learning looks for clusters or patterns by itself. All you have to do is offer the data.

Reinforcement learning

In this form of machine learning you stimulate the creativity of a so-called “agent”, which consists of an algorithm or set of algorithms that each separately perform specific tasks, or as a whole have been given a task. The agent is given only the rules of the game by the designer. Through clever trial-and-error actions (learning from mistakes), the agent seeks a way to a solution within the data machine learning mountain. After all, there are more roads that lead to Rome.

When a piece of the right road is taken, the algorithm gets a “reward”, when taking an unfavorable turn it gets “punishment”. This form of machine learning is now widely used in self-driving cars that must find the optimum in each specific situation between speed, safety, and driving comfort. But also in controlling the arms and legs of a robot.

In addition to supervised and unsupervised learning, there is also a hybrid intermediate form: semi-supervised learning. This type of learning combines supervised and unsupervised learning. Do you also want to work with (un)supervised learning models or reinforcement learning? Then contact us here.

Illustration of a reinforcement learning modelFigure 4: With reinforcement learning, you combine in one environment the data, a machine learning model, decision-making and actions. With this, you design and build a complete, self-learning, and powerful feedback loop.

Key features of machine learning models

Machine learning models can be recognized by a number of common characteristics. The most important characteristics are summarized here:

  • there is a larger collection of data with many different characteristics or big data
  • the discovery of complex connections and patterns that are often not visible to a human being (at a glance)
  • the problem definition is predetermined, but the data needed for the problem is not
  • sometimes the target variable is known (supervised) and sometimes not (unsupervised)
  • good machine learning models are computation and knowledge-intensive

Difference between deep learning and machine learning

Deep learning is a subcategory of machine learning models. Deep learning attempts to mimic the workings of the human brain. The term “deep” refers to the number of layers in the neural network, or depth of the neural network. Artificial neural networks are thus complementary to and inspired by the biological neural networks within the human brain.

The applications of deep learning are diverse. They are successfully used in areas such as image and speech recognition, in translation tasks, or in the assessment of medical scans. For example, a deep learning algorithm is able to recognize different forms of dementia early.

10 popular machine learning applications

You can deploy a machine learning model to support and improve critical business decisions in marketing, operations, HR, finance, and sales. Also, machine learning applications are used in general to improve customer relationships, predict purchasing patterns, reduce waiting times, or to improve social processes in the public domain. Below we illustrate five concrete applications of machine learning in business situations and five examples of machine learning in the public domain.

Image of four colleagues in front of a laptop

  1. Recommendations. The algorithm that recommends movies on Netflix, or products on Amazon, friend suggestions on Facebook, and contact suggestions on LinkedIn is well-known. But web shops also make intensive use of machine learning for product personalization, contextual search results, chatbots, virtual assistants, and artificially generated photo models.
  2. Irregularity detection. In this form of machine learning, you’re going to focus on exceptions in the broad sense of the word. For example, you can use irregularity detection to detect fraud or filter spam messages in email boxes. You can then look precisely for statistical irregularities in your data. The so-called outliers.
  3. Dynamic pricing. With dynamic pricing, machine learning algorithms automatically vary the price. That price depends, among other things, on competitors’ prices, the time of day, week, or month, the demand, and the available supply. Dynamic pricing is widely used in the tourism industry by airlines, hotel rooms, and so on.
  4. Predictive maintenance. Predictive maintenance is one of the most rational and appealing machine learning applications. It prevents unplanned downtime of expensive machines and installations in industry, in the maritime sector, within the civil engineering sector, the energy sector, the oil and gas sector, and so on. You also save on unnecessary maintenance, because you do it exactly at the right time.
  5. Process mining. In process mining machine learning, specialized algorithms, such as the exotic-sounding alpha miner, fuzzy miner, heuristics, transition system miners, and genetic algorithms, are applied to data from event logs. The goal is to discover process anomalies and better predict future processes by simulating them with machine learning software. Read more about process mining here.
  6. Law enforcement. In the public sphere, the police use camera surveillance and machine learning data, such as the data from intelligent cameras and microphones, to detect crimes in real time and to map crime hotspots, and predict where and when crime will occur (predictive policing). Also, consider automatic recognition of drivers with a phone in their hand using deep learning.
  7. Traffic congestion control. Adaptive signal control is a system for automatically adjusting traffic lights according to traffic density. The technology works on the basis of classical machine learning algorithms. In addition, as part of the smart city concept, several municipalities are experimenting with smart lampposts and pedestrian crossings that light up. The concept is equipped with 5G technology, sensors, charging stations, and lighting scenarios.
  8. Algorithmic Decisions. All ministries, administrative bodies, and implementing organizations use algorithms and machine learning applications to a greater or lesser extent for their decisions. In fact, some legal regulations can only be implemented with the application of decision rules to data. This is especially true for the implementing bodies that carry out financial regulations, such as the Tax and Customs Administration when it comes to property taxes, real estate valuations, or motor vehicle taxes. But also consider the automatic registration of traffic violations (for example, mobile phone use in the car) and the automatic handling of traffic fines.
  9. Robotic jurisprudence. Machine learning allows you to search jurisprudence online and discover patterns in it. This may one day be a stepping stone to a robot judge passing sentences completely automatically. Because it is difficult for litigants to gain insight into how the algorithm works, this will remain in the future for the time being. However, experiments are underway in the US with machine learning software that estimates the risk of recidivism when a prisoner or suspect is released on bail.
  10. Roboticized services. Municipalities are deploying both physical robots and chatbots to improve services to citizens. For example, Robotic Process Automation (RPA) eases the administrative workload of municipal officials, and physical robots in municipal offices show citizens the way. RPA and machine learning are made for each other.

The list is certainly not exhaustive. If you are curious about the possible machine learning data science applications in your sector, or are interested in operating your own machine learning business model, please contact us.
The Artificial Intelligence handbook Image of The Artificial Intelligence handbookMachine learning and algorithms are covered extensively in this all-new edition of the AI ​​book. Learn how these powerful techniques can contribute to a more intelligent and agile organization and how to implement machine learning in order to make faster and better decisions. Over 25,000 copies of this book have already been sold.view the Artificial Intelligence handbook

How machine learning works: an 8-step plan

Building, implementing, training, testing, fine-tuning, evaluating, and bringing machine learning models, such as a predictive model, into production involves a lot. In the following roadmap, we have summarized the eight most important steps for you. This practical roadmap is based on our data scientists‘ years of experience with clients in various industries.

The step-by-step plan is not set in stone but is more intended as a checklist to ensure that you have not skipped any crucial steps. If you are in doubt about the right approach or have detailed questions, please contact one of our data science specialists directly.

Machine learning: 8 steps planFigure 5: A roadmap for developing machine learning models and putting them into production.

  1. Formulate a clear problem statement or business case. Preferably do this in the form of a research question, for example: how will the prices of computer chips develop in the next two years? And what explanatory variables play a role in this? Another example: how quickly does a (cold) virus spread across certain parts of the world? Or how can a real estate company predict and plan the necessary maintenance of houses?
  2. Collect as much historical data as possible on the research question. Do not limit yourself to the internal company data, but also involve external data suppliers in the data collection, such as the CBS, the Chambers of Commerce, sector organizations, and so on. Consult as many different internal and external sources as possible to build up a data collection that is as representative as possible. Ask yourself what data you might still be missing. In this way, you can avoid biases in the machine learning model at this crucial stage.
  3. Prepare your data and make it suitable for model-based application. Before you can use the collected data for machine learning you will need to transform the raw data into training data (data modeling). Data and files are listed in different formats and extensions. Furthermore, there is usually a lot of data integrity, consistency, and logic to be considered. To get the collected data on the same level, a processing step (data cleaning) is necessary. This is how you transform the raw data into training data.
  4. Choose the right algorithm. Choosing and optimizing the right algorithm or an ensemble of algorithms for your research question or business case is difficult, especially for laymen. Algorithms are classified into different categories, also called families, such as recommendation systems, classification systems, clustering, anomaly detection, regression models, and text analysis. Each algorithm is designed to solve a different type of research question. Still, it can pay to have different types of algorithms compete with each other to ultimately achieve optimal prediction results through trial and error.
  5. Train the machine learning model. Thanks to the cleanup of the raw data, you can now get to work with your training data. A common way to train models is to use a training script. During this training process, an algorithm is presented with the training data and independently searches for a way to arrive at a correct answer or solution. In other words, there is an input-output model.
  6. Test the machine learning model in practice. To ensure that the machine learning model predicts correctly, it must be tested. A second split test dataset is therefore used to check the operation and accuracy of the model: how accurately can the model predict the outcome? There can be both overfitting and underfitting, which means the model predicts outcomes less accurately than expected. Don’t accept a bad outcome.
  7. Evaluate and improve the machine learning model. After calculating the accuracy, this step allows you to make further improvements to the model. In doing so, use proven statistical techniques such as cross-validation (K-fold cross-validation) to evaluate the performance of the algorithms. In doing so, you split the training data into a number of subsets of the total data sets. You then use these subsets (“folds”) to train the model and you validate with the remaining folds. You repeat this a number of times (K-fold). Finally, you are going to test the performance of your machine learning model and compare it to the final test data.
  8. Publish the machine learning model and explain it. After you have trained, tested, evaluated, and improved your machine learning model, it is time to roll it out to an acceptance and production environment. Only then will you experience honor from your work. When doing this, don’t forget to explain how the algorithm was (globally) created. For many non-mathematically trained colleagues, a machine learning model is probably a black box that they will greet with some suspicion.

Learn about machine learning with AutoML

The most striking development in the field of machine learning tools is the rise of Auto ML. Besides Google, Microsoft also offers this machine learning tool from its cloud via the Microsoft Azure Platform. In addition, there are several larger and smaller software vendors active in the market that offer platforms and machine learning tools for AutoML.

What is automated machine learning (AutoML)?

Automated machine learning, usually abbreviated to AutoML, is basically a do-it-yourself package for building your own machine learning model. Developing a machine learning model manually is time-consuming because a number of steps and tasks recur repeatedly. Also, comparing dozens of models and algorithms requires a lot of time and deep domain knowledge in practice. AutoML largely automates that process, allowing you as a user to experiment endlessly with the different types of parameters and algorithms. You can also train the machine learning model iteratively until it is ready for production. With AutoML, can you now leave machine learning entirely to laypeople? Far from it, because machine learning is and remains a real profession. It is a mathematically oriented study that is taught at universities and training academies, such as Passionned Academy, and assumes a certain basic knowledge.

Benefits of AutoML

For now, mainly experienced data scientists, analysts, and developers can benefit from automated machine learning management. But non-experts may also be able to use machine learning models and techniques in the future. This is a major shift in the way we have used machine learning so far. AutoML shows the user a range of choices, such as selecting a training dataset, choosing a type of algorithm, training the model, and how to apply the model. For example, based on the trained model, AutoML will look for interesting relationships based on the data in the dashboard or report itself and present them to the user. The AutoML system also presents a report on how the different machine learning models chosen by the user are performing. AutoML can work as a standalone machine learning tool, but also as a component within data discovery tools.

Overfitting and underfitting of machine learning models

The proper tuning of machine learning models is quite precise. There can be both overfitting and underfitting. Both phenomena cause the machine learning model to perform sub-optimally and not produce the desired reliable predictions.

Overfitting and underfittingFigure 6: Overfitting and underfitting illustrated for regression and classification models.

When overfitting and underfitting occur, the machine learning model is perfectly capable of making a correct and even highly accurate prediction or classification based on the training dataset. However, when it is unleashed on a test dataset or actual data, it completely fails:

  • Overfitting. The algorithm learned to see correlations based on the training dataset, but in new data, they are not there at all. This indicates overfitting or overtraining. The model was made to be too specific. It contains too many parameters and therefore cannot generalize well or deviate from previously found patterns. The model “remembers” the training data, for example when there are as many parameters as observations, rather than having learned to generalize. One way to avoid overfitting is to feed and train the algorithm with more (diverse) data.
  • Underfitting. In underfitting, the machine learning model is too generalistic. It contains too little diversity and cannot specify well. In overfitting, you can still get a handle on this by manually eliminating parameters. In case of underfitting, you could extend the training period of the algorithm.

Machine learning sweet spotFigure 7: Somewhere in the middle is a “sweet spot” where the outcomes of the machine learning model are most reliable.

Would you like help optimizing your machine learning model? Then contact one of the machine learning experts at Passionned Group.

Machine learning trends 2024

In addition to the AutoML megatrend mentioned above, we have collected the following machine learning trends for you. We will only touch on them briefly here. If you have any questions or would like more information about our Machine Learning training, please contact us now.

Machine learning trends

  • Calls for legislation and regulation such as the creation of a National Algorithm Registry to enable effective comparison and assessment of machine learning initiatives and algorithms will grow louder.
  • More and more voices will be raised to introduce a seal of approval for algorithms, or even more extreme, a so-called “data kill” button. This is with a view to the desired transparency of machine learning and explainability of algorithms.
  • The government will try to stop the so-called brain drain of machine learning researchers and experts by improving their (working) conditions.
  • With the advance of the Internet of Things and embedded microcontrollers, a new subcategory of machine learning models is emerging: TinyML, which allows you to analyze sensor data on devices with extremely low power consumption.
  • So-called Generative Adversarial Networks (GAN systems) produce new content such as, for example, non-existent faces, human voices, texts, news reports, and audio recordings. Governments will try to quell the dark side of GANs, the so-called deep fakes.

Passionned Group, thanks to its extensive network, is able to deliver qualified data scientists on short notice to help you develop and implement machine learning models and applications. Click here to inquire about possibilities and conditions.

Talk to a specialist

Machine learning techniques

Machine learning models use, among other things, different basic techniques (regression machine learning and machine learning clustering) and different types of algorithms. The algorithms range from simple functions, business rules, decision trees, and cluster analysis, to linear regression, logistic regression, so-called probabilistic or Bayesian networks, and genetic algorithms. A complete overview and description of all types and flavors of algorithms can be found in the Artificial Intelligence book.

Machine learning techniquesFigure 8: The different types of machine learning techniques.

We can additionally distinguish a number of more generic machine learning techniques such as data mining, text mining, and natural language processing.

  • Data mining is a technique for finding connections, patterns, and correlations in structured data using machine learning, statistics, and database techniques.
  • Text mining is finding connections, patterns, and correlations in unstructured data such as text. Again, the goal is to gain new insights and knowledge.
  • Natural Language Processing is the field that deals with computers reading, understanding, and producing human language.

5 machine learning tips

  1. As Stephen Covey advised, “Begin with the end in mind”. Always start machine learning with a specific end goal in mind. Machine learning projects have the highest chance of success if you focus on a specific, urgent problem that needs a solution. Focus on a problem with impact on your bottom line. Keep a sharp focus on that end solution. KPIs are the perfect opportunity for this.
  2. Bring different knowledge domains together to best define and frame the problem to be solved. Developing machine learning models is labor-intensive, challenging, time-consuming, and requires collaboration from experts across disciplines. Start with a simple model and temper expectations if they are too high.
  3. Because machine learning is already very computationally intensive, you should perform as few additional operations as possible. If you are working with structured data, prepare all the necessary data in one so-called “flat” file. This allows you to train the algorithm faster and come up with results more quickly.
  4. Don’t try to reinvent the wheel yourself. Software vendors often have already developed the necessary machine learning tools, cheat sheets, and programming languages, such as Python machine learning and machine learning software. Make use of the available platforms, off-the-shelf libraries, and AutoML. So you don’t have to develop the Machine Learning tools yourself.
  5. Don’t forget to factor in the context of the historical training data. The independent variable you put into a machine learning model may be colored by cultural factors, allowing biases or prejudices to creep into the training data. Discrimination against certain groups is always lurking when defining algorithms.

Artificial Intelligence Book

Hire a freelance machine learning expert

Developing machine learning models is labor intensive and requires the collaboration of experts from different disciplines in the organization. Hiring an external freelance machine learning specialist may be wise to get everyone on the same page. Contact us now if you are considering hiring a data scientist, machine learning expert, or AI consultant.

Master of Data Science

Learn all the ins and outs of machine learning, big data, and data-driven organizations in our 10-day Master of Data Science training course. Take steps towards success in implementing machine learning and data science and enroll today. Passionned Academy’s training responds to the growing need among (business) analysts, aspiring data scientists, and other professionals to start experimenting with AI, data science, machine learning, algorithms, and everything related to it. Contact us to know more.

More information

Technical books on machine learning abound. But if you’re looking for a practical book that explains machine learning models and algorithms in an accessible, understandable way, then The Data Science Book is an absolute must. Order the book now or take our Machine Learning course as a stepping stone to a Master of Science degree in Machine Learning. If you want advice, please contact us.

About Passionned Group

Logo Passionned Group, the expert in Machine LearningPassionned Group is a leading specialist in Data Science & machine learning solutions. Our experienced and passionate consultants help smaller and larger organizations transform into intelligent, data-driven organizations. Every other year we organize the Dutch BI & Data Science Award.

contact us

A selection of our customers

Become a customer now

Do you also want to become a customer of ours? We are happy to help you with machine learning and AI or other things that will make you smarter.

Photo Daan van Beek - Managing DirectorDAAN VAN BEEK MScManaging Director

Contact me directly

Fact sheet

___
customers
___
training courses
___
people trained
9.3
customer satisfaction
___
consultants & teachers
20
years of experience