Hello, and welcome!
Yeah, I know what you might be thinking, "Yet another introduction to Machine Learning blog post🤦🏽♂️!", but wait, it's not what you think it is. Or maybe it is—but different!
There is a lot of "Guide to," "Intro to," "How to do Machine Learning" blog posts out there. Those are great, but most rarely focus on what is needed to learn machine learning successfully—actual industrial applications of what is learned, including simulating the challenges a data scientist or machine learning engineer faces while they work on business-facing problems.
In these series of blog posts (1/5), we are going to introduce you to what we term "Practical Machine Learning," as told by frustrated Data Scientists. (When I mean "we," I mean there are going to be different authors sharing their knowledge with you for the next couple of days.)
So join us on day 1 of 5 as we introduce you to Machine Learning through a practical, applied approach as we walk through an actual real-world problem that is easily accessible by someone with the needed prerequisites.
Speaking of real-world problems and pre-requisites, the issue we will solve as you get your Machine Learning career started is one that is faced by an actual local real estate agent in your city. The prerequisites you'd need to have, to get the most out of this series are;
Knowledge of Python programming language (dictionaries, loops, lists; so far, you can read the basic syntax.)
NumPy, Pandas, and Matplotlib Python libraries.
Keep this glossary of standard machine learning terminologies in hand throughout the entire series as we will refer to it a lot.
Of course, we will try to explain how some things work in detail, but it would be helpful if you have these prerequisites in hand.
Without further ado (hope I spelt that right! 😅), let's jump right into this blog post's outline.
Table of Contents
Conventions Used In This Post
- "📺Video:": We compliment this blog post with embedded videos on specific topics to help you blend your learning. The videos serve as complements to the articles that follow them.
- "❗Main point": If you are in a hurry, you should read the text under this convention.
- "💡Insight:": If you want to learn more about that specific topic, you can deep into the text under this convention.
What We Expect You To Know By The End of This Post.
By the end of this post, you should;
Know what machine learning is and why it works.
What makes machine learning different from the traditional programming paradigm.
Know the various ways a computer can learn and the different types of machine learning systems.
Common practical machine learning challenges and concerns in the industry.
An Introduction to Machine Learning
What is machine learning, and what does it mean for a machine to learn?
George stated Tom Mitchell's somewhat complicated definition of Machine Learning as;
A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
—Tom Mitchell, 1997
If you struggled to understand that definition just like I did while starting, don't worry, I got you. What you know as machine learning is simply;
A technology that enables computers to write rules (instructions) to solve problems or provide answers to questions, based on the knowledge it gains/learns from data related to that question.
There are several things we need to clarify;
Computers writing rules to solve problems
Computers learning (or gaining knowledge) from data related to some questions.
Can computers write their own rules? If yes, how?
If you want to perform a task, you would typically write a program for the computer to execute when some specific conditions are met because you know how that particular task should be done (if X happens, do Y). But with machine learning, the computer constructs its own rules on how to perform a task based on some examples of how that task has been carried out before.
To be clear, programmers write the learning algorithms (statistical algorithms), while the computer applies these algorithms to construct their own set of rules on how to do certain things. The "doing" part is what is not programmed.
For years, we have known computers to be programmable devices that need the programmer's explicit (precise) instructions before it can execute a task. A program may be explicitly written as "if X happens, then do Y." This is typically known as traditional programming.
For example, you might want to build software that detects if a transaction is fraudulent or not. You may need to learn what makes a transaction fraudulent and explicitly write programs that check if a transaction meets a set of conditions to be classed as fraudulent or not. You can imagine this will get a lot complex because you will have to account for all the various ways a transaction can occur to be classed as fraudulent (or not fraudulent), and this will be very difficult to manage.
Another way to build such software could be for you to get an expert and have him/her do the classification. They'd curate a long list of conditions of what makes a transaction fraudulent or not fraudulent and have the programmer explicitly write them into the software. George referred to this in the video above as "expert systems," and although they are classed as a form of AI, they are old-fashioned and have been refined to newer systems such as heuristic reasoning.
Both of those methods have their flaws because the system may not be robust enough to capture all the possible scenarios a transaction can be classed as fraudulent (or not), or it may be too complex that the resources are vastly underused. But there is a solution—enter machine learning!
With machine learning, computers are given the ability to write their instructions by learning from simple and complex relationships and patterns from data well enough to provide insights or answers to some questions. This means that you do not have to write programs to perform tasks explicitly, you just feed the computer a bunch of examples of how that task has been done before as data, and it learns to recognize underlying patterns and similarities common across each example.
What does it mean for a machine to learn?
Machines learn much like the way we humans learn; through experience. This is why they are also a form of artificial intelligence (AI). But since machines (an old term for computers) can only process data, they learn through that instead so they can be able to use that new-found knowledge to answer certain questions, solve specific problems, or more generally make predictions about future events.
If you want to use machine learning in your organization, you have to think about the way the machines learn. With this insight, you can begin to gather the most relevant data that will allow the program to get the most useful insights to improve your customers' experience more specifically or solve problems generally.
You can best answer this question by thinking of how you learn. You would agree with me that generally, humans learn based on experience and doing. An example; you want to cook Jollof rice, how would you do this? Well, there are two possible scenarios;
- This is your first time cooking jollof rice.
- You have had some experience cooking jollof rice before.
Consider the first scenario: If this is your first time cooking jollof rice, you might take either of the following approaches; with your prior experience cooking, you might decide to dabble with various techniques until you figure it out and get jollof rice.
You can also get someone to teach you how to make Jollof rice, gain knowledge from them, and then try to make the jollof rice yourself based on the experience they give you. But you know your learning isn't complete until you receive feedback from them, they correct you, and you try again until you get it right based on their evaluation of your result (the jollof rice).
Consider the second scenario: If you had an experience cooking Jollof rice, you might perhaps know there are some steps to take like going to the market, getting the needed ingredients, and so on till you get to the kitchen and are ready to cook it. But wait! If you made the mistake of not buying tomatoes the last time you cooked jollof rice (which of course maybe did not turn out well), you definitely would avoid the same mistake again by actually getting it (tomatoes) this time so you can have an improved result (the cooked jollof rice).
In both of the scenarios above there is a common theme;
You have a problem you want to solve.
Create instructions/rules to solve such a problem based on your experiences related to such a problem.
Apply the rules you have created to solve the problem so you can have some results.
You then got feedback on your results (whether you cooked a delicious jollof rice or not).
Then you adjust your results based on the feedback you got or the mistakes you made.
This is the same way machine learning too. The only difference is that humans pose the problem (or questions) and then pass in data related to such a problem to the computer, so it can learn. Here is how it goes;
The computer starts by learning simple correlations (or patterns) between the data before learning more complex relationships between the inputs of the data (when I mean input, I mean the variables/values that make up the data).
- The computer learns these patterns through the use of statistical algorithms (or learning algorithms).
The computer then creates its own rules to understand the relationship between the inputs of the data.
It then applies this newly formed rule to other inputs of the data to check if it gives the right results or outcomes.
It then takes the feedback it got on its outcome to improve itself. Once it learns something new, it updates its memory (or database) until the iteration for the learning period is over.
I guess you can see why machine learning is also a type of artificial intelligence.
To conclude, both you and the computer came out of the learning process with added expertise. You learned how to cook Jollof rice better, and the computer learned much more about the data.
Why use machine learning now? Why does it even work?
Our world is getting increasingly complex, so are the users of your product. You need a system that can directly learn the behaviors of the users of your product or services so that you can provide better products or services experience for them.
Machine learning works because lots of data are being generated by users and systems every day, alongside the improvement of learning algorithms, and as well as the availability of computing resources for the algorithm to learn.
Any organization that has lots of data and are looking for better ways to understand and utilize it can benefit quite well from this technology.
According to Aurélien Géron's book, you should use Machine Learning now because it is great for;
Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better. (For example, building a fraudulent transaction classification software or predicting the likely product a user might want to buy.)
Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution. (For example, recognizing a person's voice or speech.)
Fluctuating environments: a Machine Learning system can adapt to new data. (For example, analyzing the stock market to predict the rise or fall of a stock, or predicting what the weather will be like tomorrow or next week.)
Getting insights about complex problems and large amounts of data. (For example, discovering new types of molecular combinations to create new drugs or design new types of materials.)
Machine learning works now because of 4 primary reasons;
An explosion of data: Data is now generated at an unprecedented rate because most activities are now digital and have gone online. Companies now have more data than ever before, but that is not enough, they have to find a way to make sense of the data in an automated fashion—this is where machine learning thrives.
Availability of computing resources: Compared to a decade or so back, there has been a lot of improvements in computing hardware that has also been made cheap, following the already rescinding Moore's law (computing power will double every two years but at half the cost). The availability of cheap cloud computing services provided by vendors such as Google, Amazon, and Microsoft. Also, advancements in hardware accelerating devices such as GPUs (graphics processing units) and specialized hardware accelerators like TPUs (tensor processing units).
Improvement in learning algorithms: Machine learning has been around for a long time (hence why it is termed "machine"; classed as old-fashioned), and therefore various techniques and algorithms have been used to make the technology work. But recent advancements in the field are owed to significant improvements in neural networks, which are a type of machine learning algorithm that can find intricate patterns in lots of data.
Accessible machine learning tools and frameworks: Machine learning used to be for geeks and nerds, but with libraries such as Scikit-Learn, H20.ai, Keras, and frameworks such as Tensorflow and PyTorch, it has become easier to implement the technology. Tools for handling big data such as Hadoop and its ecosystem of libraries have also become accessible for individuals and companies to use for their big data storage and processing options.
You will learn more about how each of these contributes to successful machine learning projects for businesses in the next series of blog posts.
To make things clear here (again), humans write the algorithms the machines use to learn, but what they generate based on their new knowledge from the data (which is called a model) is not written by a programmer but by the machine itself.
Different machine learning systems and how they learn.
We will yet again refer to Aurélien Géron's book to help you understand. There are so many different types of Machine Learning systems that it is useful to classify them in broad categories based on:
- Whether or not they are trained with human supervision (supervised, unsupervised, semisupervised and Reinforcement Learning)
- Whether or not they can learn incrementally on the fly (online versus batch learning)
- Whether they work by merely comparing new data points to known data points, or instead, detect patterns in the training data and build a predictive model, much like scientists do (instance-based versus model-based learning).
To think about the various types of machine learning systems is to think about the multiple kinds of ways you learn. We noted this earlier in the example of preparing jollof rice; you either get a tutor to teach you how to cook it, and you acquire, or you try to do it yourself based on prior knowledge on how to cook other types of meals.
You can even try a combination of the different approaches above where the tutor gives you the necessary instructions needed to cook jollof rice, and you go home and try the recipes to see what works and what doesn't and then improve yourself.
With this type of machine learning, you know a lot about the data you will feed to the machine learning algorithm.
In supervised learning as a type of machine learning system, you show the computer the connection between different inputs in the data and their corresponding output values.
As George mentioned in the video, with supervised learning, you feed the machine labeled data, which includes an input variable (or feature as he called it) and its corresponding output.
In simpler terms, you have a bunch of questions (say X) and answers to those questions (say Y) as data. In supervised learning, you feed this data to the machine learning algorithm to learn. It then maps the relationship between the input variable X (independent variable) to the output variable Y (dependent variable), so it can discover patterns in how they both relate.
You then evaluate the model's performance (a model is a machine learning algorithm that has learned from data) by passing it newer questions that are similar but not the same as the ones it learned.
Of course, business data is, in fact, a whole lot complex than just X and Y variables. You can have a lot of inputs X that define their various outputs, Y. For example, details on customer transactions can go beyond just their names but can include how long they took to purchase, the number of times they purchased the item, and so on. Learn more about it in the video below;
George mentioned something about the "training" dataset and "testing" dataset. You might be familiar with these terms if you went through the glossary.
When creating a machine learning model for supervised learning, you would need to split your entire dataset into a training set and a test set; sometimes, you might consider splitting into a validation set as well depending on the circumstance.
Splitting will help you measure the success of your model and how well it can predict an output based on the input it hasn't been trained on. We typically call this generalization, and when a model generalizes well, it is called a robust model.
The training set is where the machine learning (ML) algorithm gets to, literally, train on the dataset so that it can discover relationships between the input variables and the corresponding output variables.
The ML algorithm starts with small portions of the data, trains on it and then evaluate itself against other parts of the training set, gradually updating its knowledge of the patterns it has discovered from the relationships until the training iteration is complete.
It is best practice to use a validation set for your business projects. This represents your first attempt to understand how your model will generalize to business data it hasn't been trained on before.
The validation set is used to select the best model that has learned from the training dataset.
The general approach is to train several different models on the training data and then select which one of them generalizes best on unseen data.
In machine learning, there are typically lots of experimentation carried out before we can arrive at a satisfactory model that is generalizable. This dataset is where all of those model tweaking (hyperparameter settings) occurs so that the optimal model can be selected.
This dataset is used to provide an unbiased evaluation of how the model will perform in the real world when your customers begin to interface with the software.
The test set is where you apply the model (trained algorithm) to so you can evaluate how the model would perform on data it was not trained on. What you are trying to do is to have the model apply the rules it had created to a newer set if the problem it hasn't been trained on to see if it can give the right output Y for every input X.
Splitting into the above categories will help you evaluate your models and perform model selection based on unbiased results (that is, models that can generalize accurately to new data it hasn't seen before).
The ratio for splitting typically depends on the amount of dataset you have. If you have a large dataset, you might want to perform a 60:20:20 (that is 60% training set, 20% test set, 20% validation set).
Similarly, If you have a smaller set, you might want a ration similar to 70:20:10.
When working with unsupervised learning systems, you do not show the ML algorithm the output Y (answers). Instead, you pass it a bunch of inputs X and ask it to learn the relationship between the variables to form segments or clusters (as George mentioned in the video).
For example, you are trying to determine which customers will buy certain products, you may have the purchase details and characteristics of previous customers but you do not have labels to state whether they actually bought a product in your company or not--that's what unsupervised learning algorithms try to help you figure out.
You get this system to work by feeding it massive amounts of data. To help it effectively learn the relationship between the datasets. The larger and more relevant the data to the problem that you are trying to solve, the better the chance of getting a segmentation that will be useful for the problem you are trying to solve.
Note that unsupervised learning is a sort of partnership between you and the machine. The machine creates the clusters, and then you will decide whether or not this segment or cluster is relevant to the problem you are trying to solve.
In the video, you also learned some types of unsupervised learning algorithms, and although we will not go in-depth into them here, we have picked some of the most relevant ones to explain here briefly.
Clustering algorithms are applied to solve a variety of problems, including; offering the same marketing ads to specific clusters (or group of customers) based on similar behaviors or developing curricula for a segment of students or developing treatments for related medical groups.
Standard clustering algorithms include;
Hierarchical clustering analysis
k-means and group centroid models
These are very useful for putting an organization's data together. Anyone of them used alone or in tandem can be valuable. Just remember that as with any analysis, these algorithms exist to help you decide how you want to do things. So make sure you understand the problem before selecting a path you want to follow.
Unsupervised Transformations (Dimensionality Reduction)
As George mentioned in the video, dimensionality reduction is the technique of representing data points (input variables) in fewer dimensions while preserving as much about their structure as possible—although it is almost impossible to retain the structure of the data fully.
You are going to be using this technique a lot to process data points in your real-world business big data because it is a commonly used approach in feature preprocessing to remove noise from data.
It will also be useful for you to manage all your massive data points. Because they are efficient in dealing with the numerous dimensions that your data brings about, allowing the computer to process them faster and making them take up less space in disk and memory.
The rationale behind dimensionality reduction includes;
It reduces the number of errors in your dataset because what this technique does is to compress features (or input variables) into a single dimension rather than the initial high dimensions of data points.
It improves the generalizability of your model by reducing the noise in your data.
It, of course, improves speed as stated earlier as low dimensions of data points will help your computer process things faster.
Principal Component Analysis (PCA) is perhaps the most common algorithm used to implement dimensionality reduction. This technique combines multiple variables that are related to each other into a single component, which is much easier to deal with.
Anomaly detection has a focus on outliers—data points that are different from other data points in a multi-dimensional space.
Some machine learning algorithms are trained on the data points (input variables) in your dataset and when they see a new dataset, they can tell if it is the same with what they have been trained with or different (hence, the term "anomaly").
Anomaly detection can be applied in various ways;
To detect equipment failures in a manufacturing plant.
To detect whether a software system is going to fail under a particular load.
Exploring new market opportunities for your business that can potentially offer new value.
To detect if an employee has a heightened risk of burnout or leaving a company.
Fraud detection is perhaps the most important area of application, where the system can stop fraudulent transactions before they are carried out.
The algorithm can take in data like purchase characteristics of the customer (how much do they withdraw on an average), biometric data (such as how you hold your phone, how you type with your keyboard or use your mouse). These could serve as indicators for whether a certain transaction is fraudulent or not.
Semi-supervised learning is a crossover that takes advantage of both supervised and unsupervised learning.
You train these algorithms with some labeled data and usually a lot of unlabelled data. They are very useful for supervised learning tasks that may require a lot of granular labeling such as speech-to-text classification systems where you talk to Siri or Google assistant and they are able to transcribe your speech to text and perform the correct actions too.
On a high level, you typically train these algorithms by first passing some labeled dataset to the computer, then have it trained on the rest of the dataset but without the labels this time.
Reinforcement learning is quite different from other types of machine learning systems. With reinforcement learning, you are, no surprises, trying to reinforce the way the computer learns to reach a particular goal using rewards and penalties rather than its knowledge of data points.
They are responsible for the technology behind computers being able to master various strategic games such as chess and Go, as well as StarCraft II. They are also responsible for recent advances in self-driving cars and some other autonomous transportation methods.
Reinforcement learning algorithms reinforce the learning of the agent (what you know before now as the machine learning model) through trial and error by observing the state its environment (the massive amounts of data around it) and performing an action to reach its goal. If it takes the right step towards solving a problem, it gets a reward else it gets punished (a penalty). Over time it then develops the best strategy (also known as policy) to reach the final goal or solve the problem.
They are instrumental in problems with massive amounts of data. As humans, we do not yet know how to effectively solve problems such as; designing new materials, predicting the way a protein will fold. Ans some other problems such as predicting the energy consumption pattern of heavy-duty data centers and even stock price prediction more recently.
As we progress in the course, we will treat other types of ML systems where relevant.
Getting Started With Practical Machine Learning.
Why practical machine learning
There is a disconnection between what people learn about machine learning and how it works out there. This disconnection is often understandable because most of the ways machine learning are taught—through academic and "structured" approach.
Practical machine learning is all about how one can effectively solve real-world industrial problems with machine learning. How machine learning is applied, and the best way to learn practical machine learning is perhaps to do two things;
Follow good methodologies.
Relate every problem directly to real-world use-cases and case studies.
Learning through use-cases and case studies is perhaps the best approach to preparing learners for the troubling world of applied machine learning in the industry. The focus of practical machine learning is on productive implementations and use-cases that are both relevant to business growth and accessible to stakeholders.
Use-cases focus on getting the job done but with a trade-off of building AI in a responsible and accountable way.
Throughout this course, you will learn practical, machine learning by following proper methodologies and using use-cases to complete different sets of projects in the syllabus. By the end of the course, you should have a solid grasp on the challenges ML Engineers and data scientists face even if it's your first time in the industry.
Considerations in doing practical machine learning
Let's set the record straight: For successful implementation of machine learning, it has to do with a lot of experimentation, a whole lot. You almost do not get it right the first time, from framing the plan to executing on it and then monitoring deliverables.
Learning ML practically helps you skim years of experimentation by having you learn machine learning through the applied projects you will work on and use-cases you will follow.
The ability to take a problem, estimate how best to solve it, build a plan to tackle it with ML, and confidently execute on the said plan will help you a lot, and that's what practical machine learning does.
There are several considerations when looking to apply machine learning, therefore doing practical machine learning;
Consider if the project you are working on requires machine learning. You learned in an earlier section in this post what machine learning technologies are perfect for. As we go on, you will continuously tune your skills on performing needs analysis for various projects.
Consider if you have the available resources to do ML. ML is resource-intensive. From computational power to data storage devices to model serving and scaling in production. Practical machine learning most time involves a lot of experimentations that will take up resources, so make sure you have the needed resources to run the project from start to finish (or what is called "end-to-end' workflow or pipeline).
Consider the business or project objective. This is perhaps the most important consideration, according to our conversation with Aurélien Geron (former YouTube product manager). Aurélien stated in the interview; "I think the worst error I saw software engineers make in the ML projects I worked on (and at least once I was guilty of this myself) was to lose focus of the final goal (or never really define it clearly) and to waste time improving the system based on the wrong metric. You don't want to spend several months working on a project that is unlikely to improve your key business metrics significantly."
Consider the ethical implications of the project. As an Engineer, it is becoming increasingly crucial for you to build machine learning projects responsibility. Building and using machine learning projects responsibly should be a prime consideration for you as an engineer. You can check standard AI and machine learning design principles, but the idea is to incorporate a human-centered design approach for your projects. This is why learning design thinking is crucial for you as a machine learning engineer.
Consider the quality of your data. Data is an essential fuel for your machine learning engine. Succesful machine learning projects revolve around have data set not just in large amounts (quantity) but also in quality amounts as well. This means you should have data that is a good representation of the problem that you are trying to solve and is well related to that problem, and as well as data that is labeled appropriately and free from biases. It is essential as an engineer that you understand the limits of your dataset so you can effectively build around that without incurring biases.
Consider good methodologies. This means taking into account your end-to-end workflow in a project from problem framing to launching and monitoring your solutions.
Consider explainability and interpretability as a core part of the entire ML project workflow. Understanding trained models and being able to explain them are becoming increasingly crucial for machine learning projects to be successful. This goes back to considering the various ethical implications of the project and building to make sure every part of the project that affects the business objective or user experience is explainable.
Throughout our machine learning course, we will take into account these considerations and build alongside them.
Also, we will often refer back to lessons from our class on design thinking to follow our end-to-end project workflows and practices.
If you need an excellent resource on how design thinking integrates with machine learning, you can check out this article by Stephen Oba, one of the students of the course.
Common practical machine learning challenges.
The common challenges with practical machine learning are so much so the pointers you need to look at for when working on machine learning projects. As you will see, they are just echoes of considerations we treated in the last section.
You can class the common challenges in applying machine learning into;
Lack of data is a huge challenge but not just that, lack of relevant data to the business problem as well.
Big data is also expensive to tame and that could bring about problems for businesses that want to consider applying ML to their processes.
Data without labels (outputs or target) or incorrect labels also pose a challenge to ML application.
Poor-quality data with a lot of noise or is incomplete poses practical challenges to applying machine learning as you might spend a lot of time cleaning data than actually using it.
Problem framing and clarity:
Most times it is very difficult to translate business problems into data and then models that actually achieve a business objective.
The right metrics are not agreed upon between you and your stakeholders which makes it difficult to determine performance acceptability or threshold.
Working on the wrong business problem is also very common when trying to apply ML for such purposes. A business might want to work on a project on speech recognition when there can be better ROI in recommendation systems for better customer experience.
As stated earlier, applying ML can be resource-intensive, so infrastructure such as storage devices to store big data, as well as computing resources to process the data and train models, can be road-blocks.
The inability of the system to integrate with existing solutions in your company can pose a challenge.
ML systems need to be scalable in production and handling the operations side of machine learning can be quite difficult.
ML systems need to be monitored and updated with fresh data, and automating this process can pose a challenge as models can easily degrade in quality if not trained with fresh data.
Depending on the problem, some models can take days or weeks to train even when they have enough computing power to leverage. This can pose a challenge to estimate the length of the project,
A model can also take too long to return predictions. For example, a system taking too much time to load recommendations to users might render the entire project unsuccessful.
- Models that are too complex to interpret or explain can pose a challenge so companies would have to look for ways to balance the trade-off of complexity and explainability.
In this blog post, we introduced you to machine learning and its concept as well as a practical machine learning. We aimed to get you to under what machine learning is and why we are opting for practical machine learning. A lot of industry problems need practical and workable solutions, but there are considerations to take on before doing just that.
We will explore proper machine learning project methodologies, as well as how to conduct a needs analysis for machine learning projects in the next blog post in the series.
Let us know if you have any feedback on this post (typo, we missed sometimes, claims are wrong, and so on) in the comment section. We take critique and complements well. :)
Till next time, stay safe. 💚💚
If you also enjoyed this post, do leave a reaction 🔥 to the story, hit the like button 👍🏽, and share 📩 it with your friends that may be interested in learning. See you soon!