Machine learning is the subfield of computer science that, according to Arthur Samuel in 1959, gives “computers the ability to learn without being explicitly programmed.” Wikipedia.
Let’s decode what this definition means.
Although not specifically mentioned in the definition, the first most important concept to grasp is the feature of self-learning. This refers to the application of statistical models to detect patterns and improve performance based on data and empirical information; all without direct programming commands. This is what Arthur Samuel describes as “the ability to learn without being explicitly programmed.”
Samuel doesn’t propose that machines formulate decisions without any programming. On the contrary, machine learning is heavily dependent on computer programming. Samuel’s observation is that machines don’t need a direct input command in order to perform a set task, but rather, input data.
An example of an input command would be typing ‘2+2’ into a programming language such as Python and hitting ‘Enter.’
This represents a direct command with a direct answer.
Input data, however, is different. Data is fed to the machine, an algorithm is selected, hyperparameters (settings) are adjusted, and the machine is instructed to conduct its own analysis. The machine proceeds to decipher patterns found in the data through the process of trial and error. The machine’s hypothesis, formed from analyzing the data, can then be utilized to predict future values.
Although there exists a strong relationship between the programmer and the machine, they stand a layer apart compared to traditional programming. This is because the machine is formulating decisions based on its own experience and the data at hand, mimicking the process of human-based decision making.
As an example, let’s say the machine identifies a relationship between data scientists who like watching cat videos or a pattern amongst the physical traits of baseball players and their likelihood of winning the season’s Most Valuable Player (MVP) award.
In both cases, the machine is programmed to conduct the task of examining YouTube browsing habits of data scientists and assessing the physical features of previous baseball MVPs respectively. However, in neither case, was the machine programmed to predict a direct outcome. The programmer fed the input data, configured the computing power and nominated the algorithms, but the final decision is determined by the machine’s self-learning process.
The difference between machine learning and traditional programming may seem trivial at first but it will make more sense as you run through examples and come to appreciate the special power of self-learning.
The second most important thing to take away from this post is how machine learning fits into the broader landscape of data and computer science. This means understanding how machine learning interrelates with parent fields and sister disciplines. This is important, as these are the terms you will see time and again when searching for relevant study materials and hear mentioned ad nauseam in machine learning courses. Relevant disciplines can also be difficult and confusing to tell apart at first glance, such as ‘machine learning’ and ‘data mining.’
The lineage of machine learning can be understood by first examining its forefathers. Machine learning derives first from computer science. This particular relationship was introduced by Arthur Samuel at the start of this post. Computer science encompasses everything interrelated to the design and use of computers.
Within the all-encompassing space of computer science is the next broad field of data science. Narrower than computer science, data science comprises methods and systems to extract knowledge and insights from data through the use of computers.
Popping out from computer science and data science as the third babushka doll is artificial intelligence. Artificial intelligence, or AI, encompasses the ability of machines to perform intellectual and cognitive tasks. Comparable to the way the Industrial Revolution gave birth to machines that could simulate physical activities, AI is now driving the development of machines capable of simulating cognitive tasks.
While still broad but dramatically more honed than computer science and data science, AI contains numerous subfields that are popular today. These subfields include search and planning, reasoning and knowledge representation, perception, natural language processing (NLP), and of course, machine learning. Machine learning bleeds into other fields of AI, including NLP and perception through the shared use of self-learning algorithms.
For students with an interest in AI, machine learning provides an excellent starting point. Given the conceptual ambiguity of AI, machine learning offers a narrow and practical domain of study. Algorithms taught in machine learning can be applied across other disciplines, including perception and natural language processing. In addition, a Master’s degree is adequate to develop a certain level of expertise in machine learning, but you would need a PhD to make any significant progress in AI.
As mentioned earlier, machine learning also overlaps with data mining. Data mining is a sister discipline that focuses on discovering and unearthing patterns in large datasets.
Popular self-learning algorithms such as k-means clustering, association analysis, and regression analysis are used in both data mining and machine learning to analyze data. But where machine learning focuses on the incremental process of self-learning, data mining narrows its efforts on cleaning up large datasets to uncover valuable new insight.
The difference between data mining and machine learning can be explained through an analogy to two teams of archaeologists.
The first team of archaeologists focus on removing debris that lies in the way of valuable items hidden from sight. The team’s primary goal is to excavate the area, find new valuable discoveries, and then pack up their equipment and leave. A day later, they will fly to another exotic destination and start a new project with no correlation to the site they excavated the day before.
The second team is also in the business of excavating historical sites but their approach is different. They hold off from excavating the main pit for many weeks. In that time, they visit other relevant archaeological sites in the region and examine how each site was excavated. After returning to the site of their own project, they apply this knowledge to excavate smaller pits surrounding the main pit.
The archaeologists then analyze the results. After reflecting on their experience excavating one pit, they improve their efforts at tackling the next. This includes predicting the amount of time it takes to excavate the pit, understanding variance and patterns found in the local terrain and developing new strategies to improve the accuracy of their work. From this experience, they are able to optimize their approach to form a strategic model that they will implement to excavate the main pit.
If it is not already clear, the first team subscribes to data mining and the second team to machine learning.
At a micro-level, both data mining and machine learning appear similar and use many of the same tools. Both teams make a living excavating historical sites to discover valuable items. But in practice, their methodology is different. The machine learning team focus on self-learning and improving future predictions based on previous experience. Meanwhile, the data mining team concentrate on excavating the target area as effectively as possible before moving on to the next clean-up job.