What’s the relationship between machine learning and data mining?
Any academic discipline usually has three kinds of work: theory, methods, and applications. Projects and researchers usually often span one or two of these types of work, though often there are revolutionary renaissance-man types of researchers who straddle all three effortlessly. In the space of data, there are usually two kinds of researchers:
Data Science or Data Mining (Methods+Applications)
Data Mining is a cross-disciplinary field that focuses on discovering the properties of data sets. (Forget about it being the analysis step of “knowledge discovery in databases” KDD, this was maybe true years ago, it is not anymore).
There are different approaches to discovering the properties of data sets. Machine Learning is one of them. Another one is simply looking at the data sets using visualization techniques or Topological Data Analysis.
Researchers here focus on devising new methods and the empirical effectiveness and real-world impact of their application, rather than on fundamental statistical questions of what is the worst performance of a learning algorithm and under what conditions it might be achieved. Works ranges from simple applications of existing machine learning algorithms to investigating the peculiarities of certain datasets and application domains and customizing machine learning methods that will work well on the particular dataset or domain.
Unlike machine learning which derives its inspiration from shortcomings in existing theories or methods, data science derives its inspiration from a deep understanding of the problem domain and the assumptions baked into typical datasets in the domain. Some example domains include recommender systems, image classification, machine translation, etc.
It is clear then that machine learning can be used for data mining. However, data mining can use other techniques besides or on top of machine learning.
Machine Learning or Statistical Learning (Theory+Methods)
Machine Learning is a sub-field of data science that focuses on designing algorithms that can learn from and make predictions on the data. Machine learning includes Supervised Learning and Unsupervised Learning methods. Unsupervised methods actually start off from unlabeled data sets, so, in a way, they are directly related to finding out unknown properties in them (e.g. clusters or rules).
This kind of work looks at shortcomings of current methods or assumptions made by current statistical theory explaining these methods. Therefore, this kind of research leads to the refinement of existing methods or generalization of theories explaining existing methods.
Examples of this kind of work are a faster sampling algorithm for Bayesian networks, investigation of the convergence rate of different kinds of gradient descent, research on the tightness of convex relaxations for certain nonconvex objectives, etc.
Do you want to know how your competitors are doing business?
Tell us a little about yourself below to gain data for free
Hi What’s your name?
Gotcha! Do you want to monitor any specific competitor or market?
List of Competitors
- Add competitor…
Your Data is on the Way!
Our data scientists team is working for you by collecting data and we’ll come back to you shortly with a pre-assessment and proposal.