Big Data & Analytics

Author

Carlos Barge

Any academic discipline usually has three kinds of work: theory, methods, and applications. Projects and researchers usually often span one or two of these types of work, though often there are revolutionary renaissance-man types of researchers who straddle all three effortlessly. In the space of data, there are usually two kinds of researchers:

Data Science or Data Mining (Methods+Applications)

Data Mining is a cross-disciplinary field that focuses on discovering the properties of data sets. (Forget about it being the analysis step of “knowledge discovery in databases” KDD, this was maybe true years ago, it is not anymore).

There are different approaches to discovering the properties of data sets. Machine Learning is one of them. Another one is simply looking at the data sets using visualization techniques or Topological Data Analysis.

Researchers here focus on devising new methods and the empirical effectiveness and real-world impact of their application, rather than on fundamental statistical questions of what is the worst performance of a learning algorithm and under what conditions it might be achieved. Works ranges from simple applications of existing machine learning algorithms to investigating the peculiarities of certain datasets and application domains and customizing machine learning methods that will work well on the particular dataset or domain.

Unlike machine learning which derives its inspiration from shortcomings in existing theories or methods, data science derives its inspiration from a deep understanding of the problem domain and the assumptions baked into typical datasets in the domain. Some example domains include recommender systems, image classification, machine translation, etc.

It is clear then that machine learning can be used for data mining. However, data mining can use other techniques besides or on top of machine learning.

Machine Learning or Statistical Learning (Theory+Methods)

Machine Learning is a sub-field of data science that focuses on designing algorithms that can learn from and make predictions on the data. Machine learning includes Supervised Learning and Unsupervised Learning methods. Unsupervised methods actually start off from unlabeled data sets, so, in a way, they are directly related to finding out unknown properties in them (e.g. clusters or rules).

This kind of work looks at shortcomings of current methods or assumptions made by current statistical theory explaining these methods. Therefore, this kind of research leads to the refinement of existing methods or generalization of theories explaining existing methods.

Examples of this kind of work are a faster sampling algorithm for Bayesian networks, investigation of the convergence rate of different kinds of gradient descent, research on the tightness of convex relaxations for certain nonconvex objectives, etc.

Free Pre-Assessment Request

Do you want to know how your competitors are doing business?

Tell us a little about yourself below to gain data for free


Hi What’s your name?

Next

Hi [First Name], what is your company’s name and website?

Previous

Next

Is your company looking for any data on the following services:

Previous

Next

Gotcha! Do you want to monitor any specific competitor or market?

List of Competitors

  • Add competitor…

Previous

Next

Finally, what’s your email address and your phone number?

Previous

Send

Your Data is on the Way!

Our data scientists team is working for you by collecting data and we’ll come back to you shortly with a pre-assessment and proposal.

WYgroup BI uses the information you provide to us to contact you about our relevant content, products, and services . You can unsubscribe from communications from HubSpot at any time. For more information, check out WYgroup’s Privacy Notice.
Comments

Leave a Comment:

Your email address will not be published. Required fields are marked *