What is Machine Learning?
Other names including “statistic learning” and “shallow machine learning”
“Machine learning is aptly named, because once you choose the model to use and tune it (a.k.a. improve it through adjustments), the machine will use the model to learn the patterns in your data. Then, you can input new conditions (observations) and it will predict the outcome!” — A explanation for a five-year-old, by Megan Dibble
A slightly more sophisticated explanation is “Machine-learning algorithms use statistics to find patterns in massive* amounts of data. And data, here, encompasses a lot of things—numbers, words, images, clicks, what have you. If it can be digitally stored, it can be fed into a machine-learning algorithm.”— by Karen Hao
What is deep learning?
Deep learning models use neural networks that are modeled after how the neurons in our brains work. These models work to find patterns in the dataset; sometimes they find patterns that humans might never recognize.
Neural networks work well with complex data like images and audio. They are behind lots of software functionality that we see all the time these days, from facial recognition (stop being creepy, Facebook) to text classification. Neural networks can be used with data that is labeled (i.e. supervised learning applications) or data that is unlabeled (unsupervised learning) as well.
— A explanation for a five-year-old, by Megan Dibble
If you want to know more, here is a mathematical explanation: http://www.iro.umontreal.ca/~pift6266/H10/notes/mlintro.html
Q1: What is supervised learning?
Outcome variables have known values, for example, correct classifications are known (“apple” or “not apple”). Machines are trained using known data to predict unknown responses.
Q2: What is unsupervised learning?
When outcome variables do not have known values, or the correct classes of the training data are not known, the machine learning is unsupervised.
Q3: What is semi-supervised learning?
It is a combination of supervised and unsupervised learning techniques.
Q4: What are those different models?
- Regression (Supervised learning): Using existing known values to create a regression model, and use the model to predict value from new observations.
Regression and correlation tutorial: https://algobeans.com/2016/01/31/regression-correlation-tutorial/
- Classification (Supervised learning): predict class of a new value based upon known classifications.
K-Nearest Neighbors (KNN): The distances of the new value to certain number (this is what k means. The research set the k value) of the nearest values (these are the “neighbors”) in the known data are used to determine the membership of the new value. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s as a non-parametric technique.
Naive Bayes Classifier: this is basically logistic regression.
Support Vector Machines (SVM): Instead of using distances between values, this approach using training data (meaning known values) to create a line between categories, which maximizes the distance between two closest points belonging to each category.
- Clustering (Unsupervised): group observations into “meaningful” groups
K-mean clustering: if you have seen K-mean clustering in SPSSS, this is the same (remember the Silhouette coefficients). It is run in iteration following these two steps:
- For each cluster, compute the cluster centroid by taking the mean vector of points in the cluster
- Assign each data point to the cluster for which the centroid is the closest
K-Means Clustering Tutorial: https://algobeans.com/2015/11/30/k-means-clustering-laymans-tutorial/
Principle Component Analysis (PCA): it is the same as PCA in SPSS. It is used to reduce dimensions and identify latent variables.
Q5: What is natural language processing?
Classic natural language processing is supervised learning using tagged corpus as training data sets. This chart shows the steps:
Tokenization: breaking a stream of text up into words, phrases, symbols, or other meaningful elements. These are called tokens.
Before tokenization: Mango, banana, pineapple and apple all are fruits.
After tokenization: Mango | Banana | Pineapple | and | Apple | all | are | Fruits
Part of speech tagging: tagging the part of speech of each token.
Stemming: is a technique of stripping suffixes and grouping inflections of the same stem into the same category.
For example, after stemming, the category of “wait” includes “waited,” “waits,” and “waiting.”
Lemmatization: is an alternative to stemming. The advantage of it is more accurate stripping of grammatical inflections. It involves first determining the part of speech of a word and applying different stripping rules for each part of speech.
Named entity recognition (NER): Named entity recognition is the subtask of information extraction that seeks to locate and classify elements in text into pre-defined categories such as the names of the person, organizations, locations, expressions of times, quantities, etc.
N-gram (uni-gram, bi-gram, tri-gram, etc.): is a technique to analyze larger semantic unique (phrases instead of words), to add contexts. N means the number of words per each phrase.
Q6: What is topic modeling?
“Topic Modeling in NLP seeks to find hidden semantic structure in documents. They are probabilistic models that can help you comb through massive amounts of raw text and cluster similar groups of documents together in an unsupervised way.” — by Marc Kelechava