1. What is machine learning
1) Machine Learning : Finding Regularity in massive datasets
2) Regularities : Knowledge forms (rules, decision trees)
- Machine Learning usually uses inductive knowledge to make predictions.
- The procedure of ML : Data -> Finding regularity -> Representation as diverse forms -> Prediction
3) Machine Learning (Compared to traditional programming)
- ML : Input -> ML -> Knowledge forms
(Traditional programming : Rule based)
4) Applications of ML
- Web search , Computational biology, Finance, E-commerce, Space exploration, Robotics, Social networks, etc.
5) Machine learning is the stduy of algorithms using T / P / E
- T : Tasks
- P : Performance
- E : Experience
a) Examples1 : Autonomous driving
* T : Driving on four-lane highways using vision sensors
* P : average distance traveled before an error
* E : Sequence of images and steering commands recorded while observing a human driver
b) Example 2 : Semiconductor manufuacturing process to predcit normla of fault
* T : predict process result (normal or fault) of a semiconductor manufacturing tool
* P : Prediction accuracy (proportion of wafers that are correctly classified)
* E : Sequences of process monitoring data (Sensor data)
c) Example 3 : Credit system (bank)
* T : predict profitable customers
* P : prediction accuracy
* E : sequences of credit records kept in a bank
6) Notation of ML : Input, Output, Target function, training data, Test data
- X : input samples, Y : output samples, f : target function
- Training data : to generalize from the samples
- Test data : to estimate the output for new samples in the future
* Input is also referred to as : predictor, independent variable, attribute, feature, or explanatory variable, covariate, and regressor
* output is alos referred to as : response, dependent variable
* y_hat = F_hat(X_training) (X: training data)
* F_hat(X_test) = y_hat -> y_hat vs y -> if the difference is minimized, the performance is good -> Generalized well
7) Functional Categorization of ML
- Regression : Training data consist of <input, real-valued output>
/ Task is to predict outputs of new samples <input, ?>
- Classification : Training data consist of <input, labeled output>
/ Task is to classify class labels of new samples <input, ?>
- Clustering : Training data consist of <attribute values(inputs)
/ Task is to group samples such that each group contains samples with similar attributes values
=> By using clustering, we can label input data (Ex. Shopping / VIP or Normal customers)
- Association : Given an item set I, a training sample (also called transaction) consists of items in I to buy (Ex. Market basket data) / Training data consist of such samples / Taks is to find association rules of the form X è Y, where X and Y are subsets of I
(In case of this, we can’t classify which one is input & outputs)
Ex. Diapers in baskets -> Beers in baskets : over 90% -> Find Regularities
8) Categorization based on types of training samples
- Supervised Learning : Training data includes desired outputs (ex. Regression & Classification)
- Unsupervised Learning : Training data does not include desired outputs (ex. Clustering)
- Semi-supervised Learning : Training data includes a few desired outputs(most of data do not have outputs) ex. Autoencoder based DL
- Reinforcement learning : Rewards from a sequence of actions against the environment (Agent in Environment -> Action -> Rewards -> State transition .)
'머신러닝 with Python' 카테고리의 다른 글
[머신러닝 with Python] 상점 신용카드 매출 예측 (DACON 문제) (2/2) (0) | 2024.06.11 |
[머신러닝 with Python] 상점 신용카드 매출 예측 (DACON 문제) (1/2) (0) | 2024.06.10 |
[머신러닝 with 파이썬] 군집화(클러스터링) : K-means & HDBSCAN / 시각화 (0) | 2023.09.27 |
[머신러닝 with 파이썬] PCA / 주성분 분석 / 차원축소 /iris 데이터 활용 (0) | 2023.09.26 |
[딥러닝 with 파이썬] GAN (Generative Adversarial Networks) / 생성적 적대 신경망 / MNIST 데이터로 구현 (2) | 2023.09.25 |