[Machine Learning] What is machine learning? What is ML?

1. What is machine learning

1) Machine Learning : Finding Regularity in massive datasets

2) Regularities : Knowledge forms (rules, decision trees)

- Machine Learning usually uses inductive knowledge to make predictions.

- The procedure of ML : Data -> Finding regularity -> Representation as diverse forms -> Prediction

3) Machine Learning (Compared to traditional programming)

- ML : Input -> ML -> Knowledge forms

(Traditional programming : Rule based)

4) Applications of ML

- Web search , Computational biology, Finance, E-commerce, Space exploration, Robotics, Social networks, etc.

5) Machine learning is the stduy of algorithms using T / P / E

- T : Tasks

- P : Performance

- E : Experience

a) Examples1 : Autonomous driving

* T : Driving on four-lane highways using vision sensors

* P : average distance traveled before an error

* E : Sequence of images and steering commands recorded while observing a human driver

b) Example 2 : Semiconductor manufuacturing process to predcit normla of fault

* T : predict process result (normal or fault) of a semiconductor manufacturing tool

* P : Prediction accuracy (proportion of wafers that are correctly classified)

* E : Sequences of process monitoring data (Sensor data)

c) Example 3 : Credit system (bank)

* T : predict profitable customers

* P : prediction accuracy

* E : sequences of credit records kept in a bank

6) Notation of ML : Input, Output, Target function, training data, Test data

- X : input samples, Y : output samples, f : target function

- Training data : to generalize from the samples

- Test data : to estimate the output for new samples in the future

* Input is also referred to as : predictor, independent variable, attribute, feature, or explanatory variable, covariate, and regressor

* output is alos referred to as : response, dependent variable

* y_hat = F_hat(X_training) (X: training data)

* F_hat(X_test) = y_hat -> y_hat vs y -> if the difference is minimized, the performance is good -> Generalized well

7) Functional Categorization of ML

- Regression : Training data consist of <input, real-valued output>

/ Task is to predict outputs of new samples <input, ?>

- Classification : Training data consist of <input, labeled output>

/ Task is to classify class labels of new samples <input, ?>

- Clustering : Training data consist of <attribute values(inputs)

/ Task is to group samples such that each group contains samples with similar attributes values

=> By using clustering, we can label input data (Ex. Shopping / VIP or Normal customers)

- Association : Given an item set I, a training sample (also called transaction) consists of items in I to buy (Ex. Market basket data) / Training data consist of such samples / Taks is to find association rules of the form X è Y, where X and Y are subsets of I

(In case of this, we can’t classify which one is input & outputs)

Ex. Diapers in baskets -> Beers in baskets : over 90% -> Find Regularities

8) Categorization based on types of training samples

- Supervised Learning : Training data includes desired outputs (ex. Regression & Classification)

- Unsupervised Learning : Training data does not include desired outputs (ex. Clustering)

- Semi-supervised Learning : Training data includes a few desired outputs(most of data do not have outputs) ex. Autoencoder based DL

- Reinforcement learning : Rewards from a sequence of actions against the environment (Agent in Environment -> Action -> Rewards -> State transition .)

저작자표시 비영리 변경금지 (새창열림)

'머신러닝 with Python' 카테고리의 다른 글

[머신러닝 with Python] 상점 신용카드 매출 예측 (DACON 문제) (2/2) (0)	2024.06.11
[머신러닝 with Python] 상점 신용카드 매출 예측 (DACON 문제) (1/2) (0)	2024.06.10
[머신러닝 with 파이썬] 군집화(클러스터링) : K-means & HDBSCAN / 시각화 (0)	2023.09.27
[머신러닝 with 파이썬] PCA / 주성분 분석 / 차원축소 /iris 데이터 활용 (0)	2023.09.26
[딥러닝 with 파이썬] GAN (Generative Adversarial Networks) / 생성적 적대 신경망 / MNIST 데이터로 구현 (2)	2023.09.25