[EXPLORATION] 2. scikit-learn 내장 분류 모델 학습

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Altiora Petamus

[EXPLORATION] 2. scikit-learn 내장 분류 모델 학습 본문

SSAC X AIffel/EXPLORATION_SSAC

[EXPLORATION] 2. scikit-learn 내장 분류 모델 학습

현석종 2021. 1. 23. 21:21

scikit-learn의 내장 dataset을 활용한 간단한 분류 모델을 학습시켜보며 핵심내용과 코드에 대한 설명을 적는다.

1. scikit - learn

- 파이썬 기반 머신러닝 분야에서 많이 쓰이는 라이브러리

- 머신러닝의 다양한 알고리즘과 편리한 프레임워크 제공

- 다양한 datasets , Model 내장

scikit-learn.org/stable/datasets.html

7. Dataset loading utilities — scikit-learn 0.24.1 documentation

scikit-learn.org

2. iris dataset classification model

2-1. EDA

from sklearn.datasets import load_iris

iris = load_iris() # iris data load 

print(dir(iris)) 
# dir() - 어떤 객체를 인자로 넣어주면 해당 객체가 어떤 변수와 메소드(method)를 가지고 있는지 반환

# [output]
# ['DESCR', 'data', 'feature_names', 'filename', 'target', 'target_names']

#객체+.(dot)+method 를 이용하여 접근 
iris_data=iris.data #data 할당 
iris_label=iris.target #label 할당 

print("shape of data : ",iris_data.shape) # shape of data :  (150, 4)
print("shape of label : ", iris_label.shape)# shape of label :  (150,)

2-2. Split

scikit-learn의 내장 함수 train_test_split() 함수 이용

-> train_test_split(x,y,test_size,random_state,shuffle)

- x : train data set

- y : label data set

- test_size : 훈련 데이터와 테스트 데이터 셋을 나눌 비율 (0~1) ex) test_size=0.2 -> train set =80%, test set =20%

- random_state : shuffle이 실행될 경우의 시드 값.

- shuffle : shuffle 적용 여부 (True, false(-defalt))

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(iris_data, 
                                                    iris_label, 
                                                    test_size=0.2, 
                                                    random_state=7)
                                                    
                                                    
print("shape of X_train : ",X_train.shape)# shape of X_train :  (120, 4)
print("shape of X_test : ",X_test.shape)# shape of X_test :  (30, 4)
print("shape of y_train : ",y_train.shape)# shape of y_train :  (120,)
print("shape of y_test : ",y_test.shape)# shape of y_test :  (30,)

2-3. Train Model and validation

scikit learn 내장 모델 이용 (decision Tree )

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

decision_tree = DecisionTreeClassifier(random_state=32) # random_state = 난수 seed 설정 
decision_tree.fit(X_train, y_train)
y_pred = decision_tree.predict(X_test)# tarin모델을 이용하여 학습시킨 모델에 test data 입력 

print(classification_report(y_test, y_pred)) 
#claasification_report-각각의 클래스를 양성(positive) 클래스로 보았을 때의 정밀도, 재현율, F1점수를 
#각각 구하고 그 평균값으로 전체 모형의 성능을 평가

[output]

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         7
           1       0.91      0.83      0.87        12
           2       0.83      0.91      0.87        11

    accuracy                           0.90        30
   macro avg       0.91      0.91      0.91        30
weighted avg       0.90      0.90      0.90        30

2-4. Other Models

1) Decision Tree

decision_tree = DecisionTreeClassifier(random_state=32)

https://ratsgo.github.io/machine%20learning/2017/03/26/tree/

2) Random Forest

random_forest = RandomForestClassifier(random_state=32)

medium.com/@deepvalidation/title-3b0e263605de

3) SVM ( Supprot Vector Machine )

from sklearn import svm
svm_model = svm.SVC()

medium.com/@deepvalidation/title-3b0e263605de

4) SGD Classifier

from sklearn.linear_model import SGDClassifier
sgd_model = SGDClassifier()

scikit-learn.org/stable/modules/sgd.html

5) Logisic Regression

from sklearn.linear_model import LogisticRegression
logistic_model = LogisticRegression()

hleecaster.com/ml-logistic-regression-concept/

요약

scikit learn 라이브러리에서 제공하는 내장 데이터셋을 이용한 간단한 분류기를 구현해 보았다.
다음 내용에서 보다 이론적인 내용을 다뤄보도록 한다.

'SSAC X AIffel > EXPLORATION_SSAC' 카테고리의 다른 글

[EXPLORATION] 2-1. scikit-learn Model / Confusion Matrix (0)	2021.02.19
[EXPLORATION] 1. 간단한 이미지 분류기 구현 (0)	2021.01.12

'SSAC X AIffel/EXPLORATION_SSAC' Related Articles

Comments

Altiora Petamus

[EXPLORATION] 2. scikit-learn 내장 분류 모델 학습 본문

[EXPLORATION] 2. scikit-learn 내장 분류 모델 학습

1. scikit - learn

2. iris dataset classification model

2-1. EDA

2-2. Split

2-3. Train Model and validation

2-4. Other Models

요약

'SSAC X AIffel > EXPLORATION_SSAC' 카테고리의 다른 글

티스토리툴바