Recommendation Basic03

12 minute read

추천 시스템에서 유명한 library 는 surprise 이다. 워낙 유명하나, 개인적으로 한번 밖에 해본적이 없어서,
기억이 가물거리는 관계로 남긴다. Basic 01 에 배경설명 등이 있고, 이어지는 post 이다

CF, latent matrix 를 활용한다.

Medium susanLi

SusanLi github

나름 추천 잘하는 사람일것 같은 사람의 github

Naber Lab Reseacher 초고수

import pandas as pd
import numpy as np

surprise 공식문서 링크

user = pd.read_csv('D:/★2020_ML_DL_Project/Alchemy/dataset/BX-Users.csv', sep=';', error_bad_lines=False, encoding="latin-1")
user.columns = ['userID', 'Location', 'Age']
rating = pd.read_csv('D:/★2020_ML_DL_Project/Alchemy/dataset/BX-Book-Ratings.csv', sep=';', error_bad_lines=False, encoding="latin-1")
rating.columns = ['userID', 'ISBN', 'bookRating']

df = pd.merge(user, rating, on='userID', how='inner') ## inner join 했지만, 딱히 줄어들거나, 변경된것은 없는 듯.
df.drop(['Location', 'Age'], axis=1, inplace=True)

print(df.shape)
df.head(3) ## 별차이 없네..그냥 rating 데이터 하나만 써도 될듯 1149779

(1149779, 3)

	userID	ISBN	bookRating
0	2	0195153448	0
1	7	034542252	0
2	8	0002005018	5

본격 EDA
Rating Distribution 이미 이전 포스팅에서 다루었으니, 생략한다.

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(style="white", context="talk")

대략 62%에 달하는 user 들이 rating에 0 점을 부여했다. 이 0점은 과연 book 평점을 최악을 준걸까? 아니면, 평점 자체를 달지 않은걸까? 어떻게 봐야하나?

책별로, 평점이 가장 많이 달린걸 보자면…

min_book_ratings = 50
filter_books = df['ISBN'].value_counts() > min_book_ratings  ## 50보다 큰 ISBN 번호들만 남긴다. 
filter_books = filter_books[filter_books].index.tolist() ## 평점 50개 달린,(또는 그이상)되는 책들만, 리스트화 한다.

min_user_ratings = 50
filter_users = df['userID'].value_counts() > min_user_ratings
filter_users = filter_users[filter_users].index.tolist()

df_new = df[(df['ISBN'].isin(filter_books)) & (df['userID'].isin(filter_users))] ## 최소 충족요건을 만족한 data 들만, 추려서 들고온다.
print('The original data frame shape:\t{}'.format(df.shape))
print('The new data frame shape:\t{}'.format(df_new.shape))

The original data frame shape:	(1149779, 3)
The new data frame shape:	(140516, 3)

df_new.head()

	userID	ISBN	bookRating
394	243	0060915544	10
395	243	0060977493	7
397	243	0156006529	0
400	243	0316096199	0
401	243	0316601950	9

진짜 Surprise library 사용하기

from surprise import Reader
from surprise import Dataset
## Dataset 모듈에서, Reader class를 param으로 사용한다.
reader = Reader(rating_scale=(0, 9)) ## 실제 data 는 0~10 까지인데...왜 susan li 는 0~9 로 했을까? --> 이상하지만, 10점은 그래도 10점으로 변환된 값을 가지고 있따.
## (rating_scale=(0, 10) 으로 해도 크게 달라지는 건 없다.
s_data = Dataset.load_from_df(df_new[['userID', 'ISBN', 'bookRating']], reader)

## train test 셋으로 나누어서 해보기
from surprise import model_selection
s_data_train, s_data_test = model_selection.train_test_split(data=s_data,test_size=0.2,random_state=42,shuffle=True)

from surprise import accuracy

We use rmse as our accuracy metric for the predictions.

https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html 상세한 설명은 여기에

여기서부터는 surprise의 여러 알고리즘이 등장하는데, Susan Li 의 설명을 보고도 몰라서, 그냥 내가 하나하나 정리하면서, 올린다.

Basic01 post 에 남겼다.
하기는 수식 notaion에 대한 설명

갑자기 기억이 안나서 찾아본다…. 원본 : https://goofcode.github.io/similarity-measure
토막상식

df_new.bookRating.value_counts().sort_index(ascending=False)[0:5]

   8778
    7966
   10381
    6694
    2917
Name: bookRating, dtype: int64

Basic algorithms

NormalPredictor / BaselineOnly
이전 POST 참고

k-NN algorithms

KNNBasic / KNNWithMeans /KNNBaseline
이전 POST 참고

학문적 이해에 대해 탁월한 주소

Matrix Factorization-based algorithms

SVD

SVD algorithm is equivalent to Probabilistic Matrix Factorization (http://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf)
제일 유명한 알고리즘이며, Netflix Prize에서 처음 선보였다.
단일 알고리즘으로는 가장 우수한 성능을 낸다고 알려져있다.

유명한 이론이기고 나에게는 어렵기에 상세히는 별도 문서를 찾아볼 것을 권한다. 일단 나를 위해 기록을 남기자면, 위의 공식은 surprise 에 document에 있는 내용이다.
그런, 이것만으로는 어뜻 이해가 가지 않는다. 실제 SVD = Sigular Value Decomposition 부터 이해하고, 접근하기게 나은 방법이다.
고유값의 이해 링크

R = P.Q (원 matrix 가 R 이면, 이는 P.Q 로도 분해될 수 있다.

고유값 분해정의

SVD 수학적 정의

비유하자면, A 가 m*n 행렬 (유저-아템-rating) 매트릭스라고 생각하면 된다.

이를 풀이하면, A는 직교행렬이기에 바로 eigen-vector-decomposition이 되지 못한다. 그래서, AAt 또는 AtA 로 정방행렬이 된 상태에서,
고유값 분해를 하게 된다.
(※ 정방행렬이 고유값 분해가 항상 가능한 것은 아니지만 대칭행렬은 항상 고유값 분해가 가능하며 더구나 직교행렬로 대각화가 가능함을 기억하자.)
그렇게 되면, 상기처럼 고유값분해가 이루이고 그 와중에 U,V,∑ 이 정의된다.

U : m*m 직교행렬 A의 left singular vector [= AA(transform) eigenvector]

AAt 를 고유값 분해하면, 식 (7) 처럼 U는 고육벡터, (∑∑t) 는 고유치 대각행렬 Ut 는 U역행렬 이다. (U는 직교행렬이기에 U역행렬=Ut 이다.) P 는 AAt 의 고유벡터, ∧ 는 고유치 대각행렬이다.

V : m*n 직사각 대각행렬 A의 right sigular vector [=(ATA의 eigenvector))]

∑ : n*n 행렬 A의 고유값 대각행렬에 square root 를 씌운값

SVD 를 완전히, 상기 처럼, 수학정 정의에 의해서 분해하는 것을 full SVD 라고 하는데, 실상 python surprise 에 적용된 것과 같이, 실전에서는 Full SVD 를 사용하지 않는다.
어느정도 0이 아닌 고유값들만을 살린 행렬로 가져가는데,
Full SVD -> thin SVD -> Compat SVD -> Truncate SVD 로 축소해서, 원 A가 아닌 A’ 로 근사해서 값을 해를 찾는다.
surprise에서도 엄밀히는 Truncate SVD 를 이용하고 있다. 그림으로 살펴보면

다시, python surprise로 돌아와서, P,Q 의미를 생각하면,
P = U∑ , R = ∑tVt 란 생각이 든다. 왜냐하면, 최초에 R=nm / P=nk / Q=m*k 이 적절한 해석일 것이다.
아래의 SVD 클래스에서, 기본적으로, n_factors 의 요소가 k 를 의미한다. 얼마나 고유값 대각행렬을 결정할 (차원축소) 할 것이냐는 얘기이기도 하다

from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split

from surprise import SVD,SVDpp,NMF

algo = SVD(n_factors=150,
           n_epochs=20,
           biased=True, ## False 로 하면, Probabilistic Matrix Factorization 이론과 동일해진다. 즉, 정말 Sigular value decomposition 해 풀이로 들어간다.
                       ## 이 의미는 True 일경우, baseline 의 개념이 결합이 되어, 목적함수가 정해지는 의미이다.
           init_mean=0, ## The mean of the normal distribution for factor vectors initialization. Default is 0.
           init_std_dev=0.1, ## The standard deviation of the normal distribution for factor vectors initialization. Default is 0.1.
           lr_all=0.005, ## The learning rate for all parameters. Default is 0.005.
           reg_all=0.02, ## The regularization term for all parameters. Default is 0.02.
           lr_bu=None, ## The learning rate for bu. Takes precedence over lr_all if set. Default is None.
           lr_bi=None, ##  The learning rate for bi. Takes precedence over lr_all if set. Default is None.
           lr_pu=None, ## The learning rate for pu. Takes precedence over lr_all if set. Default is None.
           lr_qi=None, ## The learning rate for qi. Takes precedence over lr_all if set. Default is None.
           reg_bu=0.001, ##  The regularization term for bu. Takes precedence over reg_all if set. Default is None.
           reg_bi=0.001, ## The regularization term for bi. Takes precedence over reg_all if set.(reg_all설정보다 우선한다.) Default is None.
           reg_pu=0.001, ## The regularization term for pu. Takes precedence over reg_all if set. Default is None.
           reg_qi=0.001, ## The regularization term for qi. Takes precedence over reg_all if set. Default is None.
           random_state=None,verbose=False,)

results = cross_validate(algo, s_data, measures=['RMSE'], cv=3, verbose=False)     ## RMSE : 평균제곱근편차
print(results['test_rmse'])

[3.5485304  3.52974876 3.52158998]

상기 surprise 의 경우는 baseline 이론과도 결합되어 있다. 따라서, 조절해줘야 하는 hyper parameter 가 많다.
완전히 수학적의미로, 활용하려면, biased = Fasle 로 하면되는데. 논문에 따르면, 과하게 overfitting 되기 때문에, 선호되는 방법은 아니라고 하니…따라하자.
당연하겠지만, 정규화 term 을 넣은 상태로는 그닥 결과가 좋지 않다. default 로 했더니, 좀더 낫다….

SVDpp

The SVDpp algorithm is an extension of SVD that takes into account implicit ratings.
수식이 이해가 잘 안간다….

algo = SVDpp(n_factors=20,
             n_epochs=20,
             init_mean=0,
             init_std_dev=0.1,
             lr_all=0.007,
             reg_all=0.01,
             lr_bu=None,
             lr_bi=None,
             lr_pu=None,
             lr_qi=None,
             lr_yj=None,
             reg_bu=None,
             reg_bi=None,
             reg_pu=None,
             reg_qi=None,
             reg_yj=None, ## – The regularization term for yj. Takes precedence over reg_all if set. Default is None.
             random_state=None,verbose=False)

results = cross_validate(algo, s_data, measures=['RMSE'], cv=3, verbose=False)     ## RMSE : 평균제곱근편차
print(results['test_rmse'])

[3.83129776 3.83604476 3.84379468]

NMF

NMF is a collaborative filtering algorithm based on Non-negative Matrix Factorization. It is very similar with SVD.
이건 생략

Slope One

Slope One is a straightforward implementation of the SlopeOne algorithm. (https://arxiv.org/abs/cs/0702144)
CF 알고리즘의 가장 단순한 형태. simulatiry 를 평균으로 구하는 방법이다.

from surprise import SlopeOne
from surprise import CoClustering
algo = SlopeOne()
results = cross_validate(algo, s_data, measures=['RMSE'], cv=3, verbose=False)     ## RMSE : 평균제곱근편차
print(results['test_rmse'])

[3.47530464 3.47201827 3.46743209]

Co-clustering

Co-clustering is a collaborative filtering algorithm based on co-clustering (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.113.6458&rep=rep1&type=pdf) We use rmse as our accuracy metric for the predictions.
neigbor 를 설정할때 clustering 을 이용하는 방법이다

algo = CoClustering( n_cltr_u=5,n_cltr_i=5,
                    n_epochs=20,random_state=None,verbose=False)
results = cross_validate(algo, s_data, measures=['RMSE'], cv=3, verbose=False)     ## RMSE : 평균제곱근편차
print(results['test_rmse'])

[3.53167786 3.53866095 3.53000255]

모든 방법을 몇개는 건너띄고 살펴보았다. 현 data 에서, 어떤 방안을 사용할까를 고민해서, 모두 활용하기 위해 for 문으로 아래와 같이 실행한다.

## 모든 객체와 라이브러니는 surprise package 안에 있는 것을 사용한다.
from surprise import NormalPredictor
from surprise import KNNBasic
from surprise import KNNWithMeans
from surprise import KNNWithZScore
from surprise import KNNBaseline
from surprise import SVD
from surprise import BaselineOnly
from surprise import SVDpp
from surprise import NMF
benchmark = [] ## 무사통과 알고리즘 : 
# Iterate over all algorithms  ## 에러나는 알고리즘 NMF(),
%time
for algorithm in [ SVD(), SVDpp(), SlopeOne(), NormalPredictor(), KNNBaseline(), KNNBasic(), KNNWithMeans(), KNNWithZScore(), BaselineOnly(), CoClustering()]:
    # Perform cross validation
    print(algorithm)
    results = cross_validate(algorithm, s_data, measures=['RMSE'], cv=3, verbose=False)     ## RMSE : 평균제곱근편차
    
    # Get results & append algorithm name
    tmp = pd.DataFrame.from_dict(results).mean(axis=0)
    tmp = tmp.append(pd.Series([str(algorithm).split(' ')[0].split('.')[-1]], index=['Algorithm']))
    benchmark.append(tmp)

Wall time: 0 ns
<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x0000017421E6C9E8>
<surprise.prediction_algorithms.matrix_factorization.SVDpp object at 0x0000017421E6CAC8>
<surprise.prediction_algorithms.slope_one.SlopeOne object at 0x0000017421E6CA90>
<surprise.prediction_algorithms.random_pred.NormalPredictor object at 0x0000017421E6CB00>
<surprise.prediction_algorithms.knns.KNNBaseline object at 0x0000017421E6CB38>
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
Estimating biases using als...
Computing the msd similarity matrix...
Done computing similarity matrix.
<surprise.prediction_algorithms.knns.KNNBasic object at 0x0000017421E6CB70>
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
<surprise.prediction_algorithms.knns.KNNWithMeans object at 0x0000017421E6CBA8>
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
<surprise.prediction_algorithms.knns.KNNWithZScore object at 0x0000017421E6CBE0>
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
<surprise.prediction_algorithms.baseline_only.BaselineOnly object at 0x0000017421E6CC18>
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
<surprise.prediction_algorithms.co_clustering.CoClustering object at 0x0000017421E6CC50>

의외로 결과가, 심플한 알고리즘에 속하는 baseline 이 좋게 나왔다…
latent matrix 쪽은 다소 baseline 보다는 약하게 나옴

surprise_results = pd.DataFrame(benchmark).set_index('Algorithm').sort_values('test_rmse')
surprise_results

	test_rmse	fit_time	test_time
Algorithm
BaselineOnly	3.374667	0.192807	0.196814
CoClustering	3.472809	1.557530	0.258480
SlopeOne	3.476510	0.551533	3.308080
KNNWithMeans	3.482859	0.514802	4.464338
KNNBaseline	3.496288	0.654103	5.382493
KNNWithZScore	3.505984	0.609689	4.832343
SVD	3.543556	4.425603	0.290065
KNNBasic	3.727995	0.487108	4.100230
SVDpp	3.798871	105.043733	4.350798
NormalPredictor	4.681728	0.120685	0.270594

print('Using ALS')
bsl_options = {'method': 'als',
               'n_epochs': 5,
               'reg_u': 12,
               'reg_i': 5
               }
algo = BaselineOnly(bsl_options=bsl_options)
cross_validate(algo, s_data, measures=['RMSE'], cv=3, verbose=False)

trainset, testset = train_test_split(s_data, test_size=0.25)
algo = BaselineOnly(bsl_options=bsl_options)
predictions = algo.fit(trainset).test(testset)
accuracy.rmse(predictions)

Using ALS
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
RMSE: 3.3756

3.3755915082844634

trainset = algo.trainset ## fit 할때, 저장된다.
print(algo.__class__.__name__)

BaselineOnly

데이터 결과분석을 하기 위해, Susan Li 와 동일하게 실행해보면,

def get_II(uid): ## 실제 user id 가 trainset 데이터셋에서, 어떤 raiting을 준, 책 갯수 (평점을 줬는지에 대한)
    try:
        return len(trainset.ur[trainset.to_inner_uid(uid)])
    except ValueError: # user was not part of the trainset
        return 0
    
def get_Ui(iid): ## trainset 에서 아이템별 '책'별로 얼마다 평점이 달려있는지 보는 갯수
    try: 
        return len(trainset.ir[trainset.to_inner_iid(iid)])
    except ValueError:
        return 0

print(type(predictions[0]),'\t',predictions[20])
## predictions[0] dictionary 처럼 사용할 수는 없다...그냥 tuple 처럼, 값이 나온다고 생각하고, DataFrame 으로 변경해야한다.

<class 'surprise.prediction_algorithms.predictions.Prediction'> 	 user: 10819      item: 0671003755 r_ui = 0.00   est = 1.99   {'was_impossible': False}

df = pd.DataFrame(predictions, columns=['uid', 'iid', 'rui', 'est', 'details'])
df['Ii_cnt'] = df.uid.apply(get_II) ## 실제 user id 가 trainset 데이터셋에서, 어떤 raiting을 준, 책 갯수 (평점을 줬는지에 대한)
df['Ui_cnt'] = df.iid.apply(get_Ui) ## trainset 에서 아이템별 '책'별로 얼마다 평점이 달려있는지 보는 갯수
df['err'] = abs(df.est - df.rui)

df.head()

	uid	iid	est	details	Ii_cnt	Ui_cnt	err
0	31315	0802139256	2.564510	{'was_impossible': False}	246	48	2.564510
1	242073	0446608890	1.405207	{'was_impossible': False}	45	68	1.405207
2	227447	0446609617	0.384353	{'was_impossible': False}	315	38	0.384353
3	196077	0440204887	1.472857	{'was_impossible': False}	286	31	1.472857
4	49842	0553582747	2.458348	{'was_impossible': False}	19	63	2.458348

df.loc[df.rui!=0,:].head() ## 일단, test 셋이니, 실제 rating 값과, est 값은 모두 나오게 된다.

	uid	iid	rui	est	details	Ii_cnt	Ui_cnt	err
7	116758	0312195516	10.0	5.627643	{'was_impossible': False}	19	264	4.372357
15	18082	0440221471	8.0	2.612378	{'was_impossible': False}	25	171	5.387622
22	123981	0425121259	8.0	2.001490	{'was_impossible': False}	225	42	5.998510
25	165183	0060976845	10.0	4.094378	{'was_impossible': False}	25	187	5.905622
30	100459	037570504X	8.0	2.692422	{'was_impossible': False}	90	70	5.307578

best_predictions = df.sort_values(by='err')[:10] ## 제일 작은순으로, 앞에서부터 10 개
worst_predictions = df.sort_values(by='err')[-10:] ## 제일 작은순으로 했을때 뒤에서 부터 10개

best_predictions

	uid	iid	details	Ii_cnt	Ui_cnt
9837	79942	0971880107	{'was_impossible': False}	23	627
7849	225810	0394742117	{'was_impossible': False}	214	30
28966	82926	0345386108	{'was_impossible': False}	40	65
28951	102967	0425172996	{'was_impossible': False}	396	45
7923	234623	0553585118	{'was_impossible': False}	242	29
28916	35050	0446607711	{'was_impossible': False}	126	81
7945	98741	0064405842	{'was_impossible': False}	239	30
8018	78783	0440224845	{'was_impossible': False}	356	38
8074	73394	0345447840	{'was_impossible': False}	234	58
8091	102967	0440201926	{'was_impossible': False}	396	47

worst_predictions

	uid	iid	rui	est	details	Ii_cnt	Ui_cnt	err
33276	245827	0451183665	10.0	0.304337	{'was_impossible': False}	122	90	9.695663
12093	245864	0345409876	10.0	0.293552	{'was_impossible': False}	28	34	9.706448
31016	69697	0425183971	10.0	0.269747	{'was_impossible': False}	149	45	9.730253
10868	245963	0425170349	10.0	0.267484	{'was_impossible': False}	128	49	9.732516
6584	77940	0671027581	10.0	0.259088	{'was_impossible': False}	52	43	9.740912
17718	14521	0553269631	10.0	0.219905	{'was_impossible': False}	174	27	9.780095
31077	227447	0515132268	10.0	0.163739	{'was_impossible': False}	315	32	9.836261
12162	238120	0385413041	10.0	0.000000	{'was_impossible': False}	327	31	10.000000
8109	227447	055356773X	10.0	0.000000	{'was_impossible': False}	315	48	10.000000
20402	55548	0553278398	10.0	0.000000	{'was_impossible': False}	134	24	10.000000

전체 성능을 3.374667 로 알고 있기에 이를 기준으로 판단해야 한다.

완전히 예측 rating 이 틀린 경우에 대해서 모델이 잘못되었다고 할 수 있을까? 예를 들어, 특정 책 0515132268 은 월등히 0 점 맞은 비율이 높은데,
실제로 227447 유저가 10점을 주었다는건….이 사람이 더 특이한 취양이 아닐까…약간 이상치 느낌이 아닐까란 생각을 하게 된다.

print(df.loc[df.iid=='0515132268',:].shape)
df.loc[df.iid=='0515132268',:].rui.value_counts()

(15, 8)

0     11
0     1
0      1
0      1
0      1
Name: rui, dtype: int64

df.loc[df.iid=='0515132268',:]

	uid	iid	rui	est	details	Ii_cnt	Ui_cnt	err
350	11676	0515132268	9.0	4.399784	{'was_impossible': False}	1132	32	4.600216
4498	94923	0515132268	0.0	0.881753	{'was_impossible': False}	45	32	0.881753
5570	159033	0515132268	8.0	0.631298	{'was_impossible': False}	155	32	7.368702
6053	70065	0515132268	0.0	1.186498	{'was_impossible': False}	28	32	1.186498
7656	136382	0515132268	0.0	1.875635	{'was_impossible': False}	83	32	1.875635
7930	165319	0515132268	0.0	1.732530	{'was_impossible': False}	74	32	1.732530
12488	234597	0515132268	0.0	1.992210	{'was_impossible': False}	48	32	1.992210
18132	123095	0515132268	5.0	3.708091	{'was_impossible': False}	33	32	1.291909
21573	135351	0515132268	0.0	3.109227	{'was_impossible': False}	22	32	3.109227
22122	55492	0515132268	0.0	0.175138	{'was_impossible': False}	374	32	0.175138
25194	51450	0515132268	0.0	1.163447	{'was_impossible': False}	82	32	1.163447
26347	151589	0515132268	0.0	0.381658	{'was_impossible': False}	39	32	0.381658
26768	8936	0515132268	0.0	0.311826	{'was_impossible': False}	89	32	0.311826
31077	227447	0515132268	10.0	0.163739	{'was_impossible': False}	315	32	9.836261
32133	243930	0515132268	0.0	2.573211	{'was_impossible': False}	35	32	2.573211

inner_uid = trainset.to_inner_uid(11676)
len(trainset.ur[inner_uid]) ## return (item_inner_id, rating)

틈틈이 update 예정
End

Twitter Facebook LinkedIn

취미로 먹고 살고싶은 IT 개발자

Recommendation Basic03

Basic algorithms

k-NN algorithms

Matrix Factorization-based algorithms

Slope One

Co-clustering

Comments

You May Also Enjoy

System Sample

Movie reaction Sentiment Analysis using CNN (Naver Movie)

Sentiment Analysis using Korean

Word Embedding using konlpy 02