Introduction of data analysis course of NTUST 2018


This course content includes that Classification(Supervised Method:),Clustering(Unsupervised Method),Association Analysis. The result should be uploaded to Kaggle’s competition, but since the game is not open to the public, I will not post the URL.

这是 2018 年春季学期台湾科技大学资料科学介绍课程的三个作业内容,包含分类(有监督式)、聚类(无监督式)、和关联分析。原来结果都是要上传到 Kaggle 对应的比赛,但鉴于比赛不对外开放,我就不贴网址了。



The first operation is based on the basic information of the bank’s customers to predict whether the customer will finally settle the deposit.



There are a total of 16 attributes relating to clients in the training materials, including age, marital status, education level… (see training_data.csv for details), and 1 output attribute (yes or no to handle the deposit ) ).

The result is a CSV file. (A total of 4523 columns, 2 rows) The first column is id, indicating the bank customer’s number (Note! id starts at 0) The second column is ans, indicating whether the customer will handle the deposit (if it is yes, please use 1; if it is no, please use 0).


结果是一个CSV文件。(一共有4523列,2行) 第一栏为id,表示银行客户的编号(注意! id从0开始) 第二栏为ans,表示客户是否会办理定存(如果是yes的话,请用1表示,如果是no的话,请用0表示)

Attribute Introduction



This homework maybe modifyed from this program. And add some more attributes. Our goal is to use clustering method to analysis which Pokémon is most same.

这个作业可能从这个程序修改而来, 并添加一些更多的属性。 我们的目标是使用聚类分析法 (K-means) 来分析哪个神奇宝贝最相同。


A highly-upvoted post on the Reddit subreddit ‘/r/dataisbeautiful’ by /u/nvvknvvk charts the Height vs. Weight of the original 151 Pokémon.

Anh Le of Duke University posted a cluster analysis of the original 151 Pokémon using principal component analysis (PCA), by compressing the 6 primary Pokémon stats into 2 dimensions.However, those visualizations think too small, and only on a small subset of Pokémon. This dataset can also be broken into three dimensions to analyze and research.

一篇 高点赞数的帖子 在Reddit subreddit’/ r / dataisbeautiful’中的/ u / nvvknvvk图表中罗列了原始151神奇宝贝的身高与体重。

杜克大学的Anh Le使用主成分分析(PCA)发布了对原始151神奇宝贝的聚类分析,将6个主要神奇宝贝统计压缩到2个维度。但是,这些可视化想象太小,并且只能在一小部分神奇宝贝上。也可以将这些数据暴力破解成三个维度来分析研究。



This is a data set from the retailer’s transaction data. We want to analyze the common collocation choices when shopping in countries such as the United Kingdom and France.



Maybe i dont have time to do this homework.

Attribute Introduction

result.csv Introduction