[Probability] python, scipy, statsmodels, hist, empirical rule, z-score, statz.zscore(), outlier

Certificate/data analytics-Google

[Probability] python, scipy, statsmodels, hist, empirical rule, z-score, statz.zscore(), outlier

Olivia-BlackCherry 2023. 7. 17. 13:01

1.통계 패키지
- 1) Scipy
- 2) Statsmodels
2. hist() 히스토그램
3. empirical rule 확인하기
4. z-score
- 1) stats.zscore()
- 2) outlier 찾기

1.통계 패키지

파이썬을 통계에서 하기 위해서는 두 가지 패키지를 설치한다.

1) Scipy

2) Statsmodels

from scipy import stats
import statsmodels.api as sm

데이터: 문맹률을 보여준다. overall_li가 해당 지역의 문맹률이다.

2. hist() 히스토그램

3. empirical rule 확인하기

68%, 95%, 99.7% ----> 1SD, 2SD, 3SD

1) 평균과 표준편차를 구한다.

2) 1SD

74-10 ~ 74+10

= 64~ 84

전체의 68%일까를 확인해보자.

lower_limit = mean_overall_li - 1 * std_overall_li
upper_limit = mean_overall_li + 1 * std_overall_li

68~ 84에 해당하는 구간이다.

(education_districtwise['OVERALL_LI'] >= lower_limit) & (education_districtwise['OVERALL_LI'] <= upper_limit)

전체 634개 중에서 True에 해당하는 개수가 421개이다. 퍼센티지로 나타내면 66.4로 68과 근접하다는 것을 알 수 있다.

위의 코드는 아래처럼 간단하게 나타낼 수 있다.

((education_districtwise['OVERALL_LI'] >= lower_limit) & (education_districtwise['OVERALL_LI'] <= upper_limit)).mean()

마찬가지로 2SD, 3SD도 비교하면 empirical rule이 그대로 적용되었다.

4. z-score

z-score를 알면, 데이터가 분포에 어느 정도 위치에 있는지 알게 해줘서 좋다.

1) stats.zscore()

파이썬이 모두 계산해준다.

education_districtwise['Z_SCORE'] = stats.zscore(education_districtwise['OVERALL_LI'])

2) outlier 찾기

보통 z-score은 -3~3까지이다. 이상값을 찾을 때는 -3보다는 더 작고, 3보다는 더 큰 값을 찾으면 된다.

education_districtwise[(education_districtwise['Z_SCORE'] > 3) | (education_districtwise['Z_SCORE'] < -3)]

데이터분석, 구글데이터분석

저작자표시 비영리 변경금지

'Certificate > data analytics-Google' 카테고리의 다른 글

[Sampling] Python, 통계 라이브러리, sample(), std(), hist(), axvline(), plot(), legend() (0)	2023.07.18
[Sampling] population, representative sampling, process, simple, stratified, cluster, systematic, convenience, voluntary response, snowball, purposive, sampling distribution, bias, central limit theorem, standard error, proportion (0)	2023.07.18
[Probability] objective, classical, empirical, subjective, mutual exclusive, independent event, complement, additional, multiplication, conditional probability, bayes, random variable, discrete, continuous, binomial, poisson, nomal distribution, standar.. (1)	2023.07.17
[Statistics] Descriptive Statistics, Mean, Median, Mode, range, std, variance, percentile, quantile, IQR, five number summary (0)	2023.07.14
tableau, 태블로 기술 더하기, annotate, set, 대시보드만들기 (0)	2023.07.13

현재글[Probability] python, scipy, statsmodels, hist, empirical rule, z-score, statz.zscore(), outlier

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

올리비아 코딩스쿨