목차
1. 라이브러리 설치
import numpy as np
import pandas as pd
from scipy import stats
2. sample()
sampled_data = education_districtwise.sample(n=50, replace=True, random_state=31208)
sampled_data
3. Construct a 95% confidence interval
표본의 개수가 30개 이상이라면(=표본의 개수가 충분할 때), 아래의 코드를 사용한다.
scipy.stats.norm.interval(alpha, loc, scale)
★ alpha: The confidence level
95%---> 0.95
★ loc: The sample mean
sample_mean = sampled_data['OVERALL_LI'].mean()
★ scale: The sample standard error
- std()
sampled_data['OVERALL_LI'].std()
-std()를 sqrt 50으로 나눈값이 standard error가 된다.
estimated_standard_error = sampled_data['OVERALL_LI'].std() / np.sqrt(sampled_data.shape[0])
- 신뢰구간을 구한다.
stats.norm.interval(alpha=0.95, loc=sample_mean, scale=estimated_standard_error)
(71.42241096968617, 77.02478903031381)
신뢰 구간의 길이가 5.6이다.
4. Construct a 99% confidence interval
stats.norm.interval(alpha=0.99, loc=sample_mean, scale=estimated_standard_error)
(70.54221358373107, 77.90498641626891)
신뢰 구간의 길이가 7.4이다.
5. 데이터시각화
1) 박스플롯
import seaborn as sns
sns.boxplot(x=aqi_rre['state_name'], y=aqi_rre['aqi'])
2) 플롯
import matplotlib.pyplot as plt
plt.plot(aqi_mean)
6. 신뢰구간 만드는 연습하기
1) construct sample statistic
#groupby함수 이용해서, mean, count, std 집계값 만들고 컬럼명 바꾸기
aqi_rre_agg=aqi_rre.groupby(['state_name']).agg({'aqi':'mean', 'state_name':'count'})
aqi_rre_agg.rename(columns={'aqi':'mean', 'state_name':'count'}, inplace=True)
aqi_rre_agg['std'] = aqi_rre.groupby(['state_name'])['aqi'].std()
aqi_rre_agg
2) choose confidence level
confidence_level = 0.95
3) find margin of error(ME)
margin of error = z * standard error
sample_mean= aqi_rre_agg['mean']
sample_std= aqi_rre_agg['std']
sample_standard_error= aqi_std/np.sqrt(aqi_rre_agg['count'])
z_score=1.96
margin_of_error= z_score * aqi_standard_error
4) Calculate interval
upper = sample_mean + margin_of_error
lower = sample_mean - margin_of_error
for i, j in zip(lower, upper):
print(i, j)
5) scipy로 한방에 해결하기
stats.norm.interval(alpha=confidence_level, loc=sample_mean, scale=sample_standard_error)
(array([10.3597514 , 4.12463552, 5.98293645, 2.59628747, 1.86907633,
2.1120108 ]),
array([13.88267284, 6.87536448, 10.23928577, 4.07037919, 3.93092367,
3.2879892 ]))