4.1 회귀분석
4.1.1 회귀분석의 의의
"선형관계 분석"
통계학에서,선형 회귀(線型回歸, 영어: linear regression)는 종속 변수 y와 한 개 이상의 독립 변수 (또는 설명 변수) X와의 선형 상관 관계를 모델링하는 회귀분석 기법이다. 한 개의 설명 변수에 기반한 경우에는 단순 선형 회귀, 둘 이상의 설명 변수에 기반한 경우에는 다중 선형 회귀라고 한다.
4.2 데이터 실습 (리뷰)
4.2.1 데이터 준비 : DM, DM1, DM2 데이터프레임 생성
library(readr)
DM <- read_csv("~/DM/DirectMarketing.csv")
DM
library(plyr)
library(car)
DM$AgeN <- recode(DM$Age,'"Young"=1;"Middle"=2;"Old"=3')
DM$GenderN <- recode(DM$Gender,'"Female"=1;"Male"=2')
DM$OwnHomeN <- recode(DM$OwnHome,'"Own"=1;"Rent"=2')
DM$MarriedN <- recode(DM$Married,'"Single"=1;"Married"=2')
DM$LocationN <- recode(DM$Location,'"Close"=1;"Far"=2')
DM$HistoryN <- recode(DM$History,'"High"=3;"Medium"=2;"Low"=1')
DM1<-data.frame(DM$AmountSpent, DM$Salary, DM$AgeN, DM$Children, DM$GenderN, DM$OwnHomeN, DM$MarriedN, DM$LocationN, DM$HistoryN, DM$Catalogs)
DM2<-na.omit(DM1)
4.2.2 attach 명령어 실습
4.2.2.1 attach
- 데이터 프레임을 불러오는 명령어
mean(AmountSpent) : 안된다.
mean(DM$AmountSalary) : 명령어를 불어와야 함
- 사용방법
attch(DM)
mean(AmountSalary)
4.3.1 회귀분석 실습
5.3.2 lm 명령어
(예시)
lm(y~x+x1+x2, data="데이터 셑")
lm(y~x+x1+x2, data_frame="데이터프라임 셑")
(실습)
attach(DM1)
m1<-lm(DM1$DM.AmountSpent~DM1$DM.Salary+DM1$DM.Children+DM1$DM.AgeN+DM1$DM.GenderN+DM1$DM.OwnHomeN+DM1$DM.MarriedN+DM1$DM.LocationN+DM1$DM.HistoryN, data_frame("DM1"))
summary(m1)
출력화면
Call:
lm(formula = DM1$DM.AmountSpent ~ DM1$DM.Salary + DM1$DM.Children +
DM1$DM.AgeN + DM1$DM.GenderN + DM1$DM.OwnHomeN + DM1$DM.MarriedN +
DM1$DM.LocationN + DM1$DM.HistoryN, data = data_frame("DM1"))
Residuals:
Min 1Q Median 3Q Max
-1978.48 -311.59 -34.35 230.72 3081.90
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.754e+02 1.669e+02 -4.645 4.08e-06 ***
DM1$DM.Salary 1.871e-02 1.362e-03 13.742 < 2e-16 ***
DM1$DM.Children -2.798e+02 2.731e+01 -10.243 < 2e-16 ***
DM1$DM.AgeN -3.866e+01 3.535e+01 -1.094 0.274
DM1$DM.GenderN -4.865e+01 4.365e+01 -1.115 0.265
DM1$DM.OwnHomeN -2.816e+01 4.794e+01 -0.587 0.557
DM1$DM.MarriedN -1.240e+01 5.441e+01 -0.228 0.820
DM1$DM.LocationN 6.906e+02 5.009e+01 13.788 < 2e-16 ***
DM1$DM.HistoryN 2.276e+02 4.935e+01 4.612 4.75e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 535.5 on 688 degrees of freedom
(303 observations deleted due to missingness)
Multiple R-squared: 0.7167, Adjusted R-squared: 0.7134
F-statistic: 217.6 on 8 and 688 DF, p-value: < 2.2e-16
Call:
lm(formula = DM1$DM.AmountSpent ~ DM1$DM.Salary + DM1$DM.Children +
DM1$DM.AgeN + DM1$DM.GenderN + DM1$DM.OwnHomeN + DM1$DM.MarriedN +
DM1$DM.LocationN, data = data_frame("DM1"))
Residuals:
Min 1Q Median 3Q Max
-2315.80 -327.63 -67.88 251.72 3071.70
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.416e+02 1.463e+02 -3.019 0.0026 **
DM1$DM.Salary 2.284e-02 9.517e-04 24.003 <2e-16 ***
DM1$DM.Children -2.389e+02 1.881e+01 -12.699 <2e-16 ***
DM1$DM.AgeN -7.328e+00 3.207e+01 -0.229 0.8193
DM1$DM.GenderN -2.867e+01 3.916e+01 -0.732 0.4642
DM1$DM.OwnHomeN -4.598e+01 4.383e+01 -1.049 0.2944
DM1$DM.MarriedN -2.787e+01 5.041e+01 -0.553 0.5804
DM1$DM.LocationN 5.945e+02 4.067e+01 14.617 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 582.9 on 992 degrees of freedom
Multiple R-squared: 0.6347, Adjusted R-squared: 0.6322
F-statistic: 246.3 on 7 and 992 DF, p-value: < 2.2e-16
댓글
댓글 쓰기