2  Regresi Linear (Ordinary Least Squares)

Tahapan Regresi

  • Spesifikasi Model (termasuk: Identifikasi Masalah)
  • Pengumpulan Data
    • Cross Section (Individu: n, periode: 1)
    • Time Series (Individu: 1, periode: t)
    • Pooled (Individu: n, periode: t)
  • Estimasi Parameter
  • Pengujian Hipotesis, Asumsi, dan Kelayakan Model
  • Analisis/Interpretasi/Peramalan
Show the code
# Packages Library yang digunakan
library(bookdown)
library(PoEdata)
library(knitr)
library(xtable)
library(printr)
library(stargazer)
library(rmarkdown)
library(lmtest) #for coeftest() and bptest().
library(broom) #for glance() and tidy()
library(PoEdata) #for PoE4 datasets
library(car) #for hccm() robust standard errors
library(sandwich)
library(knitr)

2.1 Spesifikasi Model Umum

Model Regresi Linier mengasumsikan bahwa terdapat hubungan linier antara variabel terikat dan variabel bebas.

\[y_i =\beta_0 +\beta_1 x_{1i}+\beta_2 x_{2i}+...+\beta_k x_{ki}+\varepsilon_i\]

Dimana:

  • \(y_i=\) variabel tidak bebas individu -i
  • \(x_{ki}=\) variabel bebas ke-k individu -i
  • \(\varepsilon_{i}=\) residual individu -i

2.1.1 Spesifikasi Model Regresi OLS

  • Linear: \(y=\beta_0+\beta_1 X\)
  • Semi-Log/Log-Lin: \(log(y)=\beta_0+\beta_1 X\)
  • Semi-Log/Lin-Log: \(log(y)=\beta_0+\beta_1 log(X)\)
  • Double-Log/Log-Log: \(log(y)=\beta_0+\beta_1 log(X)\)
  • etc

2.1.2 Spesifikasi Model Contoh: Hubungan Pendapatan terhadap Pengeluaran Makanan

Show the code
library(PoEdata)
data(food)
head(food)
food_exp income
115.22 3.69
135.98 4.39
119.34 4.75
114.96 6.03
187.05 12.47
243.92 12.98

Data berjumlah 40 observasi. Variabel yang diamati ada 2. Data selengkapnya dapat diakses pada link yang disediakan.

Show the code
data("food", package="PoEdata")
plot(food$income, food$food_exp, 
     ylim=c(0, max(food$food_exp)),
     xlim=c(0, max(food$income)),
     xlab="weekly income in $100", 
     ylab="weekly food expenditure in $", 
     type = "p",
     main="Scatter plot untuk pendapatan terhadap pengeluaran makanan")

\[\text{Food Expenditure}_i=\beta_0+\beta_1 \text{Income}_i +\varepsilon_i\] \[\varepsilon_i \sim \text{Normal}(0,\sigma^2) \]

2.2 Estimasi Parameter dengan Regresi

Show the code
mod1 <- lm(food_exp ~ income, data = food)
b0 <- coef(mod1)[[1]]
b1 <- coef(mod1)[[2]]
smod1 <- summary(mod1)
smod1

Call:
lm(formula = food_exp ~ income, data = food)

Residuals:
     Min       1Q   Median       3Q      Max 
-223.025  -50.816   -6.324   67.879  212.044 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   83.416     43.410   1.922   0.0622 .  
income        10.210      2.093   4.877 1.95e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 89.52 on 38 degrees of freedom
Multiple R-squared:  0.385, Adjusted R-squared:  0.3688 
F-statistic: 23.79 on 1 and 38 DF,  p-value: 1.946e-05
Show the code
plot(food$income, food$food_exp, 
     xlab="weekly income in $100", 
     ylab="weekly food expenditure in $", 
     ylim=c(0, max(food$food_exp)),
     xlim=c(0, max(food$income)),
     type = "p")
abline(b0,b1)

2.2.1 Estimasi Selang Kepercayaan

Show the code
ci <- confint(mod1)
print(ci)
                2.5 %    97.5 %
(Intercept) -4.463279 171.29528
income       5.972052  14.44723

2.3 Pengujian Hipotesis, Asumsi, dan Kelayakan Model

2.3.1 Overall Test

  • Gunakan p value dari F-Statistik

\(H_0 :\) Tidak ada variabel yang signifikan dalam model

\(H_1 :\) Minimal ada 1 variabel yang signifikan dalam model

\(\alpha=0.05\)

Keputusan: Tolak \(H_0\), karena \(p-value <\alpha\)

Kesimpulan: Dengan tingkat keyakinan 95%, cukup bukti untuk menyatakan bahwa minimal ada 1 variabel yang signifikan dalam model.

2.3.2 Partial Test

  • Gunakan p value dari t-Statistik

\(H_0 : \beta_1=0\) variabel tidak signifikan dalam model

\(H_1 : \beta_1 \neq 0\) variabel signifikan dalam model

\(\alpha=0.05\)

Statistik Uji: \[t=\frac{\hat{\beta_1}-\beta_1}{se(\hat{\beta_1})}\]

Keputusan: Tolak \(H_0\), karena \(p-value <\alpha\)

Kesimpulan: Dengan tingkat keyakinan 95%, cukup bukti untuk menyatakan bahwa variabel signifikan dalam model.

2.3.3 Pengecekan Asumsi/Diagnosa Residual-Residual Diagnostic

2.3.3.1 Linearitas

2.3.3.2 Independensi

2.3.3.3 Normalitas

Show the code
hist(mod1$residuals)

2.3.3.4 Homoskedastisitas

Show the code
plot(mod1)

2.3.4 Kelayakan Model

  • Substansi
  • Statistik, contohnya: \((R^2)\)

2.3.4.1 Koefisien Determinasi (\(R^2\))

\[SST=SSR+SSE\]

Show the code
anova1=anova(mod1)
anova1
Df Sum Sq Mean Sq F value Pr(>F)
income 1 190627.0 190626.984 23.78884 1.95e-05
Residuals 38 304505.2 8013.294 NA NA
Show the code
r2=anova1$"Sum Sq"[1]/sum(anova1$"Sum Sq")

\(SST\) = 4.951322^{5}

\(R^2\) = 0.385

Artinya: 38.5 % proporsi variasi y yang dapat dijelaskan oleh model/variabel x.

2.3.5 Contoh Pengecekan Asumsi yang sesuai/Ideal

Show the code
set.seed(12345)   #sets the seed for the random number generator
x <- runif(300, 0, 10)
e <- rnorm(300, 0, 1)
y <- 1+x+e
mod3 <- lm(y~x)
ehat <- resid(mod3)
plot(x,ehat, xlab="x", ylab="residuals")

Show the code
hist(ehat)

Show the code
plot(mod3)

2.4 Analisis/Interpretasi/Peramalan

2.4.1 Peramalan

Show the code
newx <- data.frame(income = c(20, 25, 27))
yhat <- predict(mod1, newx)
names(yhat) <- c("income=$2000", "$2500", "$2700") 
yhat  # prints the result
income=$2000        $2500        $2700 
    287.6089     338.6571     359.0764 

2.4.2 Interpretasi

Show the code
mod1

Call:
lm(formula = food_exp ~ income, data = food)

Coefficients:
(Intercept)       income  
      83.42        10.21  
  • Hati-hati menginterpretasikan Estimasi \(\beta_0\)

  • Interpretasi dari Estimasi \(\beta_1\)

Setiap Terjadi Kenaikan Income sebesar USD 100 (Kenaikannya sebesar 1 satuan, tetapi satuan x dalam hal ini income adalah USD 100), terjadi kenaikan pengeluaran makanan sebesar USD 10,21 (satuan y dalam hal ini adalah USD).

Extras: Model Log-log

Show the code
# Calculating log-log demand for chicken
data("newbroiler", package="PoEdata")
mod6 <- lm(log(q)~log(p), data=newbroiler)
b1 <- coef(mod6)[[1]]
b2 <- coef(mod6)[[2]]
smod6 <- summary(mod6)
tbl <- data.frame(xtable(smod6))
kable(tbl, caption="Model Hubungan Harga terhadap Permintaan Ayam")
Model Hubungan Harga terhadap Permintaan Ayam
Estimate Std..Error t.value Pr…t..
(Intercept) 3.716944 0.0223594 166.23619 0
log(p) -1.121358 0.0487564 -22.99918 0
Show the code
# Drawing the fitted values of the log-log equation
ngrid <- 20 # number of drawing points 
xmin <- min(newbroiler$p)
xmax <- max(newbroiler$p)
step <- (xmax-xmin)/ngrid # grid dimension
xp <- seq(xmin, xmax, step)
predicty=exp(b1+b2*log(xp))
plot(newbroiler$p, newbroiler$q, ylim=c(10,60),
     xlab="price", ylab="quantity")
lines(predicty~xp, lty=1, col="black")

Extras 2: Model dengan Asumsi Homoskedastisitas yang terlanggar

Karena adanya heteroskedastisitas membuat kesalahan standar kuadrat terkecil menjadi keliru, maka diperlukan metode lain untuk menghitung Regresi.

Show the code
library(car)
foodeq <- lm(food_exp~income,data=food)
kable(tidy(foodeq),caption="Regular standard errors in the 'food' equation")
Regular standard errors in the ‘food’ equation
term estimate std.error statistic p.value
(Intercept) 83.41600 43.410163 1.921578 0.0621824
income 10.20964 2.093263 4.877381 0.0000195
Show the code
cov1 <- hccm(foodeq, type="hc1") #needs package 'car'
food.HC1 <- coeftest(foodeq, vcov.=cov1)
kable(tidy(food.HC1),caption="Robust (HC1) standard errors in the 'food' equation")
Robust (HC1) standard errors in the ‘food’ equation
term estimate std.error statistic p.value
(Intercept) 83.41600 27.463748 3.037313 0.0042989
income 10.20964 1.809077 5.643566 0.0000018
Show the code
w <- 1/food$income
food.wls <- lm(food_exp~income, weights=w, data=food)
vcvfoodeq <- coeftest(foodeq, vcov.=cov1)
kable(tidy(foodeq), 
  caption="OLS estimates for the 'food' equation")
OLS estimates for the ‘food’ equation
term estimate std.error statistic p.value
(Intercept) 83.41600 43.410163 1.921578 0.0621824
income 10.20964 2.093263 4.877381 0.0000195
Show the code
kable(tidy(food.wls),
  caption="WLS estimates for the 'food' equation" )
WLS estimates for the ‘food’ equation
term estimate std.error statistic p.value
(Intercept) 78.68408 23.788722 3.307621 0.0020641
income 10.45101 1.385891 7.541002 0.0000000
Show the code
kable(tidy(vcvfoodeq),caption=
"OLS estimates for the 'food' equation with robust standard errors" )
OLS estimates for the ‘food’ equation with robust standard errors
term estimate std.error statistic p.value
(Intercept) 83.41600 27.463748 3.037313 0.0042989
income 10.20964 1.809077 5.643566 0.0000018
Show the code
data("food", package="PoEdata")
food.ols <- lm(food_exp~income, data=food)
ehatsq <- resid(food.ols)^2
sighatsq.ols  <- lm(log(ehatsq)~log(income), data=food)
vari <- exp(fitted(sighatsq.ols))
food.fgls <- lm(food_exp~income, weights=1/vari, data=food)
Show the code
stargazer(food.ols, food.HC1, food.wls, food.fgls, type="text",
          column.labels=c("OLS","HC1","WLS","FGLS"),
#          single.row = TRUE, 
#          report = "vc*", 
      header = FALSE,
  dep.var.labels.include = FALSE,
  model.numbers = FALSE,
  dep.var.caption="Dependent variable: 'food expenditure'",
  model.names=FALSE
#          df=FALSE, 
#          digits=2
)

=======================================================================
                               Dependent variable: 'food expenditure'  
                              -----------------------------------------
                                 OLS        HC1       WLS       FGLS   
-----------------------------------------------------------------------
income                        10.210***  10.210*** 10.451*** 10.633*** 
                               (2.093)    (1.809)   (1.386)   (0.972)  
                                                                       
Constant                       83.416*   83.416*** 78.684*** 76.054*** 
                               (43.410)  (27.464)  (23.789)   (9.713)  
                                                                       
-----------------------------------------------------------------------
Observations                      40                  40         40    
R2                              0.385                0.599     0.759   
Adjusted R2                     0.369                0.589     0.753   
Residual Std. Error (df = 38)   89.517              18.750     1.547   
F Statistic (df = 1; 38)      23.789***            56.867*** 119.799***
=======================================================================
Note:                                       *p<0.1; **p<0.05; ***p<0.01
Show the code
stargazer(food.ols, food.HC1, food.wls, food.fgls,
  header=FALSE, 
  title="Comparing various 'food' models",
  type=.stargazertype, # "html" or "latex" (in index.Rmd) 
  keep.stat="n",  # what statistics to print
  omit.table.layout="n",
  star.cutoffs=NA,
  digits=3, 
#  single.row=TRUE,
  intercept.bottom=FALSE, #moves the intercept coef to top
  column.labels=c("OLS","HC1","WLS","FGLS"),
  dep.var.labels.include = FALSE,
  model.numbers = FALSE,
  dep.var.caption="Dependent variable: 'food expenditure'",
  model.names=FALSE,
  star.char=NULL) #supresses the stars

Challenge 2

  • Lakukan Estimasi Regresi Linear OLS untuk data Pengeluaran Makanan dan Income
  • Coba untuk melakukan Estimasi Regresi dari Isu/Permasalahan yang ditemukan dari Challenge 1