Regression 
Warning: package 'wiesbaden' was built under R version 4.1.2 
 
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ── 
 
✔ ggplot2 3.3.5     ✔ purrr   0.3.4
✔ tibble  3.1.5     ✔ dplyr   1.0.7
✔ tidyr   1.1.4     ✔ stringr 1.4.0
✔ readr   2.0.2     ✔ forcats 0.5.1 
 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag() 
 
 
Wiesbaden package to access data from the Genesis database from Statistisches Bundesamt (located in Wiesbaden).
The Credentials to access the API are saved already, test login follows here.
test_login (genesis= c (db= "regio" )) #test login  
[1] "Sie wurden erfolgreich an- und abgemeldet." 
 
 
 wahl_lohn <-  readRDS ("data/wahl_lohn.Rds" ) 
 
Simple Regression 
\[
\text{afd}_k = \beta_0 + \beta_1 \times \text{min}_k + \epsilon
\] 
\(\text{afd}_k\)  AfD Vote Share in Landkreis \(k\)  
\(\text{min}_k\)  Share of Minimum Wage Receivers in Landkreis \(k\)  
\(\beta_0\)  Intercept 
\(\beta_1\)  Slope 
\(\epsilon\)  Error 
 
 model_basic <-  lm ( 
     afd_prozent ~  lohn_prozent,  
     data =  wahl_lohn 
   ) 
summary (model_basic) 
Call:
lm(formula = afd_prozent ~ lohn_prozent, data = wahl_lohn)
Residuals:
      Min        1Q    Median        3Q       Max 
-0.113618 -0.023871 -0.004185  0.021197  0.119255 
Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.026312   0.005823  -4.519 8.21e-06 ***
lohn_prozent  0.714426   0.028455  25.107  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.03638 on 398 degrees of freedom
Multiple R-squared:  0.613, Adjusted R-squared:  0.612 
F-statistic: 630.4 on 1 and 398 DF,  p-value: < 2.2e-16 
 
 
 
Ost West 
 länder <-  wahl_lohn$ land %>%  unique () 
 ost_names <-  c ("BB" , "BE" , "MV" , "SN" , "ST" , "TH" ) 
 
 wahl_lohn <-  wahl_lohn %>%  
   mutate ( 
     ost =  ifelse (land %in%  ost_names, 1 , 0 ) 
   ) 
 
Dummy for ost
 model_ost <-  lm ( 
     afd_prozent ~  lohn_prozent +  ost,  
     data =  wahl_lohn 
   ) 
summary (model_ost) 
Call:
lm(formula = afd_prozent ~ lohn_prozent + ost, data = wahl_lohn)
Residuals:
      Min        1Q    Median        3Q       Max 
-0.094829 -0.019186 -0.001373  0.016590  0.098788 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.039565   0.006889   5.743 1.85e-08 ***
lohn_prozent 0.291299   0.039392   7.395 8.48e-13 ***
ost          0.086182   0.006420  13.425  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.03021 on 397 degrees of freedom
Multiple R-squared:  0.7338,    Adjusted R-squared:  0.7325 
F-statistic: 547.2 on 2 and 397 DF,  p-value: < 2.2e-16 
 
 
 
Unemployment Rate 
from Regionalstatistik 13211-02-05-4 
 d <-  retrieve_datalist (tableseries= "13211*" , genesis= c (db= "regio" )) 
subset (d, grepl ("Kreise" , description))  
 
 data <-  retrieve_data (tablename= "13211KJ009" , genesis= c (db= "regio" )) 
head (data) 
 
 arbeit_data <-  data %>%   
   filter (JAHR ==  2021 ) %>%  
   select (kreis =  KREISE, arbeitslosenquote =  ERWP10_val) %>%  
   # remove rows with more than 5 digits in kreis  
   mutate (arbeitslosenquote =  arbeitslosenquote /  100 ) %>%  
   filter (nchar (kreis) ==  5 ) 
 
Merge
 wahl_lohn <-  wahl_lohn %>%  
   left_join (arbeit_data, by =  "kreis" ) 
 
 model_arbeit <-  lm ( 
     afd_prozent ~  lohn_prozent +  ost+  arbeitslosenquote,  
     data =  wahl_lohn 
   ) 
summary (model_arbeit) 
Call:
lm(formula = afd_prozent ~ lohn_prozent + ost + arbeitslosenquote, 
    data = wahl_lohn)
Residuals:
      Min        1Q    Median        3Q       Max 
-0.090252 -0.018952 -0.001568  0.015493  0.100010 
Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.050509   0.007319   6.902 2.05e-11 ***
lohn_prozent       0.308381   0.038939   7.920 2.43e-14 ***
ost                0.087937   0.006322  13.910  < 2e-16 ***
arbeitslosenquote -0.279143   0.071078  -3.927 0.000101 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02968 on 396 degrees of freedom
Multiple R-squared:  0.7438,    Adjusted R-squared:  0.7419 
F-statistic: 383.2 on 3 and 396 DF,  p-value: < 2.2e-16 
 
 
 
GDP per Capita 
Regionalstatistik 82111 
 d <-  retrieve_datalist (tableseries= "82111*" , genesis= c (db= "regio" )) 
subset (d, grepl ("Kreise" , description))  
 
 data <-  retrieve_data (tablename= "82111KJ008" , genesis= c (db= "regio" )) 
head (data) 
 
 gdp_data <-  data %>%   
   filter (JAHR ==  2021 ) %>%  
   select (kreis =  KREISE, gdp =  BIP804_val) %>%  
   # remove rows with more than 5 digits in kreis  
   filter (nchar (kreis) ==  5 ) 
 
 gdp_data %>%  
   filter (! kreis %in%  wahl_lohn$ kreis) %>%  
   select (kreis) %>%  
   arrange (kreis) 
 
16056 (Eisenach) not present in Wahl_Lohn Data…, because Eisenach Kreis doesnt exist anymore, it is now Wartburgkreis (16063)
Merge
 wahl_lohn <-  wahl_lohn %>%  
   left_join (gdp_data, by =  "kreis" ) 
 
Model
 model_gdp <-  lm ( 
     afd_prozent ~  lohn_prozent +  ost+  arbeitslosenquote +  log (gdp),  
     data =  wahl_lohn 
   ) 
summary (model_gdp) 
Call:
lm(formula = afd_prozent ~ lohn_prozent + ost + arbeitslosenquote + 
    log(gdp), data = wahl_lohn)
Residuals:
      Min        1Q    Median        3Q       Max 
-0.089117 -0.018857 -0.001454  0.015734  0.100196 
Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.071078   0.059875   1.187 0.235898    
lohn_prozent       0.301457   0.043815   6.880 2.35e-11 ***
ost                0.088321   0.006425  13.746  < 2e-16 ***
arbeitslosenquote -0.274819   0.072245  -3.804 0.000165 ***
log(gdp)          -0.001852   0.005349  -0.346 0.729429    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02971 on 395 degrees of freedom
Multiple R-squared:  0.7439,    Adjusted R-squared:  0.7413 
F-statistic: 286.8 on 4 and 395 DF,  p-value: < 2.2e-16 
 
 
 
Avg. Age 
 d <-  retrieve_datalist (tableseries= "12411*" , genesis= c (db= "regio" )) 
subset (d, grepl ("Kreise" , description))  
 
 data <-  retrieve_data (tablename= "12411KJ019" , genesis= c (db= "regio" )) 
head (data) 
 
 age_data <-  data %>%   
   filter (STAG ==  "31.12.2021" ) %>%  
   select (kreis =  KREISE,  
          age =  BEV519_val #avg age  
          ) %>%  
   # remove rows with more than 5 digits in kreis  
   filter (nchar (kreis) ==  5 ) 
 
Merge
 wahl_lohn <-  wahl_lohn %>%  
   left_join (age_data, by =  "kreis" ) 
 
 model_age <-  lm ( 
     afd_prozent ~  lohn_prozent +  ost+  arbeitslosenquote +  log (gdp) +  age,  
     data =  wahl_lohn 
   ) 
summary (model_age) 
Call:
lm(formula = afd_prozent ~ lohn_prozent + ost + arbeitslosenquote + 
    log(gdp) + age, data = wahl_lohn)
Residuals:
      Min        1Q    Median        3Q       Max 
-0.072729 -0.019526 -0.002674  0.017335  0.097038 
Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -0.331786   0.091478  -3.627 0.000324 ***
lohn_prozent       0.226691   0.044195   5.129 4.57e-07 ***
ost                0.081481   0.006302  12.929  < 2e-16 ***
arbeitslosenquote -0.243973   0.069766  -3.497 0.000524 ***
log(gdp)           0.010820   0.005614   1.927 0.054652 .  
age                0.006261   0.001104   5.672 2.74e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0286 on 394 degrees of freedom
Multiple R-squared:  0.7632,    Adjusted R-squared:  0.7602 
F-statistic:   254 on 5 and 394 DF,  p-value: < 2.2e-16 
 
 
 
Population Density 
Regionalstatistik 99910 
 d <-  retrieve_datalist (tableseries= "99910*" , genesis= c (db= "regio" )) 
subset (d, grepl ("Bevölkerung" , description))  
 
 data <-  retrieve_data (tablename= "99910KJA02" , genesis= c (db= "regio" )) 
head (data) 
 
 pop_data <-  data %>%   
   filter (JAHR ==  2021 ) %>%  
   select (kreis =  KREISE,  
          pop =  AI0201_val, #population density je km2  
          foreigners =  AI0208_val #foreigners rate in %  
          ) %>%  
   mutate (foreigners =  foreigners /  100 ) %>%  
   # remove rows with more than 5 digits in kreis  
   filter (nchar (kreis) ==  5 ) 
 
Merge
 wahl_lohn <-  wahl_lohn %>%  
   left_join (pop_data, by =  "kreis" ) 
 
 model_pop <-  lm ( 
     afd_prozent ~  lohn_prozent +  ost+  arbeitslosenquote +  log (gdp) + age +  log (pop),  
     data =  wahl_lohn 
   ) 
summary (model_pop) 
Call:
lm(formula = afd_prozent ~ lohn_prozent + ost + arbeitslosenquote + 
    log(gdp) + age + log(pop), data = wahl_lohn)
Residuals:
      Min        1Q    Median        3Q       Max 
-0.070578 -0.018897 -0.002703  0.017855  0.098898 
Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -0.315423   0.092243  -3.419 0.000693 ***
lohn_prozent       0.219617   0.044483   4.937 1.17e-06 ***
ost                0.081794   0.006301  12.981  < 2e-16 ***
arbeitslosenquote -0.162693   0.093287  -1.744 0.081941 .  
log(gdp)           0.013461   0.005959   2.259 0.024447 *  
age                0.005591   0.001216   4.600 5.71e-06 ***
log(pop)          -0.002990   0.002281  -1.311 0.190642    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.02858 on 393 degrees of freedom
Multiple R-squared:  0.7642,    Adjusted R-squared:  0.7606 
F-statistic: 212.3 on 6 and 393 DF,  p-value: < 2.2e-16 
 
 
 
Choosing the best Model 
via Stepwise Regression, explained here 
not well regarded in Econometrics, because it is prone to overfitting! just for self reference, that lohn_prozent is chosen and therefore valid
step (model_basic, direction =  "both" , scope =  ~  lohn_prozent +  ost +  foreigners+  arbeitslosenquote +  log (gdp) +  age, trace= 0 ) 
Call:
lm(formula = afd_prozent ~ lohn_prozent + ost + age + arbeitslosenquote + 
    foreigners, data = wahl_lohn)
Coefficients:
      (Intercept)       lohn_prozent                ost                age  
        -0.326616           0.218156           0.088402           0.008381  
arbeitslosenquote         foreigners  
        -0.385274           0.180668   
 
 
Seems, best model is: “afd_prozent ~ lohn_prozent + ost + age + arbeitslosenquote + foreigners”
 
Interlude: Linke 
 model_linke <-  lm ( 
     linke_prozent ~  lohn_prozent, 
     data =  wahl_lohn 
   ) 
 
 model_linke_2 <-  lm ( 
     linke_prozent ~  lohn_prozent +  ost, 
     data =  wahl_lohn 
   ) 
 
 model_linke_3 <-  lm ( 
     linke_prozent ~  lohn_prozent +  ost +  log (gdp), 
     data =  wahl_lohn 
   ) 
 
summary (model_linke_3) 
Call:
lm(formula = linke_prozent ~ lohn_prozent + ost + log(gdp), data = wahl_lohn)
Residuals:
      Min        1Q    Median        3Q       Max 
-0.035639 -0.008126 -0.001951  0.005206  0.052785 
Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -0.117659   0.027711  -4.246 2.72e-05 ***
lohn_prozent -0.002847   0.020338  -0.140    0.889    
ost           0.068507   0.003049  22.471  < 2e-16 ***
log(gdp)      0.014297   0.002457   5.819 1.23e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.01385 on 395 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.7754,    Adjusted R-squared:  0.7737 
F-statistic: 454.6 on 3 and 395 DF,  p-value: < 2.2e-16 
 
 
=> not significant in all variants, low R2 in basic variant
 
Save Data 
#save all Models  
save ( 
   model_basic, 
   model_ost, 
   model_arbeit,  
   model_gdp,  
   model_age, 
   model_pop,  
   file =  "data/models.RData" ) 
 
saveRDS (wahl_lohn, "data/wahl_lohn_mod.RDS" )