代码
library(tidyverse)
library(mice)
library(gt)
library(tidymodels)
tidymodels_prefer()
library(tidyverse)
library(mice)
library(gt)
library(tidymodels)
tidymodels_prefer()
<- data.frame(
data x1 = rnorm(100),
x2 = rnorm(100),
y = rnorm(100),
group = as_factor(sample(1:2, 100, replace = TRUE))
)
$x1[sample(1:100, 20)] <- NA
data$x2[sample(1:100, 15)] <- NA
data
summary(data)
x1 x2 y group
Min. :-2.39687 Min. :-2.9417 Min. :-2.76256 1:49
1st Qu.:-0.55093 1st Qu.:-0.8672 1st Qu.:-0.59813 2:51
Median : 0.02640 Median :-0.1780 Median :-0.03425
Mean : 0.08588 Mean :-0.2281 Mean :-0.05928
3rd Qu.: 0.74400 3rd Qu.: 0.3174 3rd Qu.: 0.40397
Max. : 2.27116 Max. : 3.1440 Max. : 2.31032
NA's :20 NA's :15
|>
data mice(m = 5, method = 'pmm', seed = 500) |>
complete("long") |>
nest(.by = c(group, .imp)) |>
summarise(data = list(data), .by = group) -> data_nest
.imp
列和.id
列,分别表示插补数据集的编号和原始数据集的行号
group
和.imp
分组生成嵌套数据集
pool
函数(只是数据框的话不行)
iter imp variable
1 1 x1 x2
1 2 x1 x2
1 3 x1 x2
1 4 x1 x2
1 5 x1 x2
2 1 x1 x2
2 2 x1 x2
2 3 x1 x2
2 4 x1 x2
2 5 x1 x2
3 1 x1 x2
3 2 x1 x2
3 3 x1 x2
3 4 x1 x2
3 5 x1 x2
4 1 x1 x2
4 2 x1 x2
4 3 x1 x2
4 4 x1 x2
4 5 x1 x2
5 1 x1 x2
5 2 x1 x2
5 3 x1 x2
5 4 x1 x2
5 5 x1 x2
<- linear_reg() |>
lm_spec set_engine("lm") |>
set_mode("regression")
<- \(data) {
lm_fun recipe(y ~ x1 + x2, data = data) |>
step_normalize(x1) -> rec # 添加预处理步骤
workflow() |>
add_recipe(rec) |>
add_model(lm_spec) |>
fit(data)
}
|>
data_nest mutate(fit = map(data, \(x) map(x, lm_fun))) |>
mutate(fit = map(fit, \(x) map(x, extract_fit_engine))) |>
mutate(pool = map(fit, pool)) |>
mutate(tidy = map(pool, \(x) tidy(x, conf.int = TRUE))) |>
select(group, tidy) |>
unnest(cols = tidy) -> result
gt(result)
map
对每行应用函数,第二个map
对该列表中的每个数据集应用函数
extract_fit_engine
函数用于提取模型对象,即pool
能识别的对象。
pool
函数
tidy
函数可以直接应用于pool
结果
group | term | estimate | std.error | statistic | p.value | conf.low | conf.high | b | df | dfcom | fmi | lambda | m | riv | ubar |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | (Intercept) | -0.104473137 | 0.1354675 | -0.77120469 | 0.4445470 | -0.3771885 | 0.1682422 | 1.006426e-04 | 45.79143 | 48 | 0.04730206 | 0.006581021 | 5 | 0.006624618 | 0.01823066 |
2 | x1 | 0.007551608 | 0.1401432 | 0.05388494 | 0.9572706 | -0.2748827 | 0.2899859 | 5.462280e-04 | 44.03192 | 48 | 0.07447931 | 0.033374216 | 5 | 0.034526511 | 0.01898465 |
2 | x2 | 0.106042647 | 0.1368504 | 0.77487981 | 0.4431640 | -0.1709165 | 0.3830018 | 1.454605e-03 | 38.33747 | 48 | 0.13707669 | 0.093203849 | 5 | 0.102783684 | 0.01698252 |
1 | (Intercept) | 0.021903574 | 0.1180037 | 0.18561764 | 0.8535982 | -0.2159216 | 0.2597287 | 3.810853e-05 | 43.97233 | 46 | 0.04572249 | 0.003284066 | 5 | 0.003294887 | 0.01387915 |
1 | x1 | -0.065747067 | 0.1312048 | -0.50110266 | 0.6198290 | -0.3333113 | 0.2018171 | 2.135722e-03 | 31.08521 | 46 | 0.19881755 | 0.148876640 | 5 | 0.174917817 | 0.01465183 |
1 | x2 | 0.050960472 | 0.1192550 | 0.42732365 | 0.6717512 | -0.1911069 | 0.2930278 | 1.299648e-03 | 35.13442 | 46 | 0.15635616 | 0.109661415 | 5 | 0.123168216 | 0.01266217 |
在这个workflow中跳过extract_fit_workflow
函数也可以运行,但最终tidy
的可信区间会出错,只能得到点估计的准确结果。