热带病与寄生虫学 ›› 2023, Vol. 21 ›› Issue (4): 223-227.doi: 10.3969/j.issn.1672-2302.2023.04.010

• 防治研究 • 上一篇    下一篇

基于随机森林与多因素交互logistic回归的新型冠状病毒感染病例密切接触者感染影响因素分析——以铜陵市为例

张凡1(), 齐平2()   

  1. 1.铜陵市疾病预防控制中心,安徽 铜陵 244000
    2.铜陵学院数学与计算机学院
  • 收稿日期:2023-04-03 出版日期:2023-08-20 发布日期:2023-08-23
  • 通信作者: 齐平 E-mail:527973792@qq.com;qiping929@tlu.edu.cn
  • 作者简介:张凡,女,本科,副主任医师,研究方向:传染病预测预警。E-mail: 527973792@qq.com
  • 基金资助:
    安徽省科技厅新冠病毒科研应急攻关项目(2022e07020071);安徽省重点研究与开发计划项目(202004a05020010)

Analysis on the factors affecting infection among close contacts of COVID-19 based on random forest and multi-factor interactive logistic regression models: A case study in Tongling City

ZHANG Fan1(), QI Ping2()   

  1. 1. Tongling Center for Disease Control and Prevention, Tongling 244000, Anhui Province, China
    2. Department of Mathematics and Computer Science, Tongling University
  • Received:2023-04-03 Online:2023-08-20 Published:2023-08-23
  • Contact: QI Ping E-mail:527973792@qq.com;qiping929@tlu.edu.cn

摘要:

目的 分析新型冠状病毒感染病例密切接触者感染的影响因素及影响因素之间的交互作用,为制定精准防控方案提供科学依据。方法 收集2022年3月14日—30日铜陵市报告的新型冠状病毒感染病例及密切接触者数据,采用随机森林算法筛选强相关影响因素,再构建多因素交互logistic回归模型分析密切接触者感染的影响因素及各因素间的交互效应。结果 铜陵市新型冠状病毒感染病例密切接触者总体感染率为1.95%(101/5 168)。随机森林算法筛选出接触方式、接触频率、关联病例关系、接触地点、关联病例临床情况、年龄、性别、职业等8项重要性评分较高的影响因素。多因素交互logistic回归模型分析结果显示,新型冠状病毒感染病例密切接触者感染情况与“共同生活”(r=0.382,P<0.05)和“经常接触”(r=0.139,P<0.05)呈正相关;交互效应方面,与“共同生活”+“家庭”(r=0.761,P<0.05)、“≤10岁”+“亲属”(r=0.252,P<0.05)、“同事朋友”+“经常接触”(r=0.132,P<0.05)等3项交互效应呈正相关,与“同空间但无直接接触”+“偶尔接触”(r=-0.122,P<0.05)以及“>60岁”+“偶尔接触”(r=-0.221,P<0.05)的交互效应呈负相关。相较传统logistic回归模型,多因素交互logistic回归模型准确率提高了8.04%,精确率提高了13.24%,召回率提高了4.44%,F1分数提高了7.45%。结论 将随机森林算法与logistic完全二次交互回归模型相结合,能从样本有限的多因素数据中有效挖掘各因素之间的二次交互效应,为疾病防控提供有力支持。

关键词: 新型冠状病毒感染, 密切接触者, 影响因素, 随机森林, 铜陵市

Abstract:

Objective To analyze the factors affecting the infection and the interaction of the influencing factors among close contacts of patients with coronavirus disease 2019 (COVID-19) in Tongling for evidence to formulate accurate prevention and control strategies. Methods The data were collected from close contacts related to local COVID-19 cases reported in Tongling from March 14-30 in 2022. Strongly correlated influencing factors were initially screened out using random forest algorithm, and then multi-factor interactive logistic regression model was established to analyze the infection risk and its influencing and interaction factors among close contacts of patient with COVID-19. Results The overall infection rate was 1.95% (101/5 168) in the close contacts of patients with COVID-19 in Tongling. Random forest algorithm generated 8 factors affecting the important evaluation scores, including contact mode, contact frequency, relationship of associated cases, contact location, clinical situation of associated cases, age, gender and occupation. Analysis by multi-factor interactive logistic regression model showed that the infection risk of close contacts of patients with COVID-19 was positively related to “living together” (r=0.382,P<0.05) and “frequent contact” (r=0.139, P<0.05). In terms of interaction effects, the infection risk was positively related to the interaction effect of “living together” + “family” (r=0.761, P<0.05), “age≤10” + “relative” (r=0.252, P<0.05), and “colleagues or friends” + “frequent contact” (r=0.132,P<0.05), yet negatively to the interaction effect of “no-direct-contact-in-common-space” + “occasional contact” (r=-0.122,P<0.05) and “age>60” + “occasional contact” (r=-0.221,P<0.05). The correct rate, accuracy rate, recall rate and F1 score were increased by 8.04%, 13.24%, 4.44% and 7.45%, respectively, in multi-factor interactive logistic regression model compared to the traditional logistic regression model. Conclusion Combined random forest with logistic complete quadratic regression model can excavate interaction effects among the influencing factors from multi-factor data with limited samples, which may provide strong groundwork for disease prevention and control.

Key words: COVID-19, Close contacts, Influencing factors, Random forest model, Tongling City

中图分类号: