自然科学版
陕西师范大学学报(自然科学版)
数据挖掘专题
基于双向GRU神经网络的医学文本PICO成分识别
PDF下载 ()
龚乐君1,2*,姚凌峰1,高志宏3,李华康4,5
(1 南京邮电大学 计算机学院、软件学院、网络空间安全学院,江苏 南京 210023;2 江苏大数据安全与智能处理重点实验室,江苏 南京 210023;3 浙江省智慧医疗工程技术研究中心,浙江 温州 325035;4 自然资源部 城市国土资源监测与仿真重点实验室,广东 深圳 518034;5 苏州派维斯信息科技有限公司,江苏 苏州 215011)
龚乐君,女,副教授,硕士生导师,博士,研究方向为数据与文本挖掘、生物医学信息处理。E-mail:glj98226@163.com
摘要:
针对传统机器学习模型在识别PICO(population/problem, intervention, comparison and outcome)成分时存在特征提取不充分的问题,本文提出了一种自动识别医学文本中PICO成分的GRUCM模型,该模型融合了双向门控循环单元(bi-bated recurrent unit, BiGRU)神经网络和条件随机场(conditional random field, CRF)的优点,不仅能改善传统机器学习模型存在的特征抽取不足的问题,而且可以同时抽取出多个成分,避免创建多个模型而造成的资源浪费。该模型在测试数据上P成分的F1值为88.24%,I成分的F1值为80.49%,O成分的F1值为86.62%,与采用长短期记忆网络(long short-term memory, LSTM)和CRF模型的识别效果进行对比,本文提出的GRUCM模型对PICO成分的识别更有效。
关键词:
循证医学;GRUCM模型;PICO成分;双向门控循环单元;神经网络
收稿日期:
2020-09-30
中图分类号:
TP391
文献标识码:
A
文章编号:
1672-4291(2021)01-0014-08
基金项目:
浙江省智慧医疗工程技术研究中心资助项目(2016E10011);苏州市姑苏科技创业天使计划资助项目 (CYTS2018233);南京邮电大学引进人才科研启动基金(NY217136)
Doi:
Recognizing PICO elements in medical text based on bidirectional GRU neural network
GONG Lejun1,2*,YAO Lingfeng1,GAO Zhihong3,LI Huakang4,5
(1 School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, Jiangsu, China;2 Jiangsu Key Lab of Big Data Security & Intelligent Processing, Nanjing 210023, Jiangsu, China;3 Zhejiang Engineering Research Center of Intelligent Medicine, Wenzhou 325035, Zhejiang, China;4 Key Laboratory of Urban Land Resources Monitoring and Simulation, Shenzhen 518034, Guangdong, China;5 Suzhou Privacy Information Technology Company, Suzhou 215011, Jiangsu, China)
Abstract:
The traditional learning model of machine learning has the problem of insufficient feature extraction when recognizing PICO components.A GRUCM model that automatically recognizes PICO elements in medical texts is proposed. This model combines the advantage of bidirectional gated recurrent unit ( BiGRU ) neural network and conditional random field(CRF). Not only the problem of insufficient feature extraction in traditional machine learning models can be improved, but also multiple elements can be extracted at the same time, avoiding the waste of resources caused by creating multiple models. Using the test dataset, the F1 of the P element is 88.24%, the F1 of the I element is 80.49% and the F1 of the O element is 86.62%. The recognition effect of long short-term memory(LSTM) neural network and CRF model is compared and analyzed, which shows that the proposed GRUCM model is more effective for PICO elements recognition.
KeyWords:
evidence based medicine; GRUCM model; PICO elements; bi-gated recurrent unit; neural networks