Latxa-7b ereduan oinarritutako hizkuntzaren prozesamendu-sailkatzaileen gaitasunaren azterketa: medikuntzako aplikazioak eta kirurgia ortopediko eta traumatologiako testu klinikoen adibidea

Calvo-Lorenzo, Isidoro

Latxa-7b ereduan oinarritutako hizkuntzaren prozesamendu-sailkatzaileen gaitasunaren azterketamedikuntzako aplikazioak eta kirurgia ortopediko eta traumatologiako testu klinikoen adibidea

Calvo-Lorenzo, Isidoro ¹

1 Servicio Vasco de Salud – Osakidetza. Organización Sanitaria Integrada Barrualde. Hospital Universitario Galdakao-Usansolo. Galdakao-Bizkaia.

Revue:

Gaceta médica de Bilbao: Revista oficial de la Academia de Ciencias Médicas de Bilbao. Información para profesionales sanitarios

ISSN: 0304-4858, 2173-2302

Année de publication: 2024

Volumen: 121

Número: 2

Pages: 62-68

Type: Article

DIALNET GOOGLE SCHOLAR Accès ouvert editor

D'autres publications dans: Gaceta médica de Bilbao: Revista oficial de la Academia de Ciencias Médicas de Bilbao. Información para profesionales sanitarios

Résumé

Objective: In this work we analyze the possibility of creating a classifier of synthetic orthopedic surgery texts written in Basque adapted to the Latxa-7b Large Language Model, created by the Hitz Group (University of the Basque Country).Methods: A synthetic database is created with 20,000 clinical notes of patients where there are mentions to musculoskeletal pathologies. A classifier based on Latxa-7b is developed. This classifier is later trained with clinical notes and finally its performance in detecting malignant bone tumors is analyzed.Results: A classifier is created whose performance in the training and test data sets is 97.7% precision, 98.6% accuracy, 94.2% sensitivity, 0.99 area under curve and 0.96 F1.Conclusions: The excellent performance of the classifier described in this work should serve as a spur to start applying Natural Language Processing to the digitized medical records we use in our healthcare systems.

Références bibliographiques

Model Card for Latxa 7b. HiTZ/latxa-7b-v1 · Hugging Face
Kong HJ. Managing Unstructured Big Data in Healthcare System. Healthc Inform Res. 2019; 25:1-2. doi: https://doi.org/10.4258/hir.2019.25.1.1
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. arXiv. 2017:1706.03762. https://arxiv.org/pdf/1706.03762.pdf
Yang R, Fang Tan T, Lu W, Thirunavukarasu AJ, Wei Ting DS,Liu N. Large language models in health care: Development, applications, and challenges. Health Care Sci. 2023; 2: 255-263 doi: https://doi.org/10.1002/hcs2.61
Yao L., Mao C, Luo Y. Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC Med Inform Decis Mak. 2019; 19 (Suppl 3). doi: https://doi.org/10.1186/s12911-019-0781-4
Thirunavukarasu AJ, Ting DSJ., Elangovan K, Gutierrez L, Fang Tan T, Wei Ting DS. Large language models in medicine. Nat Med. 2023;29: 1930–1940. doi: https://doi.org/10.1038/s41591-023-02448-8
Agerri R, San Vicente I, Campos JA, Barrena A, Saralegi X, Soroa A, et al. Give your Text Representation Models some Love: the Case for Basque. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020: 4781–4788.
Artetxe M, Aldabe I, Agerri R, Perez-de-Viñaspre O, Soroa A. Does Corpus Quality Really Matter for Low-Resource Languages?. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022: 7383–7390.
Solarte-Pabón O, Montenegro O, García-Barragán A, Torrente M, Provencio M, Menasalvas E, et al. Transformers for extracting breast cancer information from Spanish clinical narratives. Artificial Intelligence in Med. 2023; 143, 102625. doi: https://doi.org/10.1016/j.artmed.2023.102625
Calvo-Lorenzo I, Uriarte-Llano I. Generación masiva de historias clínicas sintéticas con Chat-GPT: un ejemplo en fractura de cadera. Med Clin. 2024. Artikulua prentsan. doi: https://doi.org/10.1016/j.medcli.2023.11.027
Singhal K, Azizi S, Tu T, Mahdavi S, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023; 620: 172–180 (2023). Doi: https://doi.org/10.1038/s41586-023-06291-2

La source de données: Dialnet