Identifying Emerging Trends in Scientific Texts Using TF-IDF Algorithm: A Case Study of Medical Librarianship and Information Articles

Identifying Emerging Trends in Scientific Texts Using Text Mining Techniques


Context: Nowadays, due to the increased publication of articles in various scientific fields, identifying the publishing trend and emerging keywords in the texts of these articles is essential. Thus, the present study has identified and analyzed the keywords used in published articles on medical librarianship and information.

Materials and Methods: In the present investigation, an exploratory and descriptive approach has been used to analyze librarianship and information articles published in specialized journals in this field from 1964 to 2019 by applying text mining techniques. The TF-IDF weighting algorithm has been applied to identify the most important keywords used in the articles. Python programming language has also been used to implement text mining algorithms.

Results: The results obtained from the TF-IDF algorithm indicate that the words “library”, “patient”, and “inform” with the weights of 95.087, 65.796, and 63.386, respectively, were the most important keywords in the published articles on medical librarianship and information. Also, the words “Catalog”, “Book” and “Journal” were the most important keywords used in the articles published between the years 1960 and 1970, and the words “Patient”, “Bookstore” and “Intervent” were the most important keyword used in articles on medical librarianship and information published from 2015 to 2020. The words “Blockchain”, “telerehabilit”, “Instagram”, “WeChat”, and “comic” are new keywords observed in articles on medical librarianship and information between 2015 and 2020.

Conclusion: The results of the present study have revealed that the keywords used in articles on medical librarianship and information have not been consistent over time and have undergone an alteration at different periods so that nowadays, this field of science has also changed following the needs of society with the advent and growth of information technologies.

1. Hashimi H, Hafez A, Mathkour H. Selection criteria for text mining approaches. Computers in Human Behavior. 2015;51:729-33.
2. Rebholz-Schuhmann D, Oellrich A, Hoehndorf R. Text-mining solutions for biomedical research: enabling integrative biology. Nature Reviews Genetics. 2012;13(12):829.
3. Jensen PB, Jensen LJ, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics. 2012;13(6):395.
4. Rodriguez-Esteban R, Bundschus M. Text mining patents for biomedical knowledge. Drug discovery today. 2016;21(6):997-1002.
5. Kao A, Poteet SR. Natural language processing and text mining: Springer Science & Business Media; 2007.
6. Lee S, Lee S, Seol H, Park YJR, Management d. Using patent information for designing new product and technology: keyword based technology roadmapping. 2008;38(2):169-88.
7. Hung J-L, Zhang KJJoCiHe. Examining mobile learning trends 2003–2008: A categorical meta-trend analysis using text mining techniques. 2012;24(1):1-17.
8. Rajman M, Besançon R. Text mining-knowledge extraction from unstructured textual data. Advances in data science and classification: Springer; 1998. p. 473-80.
9. Su H-N, Lee P-C. Mapping knowledge structure by keyword co-occurrence: a first look at journal papers in Technology Foresight. Scientometrics. 2010;85(1):65-79.
10. SoleimaniNezhad A, Salajegheh M, Tayyebi Nia E. Clustering scientific articles based on the k_means algorithmCase Study: Iranian Research Institute for information Science and Technology (IranDoc). Iranian Journal of Information Processing and Management. 2019;34(2):871-96.
11. Lamba M, Madhusudhan M. Mapping of topics in DESIDOC Journal of Library and Information Technology, India: a study. Scientometrics. 2019;120(2):477-505.
12. Figuerola CG, García Marco FJ, Pinto M. Mapping the evolution of library and information science (1978–2014) using topic modeling on LISA. Scientometrics. 2017;112(3):1507-35.
13. Kim Y-M, Delen D. Medical informatics research trend analysis: A text mining approach. Health informatics journal. 2018;24(4):432-52.
14. Kim MJ, Ohk K, Moon CS. Trend Analysis by Using Text Mining of Journal Articles Regarding Consumer Policy. New Physics: Sae Mulli. 2017;67(5):555-61.
15. Chang Y-W, Huang M-H, Lin C-W. Evolution of research subjects in library and information science based on keyword, bibliographical coupling, and co-citation analyses. Scientometrics. 2015;105(3):2071-87.
16. Boudry C. Web 2.0 applications in medicine: trends and topics in the literature. Medicine 20. 2015;4(1).
17. Sahoo S, Bhui T. Trend of Public library research in India: a bibliometric study. Library Philosophy & Practice. 2018.
18. Salloum SA, Al-Emran M, Monem AA, Shaalan K. Using text mining techniques for extracting information from research articles. Intelligent Natural Language Processing: Trends and Applications: Springer; 2018. p. 373-97.
19. Chen C. CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology. 2006;57(3):359-77.
20. Frakes WB, Baeza-Yates R. Information retrieval: data structures and algorithms. Prentice-Hall, Inc. 1992.
21. Wartena C, Brussee R, Slakhorst W, editors. Keyword extraction using word co-occurrence. 2010 Workshops on Database and Expert Systems Applications; 2010: IEEE.
22. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction: Springer Science & Business Media; 2013.
23. Dancy-Scott N, Dutcher GA, Keselman A, Hochstein C, Copty C, Ben-Senia D, et al. Trends in HIV Terminology: Text Mining and Data Visualization Assessment of International AIDS Conference Abstracts Over 25 Years. JMIR public health and surveillance. 2018;4(2):e50.
24. Zhang Y, Chen M, Liu L, editors. A review on text mining. 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS); 2015: IEEE.
25. Miner G, Elder IV J, Fast A, Hill T, Nisbet R, Delen D. Practical text mining and statistical analysis for non-structured text data applications: Academic Press; 2012.
26. Abuhay TM, Kovalchuk SV, Bochenina KO, Kampis G, Krzhizhanovskaya VV, Lees MH. Analysis of computational science papers from ICCS 2001-2016 using topic modeling and graph theory. arXiv preprint arXiv:170502203. 2017.
27. Porter MF. An algorithm for suffix stripping. Program. 1980;14(3):130-7.
28. DePaolo CA, Wilkinson K. Get your head into the clouds: Using word clouds for analyzing qualitative assessment data. TechTrends. 2014;58(3):38-44.
29. Cui W, Wu Y, Liu S, Wei F, Zhou MX, Qu H, editors. Context preserving dynamic word cloud visualization. 2010 IEEE Pacific Visualization Symposium (PacificVis); 2010: IEEE.
30. Rehurek R, Sojka P, editors. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks; 2010: Citeseer.
31. Funk ME. Our words, our story: a textual analysis of articles published in the Bulletin of the Medical Library Association/Journal of the Medical Library Association from 1961 to 2010. Journal of the Medical Library Association: JMLA. 2013;101(1):12.
32. Kelly K. Health science libraries: future trends. 2009.
33. Kurata K, Miyata Y, Ishita E, Yamamoto M, Yang F, Iwase A, et al. Analyzing library and information science full‐text articles using a topic modeling approach. Proceedings of the Association for Information Science. 2018;55(1):847-8.
34. Gavgani VZ, Mohan VV. Application of web 2.0 tools in medical librarianship to support medicine 2.0. Webology. 2008;5(1).
35. Ashrafi-Rizi H, Hodhodinezhad N, Shahrzadi L, Soleymani M. A Study on the Novel Services of Medical Librarians in Health Information Services: A Narrative Review. 2017. 2017:7.
IssueVol 4, No2 (2020) QRcode
librarianship and information Medical Analysis Keyword Text mining

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
How to Cite
Dastani M, Mousavi chelak A, Ziaei S, Delghandi F. Identifying Emerging Trends in Scientific Texts Using TF-IDF Algorithm: A Case Study of Medical Librarianship and Information Articles. Health Tech Ass Act. 4(2).