Manually reading patient information takes a long time. As a result, researchers in the United States have developed a new AI-based algorithm that can learn to understand patient data from electronic health records (EHR). In a side-by-side comparison, the researchers demonstrated that their method properly diagnosed patients with certain diseases as well as the traditional, “gold-standard” method, which involves far more manual labor to develop and perform.
The volume and types of data saved electronically in a patient’s medical record continue to grow at an exponential rate. Extraction and analysis of this intricate web of data can be inefficient, impeding clinical research progress. In this study, we used machine learning to develop a new approach for extracting data from electronic health records that is both faster and has fewer labor costs than the industry standard. We expect that this will be a useful tool that will allow for more objective clinical informatics research.
Scientists currently use a set of established computer programs or algorithms to mine medical records for new information. The development and storage of these algorithms are managed by a system called the Phenotype Knowledgebase (PheKB). While the system is quite good at correctly detecting a patient’s diagnosis, constructing an algorithm is a time-consuming and rigid procedure.
For example, when scientists wish to learn more about an illness, they must first search all medical records for pertinent information, such as specific lab tests or medicines that are specifically linked to the ailment. The next write the algorithm that instructs the computer to look for individuals that have those disease-specific data points, which are referred to as “phenotypes.” Researchers must then manually double-check the list of patients identified by the algorithm. Then, they must start the process over every time they want to examine a new disease. The researchers used a new strategy in this study, in which the computer learns on its own how to recognize illness phenotypes, saving time and effort for the researchers.
According to a study senior author, the researchers previously demonstrated that unsupervised machine learning might be a very efficient and effective technique for mining EHR. Their approach has the potential benefit of learning disease representations from the data itself. As a result, the computer conducts a lot of the work that specialists would do to define the ideal combination of data pieces from health records to describe an illness.
In essence, a machine was designed to sift through millions of EHRs and learn how to connect data to diseases. This programming made use of “embedding” algorithms established by other researchers, like as linguists, to examine word networks in other languages. Word2vec, one of the algorithms, was extremely effective. The computer was then programmed to apply what it had learned to identify the diagnoses of over 2 million patients whose records were kept in the health system.
Finally, the researchers looked at how effective the new and old systems were. They discovered that the novel Phe2vec technique was as effective as, or slightly better than, the gold standard phenotyping process in properly identifying diagnoses from EHR for nine out of ten diseases tested.
Overall, the findings are positive, indicating that the method could be a promising technique for large-scale illness phenotyping in EHR data. They plan to utilize it to automate many of the earliest processes of clinical informatics research with more testing and refinement, allowing scientists to focus their attention on downstream analytics like predictive modeling.