These days, thanks to computers and an endless stream of research that brings on new knowledge, we live in a world where a good deal of information is available to address a large number of medical conditions. But as we have said before, ” navigating through the open waters of published medical articles is a difficult proposition. Millions of articles, complicated keywords (often spelled differently, or with arcane abbreviations and acronyms), thousands of publications, all that makes a web of knowledge virtually impossible to mine and use effectively at the point of care.” Additionally, all the electronic medical records provide a treasure trove of clinical data that can be mined for all sorts of insights. Now IBM and Mayo Clinic have teamed up to create an online consortium that applies natural language processing (NLP) software to help find what is academically and clinically important out there.
Here’s from an IBM statement on the news:
As part of the launch, Mayo Clinic and IBM released their clinical NLP technologies into the public domain. The site http://www.ohnlp.org will allow the approximately 2,000 researchers and developers working on clinical language systems worldwide to contribute code and further develop the systems.
"We are inviting our international colleagues to help continue development of these valuable tools," says Christopher Chute, M.D., Dr.P.H., Mayo Clinic bioinformatics expert and senior consultant on the project. "By making it an open-source initiative, we hope to enable wide use of these NLP tools so medical advancements can happen faster and more efficiently."
NLP is a relatively new and specialized area within computer science dealing with computational methods for understanding human language. In medicine, clinical NLP systems process the vast repositories of text generated by patient-clinician interactions. Such systems categorize and structure it according to standard nomenclature — in this case focusing on terms used in a range of medical specialties — that will ultimately speed data searches for both diagnoses and medical research. These NLP platforms or "pipelines" aid indexing and searching electronic medical records within institutions to quickly find similar cases or conditions, so physicians are not reliant solely on their own clinical experience in analyzing a problem. Researchers may also use these tools to aid retrospective epidemiological studies or do groundwork for new clinical trials.
As an increasing percentage of health care and academic medical centers adopt electronic medical records, searching and extracting information from them in an automated fashion becomes essential. Mayo Clinic and IBM jointly developed a system for extracting information from more than 25 million free-text clinical notes based on IBM’s open-source Unstructured Information Management Architecture (UIMA). As part of the system, developers build strings of “annotators” that become a pipeline, allowing physicians to mine the text for references of specific conditions, drugs, diseases, signs and symptoms; anatomical areas or organs; or treatment procedures. IBM and Mayo Clinic also developed a system to extract cancer disease characteristics from unstructured pathology reports to facilitate “consistent retrieval and transmission of cancer cases.” The system extracts tumor characteristics, lymph node status and metastatic disease information enabling the automatic computation of cancer stage, which is critical to determine optimal treatment.
The two clinical text solutions released open-source by Mayo Clinic and IBM aim at processing two specific types of notes. Clinical notes describe patient-physician encounters, while pathology reports center around tissue findings. Both options are already adding value for Mayo and its patients:
* Physicians can research past records to examine earlier cases of rare conditions, thereby “conferring” with their colleagues across time to aid diagnosis and treatment decisions.
* Retrospective studies of tissue samples can propel new research findings, as happened with a major breast cancer finding at Mayo in 2008.
* Enhanced ability to mine data and determine potential study factors or participants has already enabled individualized medicine treatments in psychiatric care.
Mayo’s open-source solution, clinical Text Analysis and Knowledge Extraction System (cTAKES), focuses on processing the patient-centric clinical notes. Its low level components, for example the software that discovers sentence and word boundaries, assigns word part-of-speech tags and forms phrases out of the words, are “trained” to understand clinical language. The higher-level information extraction components, for example the ones that determine which textual spans are highly relevant to the clinical meaning of a note, are specifically designed for this domain.
cTAKES has the functionality to recognize whether a clinical concept is negated, relevant to the patient or to the patient’s family, which are attributes critical to understanding patient-centered medical language.
IBM’s medKAT systems (medical Knowledge Analysis Tool) is a UIMA-based, modular and flexible system that uses advanced NLP techniques to extract structured information from unstructured data sources, such as pathology reports, clinical notes, discharge summaries and medical literature. medKAT/P is a version customized for the pathology domain, based on a representation of cancer, its characteristics and disease progression. The system recognizes concepts such as primary tumor and its associated attributes (e.g. histology, anatomical site, etc.) or lymph node status and its associated attributes (e.g. number of positive and excised nodes) by identifying mentions (e.g. histology or anatomical sites) and their relations (including negation). medKAT can be viewed as a development platform that is adaptable to user and domain requirements. It has been designed to operate within institutional systems or databases of any size.
Demonstration video explaining system features…
Press release: Mayo Clinic and IBM Host Medical Language Initiative …
Link: Open Health Natural Language Processing (OHNLP) Consortium
Flashbacks: CureHunter Goes Mobile ; CureHunter.com Aims to Distill Evidence Based Medicine into 1 Mouse Click