Health data exists in many forms: vital signs, lab results, patient-generated lifestyle data, physician notes and various types of imagery (magnetic resonance imaging, pathology slides and ultrasonography, to name just a few). While there are no standards or categorizations that encompass all types of health data, it is helpful to consider this important information in terms of structured and unstructured data.
Structured data, as the name suggests, is information that can be stored and displayed in a consistent, organized manner. This type of data can be validated against expected or biologically plausible ranges and easily analyzed over time. Examples of health data that would fall into this category include numerical values like height, weight and blood pressure, as well as categorical values like blood type or ordinal values like the stages of a disease diagnosis.
Unstructured data, on the other hand, lacks the organization and precision of structured data. Examples in this category include physician notes, x-ray images and even faxed copies of structured data. In most cases, unstructured data must be manually analyzed and interpreted.
A closer look at this dichotomy, especially within the context of emerging technology, reveals a more nuanced distinction. Structured data is not a homogenous or monolithic category—just because data is structured doesn’t mean that it’s structured in a way that makes sense or is easy to interpret. Conversely, just because data lacks formal structure does not mean that it cannot be easily interpreted, or that it can only be analyzed in a resource-intensive manner.
Quality and Consistency in Structured Data
At the most granular level, a piece of structured data consists of two parts: a variable name and a value. Take height, for example. Within a patient’s electronic medical record (EMR), a patient’s height might be stored as “height: 71,” meaning that the patient’s height (“height:”) is 71 inches (“71”). It’s possible, though, that value could also be 1.8 (meters), 5.196 (feet) or even 1.972 (yards). Additionally, the variable name might be abbreviated differently depending on where the data is stored, or who is storing it.
To maintain consistency in the way structured data is recorded and stored, several data standards have been developed. The Logical Observation Identifiers Names and Codes (LOINC) standard, for examples, standardizes the way are reported. Health Level 7 (HL7) is a broader standard that includes administrative health data in addition to clinical health data and also includes the Fast Healthcare Interoperability Resources (FHIR), a set of tools for exchanging structured health data between databases and systems.
While standards like LOINC and HL7 go a long way towards improving the quality and usefulness of structured health data, patient-generated data is often left uncovered by the most widely adopted data standards. Data related to activity, sleep and other “wellness” measurements, while structured, are often stored in unique or proprietary formats, making the data difficult to compare or even display within EMRs.
Additionally, because structured patient-generated data is often collected via consumer devices and not FDA-approved medical devices, this data can be difficult to compare even if it is uniformly structured. For example, most activity monitors measure the number of steps taken in a day using a small accelerometer, there is no standardized algorithm to convert that raw accelerometer data into a step count. Although this inconsistency still allows a clinician to view one patient’s relative improvement over time (assuming that they continue to use the same brand of activity monitor), it makes population-level monitoring difficult and direct patient comparison implausible.
Finding Meaning in Unstructured Data
On its face, unstructured data represents a greater challenge to analyze and interpret than structured data. Images and free text cannot be easily categorized in the same way that a structured, numerical data point can. For example, interpreting a blood pressure reading as normal, elevated or hypertensive can be accomplished in just a few lines of straightforward code. A physician’s note indicating “chest pain, trouble breath, gen fatigue” also suggests hypertension, but abbreviations and spelling errors included in the free text would require a human to decode and interpret (especially if the text were handwritten or scanned into the EMR from a fax).
Imagery presents similar challenges—x-rays and pathology slides are generally indecipherable to all except highly trained professionals and, even then, experienced clinicians often require a second opinion to validate a diagnosis or interpretation. While medical imaging is increasingly relying on digital imagery, the unstructured data itself is largely analyzed manually.
Advances in artificial intelligence and machine learning, however, have the potential to transform the way clinicians and providers use unstructured data. In the free text example above, a natural language processing tool might decode the physician’s note and interpret it as “chest pain, trouble breathing, general fatigue,” while a machine learning decision support tool might suggest that these are symptoms related to hypertension (this diagnosis would also benefit from structured contextual data like the patient’s height, weight and heart rate).
Similarly, using large archives and repositories of medical imagery, computer scientists are working with clinicians to train machine learning models to recognize patterns in medical imagery, to provide an automated “second opinion” confirming (or casting doubt on) a manually generated interpretation or diagnosis.
Looking to the Future of Health Data
It’s clear that there’s significant room for improvement in the way both structured and unstructured health data is stored, analyzed and interpreted. While powerful analytic tools are already helping providers use structured data in increasingly impactful ways, the lack of standardization continues to frustrate and impede this progress.
Machine learning, artificial intelligence and natural language processing have the potential to streamline the way unstructured data is utilized, but it’s unlikely we’ll ever get to the point where computers are making critical decisions instead of supporting the humans who have traditionally made those decisions. Regardless, patients should expect and look forward to improved efficiency and health outcomes as innovation improves the way we look at all types of health data.
Help your patients understand how their lifestyle impacts their health by encouraging them to use our data-driven lifestyle management platform. With HealthSnap, you can easily view and understand your patient’s lifestyle health in a tangible report and make data-driven care decisions based on lifestyle data. Sign up for a FREE today by clicking here and make the lifestyle conversation easy!