3.1 OVERVIEW OF ALL TYPES OF SEMANTIC RELATIONSHIPS
To address the challenges described above, we defined aCDEs and cCDEs using three
new semantic types (hybrid, dictionary, and repeated) and three new types of constraints (ordered, operated,and required) in addition to the existing two semantic relationships (dependent and variable relationships) defined in our previous study [9]. The newly defined composite semantic type replaced the old composite relationship that we defined previously [9].
Figure 1 displays aCDEs and cCDEs with their specific constraints. An aCDE can be constrained
using variable and hybrid relationships by classifying them as variable aCDE and hybrid aCDE, respectively. The definition of cCDE as a set of interrelated aCDEs in our
previous study [9] was extended to include a clear definition, a separate identifier for reuse, and
constraints among aCDEs inside a cCDE. cCDEs can be classified into dictionary and repeated cCDEs. One of the existing semantic relationships, the dependent relationship in our previous study, was extended to four constraints: ordered, operated, required,and dependent. As shown in the lower-left box in Figure 1, the ordered constraint does not apply to an aCDE.
3.2 DATA ENTRIES WITH MULTIPLE DATA TYPES:
Hybrid aCDE
A
hybrid aCDE is a particular type of aCDE that allows a value domain with multiple (or hybrid)
data types. Technically it includes several aCDEs having the same DE concept but different
value domains. Figure 2A shows part of a hemodialysis report form from the DialysisNet and Avatar Beans Project.
A time-tagged hybrid aCDE was applied to the Time attribute in a tabular data-entry format. The hybrid aCDE for Time (‘DE:47616 Hemodialysis_Time_Hybrid_DE’) was derived from
two aCDEs: ‘DE:43239 Hemodialysis_Time_DE’ allowing a time data type, and ‘DE:47614 Hemodialysis_Time_String_DE’ allowing an enumerated string data type supporting Finish and Start (Figure 2B). The hybrid aCDE can capture either a time or an enumerated string value, such as ‘DE:47616.’
3.3 TABULAR DATA ENTRIES
:
Repeated cCDE
A
repeated cCDE is a cCDE that captures data input multiple times in a tabular format. The definition
of the
repeated cCDE prevents the unnecessary creation of redundant CDEs. A
repeated cCDE efficiently captures and displays changes in input values over a certain time
span, as shown in Figure 2A
. We first grouped eight aCDEs (i.e., DE:47616, DE:43340, DE:43197, DE:43195, DE:43155, DE:43092, DE:43372, and DE:43166) to create a cCDE, and then assigned them as a
repeated relationship to create a
repeated cCDE (‘DE:47575 Hemodialysis_Repeated_Componsite _DE’)
(Figure 3)
. As shown in Figure 2
,
DE
:47616 is a hybrid aCDE contained in the repeated cCDE (DE:47575).
3.4 DICTIONARY DATA ENTRIES:
Dictionary cCDE
Our previous study [9] defined a variable CDE as a CDE containing a variable that refers to a controlled biomedical vocabulary.
Similarly, we defined a dictionary cCDE as a cCDE containing a variable aCDE with a variable that refers to the corresponding attribute as the primary key
of a dictionary table. This approach provides a way to encode an entire dictionary
table as well as a controlled vocabulary into a single dictionary cCDE, and thereby capture comprehensive biomedical knowledge from a database. A dictionary cCDE provides a useful means to apply relevant attributes of a dictionary database
to constrain and validate input values to the dictionary cCDE.
Figure 4A displays a typical data-entry document for laboratory test results in a tabular format.
The ‘Electrolyte Laboratory Tests’ form from ‘
Recommended Labs for Stroke’
of the NINDS CDE project [19] consists of six attributes, including the laboratory test name, laboratory test
result, unit of the laboratory test result, an indicator for whether the laboratory
test result is abnormal, and another indicator for whether the laboratory test result
is clinically significant when the laboratory test result is abnormal. Figure 4B shows a part of the structured NINDS ‘
Electrolyte Laboratory Tests Dictionary’
reference table. The Unit of Result attribute supports multiple units that are delimited by ‘^’. The Normal Range attribute is also separated according to the Unit of Result and is represented in JSON (Javascript object notation)-type encoding.
A dictionary cCDE can systematically capture the entire ‘Electrolyte Laboratory Tests’
data-entry document ‘DE:47571 Laboratory_Test_NINDS_Composite_DE,’ which is composed of six aCDEs
(Figure 4C
) that include a variable aCDE for Test, ‘DE:43938 Laboratory_Finding_Test_Name_DE,’ which functions as the foreign key to refer to
the primary key, and ‘
Lab
Test Name’
of the ‘
Electrolyte Laboratory Tests Dictionary’
table (Figure 4B
).</p>
Now that the dictionary cCDE (DE:47571) is related to the NINDS ‘Electrolyte Laboratory Tests’ dictionary table via the
variable aCDE (DE:43938), it provides a means to evaluate the validity of an input value to Result and Units for Result for a Test [‘Sodium (Na+)’] value of 138 mEq/L, with respect to the Normal Range (i.e., 135~145 mEq/L) provided by the dictionary table. The input value of Was test result abnormal? can also be input automatically using the biomedical knowledge provided by the dictionary
table. Moreover, when the value of Was test result abnormal? (DE:47566) is ‘Abnormal,’ the value of If abnormal, Clinically Significant? (DE:44135) can automatically be constrained to contain a value other than null. This constraint
can be encoded by a Dependent Rule, as shown in Figure 4C.
Figure 4C shows how a dictionary cCDE accompanied by its constraint rules are defined. For the two evaluation cases
listed in Figure 4B, both a Dictionary Rule and a Dependent Rule are defined by symbolic logic (or pseudocode) with the accompanying Descriptions. A Dictionary Rule defines how to use biomedical knowledge contained in a dictionary table, and a Dependent Rule defines the interrelatedness of aCDEs in a cCDE by using dependent constraint relationships.
3.5 DERIVED DATA
:
Constraints
We defined four constraints that support the creation of a robust clinical document
by specifying the interrelationship among many aCDEs.
We defined four classes of operators: assignment, arithmetic, logical, and relational.
Order can only be applied to aCDEs contained in a cCDE. However, the other tree
constraints (
operated,
required,and
dependent)
can be applied to independent aCDEs on a document
and those contained in a cCDE (Figure 1
). We created a symbolic logic with prefix notation [20] (Table 1 in the Supplementary Files) to describe the order of operations and to formulate constraints. More practical
examples are shown in Figure 5 to demonstrate how constraints are applied to a repeated cCDE as well. The four constraints are described as follows:
Operated. Table 1A
presents the standard BMI formula [BMI (in kg/m2) = weight / (height height)] in a prefix notation as
(/ CDE30 CDE31 CDE31 100 100)
, where CDE30 and CDE31 represent
Body Weight Value in kg and
Body Height Value in cm, respectively. Both the ‘
cm’
and ‘
m’
units of measurements can be supported by applying an
IF
conditional statement to manage different units:
(IF (= CDE31.unit_of_measure 'm') (/ CDE30 CDE31 CDE31) (/ CDE30 CDE31 CDE31 100 100))
.
Required. A
Required constraint applied to an aCDE means that the aCDE must have a value other than null.
Table 1B
lists the demographic information of a clinical document
constraining ‘
*Patient Age (
CDE40)’
and ‘
*Gender (
CDE41)’
as
required by the statement
(Required CDE40 CDE41)
.
Dependent. It might be necessary to dynamically enable or disable a certain aCDE according
to the value(s) of other aCDE(s). For example, a gender-specific CDE might only be
applied to subjects of the applicable gender. Table 1C
presents an example for checking whether a patient is a current (
CDE20) or past (
CDE21) smoker in order to obtain the age when tobacco use was started (
CDE22). A nonsmoker can conveniently skip
CDE22 if
(= CDE20 CDE21 'No')
by setting the value of CDE22 as null. In other words, a rule such as
(IF (or (!= CDE20 'Yes') (!= CDE21 'Yes')) CDE22 NULL)
can be imposed. Another constraint can be imposed to check illogical input values
such as
(= CDE20 CDE21 'Yes')
if necessary.
Order. The ordering of aCDEs (especially in a cCDE) is important for certain conditions
and contexts. CDEs in Table 1C
can be ordered by a constraint statement such as
(Ordered CDE20 CDE21 CDE22)
.
3.6 EVALUATION STUDY
To evaluate the usefulness of our newly extended composite semantic relationships,
we applied them to CDEs that were systematically extracted from five major clinical
documents used at
five teaching hospitals in Korea. The evaluation process consisted of the following
steps: CDE extraction, CDE integration by using the new aCDEs and cCDEs, and semantic
enrichment. We compared the number of CDEs extracted and integrated in each evaluation
steps as a measure of the structural and semantic efficiency of DEs on clinical documents
.
We first extracted 84, 48, 70, 83, and 37 CDEs from the following 5 clinical documents
used at Hospital A: admission note, initial medical examination note, discharge note,
emergency note, and operation note, respectively. We found that 95 (29.5%) of the
322 CDEs were reused in at least 2 of the 5 clinical documents, resulting in 227 unique
aCDEs. We then created clinically relevant cCDEs and applied semantic relationships
to them. Of the 84 aCDEs extracted from admission notes at Hospital A, 55 were successfully
captured by 10 created cCDEs. Finally, 16 cCDEs successfully captured 110 (48.5%)
of the 227 unique CDEs, such that 133 (=16 + 117) CDEs (41.3%) were sufficient to
represent the initial 322 CDEs extracted from the 5 clinical documents
used at Hospital A (
Table 2 in the Supplementary Files
).
In the CDE extraction step, we found that applying CDE is an effective way to reduce
redundant CDEs (22.2~37.9%) at each hospital. This means that there were many CDEs
shared across the five different documents used at each hospital. We found that an
even higher CDE reduction rate of 48.7% could be achieved by integrating the information
for all five hospitals, which indicates that various CDEs were commonly used across
the different hospitals. The CDE integration step involved integrating aCDEs into
clinically relevant cCDEs to further structure the clinical documents, and then integrating
the cCDEs across different clinical documents. For example, when a vital sign related
cCDE contained three aCDEs (‘body weight,’ ‘body temperature,’ and ‘blood pressure’)
and another vital sign related cCDE contained an additional aCDE (‘description the
reason of unstable vital sign’), we integrated this into a vital-sign cCDE comprising
four aCDEs. The application of these three steps constantly decreased the number of
CDEs. Supplementary
Tables S1–S3
list the cCDEs and how they were distributed in each document at each hospital. These
tables also provide a detailed view of how the 20 unique cCDEs comprised 327 sub-aCDEs.
The integrated CDEs not only reduced the number of CDEs, with a reuse ratio of up
to 46.9% [=(1142 – 20 – 586)/1142] (
Table 2
), but also greatly improved the semantic accuracy and interoperability.
a Number of CDEs extracted from each clinical document from each hospital
b Number of cCDEs created for each clinical document
c Number of aCDEs contained in bcCDEs
d Number of remaining aCDEs that are not contained in any of the cCDEs in each clinical
document
e Total number of CDEs consisting of bcCDEs and daCDEs that are not contained in any of the cCDEsin each clinical document
f Number of unique CDEs across the five clinical documents
g Reuse ratio of CDEs across the five documents
We found that the compositions of the clinical documents differed quite markedly across
the included hospitals. The clinical documents at Hospitals P and S contained the largest
(n=266) and smallest
(n=31) numbers of independent DEs, respectively. We also found that
even the same clinical documents showed huge variations in DE numbers, such as with
the number of admission notes varying from 12 at Hospital S to 204 at Hospital P.
Hospital P also had the largest number of aCDEs for initial medical examination notes
(n=123), while Hospital A had the largest number of aCDEs for emergency notes
(n=83) and operation notes
(n=37).
We also applied constraint rules for the five clinical documents used at the five
hospitals (
Table 3 in the Supplementary Files ). We could not determine if a DE was a
hybrid aCDE, partly due to the lack of actual input values and partly due to poor descriptions
of the response values for the clinical documents. We designated the cCDEs as
general cCDEs to distinguish them from
repeated and
dictionary cCDEs. A cCDE was on average reused twice among the five documents by the hospitals.
We also found that the clinical documents at Hospital A were the best structured and
contained the greatest detail, with more cCDEs and constraint rules compared to the
documents at the other hospitals.