To train the MST ARE4™ system, we initially collected approximately 300,000 de-identified medical records from a twelve-year range (2003-2015). We used a previously manually annotated dataset of about 29,000 records. In addition, the data was from several research hospital systems for maximum data variability.
Our approach is to have the system learn extraction patterns from pairs of reports with the associated values; accuracy depends on the annotations with which we have to work. The good news is, high quality annotations are available from previous studies
By using our state-of-the-art AI techniques, our hybrid rules-and-autodidactic method achieves accuracy of over 93% (per report) and 98% (per category).
The ARE4™ system addresses issues such as error propagation, spelling and grammar mistakes and the wide variation of expressing the concept of negation in English.
To develop the categories of information, we use the College of American Pathologists’ reporting system as our standard.