Broadly speaking, there are two types of text processing methodologies. One method is a rule-based computational linguistics model. Rule-based systems need to be explicitly coded but can be very effective and efficient on certain tasks, like tokenization, sentence breaking or stemming. A treebank provides the framework for our part-of-speech tagging, parsing, and semantic analysis.
Another approach is to use statistical models which make “soft” decisions based on assigning a weighting to each input. This deep learning, statistical model results in a more reliable result, as the relative certainty of different answers can be expressed. A drawback of deep learning is that it requires a vast amount of data to make decisions.
Natural language processing experts debate about which methodology is better. The truth is, each approach has its strengths and weaknesses.
To maximize the power of our ARE4 system, we’ve taken a hybrid approach, using a computational linguistics rules-based engine combined with a deep learning model.
The hybrid approach allows the system to learn with less data than is traditionally needed with deep learning methods which typically need many more instances of a particular token than is sometimes seen in the often limited data available for a biomarker. By preprocessing using computational linguistics, our hybrid system enables the benefits of deep learning.
We begin by analyzing the text for individual concepts. In this process layer we use pattern matching, artificial intelligence, linguistics, statistical analysis, and a variety of ontologies, both public and proprietary. Then we incorporate this layer of input into our deep learning algorithms.
In addition, our proprietary artificial intelligence autodidactic system in ARE4 literally learns and becomes “smarter.” It becomes continually better.
Subsystems handle other components of artificial intelligence. For example, the “topic relationship” engine converts various abbreviations and versions of the same concept into one standardized version.
Our modular system design makes it easy to customize and target our software for a particular need or focus.