Architecture

Architecture

The architecture of AiMe evolved as we addressed many of the challenges during the development process. We believe now that the prototype and production version should follow this architecture to create a robust, safe and secure system.

We will describe each of the components and their role below.

Aime Architecture showing the major components and the data flows.

Data Flows

  1. Audio speech file is passed to the Input Dialog Processor.
  2. The converted text to send to the AI Model.
  3. The AI model may use specific functions exposed by the Firewall.
  4. The Firewall makes function calls that are specific to the clinical system in use.
  5. Clinical System specific data is passed back to the firewall.
  6. The Firewall passes a textual response to the AI Model.
  7. The Model sends a text response to the Output Dialog Processor.
  8. An audio stream created by the Output Processor.
  9. A channel whereby the Output Processor can replace pseudo data with real data. The Firewall maintains the mapping between pseudo and real data.

The overall operation of AiMe consists of a two way conversation with the patient. The conversation starts with AiMe speaking an introduction. From then on it’s a case of the patient speaking, processing that speech and then replying to the patient. This back and forth continues to the end of the call.

Input Dialog Processing. This component controls all the input to the AI Model. The AI Model only works in text mode. All input to the model is text and all responses from the model are text.

When the patient speaks, the Input Dialog Processor converts the speech to text using an external Speech To Text processor. It then takes the resultant text and sends it, together with a set of ‘prompts’, to the AI Model to respond to. The prompts are instructions to the AI Model describing the context of the patient text and the rules for replying.

AI Conversation Model. The AI engine we have used in the protoype is Anthropic’s Claude Haiku 3.5. This model specialises in conversations and it’s fast, so it can reply to dialog rapidly to maintain a near normal conversation. It also is not trained on user data so doesn’t update itself based on the conversations it has, a crucial safety feature. See more on Safety and Privacy of Patient Information.

When the AI Model is processing a patient dialog it can reply immediately to the patient, or it may need to perform a task in the clinical system before replying. For example, if a patient asks if the practice is currently open, the AI model can reply immediately as it knows this information from the prompts. On the other hand, if the patient asks for an appointment the AI Model will need to call the clinical system to ask for the available appointment times before replying. Only when the clinical system responds will the AI Model reply to the patient.

Clinical System Firewall. This module is the interface between the AI Model and the clinical system. This is designed to restrict access to the clinical system to protect patient data. It does this while allowing the system to perform normal receptionist tasks such as making and cancelling appointments and leaving messages for clinicians. There is more on the functions this module performs in the section on Privacy of Patient Information.

The firewall also processes responses from the Clinical System and converts them to text before handing them to the AI Model. It performs a crucial role in filtering some sensitive data and replacing it with pseudo data. More details on this process can be found in the section on Privacy of Patient Information.

Clinical System. This is the internal practice system such as EMIS, SystmOne, Vision and others. As far as AiMe is concerned, the clinical system maintains the list of patients and their medical data as well as the appointment calendars.

Output Dialog Processor. This module takes the (text) replies from the AI model and converts them to speech using an external Text To Speech service before playing the speech to the patient. It also processes the text beforehand in order to convert pseudo data to real data. This is part of the process to protect patient confidentiality and more can be found in Protecting Patient Confidentiality.