Methodology for T&E Framework Development
Methodology for Literature Review
1. Objective and Scope The objective of this literature review is to systematically identify, evaluate, and synthesize peer-reviewed and grey literature relevant to the use of artificial intelligence systems in clinical trial design, matching, recruitment, monitoring, and analysis. The review specifically focuses on evidence that informs responsible AI evaluation dimensions, including safety and reliability, fairness and bias management, usefulness and efficacy, and usability within clinical trial workflows. The scope includes AI systems applied across the clinical trial lifecycle, with particular emphasis on:
Patient–trial matching and eligibility screening
Trial recruitment and enrollment optimization
Protocol feasibility and trial design support
Monitoring, data quality assurance, and outcome analysis
Preclinical-only AI systems and non-healthcare trial applications were excluded.
2. Review Design A structured narrative literature review approach was employed. This approach was selected to balance methodological rigor with flexibility, allowing incorporation of heterogeneous evidence types commonly found in clinical trial AI research, including empirical studies, validation studies, implementation reports, and regulatory or policy-oriented analyses. The review was conducted iteratively, with continuous refinement of inclusion criteria as domain understanding deepened.
3. Information Sources Literature was identified through searches across the following sources:
Biomedical Databases
PubMed / MEDLINE
Embase
Scopus
Interdisciplinary and Technical Databases
Google Scholar
IEEE Xplore
Grey Literature and Policy Sources
Regulatory guidance (e.g., FDA discussion papers, NIH reports)
Health AI evaluation frameworks and white papers
Conference proceedings and preprints where peer-reviewed evidence was limited
Reference lists of highly relevant papers were also manually reviewed to identify additional sources.
4. Search Strategy Search queries were constructed using combinations of controlled vocabulary terms and free-text keywords. Core concept clusters included:
Clinical Trials Concepts
“clinical trials,” “trial recruitment,” “eligibility screening,” “patient matching,” “trial monitoring,” “protocol design”
Artificial Intelligence Concepts
“artificial intelligence,” “machine learning,” “large language models,” “clinical decision support,” “algorithmic matching”
Evaluation and Ethics Concepts
“bias,” “fairness,” “safety,” “reliability,” “validation,” “explainability,” “human oversight”
Search strings were adapted to each database’s syntax. Searches were limited to English-language publications.
5. Inclusion and Exclusion Criteria Inclusion Criteria
Direct relevance to AI use in clinical trial workflows
Empirical evaluation, validation, or real-world implementation evidence
Discussion of risks, limitations, or performance considerations relevant to responsible AI
Publication in peer-reviewed journals or authoritative grey literature
Articles (cia arxiv.org or similar) which are preprints/postprints in scientific fields, acting as a free, open-access repository for immediate sharing of research papers, often before formal peer-reviewed journal publication
Exclusion Criteria * Preclinical or purely molecular trial simulations * Opinion pieces without supporting evidence * Studies unrelated to healthcare clinical trials * Non-AI digital tools lacking algorithmic decision-making
6. Screening and Selection Process Titles and abstracts were screened for relevance. Full-text review was conducted for sources that met initial inclusion criteria. Ambiguous cases were retained if they provided insight into evaluation challenges, deployment risks, or human–AI interaction concerns relevant to clinical trials. Priority was given to studies that:
Evaluated AI performance against human or standard-of-care baselines
Reported bias, error patterns, or subgroup performance differences
Addressed real-world deployment constraints or regulatory considerations
7. Data Extraction and Synthesis For each included source, the following information was extracted:
AI system type and intended function within the trial lifecycle
Study design and evaluation methodology
Reported performance metrics and limitations
Identified risks related to safety, bias, or reliability
Implications for clinical trial integrity, equity, and oversight
Findings were synthesized thematically and mapped to responsible AI evaluation dimensions relevant to the clinical trials use case.
8. Quality and Relevance Assessment Rather than relying solely on formal risk-of-bias tools, studies were assessed based on:
Transparency of methods and data sources
Appropriateness of evaluation metrics
Clinical relevance of the deployment context
Acknowledgment of limitations and failure modes
9. Limitations The review may underrepresent proprietary industry evaluations and unpublished internal trial data. Additionally, rapid advances in foundation models and trial automation tools mean that newer systems may not yet be fully reflected in peer-reviewed literature.
10. Output and Use The findings of this literature review are intended to directly inform:
Metric selection for responsible AI evaluation
Risk identification across the clinical trial AI lifecycle
Comparative analysis across use cases
Methodology for CHAI Member Submissions
In addition to the literature review conducted to gather published methods/metrics, the CHAI Program Management team queried members of the Clinical Trials Work Group asking for additional methods/metrics. The methodology of this approach is included below. Work Group members:
Review the use case charter and a standardized PowerPoint (PPT) template (developed by CHAI Program Management) to understand the scope of the clinical trials: AI-supported protocol data extraction and criteria matching use case.
Identify methods and metrics that can be used by Developers and/or Implementers to objectively evaluate AI solutions within this use case.
Populate the PPT template with:
Methods and metrics currently used within the member organization, and/or
Relevant methods and metrics identified through published literature, industry guidance, or other credible sources.
Follow the instructions provided within the PPT template for documenting each method and metric, including any supporting details, definitions, benchmarks, or references.
Submit the completed PPT template for consolidation, generalization, and anonymization into the work Group’s T&E Framework.