Methodology for T&E Framework Development

Methodology for Literature Review

1. Objective and Scope The objective of this literature review is to systematically identify, evaluate, and synthesize peer-reviewed and grey literature relevant to the use of artificial intelligence systems in clinical trial design, matching, recruitment, monitoring, and analysis. The review specifically focuses on evidence that informs responsible AI evaluation dimensions, including safety and reliability, fairness and bias management, usefulness and efficacy, and usability within clinical trial workflows. The scope includes AI systems applied across the clinical trial lifecycle, with particular emphasis on:

  • Patient–trial matching and eligibility screening

  • Trial recruitment and enrollment optimization

  • Protocol feasibility and trial design support

  • Monitoring, data quality assurance, and outcome analysis

Preclinical-only AI systems and non-healthcare trial applications were excluded.

2. Review Design A structured narrative literature review approach was employed. This approach was selected to balance methodological rigor with flexibility, allowing incorporation of heterogeneous evidence types commonly found in clinical trial AI research, including empirical studies, validation studies, implementation reports, and regulatory or policy-oriented analyses. The review was conducted iteratively, with continuous refinement of inclusion criteria as domain understanding deepened.

3. Information Sources Literature was identified through searches across the following sources:

  • Biomedical Databases

    • PubMed / MEDLINE

    • Embase

    • Scopus

  • Interdisciplinary and Technical Databases

    • Google Scholar

    • IEEE Xplore

  • Grey Literature and Policy Sources

    • Regulatory guidance (e.g., FDA discussion papers, NIH reports)

    • Health AI evaluation frameworks and white papers

    • Conference proceedings and preprints where peer-reviewed evidence was limited

Reference lists of highly relevant papers were also manually reviewed to identify additional sources.

4. Search Strategy Search queries were constructed using combinations of controlled vocabulary terms and free-text keywords. Core concept clusters included:

  • Clinical Trials Concepts

    • “clinical trials,” “trial recruitment,” “eligibility screening,” “patient matching,” “trial monitoring,” “protocol design”

  • Artificial Intelligence Concepts

    • “artificial intelligence,” “machine learning,” “large language models,” “clinical decision support,” “algorithmic matching”

  • Evaluation and Ethics Concepts

    • “bias,” “fairness,” “safety,” “reliability,” “validation,” “explainability,” “human oversight”

Search strings were adapted to each database’s syntax. Searches were limited to English-language publications.

5. Inclusion and Exclusion Criteria Inclusion Criteria

  • Direct relevance to AI use in clinical trial workflows

  • Empirical evaluation, validation, or real-world implementation evidence

  • Discussion of risks, limitations, or performance considerations relevant to responsible AI

  • Publication in peer-reviewed journals or authoritative grey literature

  • Articles (cia arxiv.org or similar) which are preprints/postprints in scientific fields, acting as a free, open-access repository for immediate sharing of research papers, often before formal peer-reviewed journal publication

Exclusion Criteria * Preclinical or purely molecular trial simulations * Opinion pieces without supporting evidence * Studies unrelated to healthcare clinical trials * Non-AI digital tools lacking algorithmic decision-making

6. Screening and Selection Process Titles and abstracts were screened for relevance. Full-text review was conducted for sources that met initial inclusion criteria. Ambiguous cases were retained if they provided insight into evaluation challenges, deployment risks, or human–AI interaction concerns relevant to clinical trials. Priority was given to studies that:

  • Evaluated AI performance against human or standard-of-care baselines

  • Reported bias, error patterns, or subgroup performance differences

  • Addressed real-world deployment constraints or regulatory considerations

7. Data Extraction and Synthesis For each included source, the following information was extracted:

  • AI system type and intended function within the trial lifecycle

  • Study design and evaluation methodology

  • Reported performance metrics and limitations

  • Identified risks related to safety, bias, or reliability

  • Implications for clinical trial integrity, equity, and oversight

Findings were synthesized thematically and mapped to responsible AI evaluation dimensions relevant to the clinical trials use case.

8. Quality and Relevance Assessment Rather than relying solely on formal risk-of-bias tools, studies were assessed based on:

  • Transparency of methods and data sources

  • Appropriateness of evaluation metrics

  • Clinical relevance of the deployment context

  • Acknowledgment of limitations and failure modes

9. Limitations The review may underrepresent proprietary industry evaluations and unpublished internal trial data. Additionally, rapid advances in foundation models and trial automation tools mean that newer systems may not yet be fully reflected in peer-reviewed literature.

10. Output and Use The findings of this literature review are intended to directly inform:

  • Metric selection for responsible AI evaluation

  • Risk identification across the clinical trial AI lifecycle

  • Comparative analysis across use cases

Methodology for CHAI Member Submissions

In addition to the literature review conducted to gather published methods/metrics, the CHAI Program Management team queried members of the Clinical Trials Work Group asking for additional methods/metrics. The methodology of this approach is included below. Work Group members:

  1. Review the use case charter and a standardized PowerPoint (PPT) template (developed by CHAI Program Management) to understand the scope of the clinical trials: AI-supported protocol data extraction and criteria matching use case.

  2. Identify methods and metrics that can be used by Developers and/or Implementers to objectively evaluate AI solutions within this use case.

  3. Populate the PPT template with:

    • Methods and metrics currently used within the member organization, and/or

    • Relevant methods and metrics identified through published literature, industry guidance, or other credible sources.

  4. Follow the instructions provided within the PPT template for documenting each method and metric, including any supporting details, definitions, benchmarks, or references.

  5. Submit the completed PPT template for consolidation, generalization, and anonymization into the work Group’s T&E Framework.