Open Questions for Future Research: A Medical AI Research Agenda for Ukrainian Healthcare
Abstract
After twelve weeks examining machine learning applications in medical imaging diagnosis, significant knowledge gaps remain that demand systematic investigation. This concluding article synthesizes open research questions emerging from our comprehensive review, organized across seven priority domains: generalization and distribution shift, algorithmic fairness and bias mitigation, human-AI collaboration optimization, regulatory science advancement, healthcare system integration, emerging model architectures, and low-resource context adaptation. We present a structured research agenda positioning Ukrainian healthcare institutions as contributors to global medical AI advancement while addressing locally relevant challenges. Each research domain includes specific questions amenable to empirical investigation, methodological considerations, and potential collaboration opportunities. The agenda particularly emphasizes questions relevant to middle-income healthcare systems navigating AI integration with constrained resources—a context underrepresented in current literature. This article serves as both a conclusion to our research series and a roadmap for continued scientific inquiry into the transformative potential of AI-augmented medical diagnosis.
articles published in this research series
priority research domains identified
specific research questions proposed
commercial AI products in medical imaging (Oct 2024)
1. Introduction
This article concludes a systematic twelve-week research program examining machine learning applications in medical imaging diagnosis for Ukrainian healthcare. Across 34 preceding articles, we analyzed global best practices, documented failed implementations, evaluated technical architectures, and developed practical frameworks for AI integration into clinical workflows. Yet every answer generated new questions—the hallmark of productive scientific inquiry.
Medical AI has evolved from laboratory curiosity to clinical reality, with 222+ commercial products available and 1,200+ FDA authorizations as of late 2024. However, this proliferation masks fundamental uncertainties: real-world validation remains sparse, deployment outside high-resource settings is poorly characterized, and the optimal modes of human-AI collaboration are still debated. A 2025 Lancet rapid scoping review concluded that “the potential benefits of AI are evident, but there is a paucity of evidence in real-world settings.”
For Ukrainian healthcare institutions—and similarly positioned systems in middle-income countries—these uncertainties compound with resource constraints, infrastructure limitations, and the need to adapt technologies developed primarily for Western contexts. This creates both challenges and opportunities: challenges in implementing AI responsibly with limited evidence, opportunities in contributing research that addresses globally underrepresented contexts.
1.1 Purpose and Scope
This research agenda aims to:
- Synthesize open questions emerging from our comprehensive literature review
- Prioritize research directions with highest potential for advancing medical AI implementation
- Identify questions particularly relevant to Ukrainian and middle-income healthcare contexts
- Propose methodological approaches and collaboration opportunities for each domain
- Position ScanLab and affiliated institutions as contributors to global medical AI research
2. Research Domain Overview
flowchart TB
subgraph Core["Core Technical Domains"]
A[Generalization &
Distribution Shift]
B[Fairness &
Bias Mitigation]
C[Emerging
Architectures]
end
subgraph Applied["Applied Research Domains"]
D[Human-AI
Collaboration]
E[Healthcare
Integration]
F[Regulatory
Science]
end
subgraph Context["Context-Specific"]
G[Low-Resource
Adaptation]
end
A --> D
B --> D
C --> D
D --> E
E --> F
F --> G
G --> A
style G fill:#ff9800,color:#000
| Domain | Key Challenge | Questions | Priority |
|---|---|---|---|
| Generalization & Distribution Shift | Models fail when deployment differs from training | 8 | Critical |
| Fairness & Bias | Disparate performance across populations | 6 | Critical |
| Human-AI Collaboration | Optimizing combined human-machine performance | 7 | High |
| Regulatory Science | Appropriate oversight for adaptive systems | 5 | High |
| Healthcare Integration | Real-world implementation challenges | 6 | High |
| Emerging Architectures | Foundation models, multimodal systems | 5 | Medium |
| Low-Resource Adaptation | Deployment in resource-constrained settings | 5 | Critical (for Ukraine) |
3. Domain 1: Generalization and Distribution Shift
The fundamental challenge of medical AI deployment is generalization: models trained on specific datasets frequently fail when applied to populations, equipment, or protocols that differ from training conditions. This “distribution shift” problem undermines the validity of reported performance metrics and creates patient safety risks.
🔬 Research Priority Statement
Developing robust methods for detecting, measuring, and adapting to distribution shift is the single most important technical challenge for medical AI deployment.
3.1 Open Research Questions
| # | Question | Current State | Approach |
|---|---|---|---|
| G1 | What metrics best predict when a trained model will fail on new data? | Uncertainty quantification emerging but not validated | Prospective deployment studies with stratified analysis |
| G2 | How much local data is required for effective domain adaptation? | Varies widely; no systematic characterization | Multi-site transfer learning experiments |
| G3 | Can we detect distribution shift in real-time during clinical deployment? | Prototype monitoring systems exist; clinical validation lacking | Prospective monitoring with ground truth collection |
| G4 | What imaging acquisition parameters most strongly influence model generalization? | Manufacturer, resolution known factors; comprehensive mapping absent | Controlled variation studies across equipment types |
| G5 | How do federated learning models generalize compared to centrally trained models? | Theoretical advantages; real-world comparisons sparse | Matched federated vs. centralized training experiments |
| G6 | What is the optimal frequency for model retraining or recalibration? | Ad hoc approaches; no evidence-based guidelines | Longitudinal performance monitoring with retraining triggers |
| G7 | Can synthetic data augmentation improve generalization to unseen domains? | Mixed results; context-dependent effectiveness | Systematic augmentation strategy comparison |
| G8 | How do models trained on Western populations perform on Ukrainian patient demographics? | Unknown; no published Ukrainian validation studies | External validation on Ukrainian datasets |
3.2 Ukrainian Research Opportunities
Question G8 represents a high-priority opportunity for Ukrainian researchers. The absence of published validation studies on Eastern European populations creates a gap that local institutions can uniquely address. Proposed approach:
- Partner with ScanLab implementation sites to collect prospective data
- Apply leading FDA-cleared algorithms to Ukrainian imaging datasets
- Report stratified performance by demographic and clinical characteristics
- Characterize distribution shift from training populations (typically US/EU)
- Develop domain adaptation protocols if performance degradation observed
4. Domain 2: Algorithmic Fairness and Bias Mitigation
AI systems can perpetuate and amplify existing healthcare disparities. Representation bias in training data, surrogate endpoint selection, and differential access to AI-enabled care create risks of inequitable outcomes. The 2025 NCBI Watch List noted that “existing health care data can lead to AI systems perpetuating biases and exacerbating disparities.”
flowchart LR
subgraph Sources["Bias Sources"]
A[Historical Data
Reflects past disparities]
B[Representation
Training population limits]
C[Measurement
Proxy variable bias]
D[Aggregation
Hidden subgroup effects]
end
subgraph Manifestation["Manifestation"]
E[Differential
Performance]
F[Systematically
Worse Outcomes]
end
subgraph Impact["Patient Impact"]
G[Delayed
Diagnosis]
H[Inappropriate
Treatment]
I[Widened
Disparities]
end
A --> E
B --> E
C --> F
D --> F
E --> G
F --> H
G --> I
H --> I
4.1 Open Research Questions
| # | Question | Significance |
|---|---|---|
| F1 | What fairness metrics are most clinically meaningful for medical AI? | Statistical fairness may conflict with clinical utility |
| F2 | How can we audit deployed AI systems for bias without demographic labels in clinical data? | Many EHR systems lack complete demographic information |
| F3 | What minimum dataset diversity is required for equitable performance across populations? | Practical guidance for dataset curation lacking |
| F4 | Can post-hoc fairness interventions (threshold adjustment, recalibration) adequately address training data bias? | Trade-offs between approaches poorly characterized |
| F5 | How do fairness considerations interact with regulatory approval processes? | FDA guidance emerging but implementation unclear |
| F6 | What are the unique bias risks for medical AI deployed in Eastern European contexts? | Most bias research focuses on US racial categories |
📊 Ukrainian Context
Question F6 highlights a significant research gap. Western bias frameworks focus on US-centric categories (Black/White/Hispanic), which may not capture relevant disparities in Ukrainian healthcare. Alternative dimensions requiring investigation include:
- Urban-rural disparities: Differential access to modern imaging equipment
- Age-related demographics: Older population with different disease presentations
- Socioeconomic factors: Access to private vs. public healthcare systems
- Historical exposures: Chornobyl-related health patterns, tuberculosis prevalence
5. Domain 3: Human-AI Collaboration Optimization
Medical AI systems are not autonomous—they augment human decision-making. Yet optimal collaboration modes remain poorly understood. Over-reliance (automation bias) and under-reliance both degrade combined performance. The fundamental question: how do we achieve human-AI teams that outperform either alone?
flowchart TD
subgraph Input["Clinical Input"]
A[Medical Image]
B[Clinical Context]
end
subgraph Processing["Parallel Processing"]
C[AI Analysis]
D[Human Analysis]
end
subgraph Integration["Integration Challenge"]
E{How to Combine?}
end
subgraph Outcomes["Performance Outcomes"]
F[Combined > Either Alone?]
G[Trust Calibration?]
H[Cognitive Load?]
end
A --> C
A --> D
B --> D
C --> E
D --> E
E --> F
E --> G
E --> H
5.1 Open Research Questions
| # | Question | Research Approach |
|---|---|---|
| H1 | Under what conditions does AI assistance improve versus degrade physician diagnostic accuracy? | Randomized trials with stratification by case difficulty and physician experience |
| H2 | What interface designs promote appropriate reliance (neither over- nor under-trust)? | Usability studies with trust calibration metrics |
| H3 | How should AI confidence/uncertainty be communicated to support clinical decision-making? | A/B testing of uncertainty visualization approaches |
| H4 | Does AI-first versus AI-second workflow sequencing affect diagnostic outcomes? | Crossover studies comparing workflow variants |
| H5 | What training interventions most effectively calibrate physician trust in AI? | Educational intervention trials with longitudinal follow-up |
| H6 | How do time pressures affect physician interaction with AI recommendations? | Observational studies with workload tracking |
| H7 | What explainability methods most effectively support physician override decisions? | Comparative studies of explanation types (attention maps, concept-based, etc.) |
6. Domain 4: Regulatory Science Advancement
Current regulatory frameworks were designed for static medical devices, not adaptive AI systems that may learn continuously from deployment data. The FDA’s 2021 AI/ML-Based Software as Medical Device Action Plan acknowledged this gap but implementation details remain uncertain.
⚖️ Regulatory Challenge
How do we balance innovation-enabling flexibility with safety-ensuring oversight for AI systems whose behavior may change post-approval?
6.1 Open Research Questions
| # | Question | Stakeholders |
|---|---|---|
| R1 | What post-market surveillance methodologies adequately detect performance degradation? | Regulators, healthcare systems, AI vendors |
| R2 | How should predetermined change control plans specify “locked” boundaries for continuous learning? | FDA/notified bodies, developers |
| R3 | What evidence standards should apply to AI systems validated on synthetic or augmented data? | Regulators, researchers |
| R4 | How can regulatory harmonization be achieved across FDA, EU MDR, and emerging frameworks (e.g., Ukrainian MHSU)? | International regulators, global vendors |
| R5 | What liability frameworks appropriately allocate responsibility for AI-assisted diagnostic errors? | Legal systems, insurers, clinicians |
7. Domain 5: Healthcare System Integration
Even technically excellent AI systems may fail in deployment due to workflow integration challenges. The 81% of hospitals without AI utilization—despite available tools—suggests implementation science questions are as important as technical ones.
7.1 Open Research Questions
| # | Question | Measurement Approach |
|---|---|---|
| I1 | What organizational factors predict successful AI adoption in imaging departments? | Multi-site implementation studies with organizational surveys |
| I2 | How does AI integration affect radiologist workload in real-world settings (efficiency gains vs. alert fatigue)? | Time-motion studies, cognitive load assessment |
| I3 | What PACS integration architectures optimize AI utilization while minimizing workflow disruption? | Comparative implementation studies across architecture patterns |
| I4 | How do patients respond to disclosure of AI involvement in their diagnosis? | Patient surveys, communication effectiveness studies |
| I5 | What change management approaches most effectively reduce physician resistance to AI adoption? | Intervention trials with adoption and utilization metrics |
| I6 | How should AI integration be sequenced across diagnostic modalities for maximum impact? | Phased implementation studies with outcome tracking |
8. Domain 6: Emerging Model Architectures
Foundation models and multimodal architectures represent the next frontier in medical AI. These systems—trained on massive datasets and capable of cross-task generalization—may fundamentally change how AI is deployed in healthcare. A 2025 review noted the “shift toward generalizable foundational models” as a defining trend.
flowchart LR
subgraph Past["2012-2020"]
A[Task-Specific CNNs]
end
subgraph Present["2020-2025"]
B[Vision Transformers]
C[Transfer Learning]
end
subgraph Emerging["2025+"]
D[Foundation Models]
E[Multimodal Integration]
F[Generative AI]
end
subgraph Questions["Open Questions"]
G[Clinical Utility?]
H[Safety Guarantees?]
I[Regulation?]
end
A --> B
B --> D
C --> E
D --> G
E --> H
F --> I
8.1 Open Research Questions
| # | Question | Technical Challenge |
|---|---|---|
| E1 | Do medical foundation models outperform task-specific models for clinical diagnostic tasks? | Fair comparison methodology with matched computational resources |
| E2 | How should multimodal AI systems integrating imaging with clinical data be validated? | Dataset requirements, performance attribution across modalities |
| E3 | What are the hallucination risks of generative AI in medical imaging contexts? | Detecting plausible but incorrect generated content |
| E4 | Can LLM-based reasoning enhance clinical interpretation of AI imaging findings? | Integration architecture, reliability assessment |
| E5 | What computational infrastructure is minimally required for foundation model deployment? | Resource profiling for edge deployment in resource-limited settings |
9. Domain 7: Low-Resource Context Adaptation
Most medical AI research and development occurs in high-income settings. This creates systematic gaps for middle- and low-income healthcare systems that must adapt rather than adopt AI technologies. Ukraine’s position as a middle-income country with strong technical capacity creates an opportunity to lead research addressing this underserved area.
🌍 Global Health Relevance
Research addressing AI deployment in resource-constrained settings has outsized impact potential—the majority of the world’s population receives healthcare in contexts more similar to Ukraine than to the US/EU systems where most AI development occurs.
9.1 Open Research Questions
| # | Question | Ukrainian Relevance |
|---|---|---|
| L1 | What is the minimum dataset size for effective local fine-tuning of pre-trained medical AI models? | Critical for ScanLab adaptation with limited local data |
| L2 | How do infrastructure limitations (connectivity, computational resources) affect AI deployment strategies? | Varies significantly across Ukrainian regions |
| L3 | Can AI systems designed for modern equipment effectively process images from older/legacy scanners? | Many Ukrainian facilities use older imaging equipment |
| L4 | What cost-effectiveness thresholds justify AI investment in middle-income healthcare systems? | Essential for Ukrainian health economics analysis |
| L5 | How should AI curricula be adapted for healthcare workforces with limited prior AI exposure? | Training curriculum design for Ukrainian physicians |
10. Research Methodology Recommendations
10.1 Study Design Priorities
Based on our review, several methodological approaches are underutilized in current medical AI research:
- Prospective real-world validation: Most published studies use retrospective data; prospective deployment studies urgently needed
- Randomized implementation trials: Compare AI-assisted versus standard care with appropriate blinding
- Multi-site external validation: Test generalization across diverse healthcare settings
- Longitudinal outcome tracking: Assess long-term patient outcomes, not just immediate diagnostic accuracy
- Health economic evaluation: Integrate cost-effectiveness analysis into AI evaluation studies
10.2 Reporting Standards
Adoption of standardized reporting frameworks will strengthen the evidence base:
| Framework | Application | Status |
|---|---|---|
| CLAIM 2024 | AI imaging research reporting | Updated, widely adopted |
| CONSORT-AI | AI clinical trials | Published, adoption growing |
| TRIPOD+AI | Prediction model development | Published 2024 |
| SPIRIT-AI | AI intervention trial protocols | Published, recommended |
| DECIDE-AI | Early clinical evaluation | Emerging framework |
11. Ukrainian Research Ecosystem Positioning
11.1 Institutional Collaboration Framework
flowchart TB
subgraph Academic["Academic Partners"]
A[ONPU Economics
Cybernetics]
B[Medical
Universities]
C[Computer Science
Departments]
end
subgraph Clinical["Clinical Partners"]
D[Regional
Hospitals]
E[Diagnostic
Centers]
F[ScanLab Pilot
Sites]
end
subgraph Industry["Industry Partners"]
G[AI Vendors]
H[PACS Providers]
I[Stabilarity Hub]
end
subgraph International["International Network"]
J[EU Research
Consortia]
K[WHO Digital
Health]
L[RSNA/ESR
Research]
end
A <--> D
B <--> E
C <--> I
F <--> I
I <--> J
A <--> K
B <--> L
11.2 Priority Research Projects
Based on the open questions and Ukrainian context, we propose five priority research projects:
- Ukrainian External Validation Study: Systematic evaluation of FDA-cleared AI systems on Ukrainian patient populations (Questions G8, F6)
- ScanLab Implementation Science: Prospective study of AI integration workflow effects at pilot sites (Questions I1, I2, H1)
- Legacy Equipment Adaptation: Characterizing AI performance on older imaging equipment common in Ukrainian facilities (Question L3)
- Ukrainian AI Curriculum Effectiveness: Randomized evaluation of training program impact on appropriate AI utilization (Questions H5, L5)
- Cost-Effectiveness in Middle-Income Context: Health economic analysis of AI deployment in Ukrainian healthcare system (Question L4)
12. Conclusion: A Research Manifesto
After twelve weeks examining machine learning in medical imaging, we are left with more questions than answers—a sign of productive scientific inquiry. The 42 open questions presented across seven domains represent not a failure of current research but an honest assessment of the maturation stage of medical AI.
Key conclusions from our research agenda:
- Generalization remains the fundamental challenge. Until we can reliably predict when AI systems will fail, deployment carries irreducible uncertainty.
- Human-AI collaboration is under-studied. We have invested heavily in algorithm development but insufficiently in understanding how humans actually interact with AI recommendations.
- Regulatory frameworks must evolve. Current oversight mechanisms are inadequate for adaptive, continuously learning systems.
- Middle-income contexts are underrepresented. Most research reflects high-resource settings; Ukraine and similar countries can contribute unique perspectives.
- Implementation science matters. Technical performance means nothing if AI systems are not adopted and used appropriately in clinical practice.
📢 Call to Action
We invite Ukrainian researchers, clinicians, and healthcare administrators to join this research agenda. The questions outlined here require collaborative effort across institutions and disciplines. Through rigorous investigation of these open problems, Ukraine can position itself as a contributor to—not merely a consumer of—global medical AI advancement.
The 35 articles in this series have laid groundwork; the research agenda presented here charts the path forward. Medical AI’s transformative potential for Ukrainian healthcare will only be realized through sustained, systematic inquiry into the questions that remain unanswered.
References
- Lancet EClinicalMedicine. Artificial intelligence for diagnostics in radiology practice: a rapid systematic scoping review. eClinicalMedicine. 2025. DOI: 10.1016/S2589-5370(25)00160-9
- NCBI. 2025 Watch List: Artificial Intelligence in Health Care. NCBI Bookshelf. 2025. Available at: ncbi.nlm.nih.gov/books/NBK613808
- Kowalczuk M, et al. Artificial Intelligence-Empowered Radiology—Current Status and Critical Review. Diagnostics. 2025;15:282. DOI: 10.3390/diagnostics15030282
- Raji ID, et al. AI-driven healthcare: Fairness in AI healthcare: A survey. PLOS Digital Health. 2025. DOI: 10.1371/journal.pdig.0000864
- Chen IY, et al. Algorithm fairness in artificial intelligence for medicine and healthcare. Nature Medicine. 2023;29:2933-2942. DOI: 10.1038/s41591-023-02628-3
- Rajpurkar P, et al. The future of multimodal artificial intelligence models for integrating imaging and clinical metadata. Diagnostic Interventional Radiology. 2025. DOI: 10.4274/dir.2024.242631
- Haug CJ, Drazen JM. Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. NEJM. 2023;388:1201-1208. DOI: 10.1056/NEJMra2302038
- Mongan J, et al. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiology: AI. 2024;6:e240300. DOI: 10.1148/ryai.240300
- FDA. Artificial Intelligence and Machine Learning (AI/ML)-Based Software as a Medical Device Action Plan. 2021. Available at: fda.gov/medical-devices
- WHO. Ethics and Governance of Artificial Intelligence for Health. WHO Guidance. 2021. ISBN: 978-92-4-003575-9
- Topol EJ. High-Performance Medicine: The Convergence of Human and AI. Nature Medicine. 2019;25:44-56. DOI: 10.1038/s41591-018-0300-7
- Kelly CJ, et al. Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine. 2019;17:195. DOI: 10.1186/s12916-019-1426-2
- Liu X, et al. Reporting guidelines for clinical trials evaluating artificial intelligence interventions: CONSORT-AI. Nature Medicine. 2020;26:1364-1374. DOI: 10.1038/s41591-020-1034-x
- Collins GS, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models. BMJ. 2024;384:e078378. DOI: 10.1136/bmj-2023-078378
- Park SH, Han K. Methodologic Guide for Evaluating Clinical Performance and Effect of AI. Radiology. 2018;286:800-809. DOI: 10.1148/radiol.2017171920
- Shen J, et al. Artificial Intelligence versus Clinicians in Disease Diagnosis: Systematic Review. JMIR Medical Informatics. 2019;7:e10010. DOI: 10.2196/10010
- Vasey B, et al. Reporting guideline for the early-stage clinical evaluation of decision support systems driven by AI: DECIDE-AI. Nature Medicine. 2022;28:924-933. DOI: 10.1038/s41591-022-01772-9
- Celi LA, et al. Sources of Bias in AI for Health Care. NEJM AI. 2024;1:AIra2300028. DOI: 10.1056/AIra2300028
- Obermeyer Z, et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366:447-453. DOI: 10.1126/science.aax2342
- Finlayson SG, et al. The Clinician and Dataset Shift in AI. NEJM. 2021;385:283-286. DOI: 10.1056/NEJMc2104626
- Esteva A, et al. A guide to deep learning in healthcare. Nature Medicine. 2019;25:24-29. DOI: 10.1038/s41591-018-0316-z
- Cabitza F, et al. Unintended Consequences of Machine Learning in Medicine. JAMA. 2017;318:517-518. DOI: 10.1001/jama.2017.7797
- Wong TY, Bressler NM. Artificial Intelligence With Deep Learning Technology Looks Into Diabetic Retinopathy Screening. JAMA. 2016;316:2366-2367. DOI: 10.1001/jama.2016.17563
- Rajpurkar P, et al. AI in health and medicine. Nature Medicine. 2022;28:31-38. DOI: 10.1038/s41591-021-01614-0
- He J, et al. The practical implementation of artificial intelligence technologies in medicine. Nature Medicine. 2019;25:30-36. DOI: 10.1038/s41591-018-0307-0