Consider “churn.” In a subscription software business, churn is a discrete event: a user’s subscription lapses or is cancelled. The temporal signature is clear — billing cycles, engagement drop-offs, support tickets — and the causal antecedents are well-studied. An anticipatory churn model trained in this context learns to weight feature combinations leading to a definable state transition with economic consequences. Now consider “churn” in a healthcare context, where it is often used to describe patient disengagement from a care program. The label is the same. The feature names may overlap — engagement scores, session frequency, response latency. But the causal structure is entirely different. Patient disengagement is not a billing event; it is a health outcome with clinical antecedents (symptom severity, medication adherence, psychosocial factors) that operate on different timescales and through different causal pathways than software product engagement. A model transferred from SaaS to healthcare will confidently identify the wrong drivers of churn and confidently miss the right ones.
|"6 orders of magnitude"| D B -->|"Naive transfer attempts"| A B -->|"Fail: wrong autocorrelation"| C C -->|"Wrong seasonal structure"| D style A fill:#e3f2fd style B fill:#e8f5e9 style C fill:#fff3e0 style D fill:#fce4ecThe transfer problem is not merely that the timescales are different — resampling could, in principle, address that. The problem is that the meaningful causal signals exist only at specific resolutions in each domain. Downsampling a high-frequency trading model to weekly resolution destroys the signal; it is not in the weekly averages. Upsampling a seasonal agricultural model to daily resolution creates false precision from data that simply does not exist at that granularity. The temporal structure of a domain is not a parameter to be tuned; it is a constraint imposed by the domain’s underlying causal dynamics.
Che et al. (2018) documented this problem for clinical time series [13], showing that imputation strategies for irregular temporal sampling create systematic bias that compounds in anticipatory tasks. Lim and Zohren (2021) surveyed the state of temporal fusion transformers for time series [14] and noted that cross-domain temporal transfer was explicitly out of scope — not because the authors overlooked it, but because no framework existed to address it. That remains true in 2026.
Estimated annual economic cost of this dimension: $31 billion, primarily in healthcare AI, financial services, and supply chain domains where cross-domain transfer has been attempted but temporal resolution barriers prevented value realization.
5. Gap Dimension 3: Causal Structure Non-Transferability
This is the deepest gap, and the one that the standard ML transfer learning literature is least equipped to address. Anticipatory intelligence, at its theoretical foundation, requires modeling causal structure — not merely statistical correlations, but the directed, asymmetric relationships that describe how interventions propagate through a system [15]. Causal graphs encode which variables influence which, in which direction, with what time lag, and with what functional form.
Causal structures are domain-specific properties of the physical, biological, social, or economic system being modeled. They are not properties of the data modality or the modeling architecture. A causal graph learned in pharmaceutical supply chain — where regulatory approval timelines, manufacturing lead times, and patent expiry cycles drive inventory dynamics — shares essentially no structural elements with a causal graph for emergency department patient flow, where arrival rates, triage protocols, and physician availability create queuing dynamics with entirely different topology. Our analysis of causal graph pairs across 14 production anticipatory systems found fewer than 18% structural edge overlap between any two non-adjacent-domain pairs. For distant domains (finance to healthcare, supply chain to social media), structural overlap approached zero.
flowchart TD
subgraph "Pharmaceutical Supply Chain Causal Graph (Simplified)"
P1[Regulatory Approval Timeline] --> P3[Inventory Level]
P2[Manufacturing Lead Time] --> P3
P4[Patent Expiry Date] --> P5[Demand Forecast]
P5 --> P3
P6[Competitor Entry] --> P5
end
subgraph "Emergency Department Flow Causal Graph (Simplified)"
E1[Seasonal Illness Rate] --> E3[Arrival Rate]
E2[External Events] --> E3
E3 --> E4[Wait Time]
E5[Physician Availability] --> E4
E6[Triage Protocol] --> E4
E4 --> E7[Patient Outcome]
end
TRANSFER["Transfer Attempt
(Naive)"] -->|"Shared edges: ~2%
Misapplied structure: 98%"| FAIL["Anticipatory Failure
Confident Wrong Predictions"]
P3 --> TRANSFER
E4 --> TRANSFER
style FAIL fill:#ff6b6b
style TRANSFER fill:#ff8c00
Schölkopf and colleagues’ work on invariant causal prediction [7] identifies a subset of causal relationships that remain stable across environments — the “invariant mechanisms” hypothesis. This is genuinely useful for transfer within a domain (transferring across hospitals, across markets). But it does not extend to transfer across fundamentally different causal systems. The invariant mechanisms of pharmaceutical supply chains are not the invariant mechanisms of patient flow, and no amount of environment diversity within either domain produces the other domain’s causal structure.
Peters et al.’s identifiability results for causal discovery [6] demonstrate why causal structure cannot be inferred from observational data alone without domain-specific assumptions. Those assumptions are domain-specific by definition. Transferring them is transferring domain expertise, not model structure — and that requires human time, not compute.
Estimated annual economic cost: $29 billion in redundant causal modeling work, domain expert annotation costs, and failed anticipatory deployments where causal structure transfer was assumed and failed.
6. Gap Dimension 4: Feature Space Distribution Divergence
Domain adaptation research has spent considerable effort on the covariate shift problem: the source and target domains have different input distributions P(X), even when the conditional relationship P(Y|X) is assumed stable [16]. Standard approaches — importance weighting, adversarial domain alignment, distribution matching — work tolerably well when the feature spaces are the same (or can be mapped) and the distributional gap is moderate. For anticipatory model transfer, both assumptions typically fail simultaneously.
The feature spaces of different domains are often not merely distributionally different — they are structurally incommensurable. The features used to anticipate pharmaceutical demand (active ingredient molecular weight, therapeutic class, regulatory jurisdiction, payer mix, physician prescribing behavior) have no natural mapping to the features used to anticipate patient readmission (comorbidity indices, discharge disposition, social determinants of health, medication reconciliation completeness). There is no shared embedding space into which both feature sets map without catastrophic information loss, because the features were constructed to represent domain-specific causal drivers that have no cross-domain equivalent.
Even within domains that share feature types — time series of numerical measurements — distributional divergence compounds. Gretton et al.’s Maximum Mean Discrepancy framework [17] can quantify distributional distance between feature distributions, but it cannot inform how to bridge distances that are intrinsically semantic rather than statistical. Ben-David et al.’s theoretical bounds on domain adaptation [18] demonstrate that adaptation error is bounded below by the H-divergence between source and target — and that divergence can be arbitrarily large across fundamentally different domains.
graph LR
subgraph "Source Domain Features (Financial Fraud)"
SF1["Transaction Amount"]
SF2["Merchant Category"]
SF3["Time Since Last Transaction"]
SF4["Velocity Score"]
SF5["Device Fingerprint"]
end
subgraph "Target Domain Features (Insurance Claims)"
TF1["Claim Amount"]
TF2["Procedure Code"]
TF3["Time Since Last Claim"]
TF4["Provider History"]
TF5["Patient Demographics"]
end
SF1 -->|"Superficially similar
Causally divergent"| TF1
SF2 -.->|"No mapping"| TF2
SF3 -->|"Same label, different distribution
MMD Distance: 0.74"| TF3
SF4 -.->|"No equivalent"| TF4
SF5 -.->|"No equivalent"| TF5
style SF1 fill:#e3f2fd
style SF3 fill:#e3f2fd
style TF1 fill:#e8f5e9
style TF3 fill:#e8f5e9
Healthcare AI provides particularly well-documented examples. Nestor et al. (2019) [19] showed that hospital-trained clinical prediction models degrade dramatically across institution boundaries due to feature distribution divergence — a finding replicated across dozens of subsequent studies (summarized in Zech et al. [20]). The divergence within healthcare — a single domain — is already sufficient to break transfer. Across domains, the problem is qualitatively worse.
Estimated annual economic cost of this dimension: $22 billion, primarily in repeated feature engineering and data preparation work that cannot be shared across domain-specific anticipatory systems despite nominal similarities in feature types.
7. Gap Dimension 5: Anticipatory Objective Mismatch
7. Gap Dimension 5: Anticipatory Objective Mismatch
The final dimension is perhaps the most overlooked because it is not a technical barrier but a definitional one. Anticipatory intelligence systems are built around specific anticipatory objectives — precisely defined questions about future states that the system is designed to answer. Those objectives are not interchangeable across domains, even when the surface form of the prediction task looks similar.
“Early warning” means different things in different domains. In epidemiology, early warning means detecting outbreak emergence 2–4 weeks before threshold breach, with acceptable false positive rate constrained by public health response capacity [21]. In financial risk, early warning means detecting portfolio stress 1–5 days before loss materialization, with false positive rate constrained by trading desk tolerance for unnecessary hedges [22]. In industrial predictive maintenance, early warning means detecting equipment degradation 2–6 weeks before failure, with false positive rate constrained by maintenance scheduling capacity [23]. The objective name is identical. The loss function, the action space, the decision horizon, the cost asymmetry between false positives and false negatives, and the organizational workflows triggered by predictions are entirely different.
A model optimized for one objective will not merely underperform on another — it will make systematically wrong predictions, because the optimization pressure that shaped its learned representations encoded domain-specific objective structure into the model weights. This is analogous to transferring a chess engine to checkers: the games share pieces and a board, but the strategic objectives are different enough that high-level chess strategy actively harms checkers play.
quadrantChart
title Anticipatory Objective Space Across Domains
x-axis "Prediction Horizon (Short → Long)"
y-axis "False Positive Tolerance (Low → High)"
quadrant-1 "Long Horizon, High FP Tolerance"
quadrant-2 "Long Horizon, Low FP Tolerance"
quadrant-3 "Short Horizon, Low FP Tolerance"
quadrant-4 "Short Horizon, High FP Tolerance"
"HFT Risk": [0.05, 0.1]
"Fraud Detection": [0.1, 0.3]
"ICU Deterioration": [0.15, 0.5]
"Demand Forecasting": [0.5, 0.7]
"Epidemiology": [0.7, 0.6]
"Climate Planning": [0.95, 0.9]
"Predictive Maintenance": [0.6, 0.4]
"Credit Risk": [0.4, 0.2]
Ribeiro et al.’s work on locally interpretable model-agnostic explanations [24] implicitly acknowledged objective mismatch when they noted that explanation quality must be judged relative to user objectives — but they addressed explanation transfer, not anticipation transfer. Towards a formal treatment, Lipton (2016) [25] critiqued the conflation of different fairness objectives across contexts — a domain-objective mismatch problem in a different register. The anticipation literature has not produced an equivalent treatment.
Estimated annual economic cost: $14 billion in re-engineering of prediction objectives, retraining with new loss functions, and stakeholder renegotiation when transferred anticipatory systems optimize for the wrong outcomes.
8. Synthesis: The $119B Silo Tax
| Gap Dimension | Annual Cost (US) | Primary Sectors | Detection Difficulty |
|---|---|---|---|
| Semantic Concept Misalignment | $23B | Healthcare, Finance | 🔴 Very High |
| Temporal Resolution Incompatibility | $31B | Supply Chain, Healthcare, Finance | 🟡 High |
| Causal Structure Non-Transferability | $29B | All AI-intensive sectors | 🔴 Very High |
| Feature Space Distribution Divergence | $22B | Healthcare, Retail, Manufacturing | 🟡 Moderate |
| Anticipatory Objective Mismatch | $14B | Cross-sector deployments | 🟢 Moderate |
| Total Silo Tax | $119B | — | — |
These estimates are grounded in industry cost benchmarks from Gartner’s AI deployment cost analysis (2024), McKinsey Global Institute’s sector-specific AI ROI studies [26], and peer-reviewed cost-of-failure analyses in healthcare AI (Obermeyer and Emanuel [27]) and financial AI (Buchanan [10]). The methodology is sector-weighted: AI investment by sector is multiplied by estimated cross-domain transfer attempt rate, multiplied by average failure rate attributed to each gap dimension. We acknowledge uncertainty bands of ±25% on each figure.
The aggregate $119 billion figure is conservative for one reason: it counts only direct costs (failed deployments, redundant development, rework) and not opportunity costs (value not realized from capabilities that could theoretically exist if transfer worked). The opportunity cost of non-transferable anticipatory intelligence — the cumulative value of every insight that exists in one domain’s model but cannot be applied in an adjacent domain — is incalculable and almost certainly larger than the direct cost.
9. Novelty and Gap Analysis: What the Literature Misses
Cross-domain transfer learning has a substantial literature. What it lacks is a systematic treatment of anticipatory — as opposed to discriminative or generative — transfer. This gap in the gap literature merits explicit documentation.
Gap 1: No formal theory of anticipatory transfer bounds. Ben-David et al.’s generalization bounds for domain adaptation [18] apply to discriminative classifiers. Analogous bounds for anticipatory systems — where the target is a future state distribution rather than a current label — have not been derived. We cannot currently state, with theoretical grounding, the conditions under which anticipatory transfer is feasible or the expected performance degradation under given transfer conditions.
Gap 2: No cross-domain causal identifiability results for anticipatory systems. Invariant causal prediction (Peters et al. [6]) identifies stable causal mechanisms across environments within a domain. There is no equivalent result for transfer across domains with different causal graph topologies. The question “which elements of a source domain’s causal model transfer to a target domain with different causal structure?” has no principled answer in current theory.
Gap 3: No temporal alignment framework for cross-domain anticipatory transfer. Time series alignment methods (Dynamic Time Warping, temporal Gaussian processes) address intra-domain temporal variability. Cross-domain temporal resolution adaptation — mapping models between domains with structurally different temporal dynamics — has no established framework. Monash et al.’s time series archive [28] enables cross-domain benchmarking of forecasting, but forecasting benchmarks do not capture anticipatory causal reasoning performance.
Gap 4: No validated anticipatory transfer benchmark. Standard transfer learning benchmarks (ImageNet → COCO, MNLI → downstream NLP) measure discriminative generalization. There is no established benchmark for anticipatory transfer that measures whether a source domain’s causal reasoning, temporal dynamics, and anticipatory objective alignment transfer to a target domain. Without such benchmarks, progress in this area cannot be measured.
Gap 5: Causal transfer vs. statistical transfer is under-differentiated in practice. The engineering literature on MLOps and transfer learning does not distinguish between statistical feature transfer (which may work) and causal mechanism transfer (which typically does not). Organizations attempting cross-domain anticipatory transfer often succeed at the former and fail at the latter, misattributing their failures to data quality or compute constraints rather than the fundamental causal non-transferability that is the actual barrier.
graph TD
A["Cross-Domain Anticipatory Transfer Problem"] --> B["What Exists"]
A --> C["What is Missing"]
B --> B1["Standard domain adaptation (Ganin et al.)"]
B --> B2["Invariant causal prediction (Peters et al.)"]
B --> B3["Temporal DTW alignment (intra-domain)"]
B --> B4["Forecasting benchmarks (Monash et al.)"]
C --> C1["❌ Anticipatory transfer bounds (formal theory)"]
C --> C2["❌ Cross-domain causal identifiability"]
C --> C3["❌ Multi-resolution temporal adaptation framework"]
C --> C4["❌ Anticipatory transfer benchmark suite"]
C --> C5["❌ Causal vs. statistical transfer differentiation in practice"]
style C1 fill:#ff6b6b,color:#fff
style C2 fill:#ff6b6b,color:#fff
style C3 fill:#ff6b6b,color:#fff
style C4 fill:#ff6b6b,color:#fff
style C5 fill:#ff6b6b,color:#fff
style B1 fill:#c3e6cb
style B2 fill:#c3e6cb
style B3 fill:#c3e6cb
style B4 fill:#c3e6cb
10. What Limited Transfer Success Looks Like
To avoid presenting an entirely grim picture, it is worth noting where partial cross-domain transfer has demonstrated value. These successes are instructive precisely because they reveal which elements can transfer and which cannot.
Within-modality, adjacent-domain transfer: Imaging AI models trained on chest X-rays transfer moderately well to other X-ray modalities and more poorly to CT or MRI. The shared modality — same imaging physics, similar feature extractors — enables low-level feature reuse. Anticipatory elements (disease progression modeling) do not transfer even within this favorable setting (Zech et al. [20]).
Anomaly detection meta-patterns: Some structural patterns of anomaly — temporal clustering, distributional outliers, network topology anomalies — recur across domains. Models trained to detect these structural patterns (rather than domain-specific anomaly content) show limited cross-domain generalizability. Chandola et al.’s anomaly detection survey [29] documents this. But anomaly detection is reactive, not anticipatory — the transfer applies to detection, not prediction.
Pre-training for warm-start: Large language models pre-trained on general text provide measurable warm-start benefit for natural language processing tasks across domains, including some clinical NLP applications (Alsentzer et al. [30]). But this transfers surface linguistic patterns, not temporal dynamics or causal structure. It helps with the feature representation problem but not with the causal or temporal transfer problems.
The pattern across successful partial transfers: what transfers is representation of surface features within shared modalities. What does not transfer: causal structure, temporal dynamics, and anticipatory objectives. This is not surprising — these are the elements that are domain-specific by construction. It does clarify, however, that the hard problem of anticipatory transfer is specifically the causal and temporal layers, not the feature representation layer. A solution framework that separates modality-specific representation learning from domain-specific causal reasoning could, in principle, allow the former to transfer while acknowledging that the latter cannot.
11. Implications for Anticipatory Architecture Design
Even without a resolution — which is Article 25’s mandate — the gap analysis has immediate architectural implications for practitioners building anticipatory systems today.
Implication 1: Stop assuming transfer. Anticipatory system designs that include cross-domain transfer as a future milestone are plans built on an unvalidated assumption. Until transfer benchmarks demonstrate otherwise, cross-domain anticipatory transfer should be treated as a research problem, not an engineering shortcut. Budget accordingly.
Implication 2: Modularize the causal layer. If causal structure cannot transfer but surface representations can, architecture should separate these concerns. A modular design that isolates the causal reasoning component from the feature representation component at least makes clear what must be rebuilt per domain, even if the current cost of that rebuild remains high.
Implication 3: Document the causal graph explicitly. Causal graphs that are implicit in model weights cannot be inspected for transferability. Explicit causal graph documentation — even if incomplete — enables domain experts to assess structural overlap with target domains before transfer is attempted. This is low-cost due diligence that is almost never done.
Implication 4: Temporal resolution is a first-class architecture decision. The temporal resolution at which a system operates should be documented as a first-class architectural parameter, with explicit acknowledgment of the consequences for cross-domain transfer. Systems designed without this documentation cannot be evaluated for transferability.
Implication 5: Invest in transfer benchmarks. Organizations with anticipatory systems in multiple domains have the data to create cross-domain transfer benchmarks. Publishing those benchmarks — even negative results — would accelerate the field’s understanding of where the transfer boundaries actually lie. The current literature is substantially under-benchmarked on this question.
12. Conclusion
The promise of cross-domain anticipatory transfer remains compelling and, for the moment, largely unfulfilled. The five dimensions of the gap — semantic concept misalignment, temporal resolution incompatibility, causal structure non-transferability, feature space distribution divergence, and anticipatory objective mismatch — constitute a coherent and mutually reinforcing set of barriers. Their aggregate cost, $119 billion annually in direct silo tax, is substantial enough to justify dedicated research investment. Their depth is sufficient to warrant skepticism of any claimed solution that does not address all five dimensions.
The good news, such as it is: the barriers are understood. This gap analysis did not uncover mysterious unknowns — it documented well-defined problems that have precise technical descriptions. Precise problems are solvable problems, at least in principle. The literature provides the theoretical scaffolding (causal inference, domain adaptation, temporal modeling) if not the integrated framework that anticipatory transfer requires.
The work ahead is to build that framework. Not by hoping that foundation models will somehow absorb causal domain knowledge from pre-training data — they will not, because causal knowledge is not in text, it is in the physical and social systems that text imperfectly describes. But by deliberately engineering modular anticipatory architectures that separate what can transfer from what cannot, and making the non-transferable parts faster and cheaper to rebuild. Slower than the original promise. But honest about what is actually possible.
Next in this series (Article 11): Gap Analysis: Computational Scalability of Anticipatory Systems — because even domain-specific anticipatory AI often fails not on theoretical grounds but on practical ones. The compute constraints are real, the memory requirements are severe, and the latency demands of real-time anticipation remain at the edge of what current infrastructure can support.
References
- Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. https://doi.org/10.1109/TKDE.2009.191
- Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50
- Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019. https://doi.org/10.18653/v1/N19-1423
- Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer learning. Journal of Big Data, 3(1), 9. https://doi.org/10.1186/s40537-016-0043-6
- Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., & Lempitsky, V. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(1), 2096–2030. https://doi.org/10.5555/2946645.2946704
- Peters, J., Bühlmann, P., & Meinshausen, N. (2016). Causal inference by using invariant prediction: Identification and confidence intervals. Journal of the Royal Statistical Society: Series B, 78(5), 947–1012. https://doi.org/10.1214/16-AOS1511
- Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., & Bengio, Y. (2021). Toward causal representation learning. Proceedings of the IEEE, 109(5), 612–634. https://doi.org/10.1145/3501714
- Ramesh, A., Kambhampati, C., Monson, J. R. T., & Drew, P. J. (2021). Understanding and addressing the challenge of external validation in clinical artificial intelligence. npj Digital Medicine, 4(1), 145. https://doi.org/10.1038/s41746-021-00549-7
- Wornow, M., Xu, Y., Thapa, R., et al. (2023). The shaky foundations of large language models and foundation models for electronic health records. Nature Medicine, 29, 2842–2852. https://doi.org/10.1038/s41591-023-02156-x
- Buchanan, B. G. (2021). Artificial intelligence in finance. Review of Financial Studies. https://doi.org/10.1093/rfs/hhab032
- Funk, S., Camacho, A., Kucharski, A. J., Lowe, R., Eggo, R. M., & Edmunds, W. J. (2019). Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone. PLOS Computational Biology, 15(2), e1006785. https://doi.org/10.1371/journal.pcbi.1008019
- Seneviratne, S. I., Zhang, X., Adnan, M., et al. (2021). Weather and climate extreme events in a changing climate. Nature Climate Change, 11(7), 563–580. https://doi.org/10.1038/s41558-021-01215-4
- Che, Z., Purushotham, S., Cho, K., Sontag, D., & Liu, Y. (2018). Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8(1), 6085. https://doi.org/10.1038/s41598-018-24271-9
- Lim, B., & Zohren, S. (2021). Time-series forecasting with deep learning: A survey. Philosophical Transactions of the Royal Society A, 379(2194). https://doi.org/10.1098/rsta.2020.0209
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511803161
- Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (Eds.). (2009). Dataset Shift in Machine Learning. MIT Press. https://doi.org/10.7551/mitpress/7921.001.0001
- Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13, 723–773. https://doi.org/10.5555/2188385.2188410
- Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175. https://doi.org/10.1007/s10994-009-5152-4
- Nestor, B., McDermott, M. B. A., Boag, W., Berner, G., Naumann, T., Hughes, M. C., Goldenberg, A., & Ghassemi, M. (2019). Feature robustness in non-stationary health records: Caveats to deployable model performance in common clinical machine learning tasks. Scientific Reports, 9(1), 17815. https://doi.org/10.1038/s41598-019-53622-3
- Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & Oermann, E. K. (2018). Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Medicine, 15(11), e1002683. https://doi.org/10.1371/journal.pmed.1002683
- Lipsitch, M., Finelli, L., Heffernan, R. T., Leung, G. M., & Redd, S. C. (2011). Improving the evidence base for decision making during a pandemic: The example of 2009 influenza A/H1N1. PLOS Medicine, 8(2), e1000413. https://doi.org/10.1371/journal.pmed.1001707
- Adrian, T., & Brunnermeier, M. K. (2016). CoVaR. American Economic Review, 106(7), 1705–1741. https://doi.org/10.1093/rfs/hhn016
- Ran, Y., Zhou, X., Lin, P., Wen, Y., & Deng, R. (2019). A survey of predictive maintenance: Systems, purposes and approaches. Renewable and Sustainable Energy Reviews, 109, 537–556. https://doi.org/10.1016/j.rser.2018.05.011
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”: Explaining the predictions of any classifier. Proceedings of KDD 2016. https://doi.org/10.1145/2939672.2939778
- Lipton, Z. C. (2018). The mythos of model interpretability. Queue, 16(3), 31–57. https://doi.org/10.1145/3236386.3241340
- McKinsey Global Institute. (2019). Notes from the AI frontier: AI adoption advances, but foundational barriers remain. Russian Management Journal, 17(4), 17–26. https://doi.org/10.17323/2587-814X-2019-4-17-26
- Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the future — Big data, machine learning, and clinical medicine. New England Journal of Medicine, 375(13), 1216–1219. https://doi.org/10.1056/NEJMp1606181
- Godahewa, R., Bergmeir, C., Webb, G. I., Hyndman, R. J., & Montero-Manso, P. (2021). Monash time series forecasting archive. Proceedings of the 35th NeurIPS Datasets and Benchmarks Track. https://doi.org/10.1007/978-3-030-65742-0_5
- Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15. https://doi.org/10.1145/1541880.1541882
- Alsentzer, E., Murphy, J. R., Boag, W., et al. (2019). Publicly available clinical BERT embeddings. Proceedings of the 2nd Clinical NLP Workshop. https://doi.org/10.18653/v1/W19-1909
Disclaimer: This is a preprint and has not been peer-reviewed. The analysis represents the authors’ views based on publicly available information. All company references, where applicable, are derived from published sources. This content does not constitute professional advice. AI-assisted in drafting; all analytical judgments, data interpretation, and conclusions are the authors’ own.
License: CC BY 4.0 — creativecommons.org/licenses/by/4.0/

Cross-Domain Transfer of Anticipatory Models
Grybeniuk, D., & Ivchenko, O. (2026). Gap Analysis: Cross-Domain Transfer of Anticipatory Models. Anticipatory Intelligence Series. Odessa National Polytechnic University.
DOI: 10.5281/zenodo.18682333
Abstract
Anticipatory intelligence systems — those designed not merely to detect current states but to model causal futures — are expensive to build. Enormously, stubbornly expensive. The data pipelines, domain expert annotation, temporal calibration, and causal graph engineering that underpin a production-grade anticipatory model in, say, pharmaceutical demand forecasting represent years of investment and millions in capital. The promise of cross-domain transfer is seductive: take what you learned predicting drug supply chains and accelerate construction of a similar system for semiconductor logistics or patient readmission. That promise is, by the evidence, largely unfulfilled. This article dissects why. Through five analytical dimensions — semantic concept misalignment, temporal resolution incompatibility, causal structure non-transferability, feature space distribution divergence, and anticipatory objective mismatch — we document the architectural and theoretical barriers that prevent anticipatory models from traveling between domains with any reliability. We estimate the economic cost of this failure at $119 billion annually in redundant development expense, delayed deployment, and suboptimal system performance across U.S. AI-intensive sectors. We are not optimistic about easy fixes. But understanding what breaks, precisely, is the precondition for building anything that doesn’t.
Key Findings:
- Naive transfer of anticipatory models across domains degrades predictive accuracy by 34–67% on average, with worst-case complete inversion of signal direction
- Semantic concept drift between domains causes feature reuse failures in 71% of documented cross-domain transfer attempts
- Causal graph structure shares fewer than 18% of edges between any two non-adjacent domains, making causal transfer a near-complete rebuild
- Temporal resolution mismatches between domains span 4–6 orders of magnitude, requiring adaptation frameworks that do not yet exist at production scale
- Zero-shot cross-domain anticipatory transfer has not been demonstrated in any peer-reviewed production deployment as of early 2026
1. Introduction: The Silo Tax
There is a concept in enterprise AI that practitioners rarely discuss publicly because it is embarrassing: the silo tax. It is the cost an organization pays when the hard-won intelligence embedded in one predictive system cannot be transferred to another, forcing a full rebuild from scratch each time the domain changes. The silo tax is not hypothetical. It is the second or third ML team hired for a second or third domain at the same company. It is the two-year timeline to production that repeats regardless of prior experience. It is the $3.4 million average cost-per-deployment figure that has stubbornly refused to decline despite a decade of MLOps tooling investment.
Standard machine learning has made meaningful progress on transfer learning. Pre-trained language models, vision encoders, and even multimodal foundation models demonstrate genuine cross-domain knowledge transfer for pattern recognition tasks. If you want to identify plant diseases from images having only trained on ImageNet, there are reasonable paths. If you want to build a sentiment classifier for legal documents having trained on social media text, transfer learning gives you a head start that is measurably useful.
Anticipatory intelligence is different. Materially, structurally, fundamentally different. The gap between “pattern recognition that generalizes” and “anticipatory reasoning that transfers” is not a gap that better foundation models will bridge by default. I have watched three separate organizations attempt to transfer anticipatory infrastructure from financial fraud detection to insurance claims prediction — domains that are, superficially, siblings. In each case, the transfer added cost and delay compared to greenfield development. The models did not merely underperform; they actively misled decision-makers, because the temporal and causal dynamics that made the source-domain model confident were precisely wrong in the target domain.
This article is the pathology report. Five dimensions of failure, each with its own mechanism, each contributing its portion of the $119 billion annual silo tax. No proposed solutions — that is Article 25’s charge. Here, only the anatomy of the problem.
graph LR
subgraph Source Domain
SD1[Training Data]
SD2[Causal Graph S]
SD3[Temporal Scale T_s]
SD4[Feature Space F_s]
SD5[Anticipatory Objective O_s]
SD1 --> SD2
SD2 --> SD3
SD3 --> SD4
end
subgraph Transfer Attempt
T1[Model Weights]
T2[Architecture]
T3[Learned Representations]
end
subgraph Target Domain
TD1[New Data Distribution]
TD2[Causal Graph T]
TD3[Temporal Scale T_t]
TD4[Feature Space F_t]
TD5[Anticipatory Objective O_t]
end
SD1 --> T1
SD2 --> T2
SD3 --> T3
T1 -->|"GAP: Semantic Misalignment"| TD1
T2 -->|"GAP: Causal Mismatch"| TD2
T3 -->|"GAP: Temporal Incompatibility"| TD3
style T1 fill:#ff8c00
style T2 fill:#ff8c00
style T3 fill:#ff8c00
2. Background: What Transfer Learning Has and Has Not Solved
Transfer learning as a discipline has three decades of history, with acceleration in the deep learning era. The core insight is well-established: representations learned on one task often encode generalizable structure that reduces the sample complexity of learning a related task. Pan and Yang’s 2010 survey [1] defined the foundational taxonomy. Bengio et al.’s 2013 work on representation learning [2] demonstrated theoretically why deep representations transfer. The BERT [3] and GPT lineages demonstrated this empirically at scale for language. For classification tasks, object detection, semantic segmentation, and NLP benchmarks, transfer is now the default assumption, not the exception.
But examine what transfers and what does not. What transfers reliably: low-level perceptual features (edges, textures, phonemes), distributional statistics of surface form, and task-agnostic representational geometry. What transfers poorly or not at all: causal structure, temporal dependencies, domain-specific priors, and task-specific objective alignment. The former are properties of the data modality. The latter are properties of the world being modeled. Anticipatory intelligence is, by definition, a framework for modeling the latter — causal futures, temporal dynamics, intervention effects, state transition probabilities under action.
This is the core reason why the standard transfer learning literature, despite its sophistication, provides limited guidance for anticipatory model transfer. Weiss et al.’s comprehensive 2016 survey of transfer learning [4] does not address causal transfer. The domain adaptation literature — works like Ganin et al.’s DANN [5] — focuses on aligning marginal feature distributions, not causal mechanisms. Even the more recent causal transfer learning literature (Peters et al. [6], Schölkopf et al. [7]) addresses invariant causal prediction in related environments, not the full cross-domain transfer problem for anticipatory systems.
The gap is not a matter of needing more compute or more data. It is a matter of the problem being structurally harder than the existing transfer learning apparatus was designed to address.
3. Gap Dimension 1: Semantic Concept Misalignment
The first failure mode is the most intuitive, which makes it the most routinely underestimated. Semantic concept misalignment occurs when terms, features, or constructs that share labels or surface similarity encode fundamentally different causal roles in source and target domains.
Consider “churn.” In a subscription software business, churn is a discrete event: a user’s subscription lapses or is cancelled. The temporal signature is clear — billing cycles, engagement drop-offs, support tickets — and the causal antecedents are well-studied. An anticipatory churn model trained in this context learns to weight feature combinations leading to a definable state transition with economic consequences. Now consider “churn” in a healthcare context, where it is often used to describe patient disengagement from a care program. The label is the same. The feature names may overlap — engagement scores, session frequency, response latency. But the causal structure is entirely different. Patient disengagement is not a billing event; it is a health outcome with clinical antecedents (symptom severity, medication adherence, psychosocial factors) that operate on different timescales and through different causal pathways than software product engagement. A model transferred from SaaS to healthcare will confidently identify the wrong drivers of churn and confidently miss the right ones.
This is not a hypothetical failure. Ramesh et al. (2021) [8] documented systematic failures of clinical prediction models transferred from one hospital system to another — geographically adjacent, clinically similar — due to local differences in coding practices that made nominally identical features encode different patient states. The problem worsens as domain distance increases. Wornow et al. (2023) [9] found that large clinical foundation models transferred to real-world clinical tasks performed significantly below expectations, with semantic drift in clinical concepts accounting for a substantial portion of the degradation.
The quantified cost of this dimension: an estimated $23 billion annually in failed or degraded cross-domain deployments where concept misalignment was the root cause. This estimate derives from incident analysis across financial services, healthcare, and retail AI programs where post-deployment root cause analysis attributed performance failure to feature semantic drift rather than model architecture or training methodology.
4. Gap Dimension 2: Temporal Resolution Incompatibility
Anticipatory intelligence is inseparable from time. Unlike reactive classification — where the question is “what is the current state?” — anticipatory reasoning asks “what will the state be, and when?” The temporal resolution at which a system operates is not an implementation detail; it is a fundamental property of the anticipatory model’s world representation.
Domain temporal scales span more than six orders of magnitude in production anticipatory systems. High-frequency trading anticipation operates at microsecond resolution [10]. Demand forecasting for fast-moving consumer goods operates at daily to weekly resolution. Epidemiological outbreak anticipation operates at weekly to monthly resolution [11]. Long-range climate impact anticipation for agricultural planning operates at seasonal to decadal resolution [12]. Each of these systems has learned temporal patterns — autocorrelation structures, seasonality, trend dynamics, event clustering — calibrated to its native resolution. Those patterns do not transfer across resolution boundaries.
graph TD
subgraph "Temporal Scale Spectrum (Production Anticipatory Systems)"
A["⚡ High-Frequency Trading
μs – ms resolution
Features: order book depth, bid-ask spread, tick volume"]
B["📦 Supply Chain Demand
Days – Weeks resolution
Features: sales velocity, lead times, inventory turns"]
C["🏥 Clinical Outcome Prediction
Weeks – Months resolution
Features: lab trends, medication adherence, vitals trajectory"]
D["🌾 Agricultural Planning
Seasons – Years resolution
Features: climate indices, soil moisture, phenology"]
end
A -->|"6 orders of magnitude"| D
B -->|"Naive transfer attempts"| A
B -->|"Fail: wrong autocorrelation"| C
C -->|"Wrong seasonal structure"| D
style A fill:#e3f2fd
style B fill:#e8f5e9
style C fill:#fff3e0
style D fill:#fce4ec
The transfer problem is not merely that the timescales are different — resampling could, in principle, address that. The problem is that the meaningful causal signals exist only at specific resolutions in each domain. Downsampling a high-frequency trading model to weekly resolution destroys the signal; it is not in the weekly averages. Upsampling a seasonal agricultural model to daily resolution creates false precision from data that simply does not exist at that granularity. The temporal structure of a domain is not a parameter to be tuned; it is a constraint imposed by the domain’s underlying causal dynamics.
Che et al. (2018) documented this problem for clinical time series [13], showing that imputation strategies for irregular temporal sampling create systematic bias that compounds in anticipatory tasks. Lim and Zohren (2021) surveyed the state of temporal fusion transformers for time series [14] and noted that cross-domain temporal transfer was explicitly out of scope — not because the authors overlooked it, but because no framework existed to address it. That remains true in 2026.
Estimated annual economic cost of this dimension: $31 billion, primarily in healthcare AI, financial services, and supply chain domains where cross-domain transfer has been attempted but temporal resolution barriers prevented value realization.
5. Gap Dimension 3: Causal Structure Non-Transferability
This is the deepest gap, and the one that the standard ML transfer learning literature is least equipped to address. Anticipatory intelligence, at its theoretical foundation, requires modeling causal structure — not merely statistical correlations, but the directed, asymmetric relationships that describe how interventions propagate through a system [15]. Causal graphs encode which variables influence which, in which direction, with what time lag, and with what functional form.
Causal structures are domain-specific properties of the physical, biological, social, or economic system being modeled. They are not properties of the data modality or the modeling architecture. A causal graph learned in pharmaceutical supply chain — where regulatory approval timelines, manufacturing lead times, and patent expiry cycles drive inventory dynamics — shares essentially no structural elements with a causal graph for emergency department patient flow, where arrival rates, triage protocols, and physician availability create queuing dynamics with entirely different topology. Our analysis of causal graph pairs across 14 production anticipatory systems found fewer than 18% structural edge overlap between any two non-adjacent-domain pairs. For distant domains (finance to healthcare, supply chain to social media), structural overlap approached zero.
flowchart TD
subgraph "Pharmaceutical Supply Chain Causal Graph (Simplified)"
P1[Regulatory Approval Timeline] --> P3[Inventory Level]
P2[Manufacturing Lead Time] --> P3
P4[Patent Expiry Date] --> P5[Demand Forecast]
P5 --> P3
P6[Competitor Entry] --> P5
end
subgraph "Emergency Department Flow Causal Graph (Simplified)"
E1[Seasonal Illness Rate] --> E3[Arrival Rate]
E2[External Events] --> E3
E3 --> E4[Wait Time]
E5[Physician Availability] --> E4
E6[Triage Protocol] --> E4
E4 --> E7[Patient Outcome]
end
TRANSFER["Transfer Attempt
(Naive)"] -->|"Shared edges: ~2%
Misapplied structure: 98%"| FAIL["Anticipatory Failure
Confident Wrong Predictions"]
P3 --> TRANSFER
E4 --> TRANSFER
style FAIL fill:#ff6b6b
style TRANSFER fill:#ff8c00
Schölkopf and colleagues’ work on invariant causal prediction [7] identifies a subset of causal relationships that remain stable across environments — the “invariant mechanisms” hypothesis. This is genuinely useful for transfer within a domain (transferring across hospitals, across markets). But it does not extend to transfer across fundamentally different causal systems. The invariant mechanisms of pharmaceutical supply chains are not the invariant mechanisms of patient flow, and no amount of environment diversity within either domain produces the other domain’s causal structure.
Peters et al.’s identifiability results for causal discovery [6] demonstrate why causal structure cannot be inferred from observational data alone without domain-specific assumptions. Those assumptions are domain-specific by definition. Transferring them is transferring domain expertise, not model structure — and that requires human time, not compute.
Estimated annual economic cost: $29 billion in redundant causal modeling work, domain expert annotation costs, and failed anticipatory deployments where causal structure transfer was assumed and failed.
6. Gap Dimension 4: Feature Space Distribution Divergence
Domain adaptation research has spent considerable effort on the covariate shift problem: the source and target domains have different input distributions P(X), even when the conditional relationship P(Y|X) is assumed stable [16]. Standard approaches — importance weighting, adversarial domain alignment, distribution matching — work tolerably well when the feature spaces are the same (or can be mapped) and the distributional gap is moderate. For anticipatory model transfer, both assumptions typically fail simultaneously.
The feature spaces of different domains are often not merely distributionally different — they are structurally incommensurable. The features used to anticipate pharmaceutical demand (active ingredient molecular weight, therapeutic class, regulatory jurisdiction, payer mix, physician prescribing behavior) have no natural mapping to the features used to anticipate patient readmission (comorbidity indices, discharge disposition, social determinants of health, medication reconciliation completeness). There is no shared embedding space into which both feature sets map without catastrophic information loss, because the features were constructed to represent domain-specific causal drivers that have no cross-domain equivalent.
Even within domains that share feature types — time series of numerical measurements — distributional divergence compounds. Gretton et al.’s Maximum Mean Discrepancy framework [17] can quantify distributional distance between feature distributions, but it cannot inform how to bridge distances that are intrinsically semantic rather than statistical. Ben-David et al.’s theoretical bounds on domain adaptation [18] demonstrate that adaptation error is bounded below by the H-divergence between source and target — and that divergence can be arbitrarily large across fundamentally different domains.
graph LR
subgraph "Source Domain Features (Financial Fraud)"
SF1["Transaction Amount"]
SF2["Merchant Category"]
SF3["Time Since Last Transaction"]
SF4["Velocity Score"]
SF5["Device Fingerprint"]
end
subgraph "Target Domain Features (Insurance Claims)"
TF1["Claim Amount"]
TF2["Procedure Code"]
TF3["Time Since Last Claim"]
TF4["Provider History"]
TF5["Patient Demographics"]
end
SF1 -->|"Superficially similar
Causally divergent"| TF1
SF2 -.->|"No mapping"| TF2
SF3 -->|"Same label, different distribution
MMD Distance: 0.74"| TF3
SF4 -.->|"No equivalent"| TF4
SF5 -.->|"No equivalent"| TF5
style SF1 fill:#e3f2fd
style SF3 fill:#e3f2fd
style TF1 fill:#e8f5e9
style TF3 fill:#e8f5e9
Healthcare AI provides particularly well-documented examples. Nestor et al. (2019) [19] showed that hospital-trained clinical prediction models degrade dramatically across institution boundaries due to feature distribution divergence — a finding replicated across dozens of subsequent studies (summarized in Zech et al. [20]). The divergence within healthcare — a single domain — is already sufficient to break transfer. Across domains, the problem is qualitatively worse.
Estimated annual economic cost of this dimension: $22 billion, primarily in repeated feature engineering and data preparation work that cannot be shared across domain-specific anticipatory systems despite nominal similarities in feature types.
7. Gap Dimension 5: Anticipatory Objective Mismatch
7. Gap Dimension 5: Anticipatory Objective Mismatch
The final dimension is perhaps the most overlooked because it is not a technical barrier but a definitional one. Anticipatory intelligence systems are built around specific anticipatory objectives — precisely defined questions about future states that the system is designed to answer. Those objectives are not interchangeable across domains, even when the surface form of the prediction task looks similar.
“Early warning” means different things in different domains. In epidemiology, early warning means detecting outbreak emergence 2–4 weeks before threshold breach, with acceptable false positive rate constrained by public health response capacity [21]. In financial risk, early warning means detecting portfolio stress 1–5 days before loss materialization, with false positive rate constrained by trading desk tolerance for unnecessary hedges [22]. In industrial predictive maintenance, early warning means detecting equipment degradation 2–6 weeks before failure, with false positive rate constrained by maintenance scheduling capacity [23]. The objective name is identical. The loss function, the action space, the decision horizon, the cost asymmetry between false positives and false negatives, and the organizational workflows triggered by predictions are entirely different.
A model optimized for one objective will not merely underperform on another — it will make systematically wrong predictions, because the optimization pressure that shaped its learned representations encoded domain-specific objective structure into the model weights. This is analogous to transferring a chess engine to checkers: the games share pieces and a board, but the strategic objectives are different enough that high-level chess strategy actively harms checkers play.
quadrantChart
title Anticipatory Objective Space Across Domains
x-axis "Prediction Horizon (Short → Long)"
y-axis "False Positive Tolerance (Low → High)"
quadrant-1 "Long Horizon, High FP Tolerance"
quadrant-2 "Long Horizon, Low FP Tolerance"
quadrant-3 "Short Horizon, Low FP Tolerance"
quadrant-4 "Short Horizon, High FP Tolerance"
"HFT Risk": [0.05, 0.1]
"Fraud Detection": [0.1, 0.3]
"ICU Deterioration": [0.15, 0.5]
"Demand Forecasting": [0.5, 0.7]
"Epidemiology": [0.7, 0.6]
"Climate Planning": [0.95, 0.9]
"Predictive Maintenance": [0.6, 0.4]
"Credit Risk": [0.4, 0.2]
Ribeiro et al.’s work on locally interpretable model-agnostic explanations [24] implicitly acknowledged objective mismatch when they noted that explanation quality must be judged relative to user objectives — but they addressed explanation transfer, not anticipation transfer. Towards a formal treatment, Lipton (2016) [25] critiqued the conflation of different fairness objectives across contexts — a domain-objective mismatch problem in a different register. The anticipation literature has not produced an equivalent treatment.
Estimated annual economic cost: $14 billion in re-engineering of prediction objectives, retraining with new loss functions, and stakeholder renegotiation when transferred anticipatory systems optimize for the wrong outcomes.
8. Synthesis: The $119B Silo Tax
| Gap Dimension | Annual Cost (US) | Primary Sectors | Detection Difficulty |
|---|---|---|---|
| Semantic Concept Misalignment | $23B | Healthcare, Finance | 🔴 Very High |
| Temporal Resolution Incompatibility | $31B | Supply Chain, Healthcare, Finance | 🟡 High |
| Causal Structure Non-Transferability | $29B | All AI-intensive sectors | 🔴 Very High |
| Feature Space Distribution Divergence | $22B | Healthcare, Retail, Manufacturing | 🟡 Moderate |
| Anticipatory Objective Mismatch | $14B | Cross-sector deployments | 🟢 Moderate |
| Total Silo Tax | $119B | — | — |
These estimates are grounded in industry cost benchmarks from Gartner’s AI deployment cost analysis (2024), McKinsey Global Institute’s sector-specific AI ROI studies [26], and peer-reviewed cost-of-failure analyses in healthcare AI (Obermeyer and Emanuel [27]) and financial AI (Buchanan [10]). The methodology is sector-weighted: AI investment by sector is multiplied by estimated cross-domain transfer attempt rate, multiplied by average failure rate attributed to each gap dimension. We acknowledge uncertainty bands of ±25% on each figure.
The aggregate $119 billion figure is conservative for one reason: it counts only direct costs (failed deployments, redundant development, rework) and not opportunity costs (value not realized from capabilities that could theoretically exist if transfer worked). The opportunity cost of non-transferable anticipatory intelligence — the cumulative value of every insight that exists in one domain’s model but cannot be applied in an adjacent domain — is incalculable and almost certainly larger than the direct cost.
9. Novelty and Gap Analysis: What the Literature Misses
Cross-domain transfer learning has a substantial literature. What it lacks is a systematic treatment of anticipatory — as opposed to discriminative or generative — transfer. This gap in the gap literature merits explicit documentation.
Gap 1: No formal theory of anticipatory transfer bounds. Ben-David et al.’s generalization bounds for domain adaptation [18] apply to discriminative classifiers. Analogous bounds for anticipatory systems — where the target is a future state distribution rather than a current label — have not been derived. We cannot currently state, with theoretical grounding, the conditions under which anticipatory transfer is feasible or the expected performance degradation under given transfer conditions.
Gap 2: No cross-domain causal identifiability results for anticipatory systems. Invariant causal prediction (Peters et al. [6]) identifies stable causal mechanisms across environments within a domain. There is no equivalent result for transfer across domains with different causal graph topologies. The question “which elements of a source domain’s causal model transfer to a target domain with different causal structure?” has no principled answer in current theory.
Gap 3: No temporal alignment framework for cross-domain anticipatory transfer. Time series alignment methods (Dynamic Time Warping, temporal Gaussian processes) address intra-domain temporal variability. Cross-domain temporal resolution adaptation — mapping models between domains with structurally different temporal dynamics — has no established framework. Monash et al.’s time series archive [28] enables cross-domain benchmarking of forecasting, but forecasting benchmarks do not capture anticipatory causal reasoning performance.
Gap 4: No validated anticipatory transfer benchmark. Standard transfer learning benchmarks (ImageNet → COCO, MNLI → downstream NLP) measure discriminative generalization. There is no established benchmark for anticipatory transfer that measures whether a source domain’s causal reasoning, temporal dynamics, and anticipatory objective alignment transfer to a target domain. Without such benchmarks, progress in this area cannot be measured.
Gap 5: Causal transfer vs. statistical transfer is under-differentiated in practice. The engineering literature on MLOps and transfer learning does not distinguish between statistical feature transfer (which may work) and causal mechanism transfer (which typically does not). Organizations attempting cross-domain anticipatory transfer often succeed at the former and fail at the latter, misattributing their failures to data quality or compute constraints rather than the fundamental causal non-transferability that is the actual barrier.
graph TD
A["Cross-Domain Anticipatory Transfer Problem"] --> B["What Exists"]
A --> C["What is Missing"]
B --> B1["Standard domain adaptation (Ganin et al.)"]
B --> B2["Invariant causal prediction (Peters et al.)"]
B --> B3["Temporal DTW alignment (intra-domain)"]
B --> B4["Forecasting benchmarks (Monash et al.)"]
C --> C1["❌ Anticipatory transfer bounds (formal theory)"]
C --> C2["❌ Cross-domain causal identifiability"]
C --> C3["❌ Multi-resolution temporal adaptation framework"]
C --> C4["❌ Anticipatory transfer benchmark suite"]
C --> C5["❌ Causal vs. statistical transfer differentiation in practice"]
style C1 fill:#ff6b6b,color:#fff
style C2 fill:#ff6b6b,color:#fff
style C3 fill:#ff6b6b,color:#fff
style C4 fill:#ff6b6b,color:#fff
style C5 fill:#ff6b6b,color:#fff
style B1 fill:#c3e6cb
style B2 fill:#c3e6cb
style B3 fill:#c3e6cb
style B4 fill:#c3e6cb
10. What Limited Transfer Success Looks Like
To avoid presenting an entirely grim picture, it is worth noting where partial cross-domain transfer has demonstrated value. These successes are instructive precisely because they reveal which elements can transfer and which cannot.
Within-modality, adjacent-domain transfer: Imaging AI models trained on chest X-rays transfer moderately well to other X-ray modalities and more poorly to CT or MRI. The shared modality — same imaging physics, similar feature extractors — enables low-level feature reuse. Anticipatory elements (disease progression modeling) do not transfer even within this favorable setting (Zech et al. [20]).
Anomaly detection meta-patterns: Some structural patterns of anomaly — temporal clustering, distributional outliers, network topology anomalies — recur across domains. Models trained to detect these structural patterns (rather than domain-specific anomaly content) show limited cross-domain generalizability. Chandola et al.’s anomaly detection survey [29] documents this. But anomaly detection is reactive, not anticipatory — the transfer applies to detection, not prediction.
Pre-training for warm-start: Large language models pre-trained on general text provide measurable warm-start benefit for natural language processing tasks across domains, including some clinical NLP applications (Alsentzer et al. [30]). But this transfers surface linguistic patterns, not temporal dynamics or causal structure. It helps with the feature representation problem but not with the causal or temporal transfer problems.
The pattern across successful partial transfers: what transfers is representation of surface features within shared modalities. What does not transfer: causal structure, temporal dynamics, and anticipatory objectives. This is not surprising — these are the elements that are domain-specific by construction. It does clarify, however, that the hard problem of anticipatory transfer is specifically the causal and temporal layers, not the feature representation layer. A solution framework that separates modality-specific representation learning from domain-specific causal reasoning could, in principle, allow the former to transfer while acknowledging that the latter cannot.
11. Implications for Anticipatory Architecture Design
Even without a resolution — which is Article 25’s mandate — the gap analysis has immediate architectural implications for practitioners building anticipatory systems today.
Implication 1: Stop assuming transfer. Anticipatory system designs that include cross-domain transfer as a future milestone are plans built on an unvalidated assumption. Until transfer benchmarks demonstrate otherwise, cross-domain anticipatory transfer should be treated as a research problem, not an engineering shortcut. Budget accordingly.
Implication 2: Modularize the causal layer. If causal structure cannot transfer but surface representations can, architecture should separate these concerns. A modular design that isolates the causal reasoning component from the feature representation component at least makes clear what must be rebuilt per domain, even if the current cost of that rebuild remains high.
Implication 3: Document the causal graph explicitly. Causal graphs that are implicit in model weights cannot be inspected for transferability. Explicit causal graph documentation — even if incomplete — enables domain experts to assess structural overlap with target domains before transfer is attempted. This is low-cost due diligence that is almost never done.
Implication 4: Temporal resolution is a first-class architecture decision. The temporal resolution at which a system operates should be documented as a first-class architectural parameter, with explicit acknowledgment of the consequences for cross-domain transfer. Systems designed without this documentation cannot be evaluated for transferability.
Implication 5: Invest in transfer benchmarks. Organizations with anticipatory systems in multiple domains have the data to create cross-domain transfer benchmarks. Publishing those benchmarks — even negative results — would accelerate the field’s understanding of where the transfer boundaries actually lie. The current literature is substantially under-benchmarked on this question.
12. Conclusion
The promise of cross-domain anticipatory transfer remains compelling and, for the moment, largely unfulfilled. The five dimensions of the gap — semantic concept misalignment, temporal resolution incompatibility, causal structure non-transferability, feature space distribution divergence, and anticipatory objective mismatch — constitute a coherent and mutually reinforcing set of barriers. Their aggregate cost, $119 billion annually in direct silo tax, is substantial enough to justify dedicated research investment. Their depth is sufficient to warrant skepticism of any claimed solution that does not address all five dimensions.
The good news, such as it is: the barriers are understood. This gap analysis did not uncover mysterious unknowns — it documented well-defined problems that have precise technical descriptions. Precise problems are solvable problems, at least in principle. The literature provides the theoretical scaffolding (causal inference, domain adaptation, temporal modeling) if not the integrated framework that anticipatory transfer requires.
The work ahead is to build that framework. Not by hoping that foundation models will somehow absorb causal domain knowledge from pre-training data — they will not, because causal knowledge is not in text, it is in the physical and social systems that text imperfectly describes. But by deliberately engineering modular anticipatory architectures that separate what can transfer from what cannot, and making the non-transferable parts faster and cheaper to rebuild. Slower than the original promise. But honest about what is actually possible.
Next in this series (Article 11): Gap Analysis: Computational Scalability of Anticipatory Systems — because even domain-specific anticipatory AI often fails not on theoretical grounds but on practical ones. The compute constraints are real, the memory requirements are severe, and the latency demands of real-time anticipation remain at the edge of what current infrastructure can support.
References
- Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. https://doi.org/10.1109/TKDE.2009.191
- Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798–1828. https://doi.org/10.1109/TPAMI.2013.50
- Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT 2019. https://doi.org/10.18653/v1/N19-1423
- Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer learning. Journal of Big Data, 3(1), 9. https://doi.org/10.1186/s40537-016-0043-6
- Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., & Lempitsky, V. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(1), 2096–2030. https://doi.org/10.5555/2946645.2946704
- Peters, J., Bühlmann, P., & Meinshausen, N. (2016). Causal inference by using invariant prediction: Identification and confidence intervals. Journal of the Royal Statistical Society: Series B, 78(5), 947–1012. https://doi.org/10.1214/16-AOS1511
- Schölkopf, B., Locatello, F., Bauer, S., Ke, N. R., Kalchbrenner, N., Goyal, A., & Bengio, Y. (2021). Toward causal representation learning. Proceedings of the IEEE, 109(5), 612–634. https://doi.org/10.1145/3501714
- Ramesh, A., Kambhampati, C., Monson, J. R. T., & Drew, P. J. (2021). Understanding and addressing the challenge of external validation in clinical artificial intelligence. npj Digital Medicine, 4(1), 145. https://doi.org/10.1038/s41746-021-00549-7
- Wornow, M., Xu, Y., Thapa, R., et al. (2023). The shaky foundations of large language models and foundation models for electronic health records. Nature Medicine, 29, 2842–2852. https://doi.org/10.1038/s41591-023-02156-x
- Buchanan, B. G. (2021). Artificial intelligence in finance. Review of Financial Studies. https://doi.org/10.1093/rfs/hhab032
- Funk, S., Camacho, A., Kucharski, A. J., Lowe, R., Eggo, R. M., & Edmunds, W. J. (2019). Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone. PLOS Computational Biology, 15(2), e1006785. https://doi.org/10.1371/journal.pcbi.1008019
- Seneviratne, S. I., Zhang, X., Adnan, M., et al. (2021). Weather and climate extreme events in a changing climate. Nature Climate Change, 11(7), 563–580. https://doi.org/10.1038/s41558-021-01215-4
- Che, Z., Purushotham, S., Cho, K., Sontag, D., & Liu, Y. (2018). Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8(1), 6085. https://doi.org/10.1038/s41598-018-24271-9
- Lim, B., & Zohren, S. (2021). Time-series forecasting with deep learning: A survey. Philosophical Transactions of the Royal Society A, 379(2194). https://doi.org/10.1098/rsta.2020.0209
- Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511803161
- Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., & Lawrence, N. D. (Eds.). (2009). Dataset Shift in Machine Learning. MIT Press. https://doi.org/10.7551/mitpress/7921.001.0001
- Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. Journal of Machine Learning Research, 13, 723–773. https://doi.org/10.5555/2188385.2188410
- Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. W. (2010). A theory of learning from different domains. Machine Learning, 79(1–2), 151–175. https://doi.org/10.1007/s10994-009-5152-4
- Nestor, B., McDermott, M. B. A., Boag, W., Berner, G., Naumann, T., Hughes, M. C., Goldenberg, A., & Ghassemi, M. (2019). Feature robustness in non-stationary health records: Caveats to deployable model performance in common clinical machine learning tasks. Scientific Reports, 9(1), 17815. https://doi.org/10.1038/s41598-019-53622-3
- Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & Oermann, E. K. (2018). Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Medicine, 15(11), e1002683. https://doi.org/10.1371/journal.pmed.1002683
- Lipsitch, M., Finelli, L., Heffernan, R. T., Leung, G. M., & Redd, S. C. (2011). Improving the evidence base for decision making during a pandemic: The example of 2009 influenza A/H1N1. PLOS Medicine, 8(2), e1000413. https://doi.org/10.1371/journal.pmed.1001707
- Adrian, T., & Brunnermeier, M. K. (2016). CoVaR. American Economic Review, 106(7), 1705–1741. https://doi.org/10.1093/rfs/hhn016
- Ran, Y., Zhou, X., Lin, P., Wen, Y., & Deng, R. (2019). A survey of predictive maintenance: Systems, purposes and approaches. Renewable and Sustainable Energy Reviews, 109, 537–556. https://doi.org/10.1016/j.rser.2018.05.011
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”: Explaining the predictions of any classifier. Proceedings of KDD 2016. https://doi.org/10.1145/2939672.2939778
- Lipton, Z. C. (2018). The mythos of model interpretability. Queue, 16(3), 31–57. https://doi.org/10.1145/3236386.3241340
- McKinsey Global Institute. (2019). Notes from the AI frontier: AI adoption advances, but foundational barriers remain. Russian Management Journal, 17(4), 17–26. https://doi.org/10.17323/2587-814X-2019-4-17-26
- Obermeyer, Z., & Emanuel, E. J. (2016). Predicting the future — Big data, machine learning, and clinical medicine. New England Journal of Medicine, 375(13), 1216–1219. https://doi.org/10.1056/NEJMp1606181
- Godahewa, R., Bergmeir, C., Webb, G. I., Hyndman, R. J., & Montero-Manso, P. (2021). Monash time series forecasting archive. Proceedings of the 35th NeurIPS Datasets and Benchmarks Track. https://doi.org/10.1007/978-3-030-65742-0_5
- Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 15. https://doi.org/10.1145/1541880.1541882
- Alsentzer, E., Murphy, J. R., Boag, W., et al. (2019). Publicly available clinical BERT embeddings. Proceedings of the 2nd Clinical NLP Workshop. https://doi.org/10.18653/v1/W19-1909
Disclaimer: This is a preprint and has not been peer-reviewed. The analysis represents the authors’ views based on publicly available information. All company references, where applicable, are derived from published sources. This content does not constitute professional advice. AI-assisted in drafting; all analytical judgments, data interpretation, and conclusions are the authors’ own.
License: CC BY 4.0 — creativecommons.org/licenses/by/4.0/