Funding the U.S. Scientific Training Ecosystem: New Data, Methods, and Evidence

Authors
Affiliations

Dror Shvadron

University of Toronto, Rotman School of Management

Hansen Zhang

Duke University, Fuqua School of Business

Lee Fleming

UC Berkeley, Haas School of Business

Daniel P. Gross

Duke University, Fuqua School of Business

NBER

Draft Date

June 12, 2025

NBER Working Paper Data

Every year, the United States produces roughly 30,000 new STEM PhDs—the scientists and engineers who will pioneer the next generation of discoveries in artificial intelligence, biotechnology, quantum computing, and countless other fields. Yet despite the critical role these researchers will play in advancing human knowledge, we lack a comprehensive understanding of the funding landscape that makes their training possible.

Using data from the near-population of U.S. STEM PhD dissertations since 1950, this research provides the first comprehensive mapping of doctoral funding sources across seven decades. Our analysis creates a new dataset—made publicly available-that enables systematic research into how scientific training is financed, with implications that extend far beyond individual career outcomes to the very structure of knowledge production in America.

Who funds PhD training in the United States?

We find that the U.S. federal government is by far the largest sponsor of STEM PhD training. Over 40% of graduates acknowledge direct governmental support, compared to roughly 10% from industry and 15% from non-profits. The National Science Foundation and National Institutes of Health alone are each acknowledged by more PhD graduates than the entire commercial sector. This pattern of government dominance holds across most universities and fields, though we observe notable variation—with subjects like astronomy heavily government-supported while pharmaceutical sciences receive more industry funding.

The dominance of federal funding varies significantly across scientific fields. While government support is substantial across most areas, the balance between federal and industry funding reveals interesting patterns that reflect both scientific priorities and commercial interests. For instance, automotive engineering and pharmaceutical science receive more industry support than government funding, while astronomy and astrophysics are heavily government-supported with essentially no private backing. Some fields like geology and materials science receive a balanced mix from both sectors.

Figure 2: Share of graduates with government vs. industry support, by subject, 2000-2022

These patterns become clearer when examining specific funding organizations. The following table shows that traditional government agencies like NSF, NIH, and DoD each dwarf private sector and philanthropic contributions to doctoral education, despite the prominence of major technology companies and foundations in public discourse about scientific research.

Table 1: Top 15 acknowledged organizations, by sector, 2000-2022
Government agencies Count Firms Count Non-profit organizations Count
National Science Foundation 91,895 Intel 2,276 Howard Hughes Medical Institute 3,733
Department of Health and Human Services 78,033 IBM 1,943 American Heart Association 3,677
Department of Defense 34,103 Merck 1,417 Sigma Xi 2,790
Department of Energy 30,544 Google 1,300 American Cancer Society 1,568
National Aeronautics and Space Administration 15,044 Microsoft 1,221 American Chemical Society 1,453
Department of Agriculture 13,455 Pfizer 1,180 Geological Society of America 1,388
Department of Commerce 8,928 General Electric 980 Robert Wood Johnson Foundation 1,181
Department of the Interior 6,890 DuPont 875 W. M. Keck Foundation 1,104
Environmental Protection Agency 5,638 Dow Chemical 847 Fulbright Program 953
Department of Transportation 5,292 Eli Lilly 822 David and Lucile Packard Foundation 943
Department of Education 4,615 Chevron 821 Welch Foundation 910
Department of State 4,248 ExxonMobil 775 Burroughs Wellcome Fund 897
Department of Veterans Affairs 2,220 GlaxoSmithKline 765 Gordon and Betty Moore Foundation 856
Agency for International Development 1,617 Novartis 753 Ford Foundation 818
Department of Homeland Security 1,369 Boeing 718 National Geographic Society 760

PhD production in critical technology areas

Using our classification of graduates to 18 critical technology areas identified by the White House, we map the institutional landscape training scientists in AI, quantum computing, biotechnology, and other strategically important fields. MIT, Stanford, and UC Berkeley emerge as the top producers across multiple technology areas, while federal agencies—led by NSF and DoD—are the primary funders in nearly every critical technology domain. The data reveal both the concentration of critical technology training at elite institutions and the government’s outsized role in developing national technological capabilities.

Figure 3: Share of graduates reporting government support, by critical technology area, 2000-2022

Effect of government funding on PhD production

Leveraging variation in federal agencies’ funding priorities and budget fluctuations over time, we estimate that PhD production scales nearly one-for-one with government support. Our results suggest that a 10% increase in government-funded graduates leads to a 7.5% increase in total PhD production—indicating either that federal investment crowds in additional private support, or that government funding is more prevalent than acknowledgments suggest. Either interpretation points to the same conclusion: public investment is the primary lever determining the size and composition of America’s scientific workforce, with government funding decisions today directly shaping the research capacity for decades to come.

Relationship of federal support to PhD production at the field-year level, 1970-2022
Variable Ln(PhD graduates) Ln(Publications)
(1) All (2) All (3) Non-USG (4) (5)
Ln(USG-supported PhDs) 0.437*** 0.770*** 0.623***
(0.015) (0.021) (0.037)
Ln(Non-USG PhDs) 0.534***
(0.021)
Ln(Past 20 years' PhDs) 0.253***
(0.098)
Ln(Past 20 years' USG PhDs) 0.246**
(0.098)
N 901 901 901 900 900
F-stat 133.81 147.14 147.14 622.98 342.38
Field FEs Y Y Y Y Y
Year FEs Y Y Y Y Y

Summary

These findings reveal the federal government as the dominant architect of America’s scientific workforce, with funding decisions today directly shaping the research enterprise for decades to come. As policymakers consider investments in artificial intelligence, quantum computing, and other emerging fields, our data provide the empirical foundation to understand how funding choices translate into scientific talent. The methods we’ve developed can now track this ecosystem in real-time, offering policymakers and institutions unprecedented visibility into how public investment influences the scale and direction of research across fields, regions, and time.

Methodology

Data Collection and Sample Construction

We compiled a near-population dataset of U.S. STEM PhD graduates from ProQuest Dissertations & Theses Global (PQDT), supplemented with dissertations from individual university repositories. Our sample contains 1.17 million dissertations from 1950-2022, filtered to natural sciences and engineering fields at R1/R2 Carnegie-classified institutions. We obtained full dissertation text for about 870,000 graduates (75% of the sample, rising to 96% post-2000). To validate sample completeness, we compared annual graduate counts to the Survey of Earned Doctorates, finding close alignment until 2010 and 90% coverage through 2022.

Critical Technology Classification

We developed an unsupervised large language model pipeline to classify dissertations by their relationship to 18 critical technology areas identified by the White House Office of Science and Technology Policy. Using GPT-4o-mini, we first generated standardized one-sentence summaries of each dissertation based on titles and abstracts. We then applied zero-shot classification to assess relevance to each technology area, followed by a second-stage filter that mapped dissertations to specific technology subfields. This two-stage approach classified 42.7% of graduates to at least one critical technology area, with validation showing strong correlations between our classifications and universities’ publication patterns in corresponding fields.

Research Sponsor Identification

To extract funding information, we processed dissertation full text using a six-step pipeline combining rule-based text processing with large language models. We first isolated potential acknowledgment sentences using keyword matching, then employed Solar 10.7B and Smaug 34B models to identify supporting organizations, classify them by sector (government, industry, non-profit), and extract grant identifiers. We consolidated entity names using additional LLM processing and linked organizations to external registries including the Research Organization Registry (ROR) and Wikidata. This process extracted 9.3 million organizational mentions from 11 million acknowledgment sentences.

Validation

We validated our sponsor identification against multiple benchmarks, including manual review of 500 dissertations, NSF Graduate Research Fellowship awardee lists (finding 98.6% accuracy among those mentioning NSF), and university-level comparisons with the Survey of Graduate Students and Postdoctorates in Science and Engineering (showing strong correlations across agencies).

Causal Analysis

To estimate causal effects of federal funding on PhD production, we employed a shift-share instrumental variables design that exploits variation in agencies’ field-specific funding priorities and annual budget fluctuations. The instrument leverages each university-field’s historical share of graduates from specific agencies, interacted with those agencies’ total annual graduate support, to identify exogenous variation in federal funding exposure.

Acknowledgments

We thank Bhaven Sampat and Bruce Weinberg for helpful conversations, as well as audiences at the ICSSI annual conference, Summer School on Data and Algorithms for Science, Technology & Innovation Studies conference, and NBER Investments in Early Career Scientists meeting for comments. We also thank James Dunham and colleagues at the Center for Security and Emerging Technology for insights related to technology classification; Michelle Qiu and Max Murakami-Moses for research assistance; and the Duke University Fuqua School of Business, University of Toronto Rotman School of Management and the UC Berkeley Technology Competitiveness and Industrial Policy Center, Alfred P. Sloan Foundation, and National Science Foundation (Grant No. 2420824) for financial support. All errors are our own.