Funding the U.S. Scientific Training Ecosystem: New Data, Methods, and Evidence

Authors

Affiliations

Dror Shvadron

University of Toronto, Rotman School of Management

Hansen Zhang

Duke University, Fuqua School of Business

Lee Fleming

UC Berkeley, Haas School of Business

Daniel P. Gross

Duke University, Fuqua School of Business

NBER

Draft Date

June 12, 2025

NBER Working Paper Data

Every year, the United States produces roughly 30,000 new STEM PhDs—the scientists and engineers who will pioneer the next generation of discoveries in artificial intelligence, biotechnology, quantum computing, and countless other fields. Yet despite the critical role these researchers will play in advancing human knowledge, we lack a comprehensive understanding of the funding landscape that makes their training possible.

Using data from the near-population of U.S. STEM PhD dissertations since 1950, this research provides the first comprehensive mapping of doctoral funding sources across seven decades. Our analysis creates a new dataset—made publicly available-that enables systematic research into how scientific training is financed, with implications that extend far beyond individual career outcomes to the very structure of knowledge production in America.

Who funds PhD training in the United States?

We find that the U.S. federal government is by far the largest sponsor of STEM PhD training. Over 40% of graduates acknowledge direct governmental support, compared to roughly 10% from industry and 15% from non-profits. The National Science Foundation and National Institutes of Health alone are each acknowledged by more PhD graduates than the entire commercial sector. This pattern of government dominance holds across most universities and fields, though we observe notable variation—with subjects like astronomy heavily government-supported while pharmaceutical sciences receive more industry funding.

Figure 1: Share of PhD graduates supported over time, 1950-2022, by organization type

The dominance of federal funding varies significantly across scientific fields. While government support is substantial across most areas, the balance between federal and industry funding reveals interesting patterns that reflect both scientific priorities and commercial interests. For instance, automotive engineering and pharmaceutical science receive more industry support than government funding, while astronomy and astrophysics are heavily government-supported with essentially no private backing. Some fields like geology and materials science receive a balanced mix from both sectors.

Figure 2: Share of graduates with government vs. industry support, by subject, 2000-2022

These patterns become clearer when examining specific funding organizations. The following table shows that traditional government agencies like NSF, NIH, and DoD each dwarf private sector and philanthropic contributions to doctoral education, despite the prominence of major technology companies and foundations in public discourse about scientific research.

Table 1: Top 15 acknowledged organizations, by sector, 2000-2022
Government agencies	Count	Firms	Count	Non-profit organizations	Count
National Science Foundation	91,895	Intel	2,276	Howard Hughes Medical Institute	3,733
Department of Health and Human Services	78,033	IBM	1,943	American Heart Association	3,677
Department of Defense	34,103	Merck	1,417	Sigma Xi	2,790
Department of Energy	30,544	Google	1,300	American Cancer Society	1,568
National Aeronautics and Space Administration	15,044	Microsoft	1,221	American Chemical Society	1,453
Department of Agriculture	13,455	Pfizer	1,180	Geological Society of America	1,388
Department of Commerce	8,928	General Electric	980	Robert Wood Johnson Foundation	1,181
Department of the Interior	6,890	DuPont	875	W. M. Keck Foundation	1,104
Environmental Protection Agency	5,638	Dow Chemical	847	Fulbright Program	953
Department of Transportation	5,292	Eli Lilly	822	David and Lucile Packard Foundation	943
Department of Education	4,615	Chevron	821	Welch Foundation	910
Department of State	4,248	ExxonMobil	775	Burroughs Wellcome Fund	897
Department of Veterans Affairs	2,220	GlaxoSmithKline	765	Gordon and Betty Moore Foundation	856
Agency for International Development	1,617	Novartis	753	Ford Foundation	818
Department of Homeland Security	1,369	Boeing	718	National Geographic Society	760

PhD production in critical technology areas

Using our classification of graduates to 18 critical technology areas identified by the White House, we map the institutional landscape training scientists in AI, quantum computing, biotechnology, and other strategically important fields. MIT, Stanford, and UC Berkeley emerge as the top producers across multiple technology areas, while federal agencies—led by NSF and DoD—are the primary funders in nearly every critical technology domain. The data reveal both the concentration of critical technology training at elite institutions and the government’s outsized role in developing national technological capabilities.

Figure 3: Share of graduates reporting government support, by critical technology area, 2000-2022

Effect of government funding on PhD production

Leveraging variation in federal agencies’ funding priorities and budget fluctuations over time, we estimate that PhD production scales nearly one-for-one with government support. Our results suggest that a 10% increase in government-funded graduates leads to a 7.5% increase in total PhD production—indicating either that federal investment crowds in additional private support, or that government funding is more prevalent than acknowledgments suggest. Either interpretation points to the same conclusion: public investment is the primary lever determining the size and composition of America’s scientific workforce, with government funding decisions today directly shaping the research capacity for decades to come.

Relationship of federal support to PhD production at the field-year level, 1970-2022
Variable	Ln(PhD graduates)			Ln(Publications)
Variable	(1) All	(2) All	(3) Non-USG	(4)	(5)
Ln(USG-supported PhDs)	0.437***	0.770***	0.623***
	(0.015)	(0.021)	(0.037)
Ln(Non-USG PhDs)	0.534***
	(0.021)
Ln(Past 20 years' PhDs)				0.253***
				(0.098)
Ln(Past 20 years' USG PhDs)					0.246**
					(0.098)
N	901	901	901	900	900
F-stat	133.81	147.14	147.14	622.98	342.38
Field FEs	Y	Y	Y	Y	Y
Year FEs	Y	Y	Y	Y	Y

Summary

These findings reveal the federal government as the dominant architect of America’s scientific workforce, with funding decisions today directly shaping the research enterprise for decades to come. As policymakers consider investments in artificial intelligence, quantum computing, and other emerging fields, our data provide the empirical foundation to understand how funding choices translate into scientific talent. The methods we’ve developed can now track this ecosystem in real-time, offering policymakers and institutions unprecedented visibility into how public investment influences the scale and direction of research across fields, regions, and time.

Methodology

Data Collection and Sample Construction

We compiled a near-population dataset of U.S. STEM PhD graduates from ProQuest Dissertations & Theses Global (PQDT), supplemented with dissertations from individual university repositories. Our sample contains 1.17 million dissertations from 1950-2022, filtered to natural sciences and engineering fields at R1/R2 Carnegie-classified institutions. We obtained full dissertation text for about 870,000 graduates (75% of the sample, rising to 96% post-2000). To validate sample completeness, we compared annual graduate counts to the Survey of Earned Doctorates, finding close alignment until 2010 and 90% coverage through 2022.

Critical Technology Classification

We developed an unsupervised large language model pipeline to classify dissertations by their relationship to 18 critical technology areas identified by the White House Office of Science and Technology Policy. Using GPT-4o-mini, we first generated standardized one-sentence summaries of each dissertation based on titles and abstracts. We then applied zero-shot classification to assess relevance to each technology area, followed by a second-stage filter that mapped dissertations to specific technology subfields. This two-stage approach classified 42.7% of graduates to at least one critical technology area, with validation showing strong correlations between our classifications and universities’ publication patterns in corresponding fields.

Research Sponsor Identification

To extract funding information, we processed dissertation full text using a six-step pipeline combining rule-based text processing with large language models. We first isolated potential acknowledgment sentences using keyword matching, then employed Solar 10.7B and Smaug 34B models to identify supporting organizations, classify them by sector (government, industry, non-profit), and extract grant identifiers. We consolidated entity names using additional LLM processing and linked organizations to external registries including the Research Organization Registry (ROR) and Wikidata. This process extracted 9.3 million organizational mentions from 11 million acknowledgment sentences.

Validation

We validated our sponsor identification against multiple benchmarks, including manual review of 500 dissertations, NSF Graduate Research Fellowship awardee lists (finding 98.6% accuracy among those mentioning NSF), and university-level comparisons with the Survey of Graduate Students and Postdoctorates in Science and Engineering (showing strong correlations across agencies).

Causal Analysis

To estimate causal effects of federal funding on PhD production, we employed a shift-share instrumental variables design that exploits variation in agencies’ field-specific funding priorities and annual budget fluctuations. The instrument leverages each university-field’s historical share of graduates from specific agencies, interacted with those agencies’ total annual graduate support, to identify exogenous variation in federal funding exposure.

Acknowledgments

We thank Bhaven Sampat and Bruce Weinberg for helpful conversations, as well as audiences at the ICSSI annual conference, Summer School on Data and Algorithms for Science, Technology & Innovation Studies conference, and NBER Investments in Early Career Scientists meeting for comments. We also thank James Dunham and colleagues at the Center for Security and Emerging Technology for insights related to technology classification; Michelle Qiu and Max Murakami-Moses for research assistance; and the Duke University Fuqua School of Business, University of Toronto Rotman School of Management and the UC Berkeley Technology Competitiveness and Industrial Policy Center, Alfred P. Sloan Foundation, and National Science Foundation (Grant No. 2420824) for financial support. All errors are our own.