A recent executive order from the White House establishes a “Genesis Mission” that aims to “mobilize the Department of Energy’s 17 National Laboratories, industry, and academia to build an integrated discovery platform,” according to a press release from the U.S. Department of Energy. The announcement builds on President Trump’s executive order, Removing Barriers to American Leadership In Artificial Intelligence, and his America’s AI Action Plan, released earlier this year.
According to the DOE press release, the platform will draw on the expertise of roughly 40,000 DOE scientists, engineers, and technical staff, and private sector innovators. It will gather the “world-class data sets” that America’s AI Action Plan had called for.
The call for comprehensive data sets was a focal point of a report published by the National Security Commission on Emerging Biotechnology (NSCEB), which was created as part of the annual defense authorization bill (FY22 NDAA) and appears to be, in part, a source of thinking for the subsequent EOs and action plan presented by the administration. In the commission’s report, the authors make several recommendations that illustrate the reasoning behind many of the calls for an integrated AI discovery platform:
-
“Congress must authorize the Department of Energy (DOE) to create a Web of Biological Data (WOBD), a single point of entry for researchers to access high-quality data.” The report advocates the creation of a resource that “combines biological datasets in a usable way (that) would allow researchers to spend less time curating biological data and more time testing hypotheses, training models, and designing novel biological functions.” Such a resource would provide “standardized, usable, and interoperable” data for researchers, thus shortening the amount of time researchers spend searching for data, manually cleaning each dataset, and manually combining datasets.
-
“Congress should authorize the National Institute of Standards and Technology (NIST) to create standards that researchers must meet to ensure that U.S. biological data is ready for use in AI models.” The report notes that the “the lack of universal standards, centralized access systems, or even a common language for biological data has exacerbated the current disconnected approach.” It suggests that the government should require recipients of federal funding to collect AI-ready research. The report also acknowledges that defining “AI-ready biological data” is a complex process due to the sheer number and breadth of biological data types.
-
“Congress should authorize and fund the Department of Interior (DOI) to create a Sequencing Public Lands Initiative to collect new data from U.S. public lands that researchers can use to drive innovation.” The report stresses that while the commission identified many gaps in U.S. biological data collection, “there is a particular need for non-human biological data, including data from animals, plants, microbes, and fungi, in order to better understand the breadth of America’s biological landscape.” It notes that there is currently no coordinated federal effort to catalog the genomic landscape of U.S. federal lands, which range from hydrothermal sites in Yellowstone National Park to the glacial wilderness of the Gates of the Arctic National Park and Preserve. This project is specifically mentioned as a recommended policy action in America’s AI Action Plan.
-
“Congress should authorize the National Science Foundation (NSF) to establish a network of ‘cloud labs,’ giving researchers state-of-the-art tools to make data generation easier.” The report defines cloud laboratories as “physical laboratories that are equipped with lab automation that can be programmed and controlled remotely by scientists to conduct biological experiments.” America’s AI Action Plan recommends that the government invest in these labs not only for biology, but also for engineering, materials science, chemistry, and neuroscience. The plan also recommends that the cloud labs be built by the private sector, federal agencies, and research institutions in coordination with DOE National Laboratories.
As the federal government plans to create a centralized platform for AI-based research, various states are also supporting AI-computing initiatives. As reported in Science, the University at Buffalo structural biologist Thomas Grant is building an AI computing system called SWAXSFold with funding through a $500 million, 10-year initiative called Empire AI. California has passed a law to create CalCompute, which will be housed in the University of California system.