Inside the U.S.-Led COVID-19 High Performance Computing Consortium
Nextgov heard from members and researchers about how the effort is panning out.
The world’s fastest supercomputer teamed up with the White House’s expanding supercomputing effort to fight the novel coronavirus.
Japan’s Fugaku—which surpassed leading U.S. machines on the Top 500 list of global supercomputers in late June—joined the COVID-19 High Performance Computing Consortium.
Jointly launched by the Energy Department, Office of Science and Technology Policy and IBM in late March, the consortium currently facilitates more than 65 active research projects and envelops a vast supercomputer-powered search for new findings pertaining to the novel coronavirus’ spread, how to effectively treat and mitigate it, and more. Dozens of national and international members are volunteering free compute time through the effort, providing at least 485 petaflops of capacity and steadily growing, to more rapidly generate new solutions against COVID-19.
“What started as a simple concept has grown to span three continents with over 40 supercomputer providers,” Dario Gil, director of IBM Research and consortium co-chair, told Nextgov last week. “In the face of a global pandemic like COVID-19, hopefully a once-in-a-lifetime event, the speed at which researchers can drive discovery is a critical factor in the search for a cure and it is essential that we combine forces.”
Gil and other members and researchers briefed Nextgov on how the work is unfolding, how they’re measuring success and the research the consortium is increasingly underpinning.
The Consortium’s Evolution
Energy’s Office of Science Director Chris Fall told Nextgov last week that since the consortium’s founding, its resources have been used to sort through billions of molecules to identify promising compounds that can be manufactured quickly and tested for potency to target the novel coronavirus, produce large data sets to study variations in patient responses, perform airflow simulations on a new device that will allow doctors to use one ventilator to support multiple patients—and more. The complex systems are powering calculations, simulations and results in a matter of days that several scientists have noted would take a matter of months on traditional computers.
“From a small conversation three months ago, we overcame a myriad of institutional and organizational boundaries to establish the consortium,” Fall said, adding that the effort is “building an international team of COVID-19 researchers that are sharing their best ideas, methods and results to understand the virus and its effects on humans which will [allow] the world to ultimately conquer or confine the virus.”
In a recent interview, Energy’s Undersecretary for Science Paul Dabbar explained that any researcher interested in tapping into advanced computing capabilities can submit relevant research proposals to the consortium through an online portal that’ll then be reviewed for selection. An executive committee supports the group’s organization and helps steer policies, while a science committee is tasked with evaluating research proposals submitted to the consortium for potential impact upon selection. And a third committee allocates the time and cycles on the supercomputing machines once they’re chosen.
“What's really interesting about this from an organizational point of view is that it's basically a volunteer organization,” Dabbar noted.
As of July 1, the consortium had received more than 148 COVID-19 research proposals with 78 approved and 68 up and running via the involved supercomputing resources, Energy confirmed. Though researchers are tapping into the assets free of charge, the work doesn’t come without cost. Dabbar said the consortium taps into some of the department’s user facilities and resources that were built and funded by taxpayer dollars. The effort induces operating costs such as runtime, electricity and cooling for the machines, which Dabbar said are “relatively minor compared to actually building the capacity to begin with.”
“It does absolutely cost money,” Dabbar said. “But at the end of the day, a lot of this is taking advantage of what the American people invested in, and using the flexibility, and shifting it towards this problem.”
The combined, supercomputing resources are speeding up the chase for answers and solutions against COVID-19, but that faster pace isn’t the only metric for success. IBM’s Gil said in the early days, “the establishment of the consortium and the efficiency we have achieved in expedited expert review of proposals and rapid matching of approved proposals to supercomputing systems, along with rapid on-boarding onto those systems would have to be considered our first major success.”
Those involved also measure success by the number of up-and-running research projects, and highlighted that 27 projects already have experimental, clinical or policy transition plans in place. Insiders also consider the fact that they were able to quickly bring together industry players, as Gil noted “many of whom are competitors,” labs, federal agencies, universities and several international partners to share their systems to be “a major achievement.”
NASA is one consortium member that’s been involved in the initiative from the very beginning when it was invited by OSTP, Piyush Mehrotra, chief of NASA’s Advanced Supercomputing, or NAS Division told Nextgov Thursday.
The division, at Ames Research Center in Silicon Valley, hosts the space agency’s supercomputers, which Mehrotra noted are typically used for large-scale simulations supporting NASA’s aerospace, earth science, space science and space exploration missions. But, a selection of the agency’s supercomputing assets are also reserved for national priorities that transcend beyond the agency’s scope.
“In order to understand COVID-19, and to develop treatments and vaccines, extensive research is required in complex areas such as epidemiology and molecular modeling—research that can be significantly accelerated by supercomputing resources,” Mehrotra explained. “We are therefore making the full reserve portion of NASA supercomputing resources available to researchers working on the COVID-19 response, along with providing our expertise and support to port and run their applications on NASA systems.”
Amazon Web Services is another that joined among the consortium’s first wave of members and participated in the initial roundtable discussion at the White House where the concept emerged in March. The company’s Vice President of Public Policy Shannon Kellogg told Nextgov in late May that, in joining, AWS “saw a clear opportunity to bring the benefits of cloud … to bear in the race for treatments and a vaccine.” The company has since provided cloud computing resources to more than a dozen of the consortium’s active projects, and according to Kellogg, provides “in-kind credits to the research teams, which provide them with cloud computing resources.” The tech-giant’s team then communicates regularly with the researchers to help address technical questions.
“This effort has shown how collaboration and commitment from leaders across government, business, and academia can empower researchers and accelerate the pace of their work,” Kellogg said.
Outside of IBM, NASA and AWS, other early members of the consortium include Google Cloud, Microsoft, the Massachusetts Institute of Technology, Rensselaer Polytechnic Institute, the National Science Foundation, as well as Argonne, Lawrence Livermore, Los Alamos, Oak Ridge and Sandia National laboratories. And as the consortium progresses, it’s also expanding along the way. In April, the National Center for Atmospheric Research’s Wyoming Supercomputing Center, chipmaker AMD and graphics processing units-maker NVIDIA joined, among others.
Dell Technologies also began the process to participate in April, according to Janet Morss, senior consultant, high performance computing. It took about a month for the involvement to come into fruition and the company is now donating cycles from the Zenith supercomputer and other resources.
“With the consortium, we recognize fighting COVID-19 will require extensive research in areas like bioinformatics, epidemiology, and molecular modeling to understand the threat and to develop strategies to address it,” Morss said. “In short, the team wanted to help.”
Beyond U.S. Borders
The consortium is underpinning research for some non-U.S.-based projects, and in late May, international entities began gaining membership to offer resources to support the overall effort. Now, the United Kingdom’s Digital Research Infrastructure, Switzerland’s Swiss National Supercomputing Centre and Sweden’s Swedish National Infrastructure for Computing are among the entities that provide computing assets.
“COVID-19 doesn’t know geographical or competitive borders, and the response shouldn’t either,” Gil noted.
In late June, Japan’s Fugaku eclipsed two powerful U.S. systems, Summit and Sierra, to earn the number-one spot on the Top500 list of global supercomputers—with the power to conduct 415 quadrillion computations per second—“besting the now second-place Summit system by a factor of 2.8x,” according to the announcement.
Shortly after the news surfaced, U.S. Chief Technology Officer Michael Kratsios tweeted that Fugaku was joining the consortium.
“Practically, this helps the consortium make more computing power available to research to help combat the global pandemic,” Gil said. “On another level, this underscores the value of global collaboration.”
Just as the process works with other members, the consortium’s insiders will review research proposals that are submitted and, if approved, pair those that are relevant with RIKEN, the company behind Fugaku. Energy’s Fall said the State Department initiated discussions about Japan joining the consortium in mid-April, and both RIKEN, and the Japanese Ministry of Education, Culture, Sports, Science and Technology applied in response to an official invitation from the Energy Department.
Fugaku is still in the process of being deployed, but it’s been designed to support a diverse range of applications, and several Japanese COVID-19-related projects are already currently using the system.
“We are still waiting to hear back from RIKEN regarding how much of the system will be available to consortium projects. Regardless of the amount of the system that will be available to the consortium, it will provide additional resources to fight the COVID-19 pandemic,” Fall said. “The consortium will be able to support additional projects studying the molecular makeup of the virus, looking for new vaccines or medical therapeutics or identifying new methods for patient care.”
Officials involved with the consortium repeatedly emphasized that the pandemic—which is global and not limited to the U.S.—requires a coordinated, global response. But this focus on the consortium’s inclusivity of foreign partners in necessary research also comes as the administration recently suspended issuing foreign scholars’ visas and moved to exit the World Health Organization amid the pandemic, which some argue could stifle fruitful collaboration.
Still, Fugaku and other international members’ participation in the effort aligns with the recent G7 Science and Technology Ministers’ Declaration on COVID-19, which called on members to collectively strengthen the use of high-performance computing for COVID-19 response.
“It is already the case that approximately 25% of the proposals are from outside the U.S.” Gil noted. “We further believe that the values of the consortium can be applied to a variety of global challenges, and hope that it can be used as a model to inform future global supercomputing and other collaborations.”
A Glimpse Into the Research
By now, selected researchers in American cities and abroad are leveraging the consortium’s capabilities around the clock to test potential new treatments for COVID-19, track its spread, and examining all aspects of the virus to identify weaknesses worth targeting.
University of Utah Professor Thomas Cheatham, III is presently working with a small but mighty team of both theoretical and experimental researchers, who applied to join the consortium on the last Sunday in March, and were approved the following Tuesday. Building off of a computational workflow produced during the Ebola outbreak around 2015, the team is now harnessing the National Center for Supercomputing’s Blue Waters and Texas Advanced Computing Center’s Longhorn and Frontera to support the development and design of novel peptide inhibitors of an enzyme that could help curb COVID-19. Cheatham told Nextgov that the university’s researchers for a while now have used large-scale allocations of computer time on NSF-funded resources, “however, where the consortium really helped was not just the total size of award, but the ready and immediate access to the resources.”
“I am very optimistic that good things will come out of the tremendous efforts of the consortium and the many and varied science teams in terms of critically and rapidly advancing and understanding many aspects of COVID function, spread, and control,” Cheatham said.
Officials within federal agencies are also among the researchers selected to harness the consortium’s reserves. For instance, NASA is a member providing HPC resources, and agency officials are also conducting useful new studies through the effort. In one, now approved and active project, Ames Research Center’s Medical Officer David Loftus and Senior Scientist Viktor Stolc intend to investigate genetic factors that may affect the severity of the COVID-19 illness. Via the NASA Ames Supercomputer, the team is able to conduct comprehensive genome sequencing and bioinformatics analysis, they noted, to make correlations between COVID-19 disease severity and genetic features of individual people.
“Without [the consortium’s] resources, it’s likely we would not be able to carry out the comprehensive, detailed analysis that is needed for our project to be successful," Loftus and Stolc said, adding that the “work will be successful if we are able to identify genetic features that allow us to predict COVID-19 disease severity with good reliability.”
The hope is that physicians can use such information to better anticipate the needs of specific patients—and particularly those who may require intensive care. The insights also have the potential to help hospitals forecast necessary resources needed to treat certain COVID-19 patients.
“Our approach to predicting COVID-19 disease severity may be especially important in the Fall, when rates of COVID-19 infection are predicted to rise,” Loftus and Stolc said.
NEXT STORY: $12 Billion Intelligence IT Contract Out for Bid