The Key to Using Terabytes of Data to Predict Soldier Suicide
Researchers analyzed 1.1 billion data points from 39 Army and Defense Department databases.
The Army STARRS program is one of the most important mental health efforts ever undertaken by the U.S. military, bringing a big data approach to understanding why suicide rates increased among active duty soldiers in recent years.
But the biggest challenge over the multiyear STARRS project – STARRS stands for Study to Assess Risk and Resilience in Servicemembers – wasn’t running analytics or purchasing new technologies. It was simply getting the hundreds of terabytes of soldier data together from 39 different disparate sources across the Defense Department.
“No more than 10 of those sources had ever been linked together before,” said Dr. Michael Schoenbaum, a senior adviser for mental health services at the National Institute of Mental Health.
Schoenbaum, speaking Thursday at the IBM Federal Forum, is also a scientific principal for the STARRS study, which began in 2009 after a $50 million grant from the Army.
“It was baffling to me that not only did they not have data dictionaries, but the points of contact weren’t even familiar with the concept of a data dictionary,” Schoenbaum said. “Big data depends at the end of the day in part on what feeds it and data generating processes and analysts being able to make sense of it. The people in big data analysis are totally dependent on folks at the ground level.”
Schoenbaum’s argument – that data governance must be taken seriously for its value to be truly realized – helped lead to a series of significant findings that “debunked myths and saved people from making false assumptions,” he said.
One example, he said, was that it was previously believed that multiple tours of duty increased the risk of soldier suicide, presumably because the soldier is likely to experience more trauma from battle.
The data, however, told a different story, one in which soldiers who assume multiple tours of duty are “hardier,” and actually commit suicide at a rate lower than soldiers who haven’t seen any battle.
These and other conclusions were made through the analysis of 1.1 billion data points from 39 Army and Defense Department databases, including sociodemographic variables like gender, age, race, religion, education and family status.
Those data sets were commingled with other Army data, including health records, Army entry characteristics and service records, which include rank, length of service, demotion history, and any involuntary extensions of a soldier’s active-duty term as well as other data points.
The original 5-year program culminated with a continued partnership between the Army, NIMH, several academic institutions and, as of last year, the Veterans Affairs Department, which aims to get its hands around veteran suicide.
It also presents an important lesson on why governance is vital to the success of any big data project. All the data in the world means nothing if you don’t know what it is.
“The more data owners can do, as boring as it is, the better it is for all of us,” Schoenbaum said.