The benefits of big data in healthcare are hard to deny. From enhancing diagnostic processes to improving responses to epidemics, advanced analytics are reshaping how we approach public health. And with 2.5 quintillion bytes of data generated daily, the usage of such analytics is unlikely to go away. But collecting, managing, and analyzing this data is not without issue--concerns ranging from patient privacy to access of this data by private companies remain highly contested.
Logistical Barriers to Public Data
Data is being collected at unprecedented speed. Organizing and structuring this data, however, poses significant logistical barriers which can be summed up in 6 V’s: Volume, Variety, Velocity, Veracity, Variability and Value.

Volume- The sheer number of data sources creates astronomical amounts of data.
Variety- Comparing different kinds of data can be challenging--video, photo, numerical, written.
Velocity- The speed at which data is being gathered and demand for near real-time processing of this information.
Veracity- How true is the information being gathered?
Variability- The meaning of the data we collect shifts as our perceptions of the data changes.
Value- How do we extract actionable and valuable data?
With better integration of technology in healthcare and the rise of wearable sensor devices, the volume of data collected continues to grow. Just because the data is collected, though, doesn’t mean it’s instantly accessible. Currently, almost 90% of data collected is consider unstructured, leaving it outside the reach of advanced analytics tools.
Ethical Questions
Beyond the difficulties of simply collecting the information, leveraging this data poses a slew of other issues that are more political and deal with the rights of patients to have their data used. While most countries have data protection laws in place like HIPAA in the US and GDPR in Europe, there is still significant hesitation on how to balance public health interests with patients’ right to privacy. Partnerships between public healthcare providers and private corporations, such as the one between Google’s DeepMind and the Royal Free Hospital in London, reveal that patients are often not informed of how their data is being utilized.
While there are many organizations with noble intent for data utilization, there still exists a serious and valid concern that pharmaceutical and insurance companies could use this information for predatory purposes. There is a need for greater legislation to address how to protect patients from these risks while still allowing for the data to be used.
Possibilities and Potential
With major governmental bodies like the US Department of Veteran Affairs and the Centers for Medicare and Medicaid Services offering enormous public data sets, there is great potential to address public health concerns. There are generally 4 major ways in which the data can be leveraged: Descriptive, Diagnostic, Predictive, and Prescriptive Analytics.
Descriptive- Seeks to understand what exactly happened and the impact that it had.
Diagnostic- Uses historical data to identify the root cause for why something happened.
Predictive- Combining past and present data, predictive analytics seek to anticipate the most likely outcome.
Prescriptive- Beyond simply anticipating the potential outcome, prescriptive analytics offer a course of action.
In healthcare this has obvious benefits--from earlier detection and treatment of cancer and blindness to personalized healthcare. Data analytics in the OR is helping to reduce the risk and onset of surgical site infections. Finding ways to bridge the gap between big data security concerns and the tremendous possibilities this technology offers will be crucial for healthcare and technology companies in the years to come.