Top Big Data Challenges To Overcome

Nada Naguib

November 23, 2022

There has been increasing interest in big data in the past few years as companies work on collecting and using data. For example, marketing company TechTarget’s Bridget Botelho wrote at length in a January blog about big data’s ability to help small businesses by “improving operations, providing better customer service, and creating personalized marketing campaigns.” Big data also has been embraced by educators and healthcare providers, among others.

However, a distinction should be made between “data” and “big data.” Oracle, the world’s third-largest software company by revenue and market capitalization, defines big data as data that has the three Vs: “contains greater variety, arriving in increasing volumes and with more velocity.” Another way to make the comparison is that while data can be generated at a company level, big data is generated outside the enterprise level and with massive volumes. It can reach exabytes (one with nine zeros gigabytes) of information.

Big data first appeared in 2005 when Facebook, YouTube, and others started generating big amounts of data from their users, according to Oracle. Taking notice of this, Apache Hadoop, a developer of open-source software utilities to manage such data flows, was founded. However, it wasn’t well-received at the time. The Economist reported that one Microsoft executive called such programs a “cancer,” given they were free to use and developed by non-established software companies. Undeterred, other utilities such as Hadoop, including NoSQL and Spark, appeared. The Oracle blog noted they were necessary “because they make big data easier to work with and cheaper to store.”

In the 17 years since the origin of the concept, Microsoft embraced big data, according to The Economist. Since 2018, they have been working on AI technologies to aid big data analysis to cure cancer, according to a Microsoft blog.

However, the case for big data remains a gray area. For example, while the European Union has stringent data laws, some argue that is an impediment and that big data could be the solution to several of today’s problems, according to The Economist.

The new oil?

The case for big data and the mass sharing of information has been described as so financially lucrative that it is the “new oil.” Media outlets, including Forbes magazine, Wired, and The Economist, use that phrase to illustrate the potential profits to be gained. The comparison originated with mathematician Clive Humby’s observation in 2006 that “like oil, data is valuable, but if unrefined, it cannot really be used.”

In 2016, the Organization of Economic Cooperation and Development (OECD), which includes the United States, Japan, and the United Kingdom, found that broad data sharing could translate to a GDP increase of 2.5%. An April 2020 Economist article said some “Eurocrats argued that data-sharing could speed up efforts to fight COVID-19.”

Expectations were high. However, Like oil, though, big data comes with some drawbacks. “Early on … there was a perception that there would be almost like a magic moment where the data would materialize, and it would answer our questions,” said Andrew Schroeder, a researcher who helped lead the Covid-19 Mobility Data Network.

However, that moment never came. “Unfortunately, what is collected is decided by entities whose priorities are different than that of the general public’s health,” said Nishant Kishore, who worked in one of the network’s labs. It’s possible the data just never made it to the researchers because the “public health goals ran headlong into the business interests of the companies that provided their data for analysis,” wrote Katie Palmer of Stat News.

Another drawback is that if life-saving data is controlled by tech companies that gather and analyze it, the world might become a technocracy. It is when an elite of technical experts and companies control government decisions, society, and industry. Delft University of Technology researcher Marijn Janssen said more should be done to “harness the merits and utilities of a computational form of technocratic governance.”

That means governments need laws to govern the use of big data. “Like commonplace regulations on oil, big data also needs some regulations in place,” said Forbes magazine’s Nisha Talagala. “It needs a data practice – a commonly understood and consistently executed set of principles for managing data.”

Managing big data

In the United States, there is “no single, comprehensive federal law regulating how most companies collect, store, or share customer data,” wrote Thorin Klosowski of the New York Times.

Instead, the country has “a bunch of disparate federal [and state] laws,” said Amie Stepanovich, executive director at the Silicon Flatirons Center for Law, Technology, and Entrepreneurship at the University of Colorado Law School. For example, U.S. laws tackle specific types of big data, such as health or credit card information. They can also look at particular population groups, such as children or the elderly, then “regulate within those realms,” said Stepanovich. When it comes to big data and its three Vs, this makes it more vague, giving “more possibilities for data to be leaked or breached in a way that causes real harm,” Klosowski wrote.

Some lobbyists argue that areas that should be covered include data collection and sharing rights, the right to see what was collected and by which companies, and opt-in consent to data collection. In addition, companies should limit the amount of data collected and never charge extra or offer financial compensation to encourage more data sharing.

In the EU, however, it’s quite a different story. In 2016, the EU passed the General Data Protection Regulation (GDPR) on which Egypt’s data protection law is based. It covers most areas U.S. lobbyists aim to protect, such as data minimization, opt-in consent, and other areas.

However, the GDPR and other laws are comprehensive. One example was data concerning COVID-19. Some Europeans argue that massive data, even with its potential problems, can still be useful and serve a purpose. The EU’s role so far in tech may be considered more of a mediator. They “force companies to comply with stringent data protection regulations, merrily fining big tech for antitrust violations and periodically scolding various honchos for not doing enough about privacy, disinformation, and terrorism,” wrote Gian M. Volpicelli of Wired magazine. Guntram Wolff, director of Brussels-based economic think tank Bruegel in an op-ed for Politico, said the EU has positioned itself as the “world’s tech referee,” but the problem is that “referees don’t win.”

Ethical access

To address the ethical issues that come with big data, it might become a priority to put rules on the access of that data. However, the problem would become how to regulate the mass gathering of big data while ensuring that the most important information arrives at its intended target. “Consumer data privacy laws can give individuals rights to control their data, but if poorly implemented, such laws could also maintain the status quo,” wrote Klosowski.

The European Commission (EC) is trying something new. Given the United States and China’s strong presence in tech, Volpicelli speculated the EU may have opted to compete in a different race entirely rather than focusing on personal data. It wants to help companies and startups access data gathered from connected objects, then make sharing compulsory. He explained that connected objects “such as cars, home appliances or manufacturing robots, and in computing facilities close to the user (‘edge computing’)” represent just 20% of data processed today. This figure is likely to move closer to 80% by 2025.

In February, the EC launched a draft data act that offers “harmonized rules on fair access to and use of data” and “to make Europe a leader in the [big] data economy by harnessing the potential of the ever-increasing amount of industrial data in order to benefit the European economy and society,” according to the commission’s website.

“Data spaces,”‘ as Volpicelli calls them, offer “a mechanism to pool, access and share industrial data across companies in strategic sectors.” The incentive to share would then come in the form of a tit-for-tat situation, where accessing other companies’ data is based on every company sharing its own data. It is implied in the commission document that new legislation might be introduced that makes data sharing compulsory so everyone can benefit.

There are thornier parts to the issue. If the EU’s compulsory data-sharing initiative moves forward and is extended to U.S.-based tech companies, it might draw a backlash, according to the Financial Times. “Unleashing data is valuable, but we are walking a fine line here,” Benedikt Blomeyer, director for EU policy of Allied for Startups, a small-business advocacy group, told the Financial Times in February 2020. “Entrepreneurs who have spent money to gather their data might very well be reluctant to share it.”

While the EU’s regulations won’t likely be an instant fix for big data issues, they might be an essential step toward ethical and fair data sharing. However, the type of company that will be affected by this is still unclear. Jack Hardinges, a policy adviser at the Open Data Institute in London, thinks it might be better to address traditional industries rather than social media. “An argument for doing so is that there’s latent value in data held by firms that might not think of themselves as data organizations or technology firms,” he said.