In recent years, a large number of enterprises planned and implemented projects based on big data analytics with the aim of gaining insights that would further their business objectives. Unfortunately, not all those initiatives succeeded in delivering tangible business value. It’s not surprising that a 2019 survey by NewVantage found that 77% of businesses reported that the “business adoption” of big data and AI initiatives was a big challenge.
Projects that proved to be of value usually had a well-defined problem, while others just had data or answers in search of questions. The next key ingredient, of course, was a good plan — one that defined the processes, key team members, timeframes, metrics, milestones, and technology tools. We have also seen great projects that generated potentially transformational insights sometimes failed to impact business results due to a lack of clarity around how the findings were to be operationalized.
So, while we study data science, technology and tools, it’s important to understand the big data challenges that face most organizations and put the success of analytics projects in jeopardy. We discuss the top four big data challenges here.
Data quality
The credibility and reliability of your recommendations based on analytics are a function of the quality of your data. It’s necessary to put the right controls in place for data sources, security and retention to achieve high standards of data quality.
When data is gathered from disparate systems, discrepancy is a common problem and makes data validation a very important aspect of ensuring quality. For example, the sales data from an e-commerce portal may show different numbers than the ERP system, or customer contact details in the company’s CRM may not match with those in the dealer’s system. A combination of policies and technology is usually needed in order to resolve these issues and make sure that the records are accurate and usable.
Multiple copies of the same records cause results to be skewed or incorrect and also drive up the costs of computation and storage. Incomplete data or data stored in inconsistent formats need to be completed and corrected in order to achieve meaningful results. Master data management (MDM) and deduplication are extremely important in this context.
Each data source has its own trust and certainty level, and if you combine data from sources with varying levels of credibility, the value of the entire collection of data is reduced. That’s why it’s so important to check each data source for correctness, timeliness, relevance, completeness, and ease of understanding for users.
The integration of data from disparate sources is one of the biggest big data challenges, which is why ETL and data integration tools are used. Typically, the different data sources include enterprise applications, social media channels, emails, IoT sensors, video, audio and text files, etc.
Delivering insights at the speed that businesses need
Organizations initiate big data projects in order to enhance operational efficiencies, identify new strategic opportunities or accelerate the speed of business. Data analytics is a powerful method to build a competitive advantage. But, in order for that to occur, it’s important that insights are available at the time when action needs to be taken.
The value of insights delivered in real-time is far higher than if they are made available after the event. Real-time analytics is one of the big data challenges that a variety of technology tools are aiming to overcome. Your big data system should be designed for real-time ingestion, transaction, and analysis of the data, practically as it is created. This may require an investment to upgrade your IT infrastructure and current processes. It’s important to select a technology platform that supports real-time analytics. It’s important to note that the pressure to process data in real-time should not lead to any compromises with data integrity.
There may also be change management challenges as employees may resist real-time reporting because the culture of daily, weekly or monthly reporting is so deeply ingrained.
It is also extremely important to consider the business application of real-time analytics, or to put it simply: what will you do with the insights and how soon do you need them? This will help prioritize what is actually needed in real-time, and also define what real-time means to you, as it varies across organizations — from a few milliseconds for some to 24 hours for others.
Big data talent
While the use of automation for data analytics is growing, there are still many tasks that require data science experts or software professionals. There is actually a talent shortage of data analysts available for organizations to achieve the best results from their big data investments.
There is a huge demand in the industry for professionals with deep analytical skills and even more for data management and interpretation skills. A data scientist needs a deep understanding of mathematics, statistics, computer science, modeling, and analytics. It’s quite natural that companies first seek data scientists who also have domain knowledge. For example, banks want data science expertise, as well as an understanding of banking, while manufacturing companies want their data scientists to know how manufacturing functions, and so on. A major cause of the shortage of big data talent is that there are simply not enough experts within the same domain.
Analytics solutions that integrate automation solve this problem to an extent by allowing users who are not necessarily experts to operate the systems and achieve the required results.
Some companies have found that investing in the learning and development of internal talent to equip them to leverage big data analytics is a sound strategy, as these employees already understand organizational objectives, processes, and people.
In a number of enterprises, data analytics experts function as a shared service, helping different teams apply the power of data science. The industry is also working on partnering with universities to identify talent and nurture it so that graduates are equipped with big data skills and can help bridge the talent gap.
Security of big data
Protecting your big data and analytics processes from malicious attacks or theft is critically important to prevent possible financial and reputational damage. The range of threats includes data theft, ransomware, distributed denial of service (DDoS) attacks and more. While every kind of data needs to be protected, it becomes even more critical when customer databases, financial details, or credit card information are involved. Data theft can lead to direct financial loss, not to mention that consumer data protection laws make you liable for huge fines and compensations.
Organizations face the challenge of security vulnerabilities throughout the data lifecycle. The first stage is incoming data which could be intercepted or corrupted in transit. Then there is data in storage — on-premise or on the cloud — which could be the target of attacks. The last stage is the output, when data is being presented in the form of dashboards, visualizations, etc., which could also be targeted by hackers.
In order to overcome the big data challenges related to data security, there are a number of measures that organizations can put in place. Data encryption is an extremely important method to protect data from hackers, as they cannot access it without the encryption key. Encryption can ensure data security both during input and output.
One drawback of encryption is that constant encryption and decryption of huge data chunks slows things down. For this reason, some organizations store data without encryption but this can be a huge security risk.
Firewalls are also extremely important since they filter the traffic that enters or leaves servers. This can help the organization to prevent attacks, as the firewalls prevent unknown entities or data sources from accessing the data on servers.
Businesses also need to plan identity and access management (IAM) for authorized people by enforcing good password policies, authentication, and other controls. You may need to enforce different security policies for different categories of data.
Implementing the above data security practices can mitigate the risk of the big data challenge related to data theft or hacking. Big data security audits should be conducted regularly to identify vulnerabilities and put the right preventive measures in place.
We discussed the four most critical big data challenges that could hamper progress and must be overcome for a project to succeed. By identifying these challenges, understanding how they impact your project and taking the necessary steps to mitigate them, you can increase the probability of the success of your big data initiative substantially.