We seek ever more data for a good reason: it’s the commodity that fuels digital innovation. However, turning those huge data collections into actionable insight remains a difficult proposition. Organizations that find solutions to formidable data challenges will be better positioned to economically benefit from the fruits of digital innovation.
With that basic premise in mind, here are 10 trends in big data that forward-looking organizations should look out for in 2019:
- 1 1. Data Management Is Still Hard
- 2 2. Data Silos Continue Proliferating
- 3 3. Streaming Analytics Has Breakout Year
- 4 4. Data Governance Builds Steam
- 5 5. Skills Shift as Tech Evolves
- 6 6. Deep Learning Gets Deeper
- 7 7. ‘Special K’ Expands Footprint
- 8 8. Clouds Hard to Ignore
- 9 9. New Tech Will Emerge
- 10 10. Smart Things Everywhere
1. Data Management Is Still Hard
The big idea behind big data analytics is fairly clear-cut: Find interesting patterns hidden in large amounts of data, train machine learning models to spot those patterns, and implement those models into production to automatically act upon them. Rinse and repeat as necessary.
However, the reality of putting that basic recipe into production is a lot harder than it looks. For starters, amassing data from different silos (see prediction #1) is difficult and requires ETL and database skills. Cleaning and labeling the data for the machine learning training also takes a lot of time and money, particularly when deep learning techniques are used. And finally, putting such a system into production at scale in a secure and reliable fashion requires another set of skills entirely.
For these reasons, data management remains a big challenge, and data engineers will continue to be among the most sought-after personas on the big data team.
2. Data Silos Continue Proliferating
This is not a difficult prediction to make. During the Hadoop boom five years ago, we were entranced with the idea that we could consolidate all of our data – for both analytical and transactional workloads – onto a single platform.
That idea never really panned out, for a variety of reasons. The biggest challenges is that different data types have different storage requirements. Relational database, graph databases, time-series databases, HDFS, and object stores all have their respective strengths and weakness. Developers can’t maximize strengths if they’ve crammed all their data into a one-size-fits-all data lake.
In some cases, amassing lots of data into a single place does make sense. Cloud data stores like S3, for instance, are providing companies with flexible and cost-effective storage, and Hadoop continues to be a cost-effective store for unstructured data storage and analytics. But for most companies, these are simply additional silos that must be managed. They’re big and important silos, of course, but they’re not the only ones.
In the absence of a strong centralizing force, data silos will continue to proliferate. Get used to it.
3. Streaming Analytics Has Breakout Year
The quicker you can act on a new piece of data, the better off your organization will be. That’s the driving force behind real-time, or streaming, analytics. The challenge has always been that it’s rather difficult to actually pull off and expensive too, but that’s changing as organizations’ analytic teams mature and the technology gets better.
NewSQL databases, in-memory data grids, and dedicated streaming analytic platforms are converging around a common capability, which is ultra-fast processing of incoming data, often using machine learning models to automate decision-making.
Combine that with the SQL capabilities in open source streaming frameworks like Kafka, Spark, and Flink, and you have the recipe for real progress in 2019.
4. Data Governance Builds Steam
Some people call data the “new oil.” It’s also been called the “new currency.” Whichever analogy you want to use, we all agree that data has value, and that treating it carelessly carries a risk.
The European Union spelled out the financial consequences for poor data governance with last year’s enactment of the GDPR. While there’s no similar law in the United States yet, American companies still must abide by 80-some different data mandates created by various states, countries, and unions.
Data breaches are bringing the issue to a head. According to an online survey by The Harris Poll, nearly 60 million Americans were affected by identity theft in 2018. That’s an increase of 300% from 2017, when just 15 million say they were affected.
Most organizations have realized that the Wild West days of big data are coming to an end. While the US Government won’t (yet) fine you for being reckless with data or abusing the privacy of American citizens, the writing is on the wall that this behavior is no longer tolerated.
5. Skills Shift as Tech Evolves
Human resources are typically the biggest costs in a big data project, because people ultimately are the ones that build it and run it and make it all work. Finding the right person with the right skills is absolutely critical to turning data into insight, no matter what technologies or techniques you’re using.
But as technology advances, the skills mix does too. In 2019, you can expect to see continued huge demand for anybody who can put a neural network into production. Among mere data scientists (as opposed to legit AI experts), Python continues to dominate among languages, although there’s plenty of work for folks who know R, SAS, Matlab, Scala, Java, and C.
As data governance programs kick into gear, demand for data stewards will go up. Data engineers who can work with the core tools (databases, Spark, Airflow, etc.) will continue to see their opportunities grow. You can also expect to see demand for machine learning engineers accelerate.
However, thanks to the advance of automated data science platforms, organizations will be able to accomplish quite a bit with mere data analysts, or “citizen data scientists,” as they’re commonly known. Knowledge of the data and the business – as opposed to expertise in statistics and coding – may get you further down the big data road than you imagined.
6. Deep Learning Gets Deeper
The “Cambrian explosion” of deep learning, which has powered the current AI summer that we currently find ourselves in, shows no signs of letting up in 2019. Organizations will continue to experiment with deep learning frameworks like TensorFlow, Caffe, Keras, PyTorch, and MXnet as they seek to monetize vast data sets.
Organizations will expand deep learning beyond its initial use cases, like computer vision and natural language processing (NLP), and find new and creative ways of implementing the powerful technology. Large financial institutions have already found that neural network algorithms are better at spotting fraud than “traditional” machine learning approaches, and the exploration into new use cases will continue in 2019.
This will also prop up demand for GPUs, which are the favored processors for training deep learning models. It’s unclear if new processor types, including ASICs, TPUs, and FPGAs, will become available. But there’s clearly demand for faster training and inference too.
However, the deep learning ecosystem will remain relatively young, and a lack of generalized platforms will keep this the realm of true experts.
7. ‘Special K’ Expands Footprint
Developed by Google to manage and orchestrate virtualized Linux containers in the cloud, Kubernetes has become one of the hottest technologies in the big data ecosystem, if not the IT industry as a whole. As multi-cloud and hybrid deployments become more common, Kubernetes is the glue that holds it all together.
Big data software vendors that used to write their software to run on Hadoop are now writing it to run on Kubernetes, which at least gets them in the front door (if not an invite to dinner). Supporting Kubernetes software has become the number one requirement for software vendors — including the Hadoop vendors too.
8. Clouds Hard to Ignore
The cloud is big, and getting bigger. In 2018, the three biggest public cloud vendors grew at a rate approaching 50%. With an array of big data tools and technology – not to mention cheap storage for housing all that data – it will be hard to resist the allure of the cloud.
In 2019, small businesses and startups will gravitate to the major public cloud providers, which are investing majors sums in building ready-to-run big data platforms, replete with automated machine learning, analytical databases, and real-time streaming analytics.
Bigger companies will also find the cloud hard to resist in 2019, even if the economics aren’t nearly so attractive. However, the looming threat of lock-in will keep bigger companies wary of putting all their eggs in a single cloud basket.
9. New Tech Will Emerge
Many of the major big data frameworks and databases that are driving innovation today were created by the Web giants in Silicon Valley, and released as open source. The good news is there’s no sign the well is drying up. If anything, innovation may be accelerating.
In 2019, big data practitioners would do well to retain as much flexibility as possible in their creations. While it may be tempting to cement your application to a certain technology for performance reasons, that could come back to haunt you when something better and faster comes along.
As much as you can, seek to keep your applications “loosely coupled but tightly integrated,” because you’ll eventually have to tear it apart and rebuild it.
10. Smart Things Everywhere
It’s tempting to dismiss the smart toaster as a cute gizmo that has no practical purpose in our lives. But perhaps it’s something less sinister: a prelude to an always-on world where smart devices are constantly collecting data and adapting to our conditions.
Driven by consumer demand, smart devices are proliferating at an astounding rate. Smart device ecosystems are springing up around the two leading platforms, Amazon Alexa and Google Assistant, providing consumers with the opportunity to infuse remote access and AI smarts into everything from lighting and HVAC systems to locks and home appliances.
Buoyed by the rollout of super-fast 5G wireless networks, what’s happening in the home will soon happen in the world at large. Consumers will be able to interact with a multitude of devices, providing new levels of personalization everywhere we go.
In 2019, progress will be made across a multitude of fronts. Yes, there are substantial technical, legal, and ethical hurdles presented by big data and AI, but the potential benefits are too great to ignore.