Both the immersive web and Web3 are coming. Publishers need to make sure they are prepared for any changes that will inevitably impact their data.
But before we start, it’s important to make the distinction between the immersive web and Web3. They are not the same thing.
As a new iteration of the web based on blockchain technology, Web3 will, according to many, revolutionize the internet by transferring the ownership and power of private data to users. It aims to “decentralize” management. As such, it promises to reduce the control of big corporations, such as Google or Meta, and make the web more democratic. It is defined by open-source software. It is also “trustless” (it doesn’t require the support of a trusted intermediary) and is permissionless (it has no governing body).
Meanwhile, the immersive web or metaverse (which many conflate with Facebook’s new branding as “Meta”), is a version of the online world that incorporates advanced technologies to enhance user engagement and blur the line between the user’s physical reality and the digital environment.
But what are the implications for data-driven companies?
With Web3, the most obvious data implication is that publishers will now have to deal with distributed data and new applications, which will require new connectors. It will also impact yield management. The simplest example is downstream royalties (i.e., publishers rewarding customers for the resale of their data and passing that cost along to advertisers).
Meanwhile, the immersive web’s key impact will be the explosion of data volumes, which by some estimates will push global data usage by 20 times by 2032.
The jump from Web 1.0 to Web 2.0 was massive enough. But the leap to the immersive web is likely to see an exponential increase.
So, when moving from terabytes and petabytes, to exabytes and beyond, what components do you need to unify your data?
Velocity and scale
Everywhere you look, there are data automation solutions promising access to “real-time” or “near-real-time” data. But the question shouldn’t just be: how can I get real-time access to data? Things are a bit more complicated than that.
Rather, you should be asking:
1. How can I scale ongoing operations by having a data integrity engine that I can trust to continually scale, as my disparate data sources increase in number, and as my data sets explode in volume?
Building one data pipeline manually is manageable. But having the flexibility to add more pipelines, and connectors, becomes unsustainable without automation. For example, it can take up to a month to build each new connector manually. Unfortunately, that means the data from each new integration (Facebook, Snapchat etc) is out of date by the time your teams can access it. And if you need multiple new APIs for multiple different purposes – all at the same time – chaos reigns before you know it (and with no clear end in sight).
Any publisher attempting to keep up with the influx of new and ever-changing APIs on the horizon in Web3 needs to build a strong and workable data unification platform, now.
2. How long will it take to build a strategic data asset in the first place, before my teams get access to the data?
There’s no use in having access to real-time data in six months’ time. To make informed business decisions, your teams need that data right now. However, in reality, the majority of publishers embarking on building (or buying) their own data unification platform accept that they’ll need to wait for months before they can get close to any actionable data. For example, it might take six data engineering personnel a period of three to six months to code a bespoke platform that is useful to their individual business teams.
In an age of automation, where real-time data is key to keeping up with the competition, these time frames are no longer acceptable.
Smart data pipelines
Typical data pipelines are “dumb.” That means that they’ll pull all the data you need, but also a whole lot you don’t. For example, you might only need to access 100GB of your 1TB dataset to transform into actionable data. But without building a smart API for the job, it will end up pulling the full terabyte, which you will then need to store in your data warehouse.
The costs of exponentially larger data volumes can soon spiral out of control if left unchecked. Instead, you need to build APIs for specific cuts of the data that your teams need. This is what we call a smart data pipeline.
While the pace of adoption of Web3 is still unclear, the immersive web is just around the corner. It’s imperative that data-driven companies are prepared for what’s coming. That’s not just a few more rows of data to process and store, but a tsunami of new and larger data sets that will become overwhelming overnight without the right infrastructure in place.
For any publishers still attempting to carry out their data operations manually, they need to look to automate, wherever possible, before it’s too late.
About the author
Navid Nassiri joined Switchboard as Head of Marketing in 2021. Switchboard’s data engineering automation platform aggregates disparate data at scale, reliably and in real-time, to make better business decisions. In his role at Switchboard, Navid is focused on driving growth and brand awareness through innovative marketing strategies. Navid is a seasoned entrepreneur and executive, including leadership roles at PwC and NBCUniversal.