Ocean's Blockchain Protocol To Power Decentralized Data Marketplaces For AI
Ocean is a transformative protocol to create seamless decentralized data markets and incentivize a global data commons. It ties together several themes that exist in the connected world today.
First, the data we produce throughout our digital lives is kept in siloes within large companies, which then use artificial intelligence (AI) to monetize it.
Second, that there are many new data-hungry AI start-ups emerging.
Third, a decentralisation exponential is now very visible, driven by the invention of Bitcoin and other blockchain applications.
Ocean aims to democratize data, making it accessible to a much broader set of people, while still respecting privacy. These things don't have to be mutually exclusive; they are reconcilable, says Ocean founder Trent McConaghy, also the founder of BigchainDB.
McConaghy, who worked in AI for nearly two decades before founding blockchain startups, sees broad social implications of the project.
He said: "You are being datamined by companies like Google and Facebook, perhaps with your official consent, but not really your ethical consent, because you don't really know to what extent they are mining, how they are leveraging it, and so on.
"Some of the folks who are mining your data have become more powerful than governments. The data is in silos and they don't want to tell you how much they are using it because that might make you choose to leave. For them, data is a moat, and they want to maintain these moats."
Much has been written about the unethical use of all our data and the vast amount of money it generates for the likes of Google. But what has been lacking is a technical solution. Now the right combination exists, essentially decentralisation technologies.
McConaghy's vision of data democratisation involves people having full control and visibility of their own data; the much vaunted idea of sovereign personal data. The adjoining part is making that data available to AI practitioners throughout the world, not just the big players like Google, Facebook and Baidu, but ideally to any AI set up on the planet.
He used the example of Sabre, the airport and flights database, to show how a datamarket place, although centralized, can deliver a net benefit for lots of enterprises and also users.
"Imagine if we had a Sabre for data, where you could have 1,000 or 10,000 marketplaces for data sitting on top, each serving their different customers, serving different geographies, serving different types of data whether its self-driven car data, etc.
"And imagine if that Sabre was decentralized so no single entity owns or controls it. You have incentives for people to put more data in, you have incentives for people to keep running the network over time, and it's basically like a public utility for the planet. Just like Bitcoin is a public utility for store of value, or Ethereum for business logic.
"That's really what Ocean is – a decentralized Sabre for data. It holds information about what data is available – free commons data or priced data, and if priced, what the offers are. It's a data marketplace protocol and network: the protocol as in how do these different machines talk to each other to keep running as well as talk to client nodes to serve data, to ingest data; the network as in an instantation of that protocol.
He pointed out that big enterprises have tons of data yet have trouble unlocking it. This latent value just sits there because firms are unsure how to price it, and unsure how to handle privacy. He mentioned the European General Data Protection Regulations (GDPR) which start in May 2018, and affect any website that interacts with European users.
Meanwhile, on the demand side, there are more and more AI companies emerging, and many have desperate need of data. More data means more accurate models, which may mean make-or-break for the target application.
A growing wave of projects that combine AI and blockchain technology are appearing, such as the startup Numerai and the OpenMined project. These projects are using techniques like homomorphic encryption (computing on encrypted data), on-premise computation, and decentralized networks powered by tokens to align the incentives of participants.
"It's pretty interesting that blockchain is turning out to be a linchpin for AI," said McConaghy. "So you can have blockchain technology for the sort of data side, like we are going for, as well as blockchain for compute side, such as on-premise compute."
McConaghy sees enormous potential in areas like medicine, where things like drug design are carried out using relatively limited amounts of data. "There's a win-win across the board for all sorts of medical applications. One application we are iterating against is on Parkinson's disease, which we will be talking about more in due course."
Ocean has announced that it is working with Toyota Research Institute. From a carmaker's perspective, adding more data to train their self-driving algorithms will make them safer. If car companies could somehow pool their data this would be a cost-effective way to reach the scale they need in self-driven miles.
McConaghy's team is based in Germany where a large number of the world's major car manufacturers have headquarters. He understands a medium is needed to enable this sort of data to be shared.
"The algorithms would be safe enough if they had more data, actually a lot more data. But then how do you pool it? Are they all just going to hug and sing Kumbaya and share each other's data – probably not.
"So if you actually have a medium of exchange, i.e. a data exchange or a data market place, then they can start to do that in a reasonable fashion.
"That's actually a win for society because you have fewer accidents; and you have a win for Toyota and all the automakers too and the Ubers of the world etc."
McConaghy also repeatedly asserted the need for a global data commons – data that ought to be out in the open, free for anyone to use. He described how Ocean incentivizes people to upload their data; it rewards people for sharing and serving free data. The free commons data goes hand-in-hand with the paid data; each makes the other stronger.
"Democratising data for the people. Now that sounds like a worthy target!"