Compute to data: using blockchain to decentralize data science and AI with the Ocean Protocol
- by 7wData
AI and its machine learning algorithms need data to work. By now, that's a known fact. It's not that algorithms don't matter, it's just that typically, getting more data, better data, helps come up with better results more than tweaking algorithms. The unreasonable effectiveness of data.
More data, and more compute capacity to train algorithms that use the data, is what has been fueling the rise of AI. Anyone who wants to train an algorithm for an AI application to address any problem in any domain must be able to get lots of relevant data in order to be successful.
That data can be public data, private data generated and owned by the organization developing the application, or private data acquired by 3rd parties. Public data is not an issue. Privately owned private data must be collected and processed in accordance with data protection laws such as GDPR and CCPA.
But what about private data owned by 3rd parties? Normally, application developers don't have access to those, and for good reasons. Why would you trust anyone with your private data? Even if the party you hand it over to promises to take good care of the data, once the data is out of your hands, anyone can do as they please with it.
This is the problem the non-profit Ocean Protocol Foundation (OPF) wants to solve. ZDNet connected with Founder Trent McConaghy, to discuss OPF's mission and the latest milestone achieved - Compute-to-Data.
McConaghy has been working on the Ocean Protocol since 2017. McConaghy has a background in AI and blockchain, having worked in projects such as ascribe and BigchainDB. He described how he realized that blockchain could help solve the issue of data escapes and privacy for data used to train AI algorithms.
The OPF has been working on setting up the infrastructure to enable better accessibility to data via data marketplaces. As McConaghy pointed out, there have been many attempts of data marketplaces in the past, but they've always been custodial, which means the data marketplace is a middlemen users have to trust. Recent case in point - Surgisphere.
But what if you could have marketplaces act as the connector without them actually holding the data, without having to trust the marketplace? This is what OPF is out to achieve - decentralized data marketplaces.
This is a tall order, and McConaghy is fast to admit that it will take years to get there. Last week, however, brought the OPF one step closer, by unveiling what it calls Compute-to-Data. Compute-to-Data provides a means to exchange data while preserving privacy by allowing the data to stay on-premise with the data provider, allowing data consumers to run compute jobs on the data to train AI models.
Rather than having the data sent to where the algorithm runs, the algorithm runs where the data is. The idea is very similar to federated learning. The difference, McConaghy says, is that federated learning only decentralizes the last mile of the process, while Compute-to-Data goes all the way.
TensorFlow Federated (TFF) and OpenMined are the most prominent federated learning projects. TFF does orchestration in a centralized fashion, OpenMined is decentralized. In TFF-style federated learning a centralized entity (e.g. Google) must perform the orchestration of compute jobs across silos. Personally identifiable information can leak to this entity.
OpenMined addresses this via decentralized orchestration. But its software infrastructure could use improvement to manage computation at each silo in a more secure fashion; this is where Compute-to-Data can help, says McConaghy. That's all fine and well, but what about performance?
If algorithms run where the data is, then this means how fast they will run depends on the resources available at the host.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Strategies for simplifying complex Salesforce data migrations – Free Webinar
27 March 2024
5 PM CET – 6 PM CET
Read MoreTags
You Might Be Interested In
5 Ways Hybrid Smart Contracts Are Changing the Blockchain Industry
15 Sep, 2021For years, the blockchain industry has been defined by the excitement around smart contracts, or tamper-proof digital agreements that automatically …
Why data scientists are vital for increasing customer loyalty
11 Apr, 2017The importance of big data is increasing across all industries — as are the jobs required to tap into the …
The new data highway your city really needs
11 Apr, 2017Data by itself doesn’t make a city smart. What makes a city smart is its ability to act on it …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.