Notes & Questions: Andrew Trask on AI Security

2018.01 podcast (Epicenter)

4 min readDec 4, 2018

My notes on Epicenter podcast: Andrew Trask: OpenMined — A Decentralised Artificial Intelligence Platform

Facilitated private machine learning: train ML models on users’ data without exposing that data or uploading/aggregating that data.
How? A user downloads a model and trains it on their data. The training process yields an update to the model; that update is uploaded as a proposed improvement to the model. The user data is never exposed, aggregated with other data, or transferred beyond the user’s control.

How it works

OpenMined offers tools for the user to ‘hide’ their data in a container on their machine
User downloads the machine learning model that wants to train on the user’s data (without saving or taking the data off the user’s machine)
User allows the model to train on the data that’s behind a security firewall of sorts
Model updates based on what it learns from this new dataset
Model update uploaded to OpenMined where the model is hosted and managed; model update is selectively incorporated (e.g., if it’s introducing bias or some other undesirable property to the model, the model update is rejected — verify this for accuracy, I’m not certain of the process)
User’s data never copied or stored elsewhere, and its only trace on the model is any change that’s ultimately incorporated into the model hosted by OpenMined (ed: can this process be reverse-engineered to reveal anything about the data? otherwise, it’s much like hashing except …?…)

OpenMined: focused on AI security/privacy

containerizing AI components and providing boxes and tools/knobs to make training data useful without exposing it
the obvious use case is for narrow AI, and that’s the focus
regulators present a false tradeoff between innovation and privacy: they don’t seem to know that models can be trained without access to aggregated data (in the sense that it’s collected and stored by an entity which then controls that data — whose security might be breached, potentially exposing the full collection of people’s data to the security hacker)

OpenMined

bringing awareness of private machine learning to the broader community
building and offering tools for private machine learning
goal: make the software/UI as accessible as possible for broad use
these tools eradicate the privacy/innovation tradeoff
uses blockchain to facilitate AI privacy
key differentiator: tools for data that’s not centralized
machine learning engineers don’t see the data they’re using — they provide models and receive proposed changes to the model based on training the model locally (to a data owner) on private data which never leaves the data owner’s data storage (wherever and whatever that may be)

What could we be building? (aka things OpenMined hopes to help with)

things that are personal are some of our greatest pain points, greatest vulnerabilities — what if we could build tools to help solve some of those?
eg: machine learning models to predict mental illness, breakdown, extreme depression, self harm, …
we don’t seem to know how to do that without aggregating (i.e., collecting and storing) data
machine learning specialists and data owners interact directly (eliminate the company, which is a middle man), yielding whatever margin might have been collected by a middle man/company either to higher data contributor compensation or ecosystem growth

What components make up Open Mined?

OpenMined is an ecosystem and community — the volunteers are the most important part of the system
software itself is an ecosystem of libraries
the main library is a piece of software packaged inside a unity game engine: it’s a mine: it’s designed to hold an individual’s data and to protect it while allowing them to train machine learning models
eg you buy an xbox system, a videogame, you load in your data and on your behalf that data will earn a passive income stream
download models from the blockchain, training/updating them locally, then uploading the changes to the model back up to the blockchain

OpenMined technology components

Deep learning library: Sift (keep intelligence, leave behind data), including encryption pieces
Smart contract system: Sonar (blockchain smart contract, gain intelligence about something far away)
Open grid distributed system the models will learn on (ed: individual data providers’ computers?)
Building technology to support a marketplace, but users will determine how that happens (OpenMined aims to not introduce artificial constraints in terms of currency format or marketplace processes)

OpenMined Process

individuals contribute their data to improving the models, but the data itself is never exposed, only used by the model to generate a proposed incremental change to the model
users submit models for training via OpenMined’s platform/processes

Bias in data

by having wide distribution of data, OpenMined includes a natural buffer against bias
also, not all gradients/changes accepted — only those which improve the model (ed: look into this for more details)

AI Safety

primarily concerned with AGI: something with extraordinarily high IQ, but (controlled) by human values — different from AI privacy/security

AI Privacy

conversation primarily about narrow AI
business use cases are optimized for use with private data (hence the need for OpenMind’s platform)
blockchain to handle governance of AI? doesn’t solve themes such as questions around which values to code into the AGI…
…what blockchain brings to the table is liquidity and transparency; eg: AGI is misaligned in direction x, let’s tilt it in direction y

Notes & Questions: Andrew Trask on AI Security

2018.01 podcast (Epicenter)

How it works

OpenMined: focused on AI security/privacy

OpenMined

What could we be building? (aka things OpenMined hopes to help with)

What components make up Open Mined?

OpenMined technology components

OpenMined Process

Bias in data

AI Safety

AI Privacy

Written by Monica Spisar