Notes & Questions: Andrew Trask on AI Security

2018.01 podcast (Epicenter)

Monica Spisar
4 min readDec 4, 2018

My notes on Epicenter podcast: Andrew Trask: OpenMined — A Decentralised Artificial Intelligence Platform

Facilitated private machine learning: train ML models on users’ data without exposing that data or uploading/aggregating that data.

How? A user downloads a model and trains it on their data. The training process yields an update to the model; that update is uploaded as a proposed improvement to the model. The user data is never exposed, aggregated with other data, or transferred beyond the user’s control.

How it works

  • OpenMined offers tools for the user to ‘hide’ their data in a container on their machine
  • User downloads the machine learning model that wants to train on the user’s data (without saving or taking the data off the user’s machine)
  • User allows the model to train on the data that’s behind a security firewall of sorts
  • Model updates based on what it learns from this new dataset
  • Model update uploaded to OpenMined where the model is hosted and managed; model update is selectively incorporated (e.g., if it’s introducing bias or some other undesirable property to the model, the model update is rejected — verify this for accuracy, I’m not certain of the process)
  • User’s data never copied or stored elsewhere, and its only trace on the model is any change that’s ultimately incorporated into the model hosted by OpenMined (ed: can this process be reverse-engineered to reveal anything about the data? otherwise, it’s much like hashing except …?…)

OpenMined: focused on AI security/privacy

  • containerizing AI components and providing boxes and tools/knobs to make training data useful without exposing it
  • the obvious use case is for narrow AI, and that’s the focus
  • regulators present a false tradeoff between innovation and privacy: they don’t seem to know that models can be trained without access to aggregated data (in the sense that it’s collected and stored by an entity which then controls that data — whose security might be breached, potentially exposing the full collection of people’s data to the security hacker)

OpenMined

  • bringing awareness of private machine learning to the broader community
  • building and offering tools for private machine learning
  • goal: make the software/UI as accessible as possible for broad use
  • these tools eradicate the privacy/innovation tradeoff
  • uses blockchain to facilitate AI privacy
  • key differentiator: tools for data that’s not centralized
  • machine learning engineers don’t see the data they’re using — they provide models and receive proposed changes to the model based on training the model locally (to a data owner) on private data which never leaves the data owner’s data storage (wherever and whatever that may be)

What could we be building? (aka things OpenMined hopes to help with)

  • things that are personal are some of our greatest pain points, greatest vulnerabilities — what if we could build tools to help solve some of those?
  • eg: machine learning models to predict mental illness, breakdown, extreme depression, self harm, …
  • we don’t seem to know how to do that without aggregating (i.e., collecting and storing) data
  • machine learning specialists and data owners interact directly (eliminate the company, which is a middle man), yielding whatever margin might have been collected by a middle man/company either to higher data contributor compensation or ecosystem growth

What components make up Open Mined?

  • OpenMined is an ecosystem and community — the volunteers are the most important part of the system
  • software itself is an ecosystem of libraries
  • the main library is a piece of software packaged inside a unity game engine: it’s a mine: it’s designed to hold an individual’s data and to protect it while allowing them to train machine learning models
  • eg you buy an xbox system, a videogame, you load in your data and on your behalf that data will earn a passive income stream
  • download models from the blockchain, training/updating them locally, then uploading the changes to the model back up to the blockchain

OpenMined technology components

  • Deep learning library: Sift (keep intelligence, leave behind data), including encryption pieces
  • Smart contract system: Sonar (blockchain smart contract, gain intelligence about something far away)
  • Open grid distributed system the models will learn on (ed: individual data providers’ computers?)
  • Building technology to support a marketplace, but users will determine how that happens (OpenMined aims to not introduce artificial constraints in terms of currency format or marketplace processes)

OpenMined Process

  • individuals contribute their data to improving the models, but the data itself is never exposed, only used by the model to generate a proposed incremental change to the model
  • users submit models for training via OpenMined’s platform/processes

Bias in data

  • by having wide distribution of data, OpenMined includes a natural buffer against bias
  • also, not all gradients/changes accepted — only those which improve the model (ed: look into this for more details)

AI Safety

  • primarily concerned with AGI: something with extraordinarily high IQ, but (controlled) by human values — different from AI privacy/security

AI Privacy

  • conversation primarily about narrow AI
  • business use cases are optimized for use with private data (hence the need for OpenMind’s platform)
  • blockchain to handle governance of AI? doesn’t solve themes such as questions around which values to code into the AGI…
  • …what blockchain brings to the table is liquidity and transparency; eg: AGI is misaligned in direction x, let’s tilt it in direction y

--

--