Deep Learning with Differential Privacy (DP-SGD explained)

  Переглядів 18,150

Mukul Rathi

Mukul Rathi

День тому

We're performing a technical deep dive into differential privacy: preventing models from memorising private data. Theory + Colab notebook using Tensorflow Privacy!
Social:
Twitter: / mukulrathi_
My website (+ blog): mukulrathi.com/
My email newsletter: newsletter.mukulrathi.com
-----------------
Links:
DP-SGD paper: arxiv.org/pdf/1607.00133.pdf
Tensorflow Privacy Tutorials: github.com/tensorflow/privacy...
Tensorflow Privacy: github.com/tensorflow/privacy
Pytorch Opacus: github.com/pytorch/opacus
Moments accountant implementation: github.com/marcotcr/tf-models...
GPT-2 memorises private data: ai.googleblog.com/2020/12/pri...
Netflix dataset deanonymised: www.wired.com/2007/12/why-ano...
Netflix deanonymisation paper: www.cs.utexas.edu/~shmat/shma...
Strava heatmap leaks: www.zdnet.com/article/strava-...
------------------------------------------------------------------------------------
Timestamps:
00:00 Introduction
00:32 Overview
01:26 Why Anonymisation Isn't Enough
02:38 Intuition for Differential Privacy
03:12 Example: Predict whether Bob has Cancer
04:11 Privacy Intuition
04:51 Privacy Loss Definition
05:26 Definition of Differential Privacy
06:40 Role of Noise in DP
07:08 Privacy Amplification Theorem
07:26 Fundamental Law of Information Recovery
07:51 Composition in DP
08:19 DP-SGD
09:20 Moments Accountant
12:55 Google Colab Notebook
14:39 Limitations of DP-SGD
---------------------------------------------------------------------------
Music: Coffee Break by Pyrosion is licensed under a Creative Commons License.
creativecommons.org/licenses/...
/ pyrosion
Support by RFM - NCM: bit.ly/2xGHypM

КОМЕНТАРІ: 50
@mingxiao9651
@mingxiao9651 3 роки тому
Very clear. Thanks! A question: why high order moments can achieve a tighter bound?
@MukulRathi
@MukulRathi 3 роки тому
Great question! I'll give a short answer and then give a longer answer (using the notation in the paper) that'll help if you want to read the paper in more depth . The short answer: we're computing bounds for each of the moments, and each moment gives you a different bound. We pick the tightest bound, and that might not necessarily be the highest moment, but we try out bounds from up to the 32nd moment to give us a range of bounds. ---------------------------------------------------------------------------------------- Okay, a deeper dive. First the wider context in case you're unfamiliar with moments: there is the whole field of concentration inequalities that I'd encourage you to check out (start with Markov's inequality and Chebyshev's inequality) en.wikipedia.org/wiki/Concentration_inequality From this field, the maths in the DP-SGD paper refers to a specific technique called moment-generating functions to get our bounds. In the paper, the moment generating function is written as α_M(λ). It takes in a number λ and returns the λth moment of the mechanism M's privacy loss distribution. E.g. if I pass in the number 1 to the function it would return the 1st moment. Essentially the "body" of this function α_M(λ) is an expression for the λth moment of our privacy loss distribution. In the paper the authors say the value of α_M(λ) is computed through numerical integration, they also discuss an upper bound under some mixture of two Gaussians. α_M(λ) ≤ q^2 λ(λ + 1)/(1 − q)σ^2 +O(q^3/σ^3) . The details don't matter (they're messy and relegated to the appendix), the point is that it's a general expression involving λ. We take this expression for the λth moment of the privacy loss distribution, from which we can derive a bound on the tail of the distribution (the part where the privacy loss is > ε). In the paper we get this: δ
@mingxiao9651
@mingxiao9651 3 роки тому
@@MukulRathi I appreciate your detailed explanation. It's very clear. Thanks a lot! Looking forward to your video about Renyi DP, CDP, zCDP and their relations!
@zark4346
@zark4346 2 роки тому
​@@MukulRathi Very nice explanation. Just wondering if you missed λ in the third equation, which I thinks should be ε
@sarahsimionescu7926
@sarahsimionescu7926 Рік тому
This video is exactly what I was looking for and I feel like you went into just the right amount of detail to provide a little bit of a deeper understanding. Thank you so much!
@musicbuff81
@musicbuff81 Рік тому
Thank you for a really clear explanation.
@yunathanielkang
@yunathanielkang 10 місяців тому
Very clear explanation; thank you for saving my life!
@-long-
@-long- 3 роки тому
Mukul Rathi you have a great ability to explain clearly the problem, keep it up! Thank you.
@MukulRathi
@MukulRathi 3 роки тому
Thank you very much!
@user-fw9lc4qj9m
@user-fw9lc4qj9m Місяць тому
absolutely clear :D thanks a lot!!!
@xenowits
@xenowits 4 місяці тому
It was a great informative video, thanks man!
@hengxuantang2806
@hengxuantang2806 2 роки тому
excellent, you solved most of my doubts. Looking for your further videos!
@MukulRathi
@MukulRathi 2 роки тому
Awesome, thank you!
@ucsdvc
@ucsdvc 2 роки тому
Very useful intro for someone new to this topic, thanks!
@MukulRathi
@MukulRathi 2 роки тому
Glad you found it useful!
@user-br9gc7sj8e
@user-br9gc7sj8e 2 місяці тому
Thank you so much.
@loukas371
@loukas371 2 роки тому
Awesome video, very helpful to clear some things up, thank you :)
@MukulRathi
@MukulRathi 2 роки тому
Glad it was helpful!
@MukulRathi
@MukulRathi 3 роки тому
TIMESTAMPS: 0:00 Introduction 0:32 Overview 1:26 Why Anonymisation Isn't Enough 2:38 Intuition for Differential Privacy 3:12 Example: Predict whether Bob has Cancer 4:11 Intuition behind Privacy 4:51 Privacy Loss Definition 5:26 Definition of Differential Privacy 6:40 Role of Noise in DP 7:08 Privacy Amplification Theorem 7:26 Fundamental Law of Information Recovery 7:51 Composition in DP 8:19 DP-SGD 9:20 Moments Accountant 12:55 Google Colab Notebook 14:39 Limitations of DP-SGD
@shreejaltrivedi9731
@shreejaltrivedi9731 2 роки тому
Hey Mukul, Great Video. There are very few resources on which federated learning is explained in laymen' language. Actually, I am starting researching a bit about on this topic lately. Maybe you can make a series on Federated Learning wherein you can explain the papers starting from overview and taking it to different papers such as FedAvg, qFedA, etc. Looking forward on that initiative. Again, Great explanation. Keep it up!
@user-bt7si4ov7s
@user-bt7si4ov7s 2 роки тому
So informative video!!And I really appreciate your video clap
@MukulRathi
@MukulRathi 2 роки тому
My pleasure!
@VatsalVenkatkrishna
@VatsalVenkatkrishna 8 місяців тому
Thanks for the video! Very well explained. I had a question though: doesn't one lose out on potentially critical information if the confidence of a given model is clipped? For example, if Bob would achieve a probability of 0.8 when his data is added to the training set, isn't it an unfair estimate to report a probability of 0.57? The seriousness with which one takes this prediction would be impacted.
@egesarca9550
@egesarca9550 5 місяців тому
Hey Mukul, great video and blog thanks a lot! My senior design project is Differential Privacy in Federated Learning and i have found your content extremely useful. In your blog u mention that there will be Part 2 about the moments accoundant yet i see that you have stopped creating content. Can you give me any advise, useful links about this topic? I study Electronics engineering so i'm a newbie in field of AI, I'd appreciate any help!
@macknightxu2199
@macknightxu2199 2 роки тому
at 10:28, why the privacy loss of moments accountant that we're calculating is actually a random variable of its own?
@sonjaadomeit9600
@sonjaadomeit9600 Рік тому
Hi Mukul, I'm a bit late to the game. Thanks for your video! In your example of Bob having cancer, I think it's confusing how you used log(0.8/0.5) in the formular of privacy loss, because in Dwork and Roth, the formular looks at distributions for a given outcome (e.g., 0.55) and does not compare two different outcomes. Maybe you could comment on this?
@michaeljang299
@michaeljang299 2 роки тому
Very informative video. I study ML fairness for my Ph.D and new to privacy literature. It helped me alot. Could you make a video about facebook team that you are involved in?
@MukulRathi
@MukulRathi 2 роки тому
Thanks! Yes I will do, I’m currently editing a video on what life as been like as a new software engineer at Facebook :)
@user-tg6ot1mb2i
@user-tg6ot1mb2i 3 роки тому
Good ~
@MukulRathi
@MukulRathi 3 роки тому
Thanks!
@eduardojreis
@eduardojreis Рік тому
3:16 - If I have the ability to provide bob's data to train the model, wouldn't that mean that I already need the information I want to leek prior to leak that information?
@1UniverseGames
@1UniverseGames 2 роки тому
Nice video. Is it possible can you share coding implementation of these privacy for federated learning for beginners. Thank you
@MukulRathi
@MukulRathi 2 роки тому
linked in the description!
@aninditasaha0307
@aninditasaha0307 Рік тому
Can someone pls explain me the delta added to the equation?
@810602jay
@810602jay 2 роки тому
What is the big O( ) means at 10:14 ? In the epsilon term of (\epsilon, \delta) - DP formula
@MukulRathi
@MukulRathi 2 роки тому
big O() is asymptotic complexity: we're measuring how much the overall loss changes as a function of q, epsilon, T, and big O notation lets us disregard constants. en.wikipedia.org/wiki/Asymptotic_computational_complexity
@810602jay
@810602jay 2 роки тому
@Mukul Rathi Very Thanks Mukul ! 🥰 But I still confusing about the detail of compute the each epsilon_i in each step of gradient update. How to use the "Tail Bound" formula of original paper (Theorem 2.) to compute the each epsilon_i in training process?
@MukulRathi
@MukulRathi 2 роки тому
​@@810602jay the authors use numerical integration to compute α(λ). Then since δ is fixed, we can rearrange the Tail Bound formula to get the value of ε in terms of α(λ) and δ.
@810602jay
@810602jay 2 роки тому
@@MukulRathi Very Thanks Again ! 🥰 I try to rearrange the Tail Bound formula. i.imgur.com/Xnfb67h.png But I find there are not involved the training step T in the Tail Bound & the asymptotic bound α(λ) formula. Does that mean I can obtain the total Privacy Loss ε only use q & λ & σ & δ parameters, and without T? I think I was missing some concept because the experiment result shows the privacy loss ε increasing with the epochs (related to training step T) Can you give me a brief example of how to use these two formulas (Tail Bound & the asymptotic bound α(λ)) to obtain the privacy loss ε by when training step=T? Thank you in advance! 🥰 🥰 🥰 You really help me to learn a lot of concepts.
@MukulRathi
@MukulRathi 2 роки тому
@@810602jay Regarding rearranging the tail bound formula, when we minimise the tail bound with respect to lambda, we try values up to 32 and pick the lambda that gives the smallest value of the expression. This is not necessarily when lambda=32 so you can't substitute 32 into the expression. I re-read the paper and the proofs contained in the appendix of the paper are pretty thorough. I'll try to fill in the gaps as to where the bounds in the proof presented in the appendix come from. The proof for Theorem 1 should fall out by convincing yourself first that α(λ)
@detpekandefingret
@detpekandefingret 2 місяці тому
06:13 "adding noise to the data", while noisy smoke detector simultaneously chirping in the background. Seriously, don't die in a fire, please replace those batteries, for your and everyone else's safety.
@viralgupta5630
@viralgupta5630 2 роки тому
Can you clearly define privacy loss.
@MukulRathi
@MukulRathi 2 роки тому
It's defined at 4:26, no?
@viralgupta5630
@viralgupta5630 2 роки тому
@@MukulRathi Thanks for pointing out. I was confused about privacy loss when looking at 8:45. How is it computed isn't obvious. In 4:26 it talks about probability with D and D'. But in 8:45 we are learning just single theta.
@MukulRathi
@MukulRathi 2 роки тому
@@viralgupta5630 Hey, so the process of learning theta leaks information about the data we are training on. However, if we add Gaussian noise and clip the gradients (controlled by C, sigma) then we can bound the privacy loss for each gradient descent step. We compute this using the moments accountant. As discussed in 10:24 onwards, the intuition behind the moments accountant technique is that we treat the privacy loss (defined as 4:26) as a random variable and then compute bounds on its distribution - the plot of the distribution at 10:50 gives you an intuition for what epsilon and delta refer to. I'd suggest looking at the paper (+ its appendix) to see the specific bounds and proofs. I deliberately gave a high-level overview in the video as this is an introductory video, however if you look at the replies to the other comments I've gone into the details in response to specific questions about the proofs. I think HengJay's question was similar to yours, so perhaps that response also answers your question.
@macknightxu2199
@macknightxu2199 2 роки тому
Hi, at 5:05, what's the meaning of Higer std.dev?
SaTML 2023 - Gautam Kamath - An Introduction to Differential Privacy
59:30
Nicolas Papernot
Переглядів 4,1 тис.
Privacy-preserving ML with PATE
17:17
Mukul Rathi
Переглядів 4,7 тис.
Зомби Апокалипсис  часть 1 🤯#shorts
00:29
Машины в 2018 и в 2024
00:15
Gazan
Переглядів 896 тис.
Differential Privacy - Simply Explained
6:59
Simply Explained
Переглядів 89 тис.
Privacy Preserving AI (Andrew Trask) | MIT Deep Learning Series
1:13:51
Lex Fridman
Переглядів 72 тис.
Optimization for Deep Learning (Momentum, RMSprop, AdaGrad, Adam)
15:52
The Definition of Differential Privacy - Cynthia Dwork
18:21
Institute for Advanced Study
Переглядів 38 тис.
The Mathematics Behind Differential Privacy
6:13
Security and Privacy Academy
Переглядів 1,1 тис.
Why I Chose Industry over a PhD
8:23
Mukul Rathi
Переглядів 18 тис.
Starting a Career in Data Science (10 Thing I Wish I Knew…)
10:42
Sundas Khalid
Переглядів 48 тис.
What is Differential Privacy?
3:18
National Institute of Standards and Technology
Переглядів 19 тис.