This is part 2 of a 3-part (Part 1, Part 3 -- coming soon) blog post which is in part based on my papers [CK21] and [Kar23], which appeared at TCC '21 and SaTML '23, respectively.
In part 1 of this blog series, we introduced the model extraction scenario first, and then give an overview of a specific type of model extraction defense, which called an Observational Model Extraction Defense (OMED), focusing on the theoretical underpinnings of these defenses. We then outlined the techincal contributions of [Kar23], which, briefly speaking, cast a lot of doubt on whether or not these OMEDs can provide any provable security guarantees against the model extraction problem. In part 2 of the blog series (this blog post), we will dig into what exactly is Covert Learning [CK21] --- the mysterious learning algorithm that lay at the center of the results of [Kar23] for impossibility of efficiently defending against model extraction attacks using OMEDs.
Let's dive into the connection between model extraction in the presence of OMEDs, and the Covert Learning model [CK21]. This blog post will assume a basic familiarity with computational learning theory, and specifically, the Probably Approximately Correct (PAC) learning model.
The Covert Learning model is a variant of Valiant's PAC-learning model, in the agnostic learning with queries setting. In agnostic learning with queries, a learning algorithm has access to an oracle that labels points in the domain of the target concept. These points can be chosen arbitrarily by the learning algorithm. In the "agnostic" setting, the goal is for the learning algorithm to identify a model (within some predifined "touchstone" class) that approximates an arbitrary target concept implemented by the oracle approximately as well as the best possible model contained in the touchstone class. The agnostic setting mirrors the way machine learning is traditionally conducted in practice (as opposed to the "realizable" setting, where the target concept is assumed to be contained by a predefined concept class).
The point of the Covert Learning model is to formalize a new type of privacy in learning theory. Specifically, Covert Learning algorithms provide the guarantee that the queries leak very little information about the concept, with respect an adversary that can take a look at the queries. In this way, the Covert Learning algorithm gets privacy for what it is learning.
In other words, Covert Learning allows one to learn a concept (using knowledge of some internal randomness only known to itself),[1] while the adversary remains nearly completely "in the dark" about the concept, even given a view of the entire transcript of membership queries and oracle responses. Crucially, the adversary is not privy to the secret randomness of the learning algorithm.
An example we can use to build intuition about what Covert Learning is all about is the following. Imagine being in a course, with one teacher and many classmates, where you are graded "on a curve" (meaning the goal is to finish ranked first in the class). To do this, you need to learn the day's concept by asking questions of the teacher (for simplicity, restrict to "yes or no" questions). The catch is that every one of your classmates gets to observe your questions and the teacher's answers. Can you design questions a unique set of questions that allows you to learn the day's concept, but prevent all your classmates from learning the concept similarly as well? Covert Learning algorithms provide a solution to this problem, under some natural assumptions (to be explained later).
So, how do we model such a question in a mathematical way? Covert Learning does this by borrowing from the Simulation-based security paradigm from the field of Cryptography.
Roughly speaking, any membership query learning algorithm is a Covert Learning algorithm if it has an accompanying simulator (i.e., a polynomial time algorithm) which, when given access only to random examples, emulates the distribution over the membership queries used by the Covert Learning algorithm in a computational indistinguishable way. In other words, the simulator is able to produce a "believable" transcript of the interaction between the learning algorithm and the query oracle, without query access to the underlying concept, but only (for instance) uniformly random examples.
For a concept class where access only to uniformly random examples makes learning computationally hard, this proves that the learning transcript reveals very little information to a computationally bounded adversary. Loosely speaking, this comes from the following logic:
These two points together are in contradiction, so we conclude that any adversary can't obtain any knowledge about the concept from the queries. Hence, Covert Learning algorithms are reserved for concept classes that are not (known to be) efficiently learnable in the original PAC-model with respect to a certain "hard example distribution." There are many such concept classes, such as decision trees and low-depth neural networks, which have so far evaded the computational learning theory communities attempts at designing polynomial time PAC learning algorithms.
Now I want to highlight a special case of Covert Learning, called natural Covert Learning, which is often easier to think about than the simulation-based version. In Natural Covert Learning, rather than requiring a simulator for the query interaction, we require that the distribution over the membership queries is computationally indistinguishable from a single pre-defined "hard example distribution." For concreteness, let us consider the uniform distribution as the pre-defined "hard distribution." Also, in Natural Covert Learning we will consider the realizable learning PAC setting (rather than agnostic).
Now we're finally ready to draw the connection between (Natural) Covert Learning, and the model extraction problem.
The adversary in natural Covert Learning --- a distinguisher that attempts to distinguish the membership queries made by the learner algorithm from random examples, is essentially the "adversary" in a model extraction attack --- an OMED.
At first glance this sounds really confusing, because normally the adversary in model extraction is supposed to be the malicious client that is trying to reverse engineer the model. However, consider that the job of the OMED is to somehow distinguish benign clients from adverse clients. Thus, thinking from the perspective of the actual adversary (i.e., the model extractor), the OMED is adversarial in nature and has essentially the same job as the natural Covert Learning adversary: to detect some property about the client's distribution of queries. This becomes even more apparent when considering the similarities between Figure 1 (see Blog post 1) and Figure 2.
By exploiting this connection, we can actually demonstrate that natural Covert Learning algorithms form a recipe for an attack that can fool the OMED in the same way that they fool the Covert Learning adversary. Plus, the Learning guarantee of natural Covert Learning then allows us to still argue that the client can extract the underlying model. Hence, we can show that a single Covert Learning attack can achieve extraction while "fooling" any OMED. That is, any OMED will output "accept" with high probability even though an extraction attack is being performed.
To wrap up part 2 of this blog series, I want to hammer home the point about exactly why and how Covert Learning algorithms are at the crux of model extraction attacks in the presence of OMEDs. We touched on the fact that all models are at risk of extraction by the very fact that they are exposed by some interface. Again, the VC dimension argument outlined in part 1 of the blog series shows that, given enough even uniformly random examples of the input/output behavior of the model, and enough computational power, an adversary has a PAC-learning algorithm that would work for extraction. . Additionally, since the PAC-learning algorithm only uses random examples, then the model extraction adversary can choose to make queries according to a distribution that is "accepted" by the OMED.
However, the VC dimension argument in general relies on unbounded computational power, but only very simple concept classes are known to be PAC-learnable in polynomial time from random examples (e.g., linear models). Hence, the natural question is whether there exists effective model extraction defenses against polynomial time adversaries, as is customary in Modern Cryptography. Again, this is because many important models (e.g. decision trees, low-depth neural networks) are not known to be PAC-learnable in polynomial time (on the uniform distribution over discrete inputs, say). On the other hand, a class like decision trees is learnable in polynomial time when using carefully synthesized queries. But, classic membership query algorithms like the Kushilevitz-Mansour algorithm [KM93] (which is used to learn decision trees in polynomial time) make queries that can be tested and identified as problematic by the OMED.
Therefore, Covert Learning algorithms bridge the gap between these two settings; the Covert Learning algorithms constitute attacks that are benign-looking, but are actually adverse. In other words, they are polynomial time membership query PAC-learning algorithms that synthesize queries in a special way that is provably hard to distinguish from the classic PAC-learning with random examples setting. This means that Covert Learning attacks are both undetectable by an (efficient) OMED, and run in polynomial time. On the other hand, The VC dimension argument is undetectable by the OMED, but is not efficient, while the membership query attack by [KM93] is efficient, but is detectable by the OMED.
[1] Note that, the internal randomness is not even shared with the concept oracle. To use an analogy from Cryptography, Covert Learning is "public-key" in nature, as opposed to "symmetric-key," which might rely on shared randomness between the learner and the concept oracle.