In part 1 of this blog series, we introduced the model extraction scenario first, and then gave an overview of a specific type of model extraction defense, called an Observational Model Extraction Defense (OMED), focusing on the theoretical underpinnings of these defenses. We then outlined the technical contributions of [Kar23], which, briefly speaking, cast a lot of doubt on whether or not these OMEDs can provide any provable security guarantees against the model extraction problem. In part 2 of the blog series (this blog post), we will delve into what exactly is Covert Learning [CK21] — the mysterious learning algorithm that lies at the center of the results of [Kar23] regarding the impossibility of efficiently defending against model extraction attacks using OMEDs.
Let's explore the connection between model extraction in the presence of OMEDs and the Covert Learning model [CK21]. This blog post will assume a basic familiarity with computational learning theory, specifically the Probably Approximately Correct (PAC) learning model.
The Covert Learning model is a variant of Valiant's PAC-learning model, situated in the agnostic learning with queries setting. In agnostic learning with queries, a learning algorithm has access to an oracle that labels points in the domain of the target concept. These points can be chosen arbitrarily by the learning algorithm. In the "agnostic" setting, the goal is for the learning algorithm to identify a model (within some predefined "touchstone" class) that approximates an arbitrary target concept implemented by the oracle approximately as well as the best possible model contained in the touchstone class. The agnostic setting mirrors the way machine learning is traditionally conducted in practice (as opposed to the "realizable" setting, where the target concept is assumed to be contained by a predefined concept class).
The essence of the Covert Learning model is to formalize a new type of privacy in learning theory. Specifically, Covert Learning algorithms provide the guarantee that the queries leak very little information about the concept, with respect to an adversary that can observe the queries. In this way, the Covert Learning algorithm gains privacy for what it is learning.
In other words, Covert Learning allows one to learn a concept (using knowledge of some internal randomness only known to itself),[1] while the adversary remains nearly completely "in the dark" about the concept, even given a view of the entire transcript of membership queries and oracle responses. Crucially, the adversary is not privy to the secret randomness of the learning algorithm.
An example to build intuition about Covert Learning is as follows. Imagine being in a course with one teacher and many classmates, where you are graded "on a curve" (meaning the goal is to finish ranked first in the class). To achieve this, you need to learn the day's concept by asking questions of the teacher (for simplicity, restrict to "yes or no" questions). The catch is that every one of your classmates observes your questions and the teacher's answers. Can you design a unique set of questions that allows you to learn the day's concept while preventing all your classmates from learning the concept similarly? Covert Learning algorithms provide a solution to this problem under some natural assumptions (to be explained later).
So, how do we model such a question mathematically? Covert Learning does this by borrowing from the simulation-based security paradigm from the field of Cryptography.
Roughly speaking, any membership query learning algorithm is a Covert Learning algorithm if it has an accompanying simulator (i.e., a polynomial-time algorithm) which, when given access only to random examples, emulates the distribution over the membership queries used by the Covert Learning algorithm in a computationally indistinguishable way. In other words, the simulator can produce a "believable" transcript of the interaction between the learning algorithm and the query oracle, without query access to the underlying concept, but only (for instance) uniformly random examples.
For a concept class where access only to uniformly random examples makes learning computationally hard, this proves that the learning transcript reveals very little information to a computationally bounded adversary. Loosely speaking, this comes from the following logic:
These two points together lead to a contradiction, so we conclude that any adversary can't obtain any knowledge about the concept from the queries. Hence, Covert Learning algorithms are reserved for concept classes that are not (known to be) efficiently learnable in the original PAC-model with respect to a certain "hard example distribution." There are many such concept classes, such as decision trees and low-depth neural networks, which have so far evaded the computational learning theory community's attempts at designing polynomial-time PAC learning algorithms.
Now, I want to highlight a special case of Covert Learning, called natural Covert Learning, which is often easier to think about than the simulation-based version. In Natural Covert Learning, rather than requiring a simulator for the query interaction, we require that the distribution over the membership queries is computationally indistinguishable from a single pre-defined "hard example distribution." For concreteness, let us consider the uniform distribution as the pre-defined "hard distribution." Also, in Natural Covert Learning, we will consider the realizable learning PAC setting (rather than agnostic).
Now we're finally ready to draw the connection between (Natural) Covert Learning and the model extraction problem.
The adversary in natural Covert Learning — a distinguisher that attempts to distinguish the membership queries made by the learner algorithm from random examples — is essentially the "adversary" in a model extraction attack — an OMED.
At first glance, this might seem confusing because normally the adversary in model extraction is supposed to be the malicious client trying to reverse-engineer the model. However, consider that the job of the OMED is to distinguish benign clients from adverse clients. Thus, from the perspective of the actual adversary (i.e., the model extractor), the OMED is adversarial in nature and has essentially the same job as the natural Covert Learning adversary: to detect some property about the client's distribution of queries. This becomes even more apparent when considering the similarities between Figure 1 (from Part 1) and Figure 2.
By exploiting this connection, we can demonstrate that natural Covert Learning algorithms form a recipe for an attack that can fool the OMED in the same way they fool the Covert Learning adversary. Additionally, the learning guarantee of natural Covert Learning allows us to argue that the client can still extract the underlying model. Hence, we can show that a single Covert Learning attack can achieve extraction while "fooling" any OMED. In other words, any OMED will output "accept" with high probability even though an extraction attack is being performed.
To wrap up part 2 of this blog series, I want to emphasize why and how Covert Learning algorithms are at the crux of model extraction attacks in the presence of OMEDs. We touched on the fact that all models are at risk of extraction due to their exposure via some interface. Again, the VC dimension argument outlined in part 1 of the blog series shows that, given enough even uniformly random examples of the input/output behavior of the model and sufficient computational power, an adversary has a PAC-learning algorithm that would work for extraction. Additionally, since the PAC-learning algorithm only uses random examples, the model extraction adversary can choose to make queries according to a distribution that is "accepted" by the OMED.
However, the VC dimension argument generally relies on unbounded computational power, and only very simple concept classes are known to be PAC-learnable in polynomial time from random examples (e.g., linear models). Hence, the natural question is whether there exist effective model extraction defenses against polynomial-time adversaries, as is customary in Modern Cryptography. This is because many important models (e.g., decision trees, low-depth neural networks) are not known to be PAC-learnable in polynomial time (on the uniform distribution over discrete inputs, say). On the other hand, a class like decision trees is learnable in polynomial time when using carefully synthesized queries. But classic membership query algorithms like the Kushilevitz-Mansour algorithm [KM93] (used to learn decision trees in polynomial time) make queries that can be tested and identified as problematic by the OMED.
Therefore, Covert Learning algorithms bridge the gap between these two settings; Covert Learning algorithms constitute attacks that are benign-looking but are actually adverse. In other words, they are polynomial-time membership query PAC-learning algorithms that synthesize queries in a special way that is provably hard to distinguish from the classic PAC-learning with random examples setting. This means that Covert Learning attacks are both undetectable by an (efficient) OMED and run in polynomial time. On the other hand, the VC dimension argument is undetectable by the OMED but is not efficient, while the membership query attack by [KM93] is efficient but is detectable by the OMED.
[1] Note that the internal randomness is not even shared with the concept oracle. To use an analogy from Cryptography, Covert Learning is "public-key" in nature, as opposed to "symmetric-key," which might rely on shared randomness between the learner and the concept oracle.