A6 | Model Extraction By Querying

Where It Appears In The Edge AI Architecture

A6 sits at the public API layer, but the signal it collects is model behavior across time.

Vulnerable oracle

Full confidence output and no long-session budget make substitute-model agreement rise quickly.

Substitute agreement

Detector confidence

Threat Model

The attacker has API access, not backend shell access.

Attacker capability

The caller can authenticate or otherwise obtain allowed API access, submit many syntactically valid inputs, and observe labels, confidence, latency, and rejection behavior over time.

Assets at risk

Decision boundaries, class behavior, confidence calibration, backend-specific timing patterns, dataset biases, and enough functional behavior to train a substitute model.

Out of scope

No credential attacks, no third-party API scraping, no live high-volume traffic against external systems, and no attempts to extract private training data from real users.

Attack Intuition

One prediction is an answer. Many predictions become a training set.

A6 is the slow conversion of an inference endpoint into an oracle dataset. Each request may look normal, but the sequence is shaped to learn how the model behaves near boundaries, rare classes, or ambiguous inputs.

In your lab, the risk grows because the gateway may route requests across Jetson and Zynq. If responses expose confidence and timing, the caller may learn both the model decision surface and the accelerator path that produced it.

Important safety line: use synthetic inputs, toy classifiers, or owned lab traffic. A6 validation does not require scraping a real production model.

Technical Explanation

Extraction succeeds when output detail and query strategy beat the gateway's measurement controls.

Output richness

Top-k labels, precise confidence values, logits, backend choice, and latency can all improve a substitute model. Rounding and response minimization reduce the information per query.

Query shape

Random queries are noisy. Boundary-seeking, class-balancing, near-duplicate mutation, and uncertainty-driven inputs are more extractive and should look different in D5 telemetry.

Session behavior

Low-rate extraction can hide under simple per-second rate limits. D5 needs longer windows: per-user budgets, entropy, class distribution, input similarity, and repeated confidence-edge hits.

Backend side signal

If Jetson and Zynq paths return different latency distributions, A6 can combine with A18. Constant-time wrapping and route-independent output policy reduce this side channel.

Mathematical View

The attacker trains a substitute model from oracle-labeled samples.

Let f(x) be the protected model exposed through /infer. The attacker builds a dataset D_q = {(x_i, f(x_i), c_i, t_i)} from q queries. Then they train g_theta to minimize: L(theta) = sum_i CE(g_theta(x_i), f(x_i)) + lambda * R(query_pattern_i) Useful leakage increases with q, confidence precision c_i, and timing signal t_i. D5 aims to reduce q, coarsen c_i, and make suspicious query patterns measurable.

For the lab, the most useful metric is not "was the model stolen" as a binary claim. Measure substitute agreement as query count increases, then repeat with rounded confidence, quota windows, and anomaly alerts enabled.

Safe Lab Demonstration

Use a toy oracle and synthetic query log to measure extraction pressure without touching external services.

Run the local Python simulator in vulnerable mode.
Generate synthetic two-dimensional samples and query the toy oracle.
Train a tiny substitute rule from the returned labels.
Repeat with rounded confidence and D5-style budget checks.
Compare substitute agreement, alert count, accepted queries, and confidence precision.

Local Lab Code

A small simulator for extraction measurement and D5 telemetry design.

a6_model_extraction_toy_lab.py

from dataclasses import dataclass
from math import exp
import random

@dataclass
class QueryEvent:
    user: str
    x1: float
    x2: float
    label: int
    confidence: float

def oracle(x1, x2):
    score = 2.4 * x1 - 1.7 * x2 + 0.25
    confidence = 1.0 / (1.0 + exp(-abs(score)))
    label = 1 if score >= 0 else 0
    return label, confidence

def d5_alert(events, window=80):
    recent = events[-window:]
    if len(recent) < window:
        return False, "warming_up"
    edge_hits = sum(0.50 <= e.confidence <= 0.62 for e in recent)
    class_balance = abs(sum(e.label for e in recent) / window - 0.5)
    if edge_hits > 28 and class_balance < 0.18:
        return True, "boundary_seeking_query_pattern"
    return False, "normal"

def run(mode="vulnerable", n=260, seed=7):
    random.seed(seed)
    events = []
    accepted = 0
    alerts = 0

    for i in range(n):
        x1 = random.uniform(-1.5, 1.5)
        x2 = random.uniform(-1.5, 1.5)
        label, conf = oracle(x1, x2)

        if mode == "d5" and accepted >= 140:
            continue

        if mode in {"rounded", "d5"}:
            conf = round(conf, 1)

        event = QueryEvent("research-user", x1, x2, label, conf)
        events.append(event)
        accepted += 1
        flagged, reason = d5_alert(events)
        if flagged:
            alerts += 1

    # Toy agreement estimate for a substitute trained from oracle labels.
    agreement = min(0.52 + accepted / 360.0, 0.91)
    if mode in {"rounded", "d5"}:
        agreement -= 0.08
    if mode == "d5":
        agreement -= 0.10

    return {
        "mode": mode,
        "accepted_queries": accepted,
        "alerts": alerts,
        "estimated_substitute_agreement": round(agreement, 3),
    }

for mode in ["vulnerable", "rounded", "d5"]:
    print(run(mode))

Your Edge AI Setup

A6 is a gateway measurement problem before it is a model problem.

Pi gateway

Collect per-user request counts, input fingerprints, confidence buckets, class distribution, reject reasons, and route metadata without storing raw private payloads.

Jetson and Zynq

Keep backend response shape consistent. Avoid returning backend identity to clients unless the experiment explicitly needs it and the logs mark that condition.

GPU workstation

Use the original model to create a fixed synthetic evaluation set so substitute agreement can be measured across vulnerable and protected modes.

Research output

Plot query count against agreement, alert rate, p95 latency, and accuracy impact from confidence rounding or output minimization.

Observable Signals

Good telemetry separates normal usage from extraction-shaped sessions.

Signal	Why It Matters	Safe Evidence
High query volume over long windows	Extraction can be slow and steady.	Per-token accepted query count and quota decisions.
Boundary-seeking confidence bands	Many samples near low confidence can indicate active learning.	Confidence buckets, rounded before logging if needed.
Input similarity and mutation	Near-duplicate sweeps probe local decision surfaces.	Body hash prefixes, perceptual hash buckets, or synthetic feature bins.
Class distribution shaping	Balanced class harvesting may differ from organic use.	Class histogram per user and per experiment.
Backend timing clusters	Can combine A6 with A18 dispatch inference.	Latency buckets and route policy state, not raw payloads.

Impact Analysis

A6 is mostly confidentiality, with secondary integrity and availability effects.

Confidentiality

The attacker approximates model behavior, confidence calibration, and useful decision boundaries.

Primary

Integrity

A good substitute can support later adversarial-input experiments against the real model.

Secondary

Availability

Extraction traffic can become expensive enough to overlap with A2 resource pressure.

Secondary

Framework Mapping

A6 maps cleanly to oracle abuse and model confidentiality concerns.

Framework	Mapping	Use In Report
STRIDE	Information Disclosure, with DoS as a secondary pressure if query volume is high.	Show how valid API use can still disclose behavior through aggregate outputs.
CIA	Confidentiality primary; Integrity and Availability secondary.	Separate extraction of behavior from stealing model weights.
PASTA	Stage 4 threat analysis, Stage 5 vulnerability analysis, Stage 6 attack modeling.	Model query budget, output precision, and monitoring as control points.
MITRE ATLAS	Relevant to model inference API abuse, ML model extraction, and collection of model responses.	Treat A6 as a behavior-reconstruction experiment under controlled lab conditions.

Defense Mapping To Existing D1-D11 Controls

D5 is primary, but it needs the gateway controls around it.

Control	Role Against A6	Validation Evidence
D5 Query Anomaly Detection	Primary control. Detects extractive query shapes, long-window volume, confidence-edge probing, and unusual class distributions.	Alerts fire on the synthetic extraction run while normal validation runs stay below threshold.
D2 Rate Limit + Body Cap	Constrains query volume and prevents extraction from becoming resource exhaustion.	429 decisions, token budget logs, accepted/rejected counts.
D1 Auth + Object Checks	Links query budgets to identity and prevents anonymous oracle harvesting.	Every query has a user or experiment identity before dispatch.
D6 Sanitized Logging	Preserves enough telemetry for detection without keeping raw inputs or secrets.	Logs include buckets, hashes, user id, route, status, and confidence bin.
D8 Input Validation	Rejects malformed or out-of-domain probes before expensive model execution.	Invalid type, dimension, and schema rejects happen at the gateway.
D9 Constant-Time Dispatch Wrapper	Reduces timing leakage when extraction is combined with backend route inference.	Jetson and Zynq client-visible latency distributions become less distinguishable.

Research Notes

A6 can produce a clean, publishable measurement story.

Experiment curve

Plot substitute agreement against accepted query count for open confidence, rounded confidence, top-1 labels only, and D5 budgeted modes.

Defense cost

Measure false positives, rejected legitimate evaluation traffic, latency overhead, and accuracy impact from output shaping.

Backend comparison

Record whether Jetson and Zynq produce identical client-visible output for the same input and whether timing reveals the route.

Portal artifact

Show a small dashboard with query budget, confidence histogram, route latency buckets, and detector decision for each run.

Key Takeaways

A6 is controlled by reducing information per query and measuring behavior across sessions.

Authentication alone does not stop model extraction; it only gives you an identity to budget and monitor.
Precise confidence, backend identity, and timing can be more useful to an attacker than a single label.
D5 should look at long-window query patterns, not just per-second request rate.
The safest lab demonstration uses synthetic samples and a toy substitute model.
A6 naturally connects to A2 resource pressure and A18 dispatch timing leakage.