A1 | Unauthenticated Inference Endpoint

Where It Appears In The Edge AI Architecture

A1 sits exactly at the public-to-gateway boundary, before validation and accelerator dispatch.

Anonymous request allowed The Pi dispatches the request even when Authorization is missing.

Threat Model

The attacker is an untrusted network client that can reach only the public gateway route.

Attacker capability

The attacker can send normal HTTP requests to the public inference endpoint, vary request bodies, observe status codes, latency, and predictions, and repeat requests over time. They do not begin with SSH, device shell access, switch admin access, or physical access.

Security assumption being tested

The Pi gateway is supposed to be the choke point. If it accepts inference without authentication, the private Jetson and Zynq remain network-private but still become indirectly reachable compute resources.

Assets at risk

Model outputs, confidence scores, backend timing, accelerator capacity, internal routing policy, user result objects, and research-grade measurements such as Jetson versus Zynq latency fingerprints.

Out of scope

No third-party endpoint testing, credential stuffing, SSH brute force, packet capture against uninvolved systems, or traffic generation outside an owned isolated lab. This article uses toy code and synthetic inputs.

Attack Intuition

Inference endpoints look like ordinary API routes, but they expose expensive model behavior.

A1 is not subtle: it is the absence of a decision that should happen before any meaningful work. The route receives an image, parses it, selects an accelerator, and returns a prediction without first binding the request to a caller identity. That turns the gateway into a public model oracle.

The direct impact is anonymous inference. The research impact is larger: once anonymous callers can query the model, they can measure confidence distributions, build low-rate extraction datasets, compare backend timing, and consume accelerator queues in ways that later attacks use as a foundation.

Safe framing: the demonstration below is only a localhost simulator. It teaches the control failure and the expected logs without touching real network services unless you deliberately deploy it inside your own isolated lab.

Technical Explanation

A1 is an authorization precondition failure on the gateway route.

Vulnerable sequence

Client sends POST /api/v1/infer with a syntactically valid image or JSON body.
Gateway parses the request and checks only route, method, and body shape.
Gateway dispatches to Jetson or Zynq and returns a prediction.

Correct sequence

Gateway rejects missing or invalid Authorization before parsing expensive bodies.
Gateway binds user id, tenant, quota, and result ownership to the request.
Only then does it validate input and dispatch to a backend.

Why private backends still matter

The Jetson and Zynq may have no public IP, but a permissive Pi route acts as a proxy. Backend isolation reduces direct exposure, but it does not protect the model oracle if the public gateway permits anonymous work.

Mathematical Formulation

The core defect can be expressed as an allow predicate missing the identity term.

Allow_vulnerable(r) = Reachable(r) * RouteValid(r) * BodyValid(r) Allow_hardened(r) = Reachable(r) * RouteValid(r) * AuthValid(r) * Authorized(user, object) * BodyValid(r) OracleExposure = sum_i Allow(r_i) * Info(prediction_i, confidence_i, latency_i) BackendCost = sum_i Allow(r_i) * [c_parse + c_dispatch + c_backend(accel_i)]

In A1, AuthValid(r) is effectively replaced by 1. Every syntactically valid request receives the same route treatment whether it came from a known operator, a test script, or an anonymous client. The measurable security question is how much information and compute the endpoint exposes per unauthenticated request.

Step-By-Step Safe Lab Demonstration

Use the simulator below on localhost. It never contacts your Jetson, Zynq, Pi, or any third-party target.

Save the Python code from the next section as a1_local_lab.py on your workstation.
Start the vulnerable simulator with python3 a1_local_lab.py --mode vulnerable.
In another terminal, run the localhost client command shown below. The missing token should still produce a synthetic prediction.
Stop the server and restart it with python3 a1_local_lab.py --mode hardened.
Repeat the request without a token. The expected result is 401 before simulated inference work.
Repeat with the demo token. The expected result is 200, a synthetic prediction, and a log line that includes user identity without logging the request body.

Interactive toy replay

Click the local replay button to see the difference between the vulnerable and hardened flow without running the code.

ready: no requests replayed yet

Full Code For A Local Simulated Lab

The code binds to 127.0.0.1 only and uses synthetic inference. Keep it in an owned lab environment.

a1_local_lab.py

#!/usr/bin/env python3
"""
A1 local-only simulator: unauthenticated inference endpoint.

This server intentionally binds to 127.0.0.1 and returns synthetic labels.
Use it only on your own workstation or isolated lab host.
"""

import argparse
import hashlib
import json
import time
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer

DEMO_TOKEN = "Bearer lab-token-a1"
MAX_BODY_BYTES = 64 * 1024


def synthetic_inference(payload):
    digest = hashlib.sha256(payload).hexdigest()
    score = int(digest[:4], 16) / 65535.0
    backend = "jetson" if int(digest[4:6], 16) % 2 == 0 else "zynq"
    label = "synthetic-cat" if score >= 0.5 else "synthetic-dog"
    latency_ms = 42 if backend == "jetson" else 35
    time.sleep(latency_ms / 1000.0)
    return {
        "label": label,
        "confidence": round(max(score, 1 - score), 4),
        "backend": backend,
        "latency_ms": latency_ms,
        "trace_id": digest[:12],
    }


class GatewayHandler(BaseHTTPRequestHandler):
    server_version = "A1LocalGateway/1.0"

    def log_message(self, fmt, *args):
        print("%s - %s" % (self.log_date_time_string(), fmt % args))

    def _send_json(self, status, document):
        encoded = json.dumps(document, indent=2).encode("utf-8")
        self.send_response(status)
        self.send_header("Content-Type", "application/json")
        self.send_header("Content-Length", str(len(encoded)))
        self.end_headers()
        self.wfile.write(encoded)

    def do_POST(self):
        if self.path != "/api/v1/infer":
            self._send_json(404, {"error": "not found"})
            return

        declared_length = int(self.headers.get("Content-Length", "0"))
        if declared_length > MAX_BODY_BYTES:
            self._send_json(413, {"error": "body too large"})
            print("event=reject reason=body_too_large bytes=%s" % declared_length)
            return

        if self.server.mode == "hardened":
            auth = self.headers.get("Authorization", "")
            if auth != DEMO_TOKEN:
                self._send_json(401, {"error": "missing or invalid bearer token"})
                print("event=reject reason=missing_auth route=/api/v1/infer")
                return
            user_id = "lab-user-001"
        else:
            user_id = "anonymous"

        body = self.rfile.read(declared_length)
        if not body:
            self._send_json(400, {"error": "empty body"})
            return

        result = synthetic_inference(body)
        self._send_json(200, {"user": user_id, "prediction": result})
        print(
            "event=infer mode=%s user=%s backend=%s latency_ms=%s body_sha256=%s"
            % (
                self.server.mode,
                user_id,
                result["backend"],
                result["latency_ms"],
                hashlib.sha256(body).hexdigest()[:16],
            )
        )


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--mode", choices=["vulnerable", "hardened"], default="vulnerable")
    parser.add_argument("--port", type=int, default=8080)
    args = parser.parse_args()

    server = ThreadingHTTPServer(("127.0.0.1", args.port), GatewayHandler)
    server.mode = args.mode
    print("A1 local lab listening on http://127.0.0.1:%s mode=%s" % (args.port, args.mode))
    print("Demo token for hardened mode: %s" % DEMO_TOKEN)
    server.serve_forever()


if __name__ == "__main__":
    main()

localhost client checks

# Terminal 1: intentionally vulnerable toy service
python3 a1_local_lab.py --mode vulnerable

# Terminal 2: missing Authorization is incorrectly accepted
curl -s -X POST http://127.0.0.1:8080/api/v1/infer \
  -H "Content-Type: application/octet-stream" \
  --data-binary "synthetic-image-bytes"

# Terminal 1: hardened toy service
python3 a1_local_lab.py --mode hardened

# Missing Authorization should be blocked before synthetic inference
curl -i -s -X POST http://127.0.0.1:8080/api/v1/infer \
  -H "Content-Type: application/octet-stream" \
  --data-binary "synthetic-image-bytes"

# Demo token succeeds in the local simulator
curl -s -X POST http://127.0.0.1:8080/api/v1/infer \
  -H "Authorization: Bearer lab-token-a1" \
  -H "Content-Type: application/octet-stream" \
  --data-binary "synthetic-image-bytes"

Practical Example For Your Edge AI + FPGA + Jetson Setup

The concrete risk is anonymous use of private accelerators through the Pi gateway.

Expected topology

Client traffic reaches the Raspberry Pi on the public VLAN. The Pi has a second interface on the private subnet and dispatches to Jetson Orin Nano and Zynq-7020 services. The switch, VLANs, and private subnet prevent direct client access to backend ports.

Client VLAN Pi gateway Jetson + Zynq private

A1 failure mode

If the Pi route accepts requests without a bearer token, the client does not need to reach 192.168.30.x directly. The Pi becomes the proxy that turns private accelerator services into public model compute.

POST /api/v1/infer no Authorization D1 missing

For your research portal, treat A1 as a baseline experiment: first prove that missing identity is blocked, then measure how the same input travels through the Jetson and Zynq paths once identity, quota, and object ownership are enforced. This makes the gateway a clean control point for later A6 model extraction and A18 timing side-channel studies.

Observable Signals Or Logs

A1 should be visible before backend dispatch.

Signal	Vulnerable observation	Hardened observation
HTTP status	200 for missing Authorization	401 or 403 before parsing expensive body
Gateway log	event=infer user=anonymous backend=jetson or zynq	event=reject reason=missing_auth route=/api/v1/infer
Backend log	Jetson or Zynq receives anonymous workload through Pi	No backend call for unauthenticated request
Latency	Response includes backend compute time	Short reject path, no accelerator timing signal
Metrics	Unauthenticated requests consume queue slots	Reject counters rise, queue depth unchanged

Impact Analysis

A1 is a simple bug with broad downstream leverage.

Confidentiality

Predictions, confidence values, class lists, and backend timing leak to anonymous callers. The model behaves as an oracle even if the model artifact is never downloaded.

Integrity

Unauthenticated callers can influence downstream decisions if inference output is consumed by an automated process. This is especially risky when model output gates access or triggers actuation.

Availability

Even low-complexity request loops can consume parse time, queue slots, GPU cycles, or FPGA accelerator windows. D2 reduces this, but D1 should prevent anonymous use first.

Mapping To CIA, STRIDE, PASTA, And MITRE ATLAS

Use these mappings as research labels for experiments and reports.

Framework	A1 mapping	Research interpretation
CIA	Confidentiality, Integrity, Availability	Anonymous model oracle, unauthenticated decision influence, free compute consumption.
STRIDE	Spoofing, Information Disclosure, Denial of Service, Elevation of Privilege by policy bypass	The caller does not prove identity but receives privileged inference service.
PASTA	Stage 2 technical scope, Stage 4 threat analysis, Stage 5 vulnerability analysis, Stage 6 attack modeling	Define the inference route as an asset, model unauthenticated access, then validate D1 controls experimentally.
MITRE ATLAS	Relevant to AI service discovery, ML model access, inference API abuse, and collection of model responses for later extraction	A1 is usually an enabling condition rather than the final objective. It can support later model extraction, reconnaissance, and timing studies.

Defense Mapping To Existing D1-D11 Controls

D1 is the primary fix, but several controls make A1 measurable and harder to exploit.

Control	Role against A1	Validation
D1 JWT Auth + Object Checks	Reject missing or invalid bearer tokens before inference; bind user identity to result ownership.	Missing token returns 401; cross-user result reads return 403 or 404.
D2 Rate Limit + Body Cap	Limits blast radius if an unauthenticated or invalid request reaches the gateway edge.	Oversized body returns 413; excess request rate returns 429 before backend dispatch.
D5 Query Anomaly Detection	Detects extraction-like volume, entropy, drift, and timing probes after authentication is enforced.	Metrics flag unusual query distributions per user and per token.
D6 Sanitized Logging	Captures enough to prove reject behavior without storing tokens or request bodies.	Logs contain route, status, user id or anonymous marker, body hash, and backend only when dispatched.
D4 Private Backend Subnet	Does not fix A1 directly, but prevents bypassing the gateway to reach Jetson or Zynq.	Client VLAN cannot connect directly to backend service ports.

Research Notes: What To Measure Experimentally

A1 is a useful baseline because the expected security transition is crisp.

Before and after D1

Measure unauthenticated status code distribution, reject latency, backend queue depth, Jetson and Zynq request counts, and gateway CPU time. The key proof is that missing-auth requests stop before backend dispatch.

Backend timing leakage

With authentication enabled, compare whether authorized users can still infer dispatch choice from response latency. This becomes an A18 experiment, but A1 establishes the initial model-oracle risk.

Object ownership

Verify that a token grants access only to that user's results. A1 and A3 should be tested together because login without object checks still leaks stored predictions.

Operational overhead

Quantify D1 overhead: JWT verification latency, cache hit ratio for key material, false reject rate, and throughput impact under normal authorized load.

Key Takeaways

The gateway must prove identity before it does expensive or sensitive work.

A1 turns the Pi gateway into a public model oracle even when Jetson and Zynq remain on a private subnet.
D1 should execute before body parsing, model dispatch, result creation, or object lookup.
The most useful experiment is a before/after trace showing that unauthenticated requests no longer reach backend logs or accelerator queues.
A1 is a foundation risk for A3 broken object authorization, A6 model extraction, A18 timing side-channel studies, and A2-style availability abuse.