Confidential Computing

Verifiable
private AI

Run AI models within hardware-enforced Trusted Execution Environments (TEEs). Replace legal promises with safeguards rooted in silicon.

Read whitepaper

View architecture

Standard AI APIs require a leap of faith. We seal it shut.

Traditional AI infrastructure exposes data to the host memory for processing. Even with encryption in transit and at rest, plaintext data remains completely vulnerable during the inference cycle.

The traditional infrastructure

Pinky-Promise Security

Trust is based on legal agreements (DPAs)

Client

Unencrypted ram

Cloud

Provider can see data

Data encrypted in transit, but decrypted at runtime.

Provider admins can technically access memory dumps.

Protection relies solely on non-compete clauses.

Bad actors can access data in-use.

Prem sovereign cloud

Verifiable Hardware Privacy

Cryptographically verified isolation

Client

Provider is blind to data

Cloud

Encrypted RAM

Insider Threats Mitigated: Host administrators and infrastructure providers have zero access to enclave memory.

Hardware Verification: Cryptographic attestation verifies hardware integrity and enclave state before payload transmission.

Zero Visibility: Complete memory isolation from the host OS and hypervisor.

Features

Confidential compute in action

Client-side encryption

Payloads are encrypted locally using a public key bound to a verified enclave. Data only leaves your device after successful attestation.

Transparency

Open-source codebase and public transparency logs ensure no hidden backdoors.

security-model.rs

Copy

security-model.rs

Copy

Secure execution

Computations occur within isolated environments. Payloads are dynamically decrypted inside the TEE, processed, and the output is re-encrypted before crossing the isolation boundary.

Attestation

Trusted Execution Environments generate a cryptographic proof of its identity and integrity. Your device verifies this proof before sending any data.

How it Works

Prem Confidential Compute (PCC)

Every request is encrypted on your device, processed inside a hardware-isolated enclave, and returned without ever exposing plaintext to infrastructure operators.

The Architecture

The confidential stack

Prem API

End-to-end encrypted (E2EE) agentic chat, enterprise data connectors, and developer SDKs.

Custom hypervisor

Minimal-attack-surface isolation layer strictly optimized to route confidential compute primitives without host-level visibility.

PCC runtime

Enclave-native execution engines optimized for confidential model training, fine-tuning, and inference.

Bare metal infrastructure

Purpose-built hardware utilizing leading TEE ready silicon, operating in Confidential Computing mode. Physically hosted in compliant datacenters.

Prem API

End-to-end encrypted (E2EE) agentic chat, enterprise data connectors, and developer SDKs.

PCC runtime

Enclave-native execution engines optimized for confidential model training, fine-tuning, and inference.

Custom hypervisor

Minimal-attack-surface isolation layer strictly optimized to route confidential compute primitives without host-level visibility.

Bare metal infrastructure

Purpose-built hardware utilizing leading TEE ready silicon, operating in Confidential Computing mode. Physically hosted in compliant datacenters.

Prem API

End-to-end encrypted (E2EE) agentic chat, enterprise data connectors, and developer SDKs.

PCC runtime

Enclave-native execution engines optimized for confidential model training, fine-tuning, and inference.

Custom hypervisor

Minimal-attack-surface isolation layer strictly optimized to route confidential compute primitives without host-level visibility.

Bare metal infrastructure

Purpose-built hardware utilizing leading TEE ready silicon, operating in Confidential Computing mode. Physically hosted in compliant datacenters.

Principles

Three values of PCC security

Confidentiality

Execution state and memory pages never leave the PCC enclaves unencrypted. Hardware-level memory encryption covering the full stack guarantees zero visibility to the host OS, hypervisor, datacenter operators and employees.

CPU as Intel TDX and AMD SEV-SNP

GPU as NVIDIA Hopper and Blackwell

Integrity

Every deployed PCC enclave generates a cryptographic attestation report. This ensures genuine hardware verification and tamper-proof code execution before any data transfer is initiated.

Genuine hardware verification

Tamper-proof code execution

Transparency

Verifiable privacy over vendor policy. Open-source inference codebases combined with immutable public transparency logs provide cryptographic proof of system integrity.

Public transparency log

No hidden backdoors

Hardware

Cryptographic primitives

CPU enclaves

Full memory encryption and strictly enforced hypervisor isolation.

GPU enclaves

Hardware-isolated environments with accelerator support for high-performance confidential workloads.

Attestation

Cryptographic generation and verification of enclave quotes.

Key management

Secure, enclave-bound cryptographic key generation and lifecycle management.

Hardware

Cryptographic primitives

CPU enclaves

Full memory encryption and strictly enforced hypervisor isolation.

GPU enclaves

Hardware-isolated environments with accelerator support for high-performance confidential workloads.

Attestation

Cryptographic generation and verification of enclave quotes.

Key management

Secure, enclave-bound cryptographic key generation and lifecycle management.

Future-Proofing

Post-Quantum Safe Encryption

Built for strict institutional mandates, Prem uses XWing hybrid encryption to defeat "harvest now, decrypt later" attacks. By simultaneously pairing classical (X25519) and NIST-compliant quantum-resistant (ML-KEM 768) algorithms, we guarantee your most sensitive data remains secure against future quantum threats.

Why Prem Confidential Compute?

Feature

Root of Trust

Data-in-Use State

Execution Verification

Assurance Mechanism

GPU & Interconnect Security

Traditional AI

Legal contracts (DPAs) and vendor promises

Vulnerable to memory scraping & hypervisor compromise

Black-box proprietary runtimes

Annual point-in-time manual audits (SOC 2)

Plaintext PCIe transfers and standard GPU execution

Prem AI PCC

Silicon-backed hardware root of trust

Fully encrypted in volatile memory; isolated via TEEs

Deterministic builds with PCR (Platform Configuration Register) matching

Pre-execution remote cryptographic attestation

Hardware-encrypted PCIe & NVLink via NVIDIA Confidential Computing

Root of Trust

Traditional AI

Legal contracts (DPAs) and vendor promises

Prem AI PCC

Silicon-backed hardware root of trust

Data-in-Use State

Traditional AI

Vulnerable to memory scraping & hypervisor compromise

Prem AI PCC

Fully encrypted in volatile memory; isolated via TEEs

Execution Verification

Traditional AI

Black-box proprietary runtimes

Prem AI PCC

Deterministic builds with PCR (Platform Configuration Register) matching

Assurance Mechanism

Traditional AI

Annual point-in-time manual audits (SOC 2)

Prem AI PCC

Pre-execution remote cryptographic attestation

GPU & Interconnect Security

Traditional AI

Plaintext PCIe transfers and standard GPU execution

Prem AI PCC

Hardware-encrypted PCIe & NVLink via NVIDIA Confidential Computing

Root of Trust

Traditional AI

Legal contracts (DPAs) and vendor promises

Prem AI PCC

Silicon-backed hardware root of trust

Data-in-Use State

Traditional AI

Vulnerable to memory scraping & hypervisor compromise

Prem AI PCC

Fully encrypted in volatile memory; isolated via TEEs

Execution Verification

Traditional AI

Black-box proprietary runtimes

Prem AI PCC

Deterministic builds with PCR (Platform Configuration Register) matching

Assurance Mechanism

Traditional AI

Annual point-in-time manual audits (SOC 2)

Prem AI PCC

Pre-execution remote cryptographic attestation

GPU & Interconnect Security

Traditional AI

Plaintext PCIe transfers and standard GPU execution

Prem AI PCC

Hardware-encrypted PCIe & NVLink via NVIDIA Confidential Computing

Frequently asked questions

What is PCC?

PCC offers an OpenAI-compatible inference API. A key feature is the end-to-end encryption of your data, which is then processed securely within hardware-sealed Trusted Execution Environments (TEEs). The plaintext is inaccessible to everyone, including Prem, the cloud provider, and anyone with root access to the machine. This protection is guaranteed at the silicon level by the hardware and can be cryptographically verified using remote attestation. Traditional AI APIs process all input—prompts, files, and conversations—in plaintext on the provider's servers, making the data visible to them. While HTTPS secures data during transit, it offers no protection at the endpoint. PCC safeguards the execution. Same API format, same capabilities. The difference is structural.

Who is PCC designed for?

Policy-based privacy assurances are inadequate for several types of organizations and use cases where data sensitivity is extremely high. These include: Healthcare organizations: Those subject to HIPAA, running patient data through LLMs for tasks like clinical note summarization. Financial institutions: Those handling sensitive data while using AI for risk analysis. Law firms: Those protecting privileged communications when leveraging AI for contract review. Enterprises: Those with trade secrets that cannot be exposed to third parties. AI product developers: Those who need to offer their own customers a verifiable privacy guarantee rooted in architecture, not just terms of service. There is also a forward-looking dimension. As AI agents take on more responsibility across work, health, finance, and travel, a single provider will increasingly consolidate sensitive aspects of daily life, compounding privacy risks. Establishing the correct architectural foundation is critical before this data consolidation occurs.

How does end-to-end encryption work with AI inference?

Data is encrypted on your device using XChaCha20-Poly1305 before transmission. Our gateway processes only the ciphertext and is solely responsible for authentication and billing. The confidential process works as follows: the encrypted payload is first sent to a Confidential Virtual Machine (CVM). Inside the CVM, the data is decrypted, processed by the model, and then re-encrypted. This re-encrypted response then leaves the CVM and is delivered to your device, where it is decrypted locally. Plaintext data is securely contained in only two locations: your device and the sealed hardware environment. The SDK manages this process seamlessly, allowing your code to utilize standard API calls.

Can the Prem team access my data?

No. The enclave images contain no SSH access, no debug ports, and no administrative backdoors. All machines operate unattended. The code running inside each enclave is measured by hardware — altering a single byte causes attestation to fail before any data is transmitted. This is not a policy we follow. It is a constraint the hardware imposes. A rogue employee would have no mechanism to access TEE memory. One important caveat: the proxy does observe certain metadata — request timestamps, payload sizes, and which API key is making calls. The actual content, however, is always encrypted.

Is my data used to train or improve models?

Accessing your data is physically impossible. Your information only exists as plaintext within the secure enclave for the duration of your request and is immediately wiped from memory upon completion. We do not use your data for training, fine-tuning, or plaintext logging. We view your data not as an asset, but as a liability, which we have architecturally eliminated.

What is attestation and why does it matter?

Attestation removes the need to take our word for anything. It is hardware-signed proof of exactly what code is running inside the enclave. Your device sends a random challenge to the TEE. The hardware (not our software) generates a signed report containing a fingerprint of every loaded component, the platform’s security configuration, and your challenge value, which proves freshness. The signature chain is rooted in the chip manufacturer — AMD, Intel, or NVIDIA. It cannot be forged in software. Attestation is bound to each inference request through a session-based mechanism. Each session carries a unique X-Session-Id header with a five-minute TTL and single-use constraint, ensuring that every request is routed to a verified enclave and preventing replay attacks. Alter a single byte of enclave code, and the fingerprint changes. Verification fails on your side before any data is transmitted.

Can I verify security claims independently?

Yes. The SDK performs attestation verification by default before every request. Crucially, you don't need to trust the SDK itself. The entire attestation verification stack is written in Rust, compiled statically to WebAssembly, and executes directly within your browser. This design allows you to validate the hardware manufacturer's signatures directly on your own machine, eliminating the need for a server. We are also the first to offer NVIDIA GPU attestation directly from a web browser — you can verify that the GPU processing your data is operating in confidential compute mode, from your own device, without relying on any intermediary. The verification stack is open-source, and we contribute back to the broader confidential computing ecosystem.

Is this truly "end-to-end encrypted" if the TEE sees plaintext?

This is a fair and important question. In traditional E2EE systems, only the two communicating parties see plaintext. In this architecture, the AI model is one of those parties — it must read your data to process it. The TEE is the sealed environment where that processing occurs. The "ends" are your device and the hardware-isolated enclave. Inside the Trusted Execution Environment (TEE) enclave, plaintext data is held only in encrypted memory, which is inaccessible to external components. This includes the operating system, the hypervisor, and Prem. Any data transfer between the two endpoints is entirely protected as ciphertext. While we explicitly acknowledge the theoretical risk of data exposure if a TEE were compromised via a hardware side-channel attack, TEEs remain the most practical solution for encrypted AI inference today. Alternative methods, such as fully homomorphic encryption, are currently impractical due to performance (approximately 100 times slower for inference), and running models locally often requires specialized hardware that most organizations lack. TEEs provide a viable and working architecture.

If TEEs are effective, why are they not more widely adopted?

Adoption is currently limited by several factors: a shortage of dedicated tools, lack of end-to-end solutions, limitations in storage, and the ongoing maturity of GPU confidential compute support. A significant effort was required to build the foundational stack, including attestation libraries, browser-based verification, encrypted inference pipelines, and Confidential Virtual Machine (CVM) orchestration, most of which were built from the ground up. Prem has absorbed the burden of this infrastructure development, ensuring our customers can bypass this complex setup.

Does encryption add latency?

No, the impact is negligible. The primary constraint on AI inference speed is the rate of token generation by the large language model (LLM), which is typically around 100 to 200 tokens per second. In contrast, the XChaCha20-Poly1305 cryptographic operations can process well over 100,000 tokens per second. Therefore, the encryption overhead is insignificant compared to the overall inference time. The CVM imposes a negligible compute overhead, adding approximately 5% on current hardware, which is expected to decrease with newer GPU architectures like NVIDIA Blackwell. A brief, one-time delay occurs during the initial session handshake due to the key exchange and attestation process, which requires an additional round trip. Following this initial step, the streaming performance is virtually indistinguishable from that of an unencrypted API.

Which compliance frameworks does PCC address?

PCC's architecture—featuring end-to-end encryption, hardware isolation, and zero operator access—streamlines compliance with major global privacy and security frameworks like the EU AI Act, GDPR, and ISO 27001. Its core value is eliminating the primary compliance challenge: exposing plaintext data to the AI provider. This built-in privacy addresses stringent data governance requirements in high-risk sectors, prevents exposure of sensitive data such as PHI (HIPAA) and EU personal data (GDPR), and aligns with security criteria for B2B SaaS (SOC 2). This inherent design supports specific regulatory needs across multiple jurisdictions and industries. For instance, it helps financial institutions meet DORA requirements for operational resilience and confidentiality, satisfies data processing rules for Swiss organizations (nFADP), and supports security control objectives for defense and finance (ISO 27001) through hardware-enforced access controls.

What are the known limitations of PCC?

There are three areas we want to be transparent about: Side-channel attacks on TEE hardware. These vulnerabilities exist across the industry. Researchers discover them, manufacturers issue patches, and attestation reports include firmware versions so you can verify patch status. Our mitigation approach includes prompt firmware updates, a post-quantum encryption layer that protects data in transit regardless of TEE state, and continuous monitoring of CVE databases and confidential computing security research. Compromised chip manufacturers. If AMD, Intel, or NVIDIA were to lose control of their root signing keys, attestation guarantees would weaken. This is a shared trust anchor for the entire confidential computing ecosystem and not something any single vendor can mitigate independently. Metadata exposure. The proxy observes request timing, payload sizes, API key identifiers, and rate limit counters. Content is always encrypted, but traffic analysis on metadata remains theoretically possible.

Why does PCC use post-quantum cryptography?

To protect against the threat of "harvest now, decrypt later" attacks, PCC employs a quantum-resistant solution. This attack involves an adversary targeting today's encrypted data that still requires protection in future and waiting for a cryptoanalytically-relevant quantum computer powerful enough to break many of the current cryptographic products and protocols, essentially decrypting all the recorded data retroactively. PCC addresses this by using XWing, a hybrid key encapsulation mechanism. XWing simultaneously pairs a quantum-resistant algorithm, ML-KEM768 (which is NIST FIPS 203 compliant), with a classical one, X25519. An attacker would therefore need to successfully break both algorithms to compromise the security. A future quantum computer may defeat X25519, but ML-KEM768 prepares the system to withstand quantum attacks. Two independent locks on one door. We selected XWing as our hybrid Key Encapsulation Mechanism (KEM) due to its strong industry momentum. It was co-authored by researchers from Cloudflare, SandboxAQ, and Radboud University, is recommended by Google Cloud, and is implemented in Cloudflare's CIRCL cryptography library. The specification is an IETF Internet-Draft (draft-connolly-cfrg-xwing-kem), built entirely on NIST-standardized primitives. XWing is designed to be simple, opinionated, and resistant to misconfiguration. This approach emerged from an initial need to manage persistent data securely within TEE environments. We recognized that it could be applied to the communication layer as well, providing a security solution that doesn't rely solely on potentially variable TLS configurations and implementations across different systems.