Generative AI (GenAI) is now an integral feature in the search engines, operating systems and business applications we use every day. While there’s no denying its value, convenience and accessibility has made ‘shadow AI’ the default adoption pattern, rather than the exception. When employees can easily upload sensitive, proprietary data into an ever-growing range of unmonitored, public third-party tools, it becomes a primary data-loss vector.
Five ways sensitive data escapes through the GenAI prompt
As with any third-party software, it’s essential to understand how vendors handle security and privacy. These aspects should never be taken for granted, not least because many GenAI tools repurpose your inputs as training data by default, meaning that potentially sensitive or proprietary information could resurface in response to someone else’s prompt later on. Moreover, many public GenAI tools have usage policies that grant vendors access to conversations for human review. That’s why protecting sensitive data must start with the prompt, to avoid situations like the following:
- Direct pasting of sensitive text like user credentials, contract language, source code and other proprietary information that may be used as training data or visible to a vendor’s employees.
- File uploads like spreadsheets, screenshots, proprietary design documents and even audio can contain sensitive information that may be copied, archived or repurposed for model training.
- Connectors and APIs that give copilots access to email, documents, repos and other assets can lead to excessive data exposure and may breach non-disclosure agreements.
- Model outputs can repackage sensitive inputs that then get pasted into email, chat messages or external documents, dramatically expanding blast radius.
- Prompt injection attacks happen when a malicious user deliberately tries to override an AI’s built-in safeguards, forcing it to perform unauthorized actions.
With the exception of prompt injection attacks, which are deliberate by nature, most instances of data loss through GenAI are entirely accidental, resulting from a failure to understand the inherent security and privacy vulnerabilities of these tools.
Guardrails that work: warn, justify, block (and when to use each)
A prompt is basically data in motion, so it should be treated like any other transmission of data, such as email or instant messaging. However, given most vendors repurpose user inputs as training data by default and most usage policies grant the vendor’s employees access to that data for human review, GenAI conversations are often significantly less private than other channels. This is especially the case with free subscription tiers and integrated GenAI tools, which generally don’t offer commercial data protection and fall outside the oversight of IT.
Ideally, organizations should only ever allow the use of approved, commercial-grade GenAI tools, since they provide some vital built-in protections. These alone shouldn’t be taken for granted though, but neither are traditional data loss prevention (DLP) policies and controls sufficient by themselves. After all, conventional DLP solutions generally don’t work well for unstructured content, like prompts and attachments, at scale.
Banning widely used tools isn’t a practical alternative either, since it hurts productivity and encourages people to use unregulated alternatives like personal copilot accounts. Instead, businesses must clearly define which GenAI tools teams are allowed to use for work before applying a universal, prompt-aware policy that covers every interaction.
Of course, there’s more to closing the GenAI security gap than simply imposing restrictions. It’s just as much about real-time training and awareness, hence the need for a three-layer policy model that considers user intent, the channel and the response. Here’s what such a policy should look like and how a GenAI-ready DLP solution can help:
- Warn: If a prompt contains low- to medium-risk data, like unpublished marketing collateral, the user might receive a warning. This helps with real-time user coaching to continually build awareness in everyday use cases.
- Justify: Prompts that contain non-public information, like pre-launch product information or confidential (but non-regulated) internal information should ask users to provide additional context to justify their actions. This shapes safer behavior without being overly restrictive.
- Block: Some data, like corporate secrets, credentials and regulated information, should never be allowed in a prompt. Hard enforcement should always be applied to these high-impact scenarios, regardless of user intent and channel.
For maximum effectiveness, organizations should enforce these policies wherever work happens using a single policy plane across all approved tools and channels. Ultimately, secure AI enablement is a governance program, where organizations use unified security solutions like Palo Alto Network’s Prisma SASE to proactively monitor and control the channels they use, instead of reactively chasing incidents after the fact. Continue learning more in The CISO’s Guide to Securing Data in the AI-ready enterprise ebook.