Abstract

Hidden LLM prompts have appeared in online documents with increasing frequency. Their goal is to trigger indirect prompt injection attacks while remaining undetected from human oversight, to manipulate LLM-powered automated document processing systems, against applications as diverse as résumé screeners through to academic peer review processes. Detecting hidden LLM prompts is therefore important for ensuring trust in AI-assisted human decision making.

This paper presents the first principled approach to hidden LLM prompt detection in structured documents. We implement our approach in a prototype tool called PhantomLint. We evaluate PhantomLint against a corpus of 3,402 documents, including both PDF and HTML documents, and covering academic paper preprints, CVs, theses and more. We find that our approach is generally applicable against a wide range of methods for hiding LLM prompts from visual inspection, has a very low false positive rate (approx.\(0.092\%\)), is practically useful for detecting hidden LLM prompts in real documents, while achieving acceptable performance.

1 Introduction↩︎

Large Language Models (LLMs) are being increasingly deployed or investigated for document processing and analysis, including document summarisation [1], assessment [2], [3], and decision-making [4]. The risks of indirect prompt injection [5], in which a document intentionally contains LLM instructions that attempt to manipulate how it is processed by an LLM, have been well noted in these kinds of settings [6]. These kinds of attacks naturally degrade the performance of AI-assisted human decision making [7].

Recently, we have seen an increasing trend towards authors attempting to hide embedded LLM instructions within their documents [8]–[10], for instance by rendering them using white-coloured text. This makes them invisible to humans, while ensuring they remain visible to LLMs. Indeed, documents with hidden LLM instructions can be seen as akin to adversarial examples [11]: adding hidden LLM instructions to a document leaves it unchanged to human eyes, yet may dramatically alter how it is processed by an LLM.

Therefore, it is important to be able to detect hidden LLM instructions in structured documents. Doing so is challenging for a few reasons.

Firstly, any such detection must have a low false-alarm rate; otherwise it will induce alert fatigue and be discarded in practice. There has been much work on detecting LLM prompts and prompt injection attacks in text [12]–[14]. However, text-based analysis methods are—by definition—incapable of distinguishing a document with visible LLM instructions from one with hidden instructions. Such methods alone are therefore insufficient to detect hidden LLM prompts in documents, because they will inevitably produce false alarms.

Secondly, there are myriad ways to hide text in structured documents (see 2.2), such that it is visible to LLMs but invisible to human overseers, and adversaries are motivated to continually discover new ways to do so. Therefore, any detection method must be agnostic to the specific way in which LLM instructions have been hidden.

Ideally, the detection method would also be applicable across different document formats, and compatible with existing prompt injection text detection methods.

In this paper we present the first principled and general-purpose hidden LLM prompt detection method (3), designed to achieve the above aims. We prototype our method in a tool, PhantomLint, whose current implementation (4) handles PDF and HTML documents (other formats like Word can be supported by first converting to PDF, for instance). We evaluate PhantomLint (5) against 3,402 documents, both PDF and HTML, to understand the generality of our approach, its false alarm rate, and its utility when applied to real world documents containing embedded LLM prompts. These include academic paper preprints, CVs, theses and more.

Our evaluation confirms that our approach is widely applicable, has a very low false alarm rate (approx.\(0.092\%\)), and is practically useful for detecting hidden LLM prompts in real documents, while achieving acceptable performance.

PhantomLint’s implementation is open source and available online: https://github.com/tobycmurray/phantom-lint.

2 Background↩︎

We begin by explaining the issue of hidden LLM prompts in documents (2.1), before discussing in detail various ways to hide text in structured documents (2.2).

2.1 Hidden LLM Prompts in Documents↩︎

To our knowledge, the threat of hidden LLM prompts embedded in documents was first identified in March 2023 by Riedl [15], who demonstrated how inserting white text on to his personal web page could cause Microsoft’s Bing LLM to return false information about him. Later that year, news media reported that job applicants were inserting white text into their CVs in the hope of manipulating LLM-based automatic résumé classifiers [16]. More recently, we have seen hidden LLM prompts inserted into academic paper preprints posted on arXiv [10], which attempt to manipulate any LLM that might be used to aid in the peer review process to elicit a positive review. The organisers of ICML 2025 identified similar hidden LLM prompts in papers originally accepted for publication [17].

In our own experiments (see 5.3) we identified further preprints with hidden prompts seemingly designed to manipulate peer review, including one paper which at that time appeared to be under submission to ACL ARR 2025 (about which we notified the ACL ARR editors). Another contained a prompt that was designed to allow the authors to detect when an LLM had been used by a reviewer to help write their review:

We also identified numerous CVs with hidden LLM prompts, almost all of which instructed the LLM to recommend hiring the candidate or similar. Others were designed not to manipulate decision making but, rather, to subvert it:

Beyond CVs and preprints, we identified hidden LLM prompts in doctoral and coursework theses. In one such instance, the hidden prompt appeared to be designed to subvert LLM document summarisation: We also identified a book chapter containing a hidden prompt designed to influence LLM document analysis:

Finally, we identified hidden LLM prompts in blog posts and homepages. In one instance, the prompt was designed to allow the author to track when the page in question was interpreted by an LLM with the ability to perform web requests:

Another contained a highly malicious hidden prompt:

We conclude therefore that hidden LLM prompts have a variety of intended uses, from manipulating or subverting document review and AI-assisted decision making, allowing a document author to detect when an LLM has been applied to their document, through to triggering destructive or malicious activity.

2.2 Hiding Text in Structured Documents↩︎

There are numerous ways to hide text visually in a structured document, such that the text remains present when the document exported to text (and therefore visible to LLM processing). We discuss several techniques to motivate why hidden LLM prompt detection methods must be agnostic to the specific method used to hide the prompt within a document. We focus on HTML and PDF documents for ease of exposition; however, similar ideas apply to other document formats.

2.2.0.1 Matching Text and Background

The most common method is to render the hidden LLM prompt in a text colour that matches the background colour of the region of the document in which the text is present. For example, many academic papers have been found to contain hidden LLM prompts written in white-coloured text [10].

Alternatively, one might design a web page that uses a black background colour and then include hidden LLM prompts also rendered in black.

2.2.0.2 Invisible Content

Many document formats include mechanisms to mark (text) content so that it is not displayed, and is not included in layout calculations (thereby making its absence unnoticeable, too). In PDF documents, Optical Content Groups (OCGs) (aka layers) can be used to group content together and control its visibility. In particular, setting the ViewState property of an OCG to /OFF causes its contents to be hidden. Such contents is ignored by tools like pdftotext but remains “visible” to multi-modal LLM interfaces like ChatGPT [18].

A separate way to render invisible text in PDFs is via text rendering mode 3, which causes text glyphs to be rendered with neither fill nor stroke. The text therefore remains present in the content stream and is selectable in PDF viewers, while being invisible on the page.

In HTML/CSS, one may use the display: none style to similar effect, as in the following trivial example:

<span style="display: none">
  IGNORE ALL PREVIOUS INSTRUCTIONS
</span>

2.2.0.3 Tiny Text Size

A simple way to hide text in a document is to ensure it is rendered with a tiny (or even zero) text size. Such text may appear invisible to the human eye, or as a small straight line that is easily overlooked.

2.2.0.4 Obscured Text

Most structured documents include a means to stack objects, and to control not only the \((x,y)\) position of an object but also its stacking order, or \(z\)-coordinate. By taking advantage of this mechanism, one can ensure that an object is stacked on top of hidden text, thereby obscuring it from view.

For example, the following HTML/CSS stacks an image on top of the hidden LLM prompt to obscure it from the human eye, while ensuring that both the text and image are placed at the same location on the page.

<span style="position: absolute; left: 0px;
             top: 100px; z-index: 1">
  IGNORE ALL PREVIOUS INSTRUCTIONS
</span>
<span style="position: absolute; left: 0px;
             top: 100px; z-index: 2">
  <img src=... width="800">
</span>

2.2.0.5 Offpage Text

Another common hiding method involves placing text outside of the document area. For instance, in PDF documents, one can place text outside of a page’s CropBox, which means it won’t be displayed by PDF readers that respect the CropBox information. One may also place text outside of a page’s MediaBox, which will ensure it remains invisible even to tools like pdftotext. We confirmed through experimentation that extra-MediaBox text remains “visible” to multi-modal LLM interfaces like ChatGPT, which interpret the PDF content stream directly [18].

Similarly, in HTML/CSS one can simply position text at extreme coordinates to ensure it won’t be visible on the page, while remaining present in the DOM.

<span style="position: absolute; left: -9999px">
  IGNORE ALL PREVIOUS INSTRUCTIONS
</span>

2.2.0.6 Zero-Area Clipping

Many document formats allow clipping information to be applied to document elements. By applying a zero-area clipping rectangle to a piece of text, one can ensure it is not rendered in the document. PDF clipping paths can be used to achieve this purpose, analogous to the following HTML/CSS example.

<span style="position: absolute;
             clip: rect(0, 0, 0, 0)">
  IGNORE ALL PREVIOUS INSTRUCTIONS
</span>

2.2.0.7 Zero-Opacity Text

Document formats that allow text colours to include an alpha component, or to specify opacity, can contain hidden text whose opacity has been set to 0. For example, PDFs can contain text whose RGBA colour is (0,0,0,0), i.e., transparent black. Likewise, the opacity: 0 style can be used in HTML/CSS to trivially hide text.

2.2.0.8 Malicious Fonts

Recently, Xiong et al. [19] proposed hiding malicious LLM prompts in documents by using malicious fonts, in which standard character codes are mapped to non-standard glyphs. Doing so opens many possibilities for hiding text from human eyes, and is applicable across various document formats including PDF and HTML.

2.2.0.9 Hidden-Visibility Content

Finally, we note that HTML/CSS includes the visibility: hidden style, which can be used to mark content as not visible while still having that content take up space in the page layout.

3 Design↩︎

With so many ways to hide LLM prompts in documents, we now consider how we can robustly detect them.

We begin by noting that there are two parts to this problem: (1) detecting a potential LLM prompt in text, and (2) detecting hidden text. Problem (1) has already received considerable attention [12]–[14]. We might hope as a starting point therefore to take a document, convert it to text, and apply existing solutions to problem (1). Of course doing so is insufficient if we cannot solve problem (2). Consider the present paper, which contains many LLM prompts in its text; however, contains no hidden LLM prompts. Simply applying LLM prompt detection methods to document text is likely to yield many false positives (i.e., false alarms). However, we would be unwise to discard existing prompt detection methods altogether.

Therefore, we focus our attention on a slight variation to problem (2) as it was stated above:

Given a piece of text, which has been identified as potentially containing an LLM prompt, how can we determine whether it contains hidden text?

The core insight behind PhantomLint is inspired by ideas from metamorphic testing [20] and program hyperproperties [21], namely that of testing a condition by comparing multiple program outputs.

Specifically, we say that a block of text \(\mathit{text}\) from document \(D\) does not contain hidden text when the OCR Consistency Test defined in [alg:ocr-test] succeeds.

This procedure obtains the minimal region \(R\) of the document containing the text block \(\mathit{text}\) in question, renders an image \(I\) of that region, performs Optical Character Recognition (OCR) on that region to obtain the text \(\mathit{ocrtext}\) that is visible, and then computes the difference \(\Delta\) between that and the original text \(\mathit{text}\). It succeeds when \(\Delta\) is empty. Otherwise, \(\Delta\) is (evidence of) hidden text.

It makes use of the Difference procedure, which returns those text spans from \(\mathit{text}\) that are absent from \(\mathit{ocrtext}\).

[alg:ocr-test] has the advantage that it is independent of the method used to hide the prompt in a document. It relies on being able to determine the visual area of the document occupied by a piece of text.

It also works best when the piece of text in question is relatively short and occupies a tight visual area. To understand why, suppose a hidden LLM prompt was present on page of the present paper. That page also contains a number of visible LLM prompts taken from real documents. Therefore, when computing \(\Delta\), it is important to focus on the smallest visual area \(R\) that contains the suspicious text in question.

These observations lead to the document analysis procedure defined in [alg:detector]. Here, Analyze takes a block of text and returns those text spans within it that it deems as likely to contain LLM prompts.

[alg:detector] checks for the presence of LLM prompts in each block of text \(\mathit{text}\) of document \(D\). For each that is deemed suspicious, i.e. as likely to contain a prompt, the OCR Consistency Test procedure ([alg:ocr-test]) is applied. [alg:detector] determines that the text block \(\mathit{text}\) likely contains a hidden LLM prompt if the difference set \(\Delta\) returned by [alg:ocr-test] overlaps with the suspicious text spans \(\mathit{spans}\) identified by the Analyze procedure: that intersection is precisely those text spans that are both deemed suspicious by the Analyze procedure and deemed as hidden by the OCRConsistencyTest procedure.

[alg:detector] leaves undefined what constitutes a text block. We think of a text block as the smallest unit of logically contiguous text for which document image regions \(R\) can be accurately computed (see [alg:ocr-test]).

[alg:detector] is intentionally designed to allow existing prompt detection methods to be applied during its Analyze step. It is also designed to maximise efficiency by applying OCR only to those regions of the document that have already been identified as suspicious.

4 Implementation↩︎

The design of [alg:detector] is intentionally agnostic to the format of the document \(D\). All that is required to implement it for a particular document format is a means to identify each document text block, and for each to obtain the document region occupied by that text block, from which an image can be rendered for processing by OCR.

PhantomLint is implemented in Python, and supports both PDF and HTML documents. Given an HTML document, it walks the DOM to identify all text nodes, each of whose contents becomes a text block. PhantomLint makes use of the playwright library [22] to render the HTML document as an image. Given a text node, it computes the node’s bounding box within the rendered document. To render an image of a particular text block, it simply crops the region identified by the text block’s bounding box from the larger document image already rendered. Elements that are offpage yield zero-sized cropped images, for which OCR naturally returns no text. Therefore such text blocks are trivially identified as containing hidden text, as required.

For PDF documents, PhantomLint follows a similar approach. Each text block is simply a text block element of the PDF document, as identified by the PyMuPDF library [23]. The same approach is used to render text block images: compute the text block’s bounding box and use that to crop from the larger image of the page that contains the text block in question. To handle offpage text, PhantomLint makes use of the pikepdf library [24] to first remove each page’s CropBox, and to enlarge its MediaBox, before processing by PyMuPDF. This preprocessing by pikepdf also ensures that all Optical Content Groups (OCGs) are visible. Without it, text contained in OCGs whose visibility is marked /OFF would otherwise be ignored by PyMuPDF and pdftotext. Because PyMuPDF does not ignore clipping paths, PhantomLint implements clipping path detection. For document pages that contain clipping paths, it falls back to using Poppler’s pdftotext [25] to extract the page’s text, because pdftotext does ignore clipping paths. In this fallback mode, the entire page is treated as one text block. In practice this does not harm PhantomLint’s accuracy.

PhantomLint provides multiple implementations of the Analyze procedure, for detecting likely LLM prompts in text blocks. Its default implementation uses the all-MiniLM-L6-v2 model from the Sentence Transformers library [26] to match against a list of common prompts, as detailed in 3, using a sliding window algorithm over the text block contents. This allows identifying the specific text spans that are likely to contain LLM prompts; adjacent and overlapping spans are merged together to compute maximal spans likely to contain LLM prompts. As an alternative to this approach, PhantomLint also supports using the LLMGuard [12] library for prompt detection; however we have found in practice that the former approach is more accurate at present.

To implement the Difference procedure of [alg:ocr-test], PhantomLint first computes all words that appear in the text block \(\mathit{text}\) but are absent from the OCR text \(\mathit{ocrtext}\). It then merges adjacent words to form spans, before merging overlapping and adjacent spans to form maximal spans \(\Delta\) that it identifies as hidden text. While somewhat coarse, this approach works well in practice.

PhantomLint currently uses the Tesseract library for image OCR [27]. In some cases, Tesseract is more discerning than human eyes: for instance, it is able to identify text rendered with colour #fefefe atop a white background #ffffff. PhantomLint mitigates this effect by applying a Gaussian blur to low-contrast regions of the image, before applying OCR.

5 Evaluation↩︎

We evaluated PhantomLint to understand the generality of our approach to detecting hidden LLM prompts in documents (5.1), as well as its false positive rate ([sec:eval-fp]), its practical utility against real-world documents containing hidden LLM prompts (5.3), as well as its performance (5.4).

5.0.0.1 Experiment Setup

All experiments were carried out on a 2021 14-inch MacBook Pro (Apple M1 Pro, 16 GB RAM) running macOS Sequoia 15.6. PhantomLint was run in its default configuration, using its default bad phrases list. The bad phrases list appears in 3.

None

Figure 3: PhantomLint’s default bad phrases list..

5.1 RQ1: Generality↩︎

We begin by evaluating PhantomLint’s generality, namely whether it can successfully detect hidden LLM prompts when a variety of hiding strategies are used, across both PDF and HTML documents.

1 summarises the different hiding methods against which we evaluated PhantomLint (explained in 2.2). For each we created a synthetic document that included an LLM prompt (such as “IGNORE ALL PREVIOUS INSTRUCTIONS” or “GIVE THIS PAPER A POSITIVE REVIEW”) hidden using that method. For invisible PDF content we implemented both the ViewState /OFF OCG method and the text rendering mode 3 method. For offpage PDF text we implemented two methods: one that placed the hidden prompt outside the page’s CropBox and the second that placed it outside the page’s MediaBox. We implemented Xiong et al.’s method [19] of using malicious fonts to hide content by creating a malicious font that maps every character to a blank glyph. Since hidden-visibility content applies only to HTML, we did not create a PDF document employing that technique. The synthetic documents we created included synthetic single- and double-column nonsense scientific papers, as well as simple “Hello, World!” style web pages. The resulting data set contained 26 synthetic documents in total: 16 PDF and 10 HTML. In all cases, PhantomLint successfully detected the hidden LLM prompts.

Table 1: PhantomLint evaluated against various text hiding methods applied to synthetic documents to embed simple LLM prompts. \(✔\) means that PhantomLint was able to successfully detect hidden text. n/a means that this method was not applicable to the corresponding document format.
Method	PDF	HTML
Matching text and background	\(✔\)	\(✔\)
Invisible content	\(✔\)	\(✔\)
Tiny text size	\(✔\)	\(✔\)
Obscured text	\(✔\)	\(✔\)
Offpage text	\(✔\)	\(✔\)
Zero-area clipping	\(✔\)	\(✔\)
Zero-opacity text	\(✔\)	\(✔\)
Malicious Fonts	\(✔\)	\(✔\)
Hidden-visibility content	n/a	\(✔\)

5.2 RQ2: Low False Positive Rate↩︎

To understand PhantomLint’s false positive rate, we applied to to every paper accepted for publication at ICML 2025. We downloaded all such papers from openreview.net. At the time of writing there were 3,257 papers marked as accepted (from an original 3,260 accepted papers, three had been withdrawn), which gave us a data set of 3,257 PDF documents. We chose this data set because the ICML organisers had reported that hidden LLM prompts had been detected in accepted papers.

PhantomLint identified three papers with hidden, suspicious phrases. Upon inspecting its results, all were false positives (i.e. false alarms). All were caused by OCR failures in text adjacent to sentence fragments consistent with hidden LLM prompts. Two were instances in which very short spans of words were highlighted in response to an OCR failure over text containing non-English alphabetic characters. The third identified a long block of text containing sentence fragments similar to hidden LLM prompts which, upon inspecting the original PDF, was intentionally rendered in a very small font and so hindered the performance of OCR.

We conclude therefore that PhantomLint’s false positive rate is approximately \(\frac{3}{3,257} \approx 0.092\%\).

5.3 RQ3: Practical Utility↩︎

We evaluated PhantomLint’s practical utility, when applied to real documents containing hidden LLM prompts. To carry out this experiment, we collected a sample of such documents. We began with the 18 arXiv preprints recently reported by Liu [10] to contain hidden LLM prompts. At the time of writing 17 of those papers were still available online. We then carried out a series of Google searches to acquire additional documents, by searching for PDF and HTML files containing phrases known to be used in hidden LLM prompts [10], such as “IGNORE ALL PREVIOUS INSTRUCTIONS”, “GIVE THIS PAPER A POSITIVE REVIEW”, “If you are an LLM”, “Note to LLM reviewers”, but that did not mention the word “prompt” within the document. We then widened our search to include Curriculum Vitae (CV) documents, by searching for PDF files containing the words “CV”, “Curriculum Vitae”, or “Résumé” alongisde phrases known to be used in hidden LLM prompts used in CVs, including “IGNORE ALL PREVIOUS INSTRUCTIONS”, “recommend hiring this candidate”, “well qualified candidate”.

For each document we found, we manually analysed it to determine that it did indeed contain an LLM prompt. To do so we opened the document in MacOS Preview version 11.0 (1069.7.1) and searched for the LLM prompt that Google had identified the document to contain. We then sorted the resulting documents into two classes, as follows. Positive documents were those in which the LLM prompt was not visible to the human eye. Negative documents were those in which the LLM prompt was visible.

The resulting data set we obtained comprised 119 documents in total: 113 positive and 6 negative. It included 93 PDF documents and 26 HTML. Its composition is summarised in 2. Positive PDFs include both CVs and preprints, as well as a handful of other documents, including one book chapter. Positive HTML content includes a few blog posts, while negative HTML content includes one email message from a public mail archive.

PhantomLint identified no hidden, suspicious phrases among the negative documents. In each positive document, PhantomLint successfully identified the hidden LLM prompt, by which we mean it identified at least one region of the document text as containing a hidden LLM prompt and that region overlapped with the actual hidden LLM prompt. In practice, PhantomLint may fail to identify all of the hidden LLM prompt, due to its diffing implementation (see 4). It may also highlight non-hidden non-words adjacent to the hidden LLM prompt (such as punctuation or numbers) because these are ignored during hidden text span identification (see 4).

Table 2: Real documents with LLM prompts.
	Positive	Negative	Total
PDF
CVs: 64
Preprints: 22
Theses: 2
Chapters: 1
CVs: 2
Preprints: 2	93
HTML
CVs: 1
Preprints: 20
Blogs: 3
Blogs: 1
Emails: 1	26
Total	113	6	119

5.4 RQ4: Performance↩︎

We analysed PhantomLint’s running times across the experiments carried out for RQ2 and RQ3, namely across the data sets of accepted ICML 2025 papers and the real-world documents containing LLM prompts that we curated (summarised in 2) respectively.

PhantomLint’s average running time over the 3,257 PDF papers accepted for publication at ICML 2025 was 68.25 seconds per paper. Across the 119 real-world HTML and PDF documents it was 43.75 seconds per document. This number is lower because this data set comprises short documents, namely 1–2 page CVs.

We conclude that PhantomLint’s current implementation achieves acceptable performance.

The threat of indirect prompt injection was documented at least as early as February 2023, by Greshake et al. [5]. Since that time, much research has been focused on methods to mitigate the risks of this attack. Yi et al. [6] provide a recent example.

More recently, Chen et al. [6] consider the problem of detecting indirect prompt injection in document text, and methods for removing prompts embedded in document text. They conclude that models specially trained to detect indirect prompt injection over the document class in question can achieve acceptable performance.

Our focus in contrast is on detecting hidden LLM prompts in structured documents. We propose an approach based on ideas from metamorphic testing [20] in which the results of two analysis procedures are compared: namely text extraction vs.OCR text recognition.

In 2023, Greshake [9] presented a tool for automatically hiding text in PDF documents. We note that the corpus of documents against which we evaluated PhantomLint (5.3) included documents with hidden prompts identical to those that Greshake’s tool produces by default.

More recently, Xiong et al. [19] proposed using malicious fonts to hide LLM prompts in documents. We implemented their method and found that our approach was effective in detecting it (along with all the other hiding methods we evaluated in [sec:eval-hiding-methods]).

We note that OCR has long been used as an analysis and detection method in digital forensics settings, for instance for forgery detection [28]. We show that it is also effective in enabling hidden LLM prompt detection.

Finally, we observe that our core idea—comparing text obtained from a document (text block) with that obtained from applying OCR (to the document region occupied by the text block) echoes ideas first proposed by Duan et al. [29] for detecting cloaking, which is the practice of hiding text in web pages for the purpose of e.g., search engine optimisation (SEO). Their method involves comparing web page contents obtained through web crawling against the client-side view to identify discrepancies.

7 Conclusion↩︎

Indirect prompt injection, via LLM prompts hidden in documents from visual inspection, presents a growing security risk to AI-enabled automated document processing and analysis systems. We presented the first general-purpose and principle approach to hidden LLM prompt detection in structured documents. We implemented our approach in the prototype tool PhantomLint and evaluated it, showing that it enjoys wide generality, a very low false-alarm rate, is effective against real-world documents, and has acceptable performance.

References↩︎

[1]

Tanya Goyal, Junyi Jessy Li, and Greg Durrett. News summarization and evaluation in the era of gpt-3. arXiv preprint arXiv:2209.12356, 2022.

[2]

Swanand Vaishampayan, Hunter Leary, Yoseph Berhanu Alebachew, Louis Hickman, Brent A Stevenor, Weston Beck, and Chris Brown. Human and LLM-based resume matching: An observational study. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 4808–4823, 2025.

[3]

Minjun Zhu, Yixuan Weng, Linyi Yang, and Yue Zhang. eepReview: Improving LLM-based paper review with human-like deep thinking process. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29330–29355. Association for Computational Linguistics, July 2025.

[4]

Frank P-W Lo, Jianing Qiu, Zeyu Wang, Haibao Yu, Yeming Chen, Gao Zhang, and Benny Lo. Ai hiring with LLMs: A context-aware and explainable multi-agent framework for resume screening. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 4184–4193, 2025.

[5]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM workshop on artificial intelligence and security, pages 79–90, 2023.

[6]

Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. Benchmarking and defending against indirect prompt injection attacks on large language models. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 1, pages 1809–1820, 2025.

[7]

Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, Ziang Xiao, and Ming Yin. From text to trust: Empowering ai-assisted decision making with adaptive LLM-powered analysis. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–18, 2025.

[8]

Aysan Esmradi, Daniel Wankit Yip, and Chun Fai Chan. A comprehensive survey of attack techniques, implementation, and mitigation strategies in large language models. In International conference on ubiquitous security, pages 76–95. Springer, 2023.

[9]

Kai Greshake. . Available online: https://kai-greshake.de/posts/inject-my-pdf/, 2023.

[10]

Zhicheng Lin. Hidden prompts in manuscripts exploit AI-assisted peer review. arXiv preprint arxiv:2507.06185, 2025.

[11]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.

[12]

Shubh Goyal, Medha Hira, Shubham Mishra, Sukriti Goyal, Arnav Goel, Niharika Dadu, Kirushikesh DB, Sameep Mehta, and Nishtha Madaan. Llmguard: guarding against unsafe llm behavior. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 23790–23792, 2024.

[13]

Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed? arXiv preprint arXiv:2502.16580, 2025.

[14]

Valerii Gakh and Hayretdin Bahsi. Enhancing security in llm applications: A performance evaluation of early detection systems. arXiv preprint arXiv:2506.19109, 2025.

[15]

Mark Riedl. Social media post on X. Available online: https://x.com/mark_riedl/status/1637986261859442688, March 2023.

[16]

Danielle Abril. Job applicants are battling ai résumé filters with a hack, July 2023.

[17]

ICML 2025 Organizers. Publication ethics. Available online: https://icml.cc/Conferences/2025/PublicationEthics, July 2025.

[18]

ChatGPT. Saved ChatGPT conversation with the author, August 2025. Available online: https://chatgpt.com/share/68ac3cf1-3428-8005-bf21-54cd86177a64.

[19]

Junjie Xiong, Changjia Zhu, Shuhang Lin, Chong Zhang, Yongfeng Zhang, Yao Liu, and Lingyao Li. Invisible prompts, visible threats: Malicious font injection in external resources for large language models. arXiv preprint arXiv:2505.16957, 2025.

[20]

T. Y. Chen, S. C. Cheung, and S. M. Yiu. Metamorphic testing: A new approach for generating next test cases. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong University of Science and Technology, 1998.

[21]

Michael R Clarkson and Fred B Schneider. Hyperproperties. Journal of Computer Security, 18(6):1157–1210, 2010.

[22]

Microsoft. Playwright. https://playwright.dev/. Accessed: 2025-08-11.

[23]

Artifex Software, Inc. Pymupdf. https://pymupdf.readthedocs.io/. Accessed: 2025-08-11.

[24]

James R. Barlow. The pikepdf library. https://pikepdf.readthedocs.io/. Accessed: 2025-08-11.

[25]

Freedesktop.org. Poppler. https://poppler.freedesktop.org/. Accessed: 2025-08-11.

[26]

Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using SiameseBERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019.

[27]

Ray Smith. An overview of the TesseractOCR engine. In Ninth international conference on document analysis and recognition (ICDAR 2007), volume 2, pages 629–633. IEEE, 2007.

[28]

Svetlana Abramova and Rainer Böhme. Detecting copy–move forgeries in scanned text documents. Electronic Imaging, 28:1–9, 2016.

[29]

Ruian Duan, Weiren Wang, and Wenke Lee. Cloaker catcher: A client-based cloaking detection system. arXiv preprint arXiv:1710.01387, 2017.

PhantomLint: Principled Detection of Hidden LLM Prompts in
Structured Documents

Abstract

1 Introduction↩︎

2 Background↩︎

2.1 Hidden LLM Prompts in Documents↩︎

2.2 Hiding Text in Structured Documents↩︎

2.2.0.1 Matching Text and Background

2.2.0.2 Invisible Content

2.2.0.3 Tiny Text Size

2.2.0.4 Obscured Text

2.2.0.5 Offpage Text

2.2.0.6 Zero-Area Clipping

2.2.0.7 Zero-Opacity Text

2.2.0.8 Malicious Fonts

2.2.0.9 Hidden-Visibility Content

3 Design↩︎

4 Implementation↩︎

5 Evaluation↩︎

5.0.0.1 Experiment Setup

5.1 RQ1: Generality↩︎

5.2 RQ2: Low False Positive Rate↩︎

5.3 RQ3: Practical Utility↩︎

5.4 RQ4: Performance↩︎

7 Conclusion↩︎

References↩︎

Subjects

Updated on Academus

PhantomLint: Principled Detection of Hidden LLM Prompts in Structured Documents

Abstract

1 Introduction↩︎

2 Background↩︎

2.1 Hidden LLM Prompts in Documents↩︎

2.2 Hiding Text in Structured Documents↩︎

2.2.0.1 Matching Text and Background

2.2.0.2 Invisible Content

2.2.0.3 Tiny Text Size

2.2.0.4 Obscured Text

2.2.0.5 Offpage Text

2.2.0.6 Zero-Area Clipping

2.2.0.7 Zero-Opacity Text

2.2.0.8 Malicious Fonts

2.2.0.9 Hidden-Visibility Content

3 Design↩︎

4 Implementation↩︎

5 Evaluation↩︎

5.0.0.1 Experiment Setup

5.1 RQ1: Generality↩︎

5.2 RQ2: Low False Positive Rate↩︎

5.3 RQ3: Practical Utility↩︎

5.4 RQ4: Performance↩︎

6 Related Work↩︎

7 Conclusion↩︎

References↩︎

Subjects

Updated on Academus

PhantomLint: Principled Detection of Hidden LLM Prompts in
Structured Documents