What is an XML Injection Attack?
Extensible Markup Language (XML) is an encoding standard that assists in the creation, retrieval, and storage of documents. It consists of a tag structure that identifies specific information within a document1. Unlike HTML, XML is not limited to a specific set of tags, because a single tag set would not adapt to all documents or applications that may use XML. XML utilizes the concepts of tags, elements, and attributes for encoding text.
XML injection attacks are a subcategory of injection attacks where an application doesn’t correctly validate/sanitize user input before using it in an XML document or query. XML injection can be exploited to deliver attacks targeting XML applications that do not escape reserved characters2. In an XML document, “<” and “>” are examples of reserved characters that are used to specify the beginning or the end of an XML tag. In order to use the reserved characters, their predefined meaning must be “escaped”. Escaping in an XML file involves removing traces of offending characters that could be wrongfully interpreted as markup. In an XML injection attack, a cyberattacker often does the following:
- Injection. The cyberattacker injects malicious JavaScript markup code as escaped text in an XML document. Since the code is escaped, the malware filtering software may fail to detect it.
- Parsing. The XML document is parsed by an XML application. The cyberattacker targes XML applications that do not serialize properly reserved characters, so reserved characters are not escaped.
- Payload. Content of the XML element that contains malicious JavaScript markup code is used as input data for a website. When this website is loaded by a visitor, the malicious code is executed.
Examples of surfaces vulnerable for XML injection attacks include applications that rely on XML-based protocols, applications that store XML document sin a database or as flat files, applications that support XML-based document formats and data import, software that relies on XML-enables databases, or XML-based APIs. The two types of XML injection attacks that exploit parsers with bugs or misconfigurations include3:
XML bombs. The XML parser may crash or execute incorrectly given certain input data, resulting in a Denial-of-Service attack.
XXE disclosure. The XML parser may inadvertently leak sensitive information.
To mitigate XML injection risks:
- Sanitize user inputs to filter out unacceptable characters;
- Specify which inputs are allowed;
- Monitor you XML parser;
- Implement a content security policy (CSP).
1 UNL Center for Digital Research in the Humanities, 2023, “What is XML?”
2 Goh, 2017, “An In-Depth Look at XML Document Attack Vectors”
3 Kohnfelder et al., 2022, “Introduction to Software Security: Chapter 3.8.4: XML Injection Attacks”