XML is a universal format for storing data in a plain text readable by both people and computers. XML parsing means taking a XML document and transforming it into a code ready to be read and executed.
PHP coding language has inbuilt functions and predefined constants for Expat XML parser. In this tutorial, we will explain them in detail.
Contents
XML Parsing: Main Tips
- XML functions allow you to parse XML documents but not validate them.
- Expat is an event based parser which allows you to process and manage XML documents in PHP.
Expat Parser
Expat is an event based parser. Parsers like this take XML files as event series, calling a specified function for dealing with it whenever an event occurs. That makes it lightweight and well-suited for fast web applications.
It is a non-validating parser and ignores DTDs that may be linked to the documents. If a document is not properly formatted, it will end with an XML parsing error message.
Remember: Expat parser is not designed for document validation. However, if some formatting issues are detected, you will be informed with an error message.
- Easy to use with a learn-by-doing approach
- Offers quality content
- Gamified in-browser coding experience
- The price matches the quality
- Suitable for learners ranging from beginner to advanced
- Free certificates of completion
- Focused on data science skills
- Flexible learning timetable
- Simplistic design (no unnecessary information)
- High-quality courses (even the free ones)
- Variety of features
- Nanodegree programs
- Suitable for enterprises
- Paid Certificates of completion
- A wide range of learning programs
- University-level courses
- Easy to navigate
- Verified certificates
- Free learning track available
- University-level courses
- Suitable for enterprises
- Verified certificates of completion
List of Functions
Look at the list below. Functions that can be used for XML parsing are listed alphabetically.
Note: all of these functions are part of PHP. Therefore, you do not need to install any third-party applications.
In the colum on the right, versions of PHP in which a certain function is valid are indicated:
Function | Description | PHP version |
---|---|---|
utf8_decode() | Decode UTF-8 strings into ISO-8859-1 | 3 and newer |
utf8_encode() | Encode ISO-8859-1 strings into UTF-8 | 3 and newer |
xml_error_string() | Get XML parsing error strings | 3 and newer |
xml_get_current_byte_index() | Get current byte index from PHP XML parser | 3 and newer |
xml_get_current_column_number() | Get current column number from PHP XML parser | 3 and newer |
xml_get_current_line_number() | Get current line number from PHP XML parser | 3 and newer |
xml_get_error_code() | Get XML parsing error code | 3 and newer |
xml_parse() | Parse XML documents | 3 and newer |
xml_parse_into_struct() | Parse XML data into array values | 3 and newer |
xml_parser_create_ns() | Create XML parser that has namespace support | 4 and newer |
xml_parser_create() | Create PHP XML parser | 3 and newer |
xml_parser_free() | Free the PHP XML parser | 3 and newer |
xml_parser_get_option() | Gets options from PHP XML parser | 3 and newer |
xml_parser_set_option() | Sets options in PHP XML parser | 3 and newer |
xml_set_character_data_handler() | Sets handler function for handling char data | 3 and newer |
xml_set_default_handler() | Sets default handler function | 3 and newer |
xml_set_element_handler() | Sets handler function for handling start and end element of elements | 3 and newer |
xml_set_end_namespace_decl_handler() | Sets handler function for handling the end of namespace declarations | 4 and newer |
xml_set_external_entity_ref_handler() | Sets handler function for handling external entities | 3 and newer |
xml_set_notation_decl_handler() | Sets handler function for handling notation declarations | 3 and newer |
xml_set_object() | Uses PHP XML parser within an object | 4 and newer |
xml_set_processing_instruction_handler() | Sets handler function for handling processing instruction | 3 and newer |
xml_set_start_namespace_decl_handler() | Sets handler function for handling the start of namespace declarations | 4 and newer |
xml_set_unparsed_entity_decl_handler() | Sets handler function for handling unparsed entity declarations | 3 and newer |
Error Codes and Constants
You might encounter errors during parsing. Here are error codes that the xml_parse()
function can return:
Constant |
---|
XML_ERROR_NONE (int) |
XML_ERROR_NO_MEMORY (int) |
XML_ERROR_SYNTAX (int) |
XML_ERROR_NO_ELEMENTS (int) |
XML_ERROR_INVALID_TOKEN (int) |
XML_ERROR_UNCLOSED_TOKEN (int) |
XML_ERROR_PARTIAL_CHAR (int) |
XML_ERROR_TAG_MISMATCH (int) |
XML_ERROR_DUPLICATE_ATTRIBUTE (int) |
XML_ERROR_JUNK_AFTER_DOC_ELEMENT (int) |
XML_ERROR_PARAM_ENTITY_REF (int) |
XML_ERROR_UNDEFINED_ENTITY (int) |
XML_ERROR_RECURSIVE_ENTITY_REF (int) |
XML_ERROR_ASYNC_ENTITY (int) |
XML_ERROR_BAD_CHAR_REF (int) |
XML_ERROR_BINARY_ENTITY_REF (int) |
XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF (int) |
XML_ERROR_MISPLACED_XML_PI (int) |
XML_ERROR_UNKNOWN_ENCODING (int) |
XML_ERROR_INCORRECT_ENCODING (int) |
XML_ERROR_UNCLOSED_CDATA_SECTION (int) |
XML_ERROR_EXTERNAL_ENTITY_HANDLING (int) |
These constants are parameters of xml_parser_set_option
:
Constant | Description |
---|---|
XML_OPTION_CASE_FOLDING (int) | Manages whether case-folding is enabled for XML parser. By default, it is enabled. |
XML_OPTION_TARGET_ENCODING (int) | Indicates how many characters should be skipped from the beginning of the tag name. |
XML_OPTION_SKIP_TAGSTART (int) | Indicates whether to ignore values that have whitespace characters. |
XML_OPTION_SKIP_WHITE (int) | Sets which target encoding to use in this XML parser. |
XML Parsing: Summary
- PHP has an inbuilt extension for a lightweight event-based XML parser called Expat. Event based parsers view XML files as a series of single events.
- Expat allows the user to parse XML files, but cannot validate them. If a certain document is not formatted correctly, an error message shows up.
- You can use PHP XML functions listed in the tutorial to start XML parsers and define XML event handlers in your codes.