techvanguards.com
Last updated on 1/25/2016

Hello MSXML
by Binh Ly

Microsoft's XML parser, MSXML, is a COM-based library used to build high-performance applications that manipulate XML documents. MSXML is currently on version 3.0 SP1. There is also a technology preview of MSXML 4 from MSDN, with major improvements in the areas of XML Schemas (XSD). In this lesson, we'll look into: 

  1. Installing and using MSXML

  2. Understanding the basics of MSXML programming

Installing MSXML

MSXML 3 can be obtained from the Microsoft site. Standard windows installations (including the latest versions of Internet Explorer) will typically have an older version of MSXML (such as 2.5). For development and production purposes, I recommend downloading and installing MSXML 3 SP1 or higher. When deploying applications in a production environment where you need to use MSXML from a client/desktop machine, MSXML can be redistributed as a downloadable CAB file.

The lessons in this series are based on MSXML 3 SP1 or higher. 

Programming MSXML

MSXML is heavily COM-based. Since Delphi (since version 3) has excellent support for building COM client applications, using MSXML from Delphi is as easy as using any other COM library. If you are not familiar with building COM applications in Delphi, check out my Delphi COM lessons.

From a high-level view, MSXML supports the following basic standards:

  1. Document Object Model (DOM)
  2. Simple API for XML (SAX)
  3. XML Namespaces
  4. XSLT and the XML Path Language (XPath)
  5. Document Type Definitions (DTD) and XML Schemas (XDR in MSXML 3, XSD in MSXML 4)

MSXML is a validating parser. In simple terms, MSXML can be used to check if XML documents conform to a specific schema (DTD, XDR, or XSD). This feature tremendously helps to automatically validate XML documents without the need to manually develop tedious code that performs validation differently in every application. In addition, MSXML validation is optional and can be programmatically turned on and off as needed.

As with any other COM library, the first step to using MSXML is to import its interface type information. This is done using Project | Import Type Library in the IDE or by running the tlibimp.exe command-line utility. When importing, select "Microsoft XML (version 3.0)", turn off "Generate Component Wrapper" (for D5 and above), and select the "Create Unit" option. This process will then produce a module named MSXML2_TLB.

Note that the import module is named MSXML2_TLB instead of MSXML3_TLB. This idiosyncrasy is due to Microsoft's retarded programmatic versioning scheme for MSXML. 

It is important to install the latest SP/UP for your specific version of Delphi before performing this import process. I've heard that the first version of Delphi 6 has some bugs in the type library importer. D6 users can stick to the D5 import or wait for the next SP or use my TypeExport utility.

Delphi 6 provides native XML capabilities based on a library of XML classes. We will not be covering the native Delphi XML classes in these lessons because it requires a different scope in syntax, semantics, and feature discussions.

With the generated module in hand, we can now use this module to unleash the full power of MSXML. Let's start with a simple example:

uses
    MSXML2_TLB;

procedure TForm1.LoadXMLDocumentClick(Sender: TObject);
var
    doc: IXMLDOMDocument;
    MessageText: string;
begin
    //create DOM document instance
    doc := CoDOMDocument.Create;

    //prepare to load an XML document in synchronous mode
    doc.async := False;

    //load an XML document from file
    if doc.load ('helloworld.xml') then
    begin
        //extract text value of "message" element
        MessageText := doc.documentElement.childNodes [0].text;
        ShowMessage (MessageText);
    end
    else
        //error loading, display error
        ShowMessage (Format ('Error loading XML document.'#13 +
            'Error number: %d'#13 +
            'Reason: %s'#13 +
            'Line: %d'#13 +
            'Column: %d', [doc.parseError.errorCode,
            doc.parseError.reason,
            doc.parseError.line,
            doc.parseError.linePos]));
end;

The helloworld.xml file is:

<root>
    <message>Hello World</message>
</root>

The above example can be dissected as follows:

  1. A reference to the MSXML2_TLB module is first added to the uses clause.
  2. An XML DOM document instance is created into the doc variable.
  3. A "helloworld.xml" file is loaded into the doc DOM instance in synchronous mode. MSXML's DOM class loads XML documents in asynchronous mode by default. Thus it is necessary to reset the doc.async property to False before loading the document.
  4. On a successful load, the first child element (message) of the document element (root) is inspected and its text value ("Hello World") is extracted. This value is then displayed.
  5. On an unsuccessful load, an error message is displayed based on detailed error information contained in the DOM document instance's parseError property.

If you've never used MSXML before (or if you've used another XML parser on a different platform), it is important to understand the mechanics of the above example line-by-line. Not only is the example trivial, it also illustrates the essential semantics and nuances in using MSXML as an XML parser and as a COM library.

The DOM document class can also be used to load XML documents from various sources using various mechanics. For instance:

uses
    MSXML2_TLB;

procedure TForm1.LoadXMLDocumentClick(Sender: TObject);
var
    doc: IXMLDOMDocument;
    S: string;
begin
    //create DOM document instance
    doc := CoDOMDocument.Create;

    //prepare to load an XML document in synchronous mode
    doc.async := False;

    //load an XML document from URL
    if doc.load ('http://www.nowhere.com/test.xml') then ...

    //load an XML document from an XML string
    S := '<nothing-good/>';
    if doc.loadXML (S) then ...

    //load an XML document from ASP request stream/object
    //this is basically an IStream load
    if doc.load (ASPRequest) then ...

    //write XML directly from an ADO recordset into DOM instance
    //this is basically an IStream write
    ADORecordset.Save (doc, adPersistXML);
end;

In any case, it is important to note the doc.async usage and the if-then test for doc.load and doc.loadXML to determine if the load operation is successful or not.

Since there's a Load, there's also a Save. Save is used to persist the string representation of an XML document. For instance:

uses
    MSXML2_TLB;

procedure TForm1.SaveXMLDocumentClick(Sender: TObject);
var
    doc: IXMLDOMDocument;
    S: string;
begin
    //create DOM document instance
    doc := CoDOMDocument.Create;

    //prepare to load and XML document in synchronous mode
    doc.async := False;

    //load an XML document
    S := '<nothing-good/>';
    if doc.loadXML (S) then
    begin
        //save to file
        doc.save ('nothinggood.xml');
    end;
end;

The end-result of the above example is the "<nothing-good/>" XML string persisted into the nothinggood.xml file. Saving an XML document contained in a DOM document instance can also be done in several ways: 

uses
    MSXML2_TLB;

procedure TForm1.SaveXMLDocumentClick(Sender: TObject);
var
    doc: IXMLDOMDocument;
    S: string;
begin
    //create DOM document instance
    doc := CoDOMDocument.Create;

    //prepare to load and XML document in synchronous mode
    doc.async := False;

    //load an XML document
    S := '<nothing-good/>';
    if doc.loadXML (S) then
    begin
        //extract XML string from DOM instance and save it using other mechanics
        S := doc.xml;
        SaveString (S);

        //write an XML document to an ASP response stream/object
        //this is basically an IStream extraction
        ASPResponse.Write (doc);
    end;
end;

The above examples give us a general overview of the mechanics of programming MSXML. The remainder of the lessons in this series will drill down into the details of how MSXML implements the various XML standards and features. For now, lets take a quick look at what's contained in the MSXML module (MSXML2_TLB):

Classes:

CoDOMDocument DOM document class
CoFreeThreadedDOMDocument  High performance DOM document class. Used for multiple thread access to a single DOM document instance.
CoXSLTemplate High performance XSL stylesheet cache class. Used for repetitive XSL transformations.
CoXMLHTTP Client-side HTTP access component. Used to send and receive documents (primarily XML documents) across HTTP.
CoServerXMLHTTP High performance server-side HTTP access component. Used to send and receive documents (primary XML documents) across HTTP from within a server application.
CoSAXXMLReader SAX parser engine

Interfaces:

IXMLDOMNode DOM node interface
IXMLDOMNodeList Collection of DOM nodes
IXMLDOMDocument DOM document interface
IXMLDOMElement DOM element interface
IXMLDOMAttribute DOM attribute interface
IXMLDOMCharacterData DOM character data manipulation interface
IXMLDOMText DOM text interface
IXMLDOMComment DOM comment interface
IXMLDOMCDATASection DOM CDATA interface
IXMLDOMProcessingInstruction DOM PI interface
IXMLDOMParseError DOM document parse error interface
IXSLTemplate XSL template stylesheet cache interface
IXSLProcessor XSL processor engine interface
IVBSAXXMLReader SAX parser engine interface
IVBSAXContentHandler SAX content event handler interface
IVBSAXAttributes SAX attributes content event handler interface
IVBSAXErrorHandler SAX error handler event interface
IXMLHTTPRequest Client-side HTTP access interface
IServerXMLHTTPRequest Server-side HTTP access interface

Detailed documentation/reference on the above and the entire MSXML package can be obtained by separately downloading the MSXML SDK. For MSXML 4.0, the SDK is already included in the parser package.

An important detail to realize up front is the mechanics of MSXML's COM implementation. For instance, a DOM element is a special type of a DOM node. In MSXML terms, an IXMLDOMElement derives from an IXMLDOMNode:

type
    IXMLDOMElement = interface(IXMLDOMNode)
        ...
    end;

Using the above definition, an IXMLDOMNode pointer to a DOM element is cast to an IXMLDOMElement as follows:

function NodeAsElement (const Node: IXMLDOMNode): IXMLDOMElement;
begin
    //use COM QueryInterface cast
    //raises an exception if Node is not an element
    Result := Node as IXMLDOMElement;
end;

Likewise, downcasting an IXMLDOMElement pointer to the more generic IXMLDOMNode is done as follows:

function ElementAsNode (const Elem: IXMLDOMElement): IXMLDOMNode;
begin
    //use COM QueryInterface cast
    Result := Elem as IXMLDOMNode;
end;

Conclusion

In this lesson, I've given you a basic overview of how to programmatically use MSXML and its basic features. In the next few lessons, we'll dig into how MSXML does the DOM, SAX, XSL, Schemas, etc.

 

Copyright (c) 1999-2011 Binh Ly. All Rights Reserved.