Thursday 19 July 2012

Umbraco Examine - Indexing Custom XML Data

I recently ran into a problem with an Umbraco Examine indexer suddenly stopping after trying to index the first node in the content tree.

The only error information I could get was:

Exception information:
Exception type: NullReferenceException
Exception message: Object reference not set to an instance of an object.
at UmbracoExamine.UmbracoContentIndexer.OnGatheringNodeData(IndexingNodeDataEventArgs e)
at Examine.LuceneEngine.Providers.LuceneIndexer.GetDataToIndex(XElement node, String type)




I tried using NodeIndexing & GatheringNodeData handlers to find out more but all I could see was the Examine was trying to index the root node (100 in this case), then looking for Node 1 (Which didn't exist).

Eventually I found out it was due a custom data type (TheFarm - EmbeddedContent) saving content as XML.

The data was being stored as:


    
      
        Node 1 content
        Node 2 content
      
    


The problem seems to be with using "id" as an attribute as it is used in umbraco.config:

<NodeType id="100" parentID="99" level="4" ...



It appears the indexer assumes it is parsing a new node when it reads this.

There are a few options to resolve this problem:
  1. Wrap custom XML data in <![CDATA[
  2. Try using a NodeIndexing handler on the BaseIndexProvider to deal with the XML.
  3. Avoid using id as an attribute in custom XML data.
Option 3 seemed the easiest for me, I updated the  EmbeddedContent source code to use another id attribute.