Thursday, 19 July 2012

Umbraco Examine - Indexing Custom XML Data

I recently ran into a problem with an Umbraco Examine indexer suddenly stopping after trying to index the first node in the content tree.

The only error information I could get was:

1
2
3
4
5
Exception information:
Exception type: NullReferenceException
Exception message: Object reference not set to an instance of an object.
at UmbracoExamine.UmbracoContentIndexer.OnGatheringNodeData(IndexingNodeDataEventArgs e)
at Examine.LuceneEngine.Providers.LuceneIndexer.GetDataToIndex(XElement node, String type)



I tried using NodeIndexing & GatheringNodeData handlers to find out more but all I could see was the Examine was trying to index the root node (100 in this case), then looking for Node 1 (Which didn't exist).

Eventually I found out it was due a custom data type (TheFarm - EmbeddedContent) saving content as XML.

The data was being stored as:

1
2
3
4
5
6
7
8
<umbracoproperty>
    <data>
      <item id="1">
        <node1>Node 1 content</node1>
        <node2>Node 2 content</node2>
      </item>
    </data>
</umbracoproperty>

The problem seems to be with using "id" as an attribute as it is used in umbraco.config:

<NodeType id="100" parentID="99" level="4" ...



It appears the indexer assumes it is parsing a new node when it reads this.

There are a few options to resolve this problem:
  1. Wrap custom XML data in <![CDATA[
  2. Try using a NodeIndexing handler on the BaseIndexProvider to deal with the XML.
  3. Avoid using id as an attribute in custom XML data.
Option 3 seemed the easiest for me, I updated the  EmbeddedContent source code to use another id attribute.