Monday 28 February 2011

Displaying HTML Source Code with Blogger .NET API

I've been using the Blogger/Blogspot .Net api to extract feed content and display it on my site.

Some of the blog entries have HTML source code and don't display properly. Even when using the character references > and < for angled brackets in the editor, the extracted content is already decoded and breaks the page I'm trying to display it in.

Someone has already raised the issue with Google:

http://code.google.com/p/google-gdata/issues/detail?id=471

This has been merged with an existing issue here:

http://code.google.com/p/google-gdata/issues/detail?id=399#makechanges

It's still unresolved so fingers crossed it'll be fixed soon.

Wednesday 23 February 2011

301 Redirects for HTML files in Umbraco

When replacing an older website with a new content managed system, either the site structure or the server side technology will likely change. This can lead to problems from an SEO point of view as users searching for your site can directed to pages that no longer exist, or now have a different file extension.

The best way to deal with this is for your site to provide a "301 Redirect" whenever this happens. This will tell search engines a page has moved permanently and will preserve its ranking.
Recently I was migrating an existing classic ASP site to Umbraco and had to deal with this problem, here is my solution:

1. Install Redirect package

Luckily, there is a great package called "301 URL Tracker" that does a lot of the work, you can find it here
After installing the package I began getting "Invalid object name 'infocaster301'" server errors, I figured out the required database table hadn’t been installed.
After a quick search on the developer’s forum it turns out someone had a similar problem:
http://our.umbraco.org/projects/developer-tools/301-url-tracker/bug-reports/8974-I%27m-getting-an-Invalid-object-name-%27infocaster301%27-server-error
Unfortunately, the author’s solution was missing a declaration for the ‘IsRegex’ column in the SQL statement, here is an amended version that I got working:

CREATE TABLE infocaster301(
NodeID int NOT NULL,
OldUrl nvarchar(400) NOT NULL,
IsCustom bit NOT NULL,
IsRegex bit NOT NULL,
Message nvarchar(400) NULL,
Inserted datetime NOT NULL,
CONSTRAINT PK_infocaster301 PRIMARY KEY CLUSTERED
(
NodeID ASC,
OldUrl ASC
) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
ALTER TABLE infocaster301 ADD  CONSTRAINT DF_infocaster301_Custom  DEFAULT ((0)) FOR IsCustom
ALTER TABLE infocaster301 ADD  CONSTRAINT DF_infocaster301_Inserted  DEFAULT (getdate()) FOR Inserted

The package is great once installed; it allows you to create multiple custom URL’s , all mapping to an existing content node, just what I needed...well almost.

2. Mapping HTML and ASP

The package only seems to work for url’s with a .aspx extension, the URL’s from the old site would be a mixture of .HTML and .asp.

To fix this I had to map .HTML and .ASP extensions to run as .aspx in IIS.
There is a lot of information about this out there, a few sites that helped were:

http://our.umbraco.org/wiki/recommendations/recommended-reading-for-web-developers/urlrewriting-html-to-aspx
http://weblogs.asp.net/scottgu/archive/2007/03/04/tip-trick-integrating-asp-net-security-with-classic-asp-and-non-asp-net-urls.aspx
http://learn.iis.net/page.aspx/508/wildcard-script-mapping-and-iis-7-integrated-pipeline/

My new Umbraco site runs on IIS7, in integrated pipeline mode. Most of what I read suggested creating a wildcard Handler mapping in IIS to "%windir%\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll". I tried this, but whenever I viewed an HTML or ASP page in the site it would return a blank page.

I finally got it working by creating a new Managed Handler for *.html that uses the "umbraco.BasePages.BasePage" type.

This will add an entry like this to your web.config:

 

3. URL Rewriting

The next step is to create a redirect within Umbraco so .html extensions are replaced with .aspx. To do this, open /config/UrlRewriting.config and add an entry like this:
 

Now all HTML page will be recognised .net calls and can be dealt with by Umbraco and the redirect package.

4. Finally

The final job is to create a list of redirects within Umbraco. When adding a new url redirect to "301 URL Tracker" (and using this method of Handler Mapping), you will have to replace ".html" with ".aspx".
If you wanted "oldsite.com/homepage.html" to redirect to "newsite.com/home.aspx", you would have to add it as "oldsite.com/homepage.aspx".
Hope this helps, if you can think of any way to improve the solution let me know.