From DasBlog To SubText

time to read 39 min | 7739 words

My site was down because of DasBlog today. I have no idea why it started to misbehave again, and I don't really care at this point, that was the final reason that I needed to move to SubText. Here is how I did it:

  • Put App_Offline.htm page in the blog directory, since otherwise DasBlog would kill the server
  • Back up the existing blog.
  • Prepare a SubText skin matching my current blog look & feel.
  • I then took the DasBlog to BlogML importer here, which is still not ready, and started to play with the code. This is definately something that would require a developer to do, by the way.
    I had to give up on the trackbacks because they issues an IsInRole() call and I am not connected to the office at the moment. I took the short way our and sprinkled Thread.CurrentPrincipal = new GenericPrincipal(..) all over the place (seem to reset itself for some reason), and it worked.
  • I then tried imprted the resulting file to SubText. I have a large blog, and this meant that I had to increase the timeout values. I then discovered that my posts are not legal in SubText, it was warning me about issues with <script> tags in the posts and then failed with some whitespace error that took some time to check in depth. Appernatly I had literal "<<" somewhere, which SgmlReader interpreted as whitespace, and that caused an error when it tried to save that.
  • The posts are appearing in reverse order (the oldest are on the top), that is a side affect of the way BlogML worked, by output it to the export file in reverse order, and SubText accepted it in order. The fix:

    UPDATE  subtext_content

    SET     DateSyndicated = DateAdded

  • The next issues, categories, I am pretty strict about categories, and it looked like the move messed them up, I got mutliply categories with the same name, probably from multiply attempts to import the same set of data, I cleaned it up with:

    DELETE  FROM subtext_linkcategories

    WHERE   categoryid NOT IN ( SELECT  categoryid

                        FROM    subtext_links )

    That still didn't solve the problem, I had an issue with categories that with duplicated names differencing in whitespace, so I found out about them with this query:

    SELECT  LTRIM(RTRIM(title)),

            MIN(categoryid),

            MAX(categoryid),

            COUNT(*)

    FROM    subtext_linkcategories

    GROUP BY LTRIM(RTRIM(title))

    HAVING  COUNT(*) > 1

    Note that I only had two duplicates categories overall, so it was easy to find the values. Then, I fixed the duplication with this query:

    UPDATE  subtext_links

    SET     categoryid = ( CASE categoryid

                             WHEN 452 THEN 460

                             WHEN 464 THEN 527

                             WHEN 465 THEN 504

                             WHEN 477 THEN 508

                             ELSE categoryid

                           END )

    All that was left was cleaning up the empty categories, using the second query above.
    If you haven't figured it out yet, I really like the ability to run queries on my blog :-)
  • The next target was the comments, I first run this:

    update  subtext_feedback

    set     isBlogAuthor = 1

    where   author = 'Ayende Rahien'

    So I could claim ownership of over 600 comments on my blog. Then it was a matter of handling the stats correctly:

    UPDATE  subtext_config

    SET     commentcount = ( SELECT COUNT(*)

                             FROM   subtext_feedback

                             WHERE  feedbacktype = 1

                           ),

            PingTrackCount = ( SELECT   COUNT(*)

                               FROM     subtext_feedback

                               WHERE    feedbacktype = 2

                             )

    One thing that bugs me with the feedback is that there are multiply places that are using "order by id" to sort by date, in my case, this is not the case, it is actually the reverse of what really should happen.
  • Now, the issues of preserving the old URLs. That one is very important to me, since I have a lot of links directed to my site that I don't want to break. I first had to run a check to see how many duplicate entries name I had, and realized that I had merely 12 duplicate entries, this is something that I can live with. I should explain that the importing process didn't give me a way to correlate between the DasBlog entry and the SubText entry. What I did have was the entry title and date, which should be enough, since I didn't post entries with duplicated names on the same day. I created the following table:

    CREATE TABLE subtext_MappingFromDasBlog

    (

      Id int identity(1, 1)

             primary key

             not null,

      subtextId int references dbo.subtext_content ( id )

                    not null,

      date datetime not null,

      dasBlogId uniqueidentifier not null,

      dasblogUrl nvarchar(255) not null

    )

    And I run the following little boo script:

    import System

    import System.Xml

    import System.IO

    import System.Data

    import System.Data.SqlClient

     

    con = SqlConnection("Data Source=localhost;Initial Catalog=SubtextData;Trusted_connection=yes;")

    con.Open()

     

    xdoc = XmlDocument()

    xdoc.Load("C:\\export.xml")

    nsMgr = XmlNamespaceManager(xdoc.NameTable)

    nsMgr.AddNamespace("blog", "http://www.blogml.com/2006/09/BlogML")

    for node as XmlNode in xdoc.SelectNodes("/blog:blog/blog:posts/blog:post", nsMgr):

          id = Guid(node.SelectSingleNode("@id").Value)

          url = node.SelectSingleNode("@post-url").Value;

          title = node.SelectSingleNode("blog:title/text()", nsMgr).Value;

          dateCreated = date.Parse(node.SelectSingleNode("@date-created").Value)

         

          print "Id ${id} url ${url} title ${title} date ${dateCreated}"

         

          using cmd = con.CreateCommand():

                cmd.CommandText = """SELECT Id FROM subtext_Content

                            WHERE Title = @title and year(DateAdded) = @year

                            and month(DateAdded) = @month and day(DateAdded) = @day""";

                cmd.Parameters.AddWithValue("title", title);

                cmd.Parameters.AddWithValue("year", dateCreated.Year)

                cmd.Parameters.AddWithValue("month", dateCreated.Month)

                cmd.Parameters.AddWithValue("day", dateCreated.Day)

                subtextId as int = cmd.ExecuteScalar()

                cmd.Parameters.Clear()

                cmd.CommandText = """INSERT INTO subtext_MappingFromDasBlog

                            VALUES(@subtextId, @date, @dasBlogId, @dasblogUrl);"""

                cmd.Parameters.AddWithValue("subTextId", subtextId)

                cmd.Parameters.AddWithValue("dasBlogId", id)

                # just the date, not the time

                cmd.Parameters.AddWithValue("date", dateCreated.Date)  
                #trim the http://www.ayende.com/Blog/yyyy-mm-dd/

                cmd.Parameters.AddWithValue("dasblogUrl", url.Substring(38) )

                cmd.ExecuteNonQuery()

         

    con.Dispose()

    Just let me iterate that the above is not production code, it is one off code.
    Now, to get some use out of it...
    SubText supports UrlRewriting, with HttpHandlers, so the first thing I needed to do is enable going to a post just by its id. I had to add this to the HandlerConfiguration/httpHandlers section:

    <HttpHandler pattern="(?:/archive/\d+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>

    Now urls such as /archive/343.aspx are valid.
    Now for the more complex stuff, I wanted to be able to support the following formats.
    • CommentView,guid,{guid}.aspx
    • PermaLink,guid,{guid}.aspx
    • NormalizedPostTitle.aspx
    • /yyyy/mm/dd/NormalizedPostTitle.aspx

It took a while, but I manage to integrate that into SubText cleanly (an additional handler and changing the web.config), so the old urls should still work. The issue was mainly
Important: If you notice a borkne link please let me know about it.

  • Syndication - RSS / Atom - I get a lot more readers on the site than on RSS, but it is obviously very important thing to keep. Nothing is as annoying as a broken link. In this case, dasBlog is using SyndicationService.asmx
  • Binaries - I merely kept them in the same place (I am talking about images, attachments, etc.

The reason that I wrote all of that is that I would actaully remember what I did next time that I would have to upgrade / modify the blog.

The last thing that you need to know is how to configure the url mapping:

<HandlerConfiguration defaultPageLocation="DTP.aspx" type="Subtext.Framework.UrlManager.HandlerConfiguration, Subtext.Framework">

    <HttpHandlers>

        <HttpHandler pattern="login.aspx" handlerType="Page" pageLocation="Login.aspx"/>

              <HttpHandler pattern="(?:/admin\/handlers\/BlogMLExport\.ashx)" type="Subtext.BlogML.BlogMLHttpHandler, Subtext.BlogML" handlerType="Direct"/>

              <HttpHandler pattern="/Admin/" handlerType="Directory" directoryLocation="Admin"/>

              <HttpHandler pattern="/Providers/" handlerType="Directory" directoryLocation="Providers"/>

              <HttpHandler pattern="/rss\.aspx$" type="Subtext.Framework.Syndication.RssHandler, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="/SyndicationService\.asmx/GetRss$" type="Subtext.Framework.Syndication.RssHandler, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="/atom\.aspx$" type="Subtext.Framework.Syndication.AtomHandler, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="/SyndicationService\.asmx/GetAtom$" type="Subtext.Framework.Syndication.AtomHandler, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="/comments/commentRss/\d+\.aspx$" type="Subtext.Framework.Syndication.RssCommentHandler, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="/aggbug/\d+\.aspx$" type="Subtext.Framework.Tracking.AggBugHandler, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="/customcss\.aspx$" type="Subtext.Web.UI.Handlers.BlogSecondaryCssHandler, Subtext.Web" handlerType="Direct"/>

              <HttpHandler pattern="(?:/category\/(\d|\w|\s)+\.aspx/rss/?)$" type="Subtext.Framework.Syndication.RssCategoryHandler, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="(?:/ArchivePostPage.aspx)$" controls="ArchivePostPage.ascx"/>

              <HttpHandler pattern="(?:/LinkPage.aspx)$" controls="LinkPage.ascx"/>

              <HttpHandler pattern="(?:/ArticleCategories.aspx)$" controls="ArticleCategories.ascx"/>

              <HttpHandler pattern="(?:/archives\.aspx)$" controls="SingleColumn.ascx"/>

              <HttpHandler pattern="(?:/archive/\d{4}/\d{2}/\d{2}/\d+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>

              <HttpHandler pattern="(?:/archive/\d+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>

              <HttpHandler pattern="(?:/archive/\d{4}/\d{2}/\d{2}/[-_,+\.\w]+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>

              <HttpHandler pattern="(?:/archive/\d{4}/\d{1,2}/\d{1,2}\.aspx)$" controls="ArchiveDay.ascx"/>

              <HttpHandler pattern="(?:/archive/\d{4}/\d{1,2}\.aspx)$" controls="ArchiveMonth.ascx"/>

              <HttpHandler pattern="(?:/articles/\d+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>

              <HttpHandler pattern="(?:/articles/[-_,+\.\w]+\.aspx)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>

              <HttpHandler pattern="(?:/\d{4}/\d{2}/\d{2}/[\d\w]+\.aspx)$" type="UrlRewriting.For.DasBlog.FromDasBlogToSubText, UrlRewriting.For.DasBlog"  handlerType="Factory" />

              <HttpHandler pattern="(?:commentview,guid,[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}\.aspx)$" type="UrlRewriting.For.DasBlog.FromDasBlogToSubText, UrlRewriting.For.DasBlog"  handlerType="Factory" />

              <HttpHandler pattern="(?:commentview\.aspx)$" type="UrlRewriting.For.DasBlog.FromDasBlogToSubText, UrlRewriting.For.DasBlog"  handlerType="Factory" />

              <HttpHandler pattern="(?:/contact\.aspx)$" controls="Contact.ascx"/>

              <HttpHandler pattern="(?:/posts/|/story/|/archive/)" type="Subtext.Web.UI.Handlers.RedirectHandler, Subtext.Web" handlerType="Direct"/>

              <HttpHandler pattern="(?:/gallery\/\d+\.aspx)$" controls="GalleryThumbNailViewer.ascx"/>

              <HttpHandler pattern="(?:/gallery\/image\/\d+\.aspx)$" controls="ViewPicture.ascx"/>

              <HttpHandler pattern="(?:/(?:category|stories)/(\w|\s)+\.aspx)$" controls="CategoryEntryList.ascx"/>

              <HttpHandler pattern="(?:/comments\/\d+\.aspx)$" type="Subtext.Framework.Syndication.CommentHandler, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="(?:/services\/trackbacks/\d+\.aspx)$" type="Subtext.Framework.Tracking.TrackBackHandler, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="(?:/services\/pingback\.aspx)$" type="Subtext.Framework.Tracking.PingBackService, Subtext.Framework" handlerType="Direct"/>

              <HttpHandler pattern="(?:/services\/metablogapi\.aspx)$" type="Subtext.Framework.XmlRpc.MetaWeblog, Subtext.Framework" handlerType="Direct"/>

 

              <HttpHandler pattern="(?:/default\.aspx)$" controls="homepage.ascx"/>

              <HttpHandler pattern="(?:/[\d\w]+\.aspx)$" type="UrlRewriting.For.DasBlog.FromDasBlogToSubText, UrlRewriting.For.DasBlog"  handlerType="Factory" />

 

              <!-- this needs to be last -->

        <HttpHandler pattern="(?:((\/\/default\.aspx)?|(\/\/?))?)$" controls="homepage.ascx"/>

       </HttpHandlers>

</HandlerConfiguration>

 


And you can find the code for the actual URL mapping here.
It is simple, it works, and that is all I care about.

If you haven't guess, this migration was NOT a walk in the park. I hope that I got all the kinks out, but let me know if there is anything that I missed.