From MediaWiki to XWiki part III

  Subscribe
1/22/2008 - Patrick (updated on 11/13/2017)

As mentioned in the previous article, one of the loose ends to clear up is importing into separate spaces. One of the reasons we moved to XWiki was for the multiple namespace support; it would have been a shame to import all of our existing data into one, gigantic namespace.

Importing separate spaces

What we've got after running wikifetch.pl is one directory with all MediaWiki pages in it and all links pointing to the space MySpacePlaceholder.

First, we needed to decide which page goes where. There is no easy way to do this, so we just sorted them out manually, moving the page files to different directories. For example, all development-related stuff went into a directory named development, whereas all sales-related files went to the sales directory and so on.

We could now go ahead and import the directories one by one if it weren't for the backlinks. Say you've got a link from Main.WebHome to Development.Development. How would the import script know that Development is now located in the Development space?

We modified import.groovy to resolve the back links using a copy of the pages to be imported (the originals will get deleted as mentioned in part II).

First we'll need a hash map of the spaces:

[...]
def fileSpaces = [:];

new File( "C:/temp/copy_wiki" ).eachFileRecurse() {
  f->
  parentDir = f.parentFile.name;
  fileSpaces[ f.name ] = parentDir + "." + f.name;
}
[...]

What we've got now is a hash map where we can look up a "Space.Page" for a given "Page".

This we'll need in the main import loop:

[...]
		  fileAsText = f.getText();
		  
		  fileSpaces.each { 
				pageName, newName ->
		  		    fileAsText = fileAsText.replaceAll( "MySpacePlaceholder." + pageName, newName );
		  };
[...]

Which replaces all "MySpacePlaceholder.Page" with "Space.Page" where "Space" is the correct space-name. This could be optimized as it does a search replace n^2 times but for a one-time import this shouldn't matter much -- it wasn't much of a problem with our few hundred pages.

Now we're free to start the import, one directory at the time. The new version of import.groovy can be downloaded here.

Problems

There are still some unresolved problems regarding the export/import chain. One is that some pages ended up with garbled pre sections (the handling in XWiki is somewhat "unusual" by having no closing tags). Another one is that we haven't set the page's title automatically. This could have been the first h1 or h2 on the page. But apart from these problems and some special characters, it "worked" for us and we've since happily moved on with our XWiki.

Sign up for our Newsletter