TikiWiki and Oracle

Well, TikiWiki claims to support Oracle … great! So I’ll install it, and try it.

(insert comedy failure noise here)

The installation doesn’t work! Mostly because you can only name oracle things with names of less than thirty characters in length, and this product doesn’t respect that when installing on Oracle, so action is needed.

Here’s the file of corrected statements I ran to get all the tables created successfully and also reinstate triggers and indexes that failed (I’m not promising its perfect). Where I needed to modify a correlating php page, that’s documented as well. I hope this helps someone in the future – me, next time I need to do this, perhaps?

tiki_installation_corrections.txt

crawlers and spiders

Its monday morning, and I’ve just reviewed the suggestions for storing information for the team that I put forward earlier. My boss is going to go with me on DokuWiki but for some reason the lack of database-backend is making him nervous. The search functionality is currently absolutely fine but that’s with 50 docs and we might need to handle 5000. We need a spider.

The thing with DokuWiki is that it stores its information in files, which is fine because it is a series of pages, or documents, and that’s what file systems were invented for! However if you want to look for a particular word or phrase then you will need to open and close each one of those documents … and that’s slow. So I’m looking for a thing of some kind which will index my information out of those files (not choking on the markup) while I’m not looking, and then deliver very fast search results.

I haven’t got anything working yet so this is kind of theoretical and I’ll come back and update this when I get a solution in place, but here’s the current shortlist.

ZSearch from the Zend Framework
Except it needs PHP5 and we’re running PHP4. Not sure whether I should try to work with it or what.

Xapian

I’ve come across http://www.xapian.org/ which looks promising … except I’m working on Windows and I’ll have to compile stuff, and the IT proxy isn’t working and the main one won’t let me download executables. Back to some real work and save this project for another day!

EDIT you can read the follow-up articles here and here

Thoughts of Wikis

I’m implementing a new information-keeping system where I work, and trying to find something that will fit in with a number of requirements. Here’s a quick summary of the task and how I’m getting along:

Requirements

  • Allow text
  • Allow attachments (files and pictures)
  • integrate with existing extranet signon
  • re-brand to match extranet
  • allow conversion of existing files from
    1. knowledgebase
    2. dokuwiki
    3. html pages
  • fine-grained access control for groups of users – it, programmers, symphony users, customers, etc
  • consideration for scaling of solution
  • lowest possible effort needed to edit/add info

Preferences

  • Oracle-driven if a database is needed
  • powerful search functionality
  • ideally free!

Products

TikiWiki

The only PHP-driven Oracle-backed product on the market. TikiWiki is relatively straightforward to install. It is very complex for the purpose as it is a fully-fledged groupware with CMS, the wiki is just one module

pros

  • oracle-driven
  • written in php
  • skinnable
  • support for output in PDF

cons

  • overkill
  • no fine-grained access control
  • no hooks for adding our authentication or interface with existing standards
  • rather buggy under Oracle (especially the installation!)

MediaWiki

Experimental Oracle support apparently – testing with MySQL

MediaWiki is the engine behind slashdot – it is widely used and understood. Traditionally PHP driven there is some support for Oracle however this is not widely used and not really supported by the project developers.

pros

  • widely used product – plenty of community support
  • good search functionality
  • fine-grained access control (hides things you don’t have access to – very nice)
  • LDAP authentication supported

cons

  • horrible markup (not very strict, not block-level, hard to parse or convert from)
  • difficult to convert existing documents
  • standard of Oracle implementation unknown – I can’t get it to install! Likely to be poor and/or patchy

DokuWiki

Simple, text-backed storage. DokuWiki is the first information management system we implemented at Symphony to test the idea. It uses flat file storage, which we saw as a potential storage problem, however I’m seeing examples of people saying its fine up to 40k pages or so. http://www.pmwiki.org/wiki/PmWiki/FlatFileAdvantages

pros

  • lightweight, easy to install and brand
  • text files can be readable with or without frontend
  • simple and clear markup
  • easy to convert documents to it
  • syntax highlight in code blocks

cons

  • poor search functionality, also slow due to the file access. Could use external spider e.g. http://www.phpdig.net
  • potentially poor scalability as the file structure grows

I’ve got a Google Analyltics account

Yay! My invite arrived this morning for google analytics. I’ve got this site and one other all hooked up to it and will wait and see how it all turns out. Both sites are very low traffic but that’s OK.

So far the features are good, I needed a Google account to log in with and can give access to others as well – they also need a google account. I’ve given access to my co-owner of one of the sites to the reports, which was very easy.

I’ll write more about how I get on once I get some statistics to look at.

Textile Knowledge Nuggets

I’ve run into some problems formatting an article about markup, which I was writing in textpattern, which uses markup …. you can see where this is going. Well I learned some new tricks!

escaping from textile

To prevent a block from being processed, just use double equals signs around what you are interested in (don’t know how to show you an example without breaking stuff so I won’t try!)

This is much better than the results from ... or bc. where your code can still get processed.

to make a block style persist

When using a block quote to show lots of lines of code, or verses of a song, use a double dot, like this:

bq..

Then everything else you write

Even if it has line breaks

Will carry on being in that style until you start a new style

p.

such as a paragraph

Many thanks to AllPhilosophy for these! http://allphilosophy.com/home/guide/rich

Apache FOP: formatting objects is fun

I’ve been working on a tricky problem at work this week (and last week as well actually, its been really really tricky in fact), we need to be able to output a form in both PDF (Portable Document Format) and PCL (Printer Control Language) output, because our fax system can only handle PCL format.

Ghostscript

I had a look at using Ghostscript, its been around a while and is widely-used, freely-available and, by all accounts, stable. I had some trouble getting it working initially but I think it would have done the job.

Apache FOP

The Apache Project has a project called FOP (Formatting Objects Project) which is part of their XML Graphics project. Its a module that takes a particular type of XML format called Formatting Objects (now a w3c recommendation and known as xml:fo), a type of XML used to represent a document of information along with information about presentation.

Since xml:fo is a recognised standard, its a great format to choose to implement the conversions to PCL and PDF. Other output versions are also available with more on the way too, so its an application that can be adapted to meet other needs as they arise.

XSL translations

Since xml:fo is a standard and its XML, it should be possible to get any number of XML formats (including Open Office or Word XML) translated into it using an XSL (eXtensible Stylesheet Language). I tried out a couple of these from http://www.antennahouse.com/, however although these worked well with the sample files I found that I had trouble with the resulting xml:fo formats produced from my own xhtml files.

AntennaHouse clearly have a lot of knowledge in this area though, and their site is well worth a visit for background reading on this topic. I suspect that part of the problem was that FOP only has a partial implementation of the xml:fo specification, so although I was feeding it valid xml:fo, it didn’t know what to do with all of it. There is a rewrite in progress so I expect that newer versions will be much more robust.

Final Solution

In the end (since I only wanted a simple one-page form), I settled on writing the xml:fo format by hand, producing really great results in both formats and with images as well. I’ve also been asked to look into programs to generate this output, they’re mostly commercial but if I come across anything interesting I’ll add it here. Apache FOP is a great project and I hope it doesn’t lose it momentum!

The Wool Shop

I have recently moved to Leeds and am slowly starting to find my way around and build up some local knowledge. A very important ingredient in this process is my discovery that there is a great wool shop near where I live. Its so good in fact, that I have to recommend it to anyone who likes wool (or old-fashioned shops!).

The shop is at:

S & D Woodhead
Wool Shop, Wingate Junction, Tong Road, Leeds, West Yorkshire LS12 4NQ
Tel: 0113 263 8383

From the outside it looks quite big, however when you go in the shop part is really small. You have to have a chat with the lady behind the counter about what you want and she will disappear off and come back with some suggestions. There are big sacks of discounted wool, often just a couple of balls, and everyone is welcome to rummage (but be warned that you will get filthy in the process!)

I wasn’t sure what I was looking for and got some good advice while I was there as well as some very patient service. I discovered that they don’t take cards and I didn’t have a lot of cash on me (I’m 25, my generation lives by plastic!), I was sold enough material to start the project and the rest has been put aside with my name of for me to collect when I need it – and by that time I’ll have a good idea of how much of each type of yarn I’ll need as well.