Languages other than en-US or Latin
I've found that Beaumont is usable with Japanese in unicode but there are some annoying problems. In the order of technical complexity (in my guess), they are:
1. URL links look strange. It will look like:
http://xxx.com/page.php?tag=----
This is not very good. I don't mind having to enter English title to use for URL, if necessary...
2. Files with name in Japanese get broken filenames. In many cases the Japanese part just gets removed and sometimes I get names like .jpg, that's right, filenames in Japanese still use .jpg extension for JPEG files. (Did you know that?)
3. Tags. Tags are comma- or space-separated so it should be relatively easy but Japanese phrases are all but ignored.
4. Search. I bet this is a bit tricky because word boundaries are not obvious, but it would be very nice to be able to use search in multiple languages.
1. URL links look strange. It will look like:
http://xxx.com/page.php?tag=----
This is not very good. I don't mind having to enter English title to use for URL, if necessary...
2. Files with name in Japanese get broken filenames. In many cases the Japanese part just gets removed and sometimes I get names like .jpg, that's right, filenames in Japanese still use .jpg extension for JPEG files. (Did you know that?)
3. Tags. Tags are comma- or space-separated so it should be relatively easy but Japanese phrases are all but ignored.
4. Search. I bet this is a bit tricky because word boundaries are not obvious, but it would be very nice to be able to use search in multiple languages.
If you turn on URL rewriting (and install the supplied .htaccess file), URLs will be generated as /p/tag
Yes, I used URL rewriting initially as it is recommended in the siteframe.ini, but I realized that the edit page gets confused, probably because my page titles often used all-Japanese phrases and the page names are -, --, ---, ----, etc. I would greatly appreciate some minimal level of nonenglish support in the future versions...
2) yes, it attempts to replace all non-ASCII characters with "-".
3) Hm, I'm just using the PHP split() function, so maybe that's not multi-language aware.
4) Yeah, at this time, the search table is defined as UTF-8, and it uses MySQL's text search function. I don't know what it would take to get it to work with Japanese.
The lead text followed by a ... is not very predictable if the text is in Japanese. Sometimes the last character gets broken. More often, there is no text and just ... is given to the summary. Is this easy to fix?