Ableverse™
Platform 6.0

Package av.msg

This package defines Ableverse Message Facility .

See:
          Description

Class Summary
Message Message represents compiled message definitions, and are used to produce text results according to particular argument values.
Messager This is the primarily interface of Ableverse Message Facility.
MsgManager Manages Messager instances.
 

Exception Summary
MsgException Thrown if any message related error occurs.
 

Annotation Types Summary
Msg Denotes that a final String variable is a Message Definition together with its doc comment (can be omitted when referencing another so defined msg) and constant value.
 

Package av.msg Description

This package defines Ableverse Message Facility .

Internationalization (abbr. i18n), also known as Native Language Support (NLS), Localization (abbr. L10n) and Logging are two big topics across modern computer software developments. They are subtlely related to each other but almost all present platforms/libraries today have them separately supported, a developer has to combine the separated APIs together himself to produce internationalized log records.

NLS enabled components are particularly appreciated (that's an open-source-correct term for 'required by corporate end-users' :-) for tooling and middleware components.

NLS internationalization SHOULD be strongly considered for used for fatal, error, warn, and info messages. It is generally considered optional for debug and trace messages.

Following the Programming by Nature aiming of Ableverse Platform, Message Facility is right such an attempt to compound these two infrastructures in a natural way.


Internationalization

Unfortunately, computers were not initially invented for the whole world, nor were the inventors aware of the potential spreading of computer systems world widely nowadays. Under this heritage, almost all general purpose programming languages, used to control computer activities, were designed to store and use textual resources staticly, moreover, many of them can not natively handle text in multiple encodings correctly. Being locale sensitive is entirely an addon to a programming language with libraries developed for it. Among widely used ones, Java™ may be the most i18n ready platform, it represents strings in 16-bit Unicode encoding natively, accepts source files in almost all standard encodings, supports non-ASCII identifiers, and stores string constants in UTF-8 -- the most embrancing encoding ever standardized. Java has a standardized internationalization mechanism and methodology, which has been widely used.

Localization is supported at the most basic level by the ResourceBundle class, which provides access to locale specific objects, including strings.

But a pitty, this basic level support from ResourceBundle is not as perfect enough from today's view. Seldom if ever today's developers are still using class based ResourceBundles, instead, properties files based PropertyResourceBundles are more commonly used for easy localization, there are even proprietary mechanisms developed for i18n and configuration that read properties files.

While, yes, the properties files are tolerant enough for any Unicode compatible encodings, as it stores non-ASCII characters in \\uXXXX notation, so also portable across platforms, but direct editing these like files had been daunting every java beginners assigned such jobs. Even many senior java engineers don't know the existence of native2ascii utility shipped with a standard JDK, which is the official tool to convert between natively editable text files and the properties files format.

And as computer softwares got involved in the globalization procedure, engineers found it turns incorrect that simply form a sentence by catenating a sequence of localized words from ResourceBundles, many other languages in different cultures appear to have their own word orders sometimes greatly differ from English. To solve this problem, MessageFormat was introduced to java, which provides a means to produce concatenated messages in language-neutral way. Did it work well?

The founder of Ableverse is a Chinese man, who had seen pretty enough poorly translated messages in American made softwares those Localized for Chinese market. Why this is still happening?

The key is an ignored fact: Translators work in the L10n team need some basic communications with programmers in the coding team who write the code that consumes i18n resources commited for translation. But in the realworld, these two teams of authors of a software are completely cut off by team organizations from space and by software development processes from time. And the worst is in legacy i18n architecture, there is not any bit hint for a word to be translated can be transmited from the programmer to the translator who is uncertain which is the exact meaning of that word in that place among its many totaly different meanings. At most, the translator might look at an over-all guidance documentary sumarrized for the entire project, but it just tells a tiny little. For example, for the word manual, who can tell whether it means a reference document or just means by hand, without any particular clue? This situation doesn't mean there is no way to make perfect localizations for a software, but no justice to blame a translator for his imperfect work, therefore no way to guarantee the quality of translations. The result is translators are implicitly permited to commit blind translations for uncertain words, and a final collation phase has to be set for corrections, this solution not only requires extra labour costs, but performs poorly.

When communication is mentioned here, it doesn't necessarily mean in-person talks or interactive chattings or other means, it means practical, effecient and offical ways to transmit translation related information. In fact, localization phase is normally the late most phase just before final production test, at that time programmers may have already been assigned jobs from other projects, seldom programmers would get pleased if concerned with previously released code he'd writen two monthes ago.

However, we have a best exemplar here, that is: the javadoc mechanism, it is really a great invention. Just like BLOG turns millions of net surfers into active authors, javadoc turns most of the java coders, even the most impatient ones into happy doc writers. No context switch needed, rich but free forms of writing, in-place reference to the implementing code, and compiler supports to check the validity and completeness. What's more, the automatically generated html javadocs are in an excellent structure that corss linked. All these easy to write, code related doc segments are finally collected and built into priding, great technical documents. Attraction improves better than enforcement.

As books are great ways of communications from the authors to readers, javadoc is such a great way of communications from library developers to API users. Inspired with the same idea, a new communication way from the java developers to the message resource translators was believed to be carried out. And the carrier is Ableverse Message Facility.

In Ableverse Message Facility, an atomic unit of localization is a message declaration, in form of a final String variable annotated with Msg, and initialized to a constant string value. Most messages are message definitions themselves, which should have a javadoc doc comment associated with each for content. And some messages are just references to other messages defined otherwhere, in this case they don't have doc comments, and their constant values are usually initialized by referencing the defining variables of the referenced message. To get the message content, an invocation with the variable as msgID to Messager.format(Locale, String, Object[]) will return a locale specific string formated with the given arguments.

When java source files are compiled, Ableverse will extract these messages to corresponding XML files. And localizers then will be able to take the XML files as input, to produce locale specific XML files containing the same message definitions but translated. Finally, at runtime Messager will read these XML files and serve message contents in a locale sensitive way. Because the extraction is executed automatically and implicitly, programmers can just focus on his coding, writing the message declarations and use them, all done inside his java source code. Before any translation, his program just runs well displaying the messages declared in his source code. He doesn't need to concern himself with the XML files generated.

But implicitly and explicitly, he'd passed many important information to the translators by defining and using a message in this way.

The implicit information is the definition and reference position. As extracting message declarations from java source code, Ableverse also records the position in the source code where they are defined and referenced. A programmer is strongly encouraged to define and reference a messag as close as possible to where it is used, to make the position info precise.

But as a standard came with Java 6.0, JSR 269: Pluggable Annotation Processing API does NOT model local variables yet. Even worse, it doesn't provide means to obtain line number information of Java elements. Due to its limitations, the @Msg Annotation Processor depends on SUN's implementation of javac (the Compiler Tree API in fact), and a message has to be defined as a final String field of the class. We are looking forward JSR 269 and its successors will encount local variables as well, at least provide line number information in a pure Java way, such we can produce more meanful source position information for localizable messages.

However, Ableverse Platform provides an open source, but not supported patch to SUN javac, which can be downloaded from http://www.ableverse.org/opensource/javac6patch.zip . With this patch, you will be able to define localizable messages with final String local variables, with the limitation that their location and msgID must be literal string constants (can't be constant references). But this patch is not officially supported, use it for your own sake at your own risk.

While MessageFormat uses notations like {0} {1} to identify arguments, Ableverse Message Facility identifies arguments by name. It is far more expressive than MessageFormat, plus each message argument can have a default value and detailed description.

The main explicit information passed with a message definition is the argument list with optional default values and detailed descriptions. Like parameters information in the doc comment for a java method, message definitions by convention have argument descriptions, through doc tag \@arg.

And more words as many as the programmer want to say to the translators about translation of the very message content can be writen as extra tagged descriptions. By convention, there can be a \@usage tag, with detailed description of how this message will be used and predicted translation issues. But other tags are also allowed. In fact, in the corresponding XML file, all tags other than \@arg will be mapped to a child element node of the message's.

See Declaration Format for more format information.

XML files can be edited and stored in any valid encoding, only if properly specified in the file header, so a careful translator can just use a plain text editor like Windows™ Notepad or vi to well accomplish the translation work. Providing many mature XML editor softwares are already on the market, developing an integrated translation system is also much easier upon standard XML schemas. By all means doing the translation work, the rich information about messages are always available to a translator, most usually sufficient for him to provide proper translations. Seldom if ever he stuck around a strange message, he can send a query to the programmer providing the position information, the programmer is no pain to locate the message usage by source file name and the line number, and easily he could remind about the message meaning and return it to the uncertain translator.

At last, the hierarchical i18n resource lookups of Messager is far more flexible than that of ResourceBundle's. While you can only set a single parent bundle for a ResourceBundle, Messager inherits the parent and near/far super bundles from MetaBundle, and text resource lookups follow the same searching path of MetaBundle's.

Logging

Since JRE 1.4 on, a new set of Logging APIs was introduced in package java.util.logging. While this is a brand new one and commonly considered more 'standard', the most mature and more widely used logging framework is log4j.

But both these logging frameworks are weak in file based configuration, typically there can be only one configuration file across an entire JVM. This is no problem for standalone desktop applications, but in a J2EE environment, it means if an application server plans to export the logging service to its applications, at the same time it has to define its own mechanism for the applications to set their logging configurations. An application programmer learnt how to configure the loggers, but oops?! Those do NOT apply to his web application, yet he has to dig into the application server's document or even source code to see how to make his debug log appear. When different deployers/administrators happen to be tuning different webapps in a same server, unwarely, they might be overwriting eachother's modification to the only config file for global logging control. Even a popular framework like Jakarta Commons Logging clearly declaims that "JCL provides only a bridge for writing log messages. It does not (and will not) support any sort of configuration API for the underlying logging system".

Yes, any logging framework provides its really flexible configuration through APIs. But logging configuration appears more like a matter of the deployers or administrators, who themselves are not programmers, they are reasonably hating writing a single line of java code to manage a running application server. Configurating logging options should be operations rather than programming.

But why is it necessary to limit all loggers' configuration to a single file? The answer may be they don't have a hierarchical configuration system. Yet Ableverse Meta Facility is just the right mechanism, and its hierarchy can just perfectly map to named logger hierarchy. This is the initial motivation to add logging support to the Message Facility. Therefore logging options are just configured by the Meta Facility, and naturally, the configurations are inheritable along the logger hierarchy. A particular logger can have its own logging configuraion defined in its meta resource file or just inherit its parent's configuration. It also benefits from the hot reloadable feature of Meta Facility in a long-run application server. See Logging Configuration for more information.

After logging service is stuck to the Messager, logs get automatially and naturally i18n ready, and what's more suprising, the necessary code guards used to avoid logging performance drawbacks appears obsoleted, so your code can get neater and less bug potential (since the if clause and braces are omitted), at the same time performance drawbacks are equally effectively avoided.

Code guards are typically used to guard code that only needs to execute in support of logging, that otherwise introduces undesirable runtime overhead in the general case (logging disabled). Examples are multiple parameters, or expressions (e.g. string + " more") for parameters. Use the guard methods of the form log.is() to verify that logging should be performed, before incurring the overhead of the logging method call. Yes, the logging methods will perform the same check, but only after resolving parameters.

Logging with a Messager latens the content construction to message formatting, which will only be performed after the logging support is confirmed. But you need to follow a convention that not to calculate complex message arguments for logging. Instead, your calculation could be moved to the message definition, which allows dynamic expressions in a scripting language in the Message Content Declaration Format.

What's more, if you need absolutely eliminate the logging overhead but don't want to recompile your source code as seperated debug version and release version, you are lucky on a 1.4 or later JVM, since from 1.4, the assertion mechanism is introduced, and you can write some piece of your code as assertion statements, so that they will not get loaded in to JVM unless assertion is enabled at the folding class. To help you benefit from this mechanism, all logging methods of Messager constantly return true to silently pass the assertion when enabled. So you can write code like:

       assert msgr.finest("This message may consume much time to construct.");
 
where the logging logic will even not get loaded when assertion is disabled at your class (assertion is disabled by default at normal execution). This will effectively get rid of the logging overhead for production execution while logging is still available without recompile the program.

And to be maximum standard compliant, logging support of Message Facility stands on the base of JRE1.4 logging infrasture. Only the Logger is substituted by Messager and LogManager is substituted by MsgManager, the rest under components ( Handler, Formatter, LogRecord) are just the same of JRE1.4 logging. But this also means less rich features than log4j has, the positive aspect is less denpendency. Anyway, maturity will increase under the force of SUN. See Logging Configuration for how to gear Messagers with JRE1.4 logging handlers and formatters.


Ableverse™
Platform 6.0

Copyright© 2006 Ableverse Platform. All rights reserved.