Wednesday, January 13, 2010

ElixirFM 1.1 Update + Wiki + API

The ElixirFM Functional Arabic Morphology project has released an update of its libraries, executables, data, and documentation at SourceForge.

The current version 1.1.927 includes important improvements in the performance of the system and comes with enhanced user and programming interfaces. Next to the ElixirFM Online Interface, the project also features:

ElixirFM Wiki
documentation for the project has been set up, which now brings notable information for the computational linguists and interested developers who would like to explore the ElixirFM system more deeply and use it in their applications
ElixirFM API
there is a powerful ElixirFM programming interface for Perl which allows you to invoke the elixir executable from your code and further parse and process the results easily

The ElixirFM lexicon has been extended and refined, and a number of words have been encoded in a way that makes their deep word structure more explicit. The sources of the lexicon plus the editing software are available freely upon request.

ElixirFM now operates more smoothly in all its modes. In particular, the resolve mode involves solution pruning and its morphological analyses now comply with most linguistic constraints. Likewise, the online inflect and derive modes have been integrated with lookup, due to which word form generation becomes much more intuitive and yet more enjoyable.

ElixirFM is published under the GNU General Public License GNU GPL 3. Everyone is welcome to participate in this project!

Tuesday, March 3, 2009

ElixirFM 1.1 Online Interface

In the recent months, the ElixirFM project has undergone considerable improvement in various respects. We have worked most on developing the programming library and on refining the lexicon. On top of these essential components, we have built a user-friendly web application, the ElixirFM 1.1 Online Interface.

ElixirFM is a computational model of the morphology of Modern Written Arabic. It provides the user with four different modes of operation, in addition to the unique lexical resource and the other open-source functions of the implementation.

Resolve
provides tokenization and morphological analysis of the inserted text, even if you omit some symbols or do not spell everything correctly. You can experiment with entering the text not only in the original script and orthography, but also in other notations, including a purely phonetic transcription.
Inflect
lets you inflect words into the forms required by context. You only need to define the grammatical parameters of the expected word forms. You can either enter natural language descriptions, or you can specify the parameters using the positional morphological tags.
Derive
lets you derive words of similar meaning but different grammatical category. You only need to tell the desired grammatical categories, using either natural language descriptions, or the positional morphological tags.
Lookup
can lookup lexical entries by the citation form and nests of entries by the root. You can even search the dictionary using English.

The online interface includes example queries for each of the modes. It further incorporates several interactive tools to facilitate the browsing of the results returned by the system.

Information on the programming libraries and the research context of the project is in part available in our papers. Yet, we would like to extend the documentation according to the requirements of the users, and would be happy to discuss any unclear issues with anyone interested.

ElixirFM is published under the GNU General Public License GNU GPL 3. Everyone is welcome to participate in this project!

Enjoy ... and let us know in case of questions or comments :)

Wednesday, July 9, 2008

SourceForge Projects

The SourceForge open-source software repository offers a number of projects related to computational processing of Arabic:

ElixirFM
High-level implementation of Functional Arabic Morphology
Encode Arabic
Implementations for encodings of Arabic, in Haskell and Perl
AraMorph
Buckwalter Arabic morphological analyzer
Arabic WordNet
Multi-lingual concept dictionary mapping word senses in Arabic to those in the English Princeton WordNet
Sarf
Arabic morphology system that can generate and inflect Arabic verbs, derivative nouns, and gerunds
Arabic Spellchecker Word Lists
Arabic word list for spell checkers

Users can register with SourceForge and subscribe to the monitoring service of every project, in order to receive notifications of new updates.

Friday, May 2, 2008

A Word on the Million Words

Work on the new PADT 2.0 is now in progress. The recent developments are described in our submission to the LREC 2008 Workshop on Arabic & Local Languages:

Prague Arabic Dependency Treebank: A Word on the Million Words
[paper]

According to the paper, the expected contents of PADT 2.0 will include these annotations:

PADT 2.0 Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes
Total 1,095,610 1,281,858 1,001,908 30,894 merged annotations
Prague 328,240 383,482 282,252 30,894 original annotations
Penn 767,370 898,376 719,656 converted annotations
Prague Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes
AEP 99,360 116,717 116,717 9,690 Arabic English Parallel News
EAT 48,371 55,097 55,097 13,934 English-Arabic Treebank
ASB 11,881 14,254 14,254 Arabic Gigaword
NHR 21,445 25,329 12,613 Arabic Gigaword
HYT 85,683 100,537 41,855 5,228 Arabic Gigaword
XIN 61,500 71,548 41,716 2,042 Arabic Gigaword
Penn Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes
1v3 151,546 172,386 172,386 Penn Arabic Treebank 1v3
2v2 141,515 161,217 161,217 Penn Arabic Treebank 2v2
3v2 335,250 394,466 394,466 Penn Arabic Treebank 3v2
4v1 149,784 178,720 Penn Arabic Treebank 4v1

Your suggestions and comments are very welcome. Thank you.

Friday, October 12, 2007

Resolve Online Interface

The resolve function of ElixirFM has been made accessible via this online interface.

You can enter Arabic words in various notations, including the genuine orthography and the most popular transliterations. Symbols for vowels or diacritics can be omitted. The words will be analyzed as to their inflectional features as well as morphological structure.

Example requests are provided.

Enjoy ... and let us know in case of questions or comments :)