LMNL data model

From LMNLWiki

This page describes the abstract LMNL data model. It's subject to revision (but we need to get it finalised). If you want a concrete model with mutators and all, see LOM instead.

A LMNL data model is a collection of model objects of the following six types:


A Document has two properties:

Two Documents are equal iff their properties are equal.


A Name has two properties:

Two Names are equal iff their properties are equal.


A Limen (pl. Limina) has three properties:

The ranges property is unordered.

Two Limina are equal iff they are the same Limen.

In a data model derived directly from a document in LMNL syntax, there is one Limen for the Document and one for each Annotation.

Additional Limina may be created; if Limen A is owned by Limen B, then the content property of Limen A is derived from the ranges property of Limen B by applying a selection function that picks out a subset of the ranges and an ordering function that gives them an order. These functions are specific to Limen A and need not be the same as those used for any other Limen.


A Range has five properties:

The start and end of a range are constrained to be between zero (indicating the point before the first Atom in its owner's content) and the length of the owner's content (indicating the point after the last Atom in that content). The end of a Range must be greater than or equal to the start of that Range.

The annotations of a range appear in the same order in which they appear in the document, without distinction between annotations in the start-tag and annotations in the end-tag.

Two Ranges are equal iff all their properties are equal.

The length of a Range is its end minus its start. The value of a Range is the sequence of Atoms in the content of the owner of the Range falling between the start and end points. These are derived properties.

See also Range relationships.


An Annotation has four properties:

Two Annotations are equal iff they are the same Annotation.


An Atom has three properties:

If an Atom has the name lmnl:char and contains a single Annotation named codepoint, then the value of that annotation is interpreted as an integer expressed in hexadecimal digits, and the Atom represents a Unicode character with that codepoint. Note that these digits are themselves Atoms, and so this model contains an infinite regress. The practical LOM API avoids this problem.

All other Atoms have application-defined meaning.

Two Atoms are equal iff their properties are equal.