This is the first of two introductory posts about my project supported by my Shuttleworth Flash Grant. Rationale and motivation here, vague/concrete plan next.
Mathematics or not, I am convinced of the value of capturing scholarly documents in a structured form. If you wish to have a variety of outputs for your writing, such as print, PDF, HTML web pages, and e-books, then it is a must.
Consider two very different scenarios. First, when you place a bold typeface to a word, what do you really mean? Is it emphasis, or perhaps a defined term? Especially in an electronic medium (see output formats in the previous paragraph), you might wish to handle the two situations differently. The emphasis could be bold in print, or colored red in an electronic version. The defined term might be a hyperlink or pop-up window or knowl in the electronic version. You might begin to see that authoring in a “word processor” is the exact opposite of what I am advocating.
Second, suppose you write in LaTeX. How do you know a subsubsection has ended? Only when you encounter the start of another subsubsection, or the start of a new subsection, or the start of a new section, or the start of the chapter exercises, or the start of a new chapter, or the start of a new part (a major subdivision of a book), or the end of the book. It can be a nightmare to process all these conditions in an automated way (see output formats in the previous paragraph), when LaTeX does not require a marker for the end of subdivisions like this. Many people like to say LaTeX separates structure from presentation, but you can very quickly see this is a false promise, and is routinely violated by authors.
XML (eXtensible Markup Language) is an extremely simple language for describing the structure of text. It is hierarchical (tree-like structure), and a begin is always accompanied by an end, like any structured programming language. The vocabulary (“elements” and “attributes”) can be anything you like. The output can be any format. For example, you can create LaTeX as output (and then use standard programs to convert to PDF). The power comes through the transformation language XSL (eXtensible Stylesheet Language) which is often called XSLT (T=Transform). This is not an easy language to understand or learn. But an author need only write with an accepted vocabulary and let an existing transform do the conversion. The hard technical details of the output language can be captured in the transform. A set of elements, their relationships, and the resulting transforms, are together called an XML application. (X)HTML is an example of an XML vocabulary.
Case in point. Summer and fall of 2012, I converted the semi-structured LaTeX source of my linear algebra textbook to an ad-hoc set of tags. The resulting XSL transformation produced the online version. None of this is very reusable, and I have not released the transforms publicly. They should not interest most authors. The source should be of interest (it is not perfect, but is reasonable) and the outputs, such as the online and print versions, should be the demonstration of the power of XML source converting to two very different formats via an XSL transform.
So I see real utility in creating an XML application for authors of writings about mathematics, with a special interest to allow writing about compuational tools, specifically Sage. An author could adopt this approach, but any mathematician who can write in LaTeX already has the mindset to use a markup lanuguage rather than a word processor. More in the next post about design requirements and my plans. If you cannot wait, head over to the project page.