NOTE:
This project is no longer being maintained: it was developed for my masters thesis, which was completed in early 1997. I still, however, welcome any questions or comments that people may have.

[Home] [ToC] [Up] [Prev] [Next]

iHTML Architecture

Introduction

This section provides an overview of the iHTML system, its goals, and basic design. It should be read by anyone wishing to write a language module, incorporate the iHTML system into a browser, or have a basic understanding of the underlying iHTML architecture. It also provides an introduction to many of the concepts in the rest of the manuals.

The iHTML architecture is described by a set of C language header files. These header files define the low-level interface between various components in the system, whose implementations are otherwise hidden (they operate as black boxes).

Architecture Overview

The figure below is a high-level view of the iHTML architecture:

As shown, there are three main components to the system:

iHTML Library: This is the central regulator of the system, which controls the interaction between a web browser and the available back-end language modules. It is the only component that is actually implemented in the system itself. For other components, only the interface is defined; their implementations are left completely to the discretion of the component implementors.
Language Modules: A language module is the implementation of one or more MIME Content-types, which tell the rest of the system how to display or otherwise handle data of that type. There can be many different language modules available, each handling different types of data. The system dynamically loads these modules as needed. A language module is very similar to a traditional Netscape-style plug-in, except that its interface to the rest of the browser has been extended to support various types of client-side scripting.
Browser: This component is the traditional Web browser. The iHTML architecture defines a standard interface between it and the rest of the system, but otherwise treats it as a black box; thus the "Browser-iHTML Glue" division represents the code used to map between this black-box implementation and the interface defined by iHTML. This includes both the functions that must be supplied by the browser component and the calls it must make into the iHTML Library component in order for the system to operate correctly.

The arrows in the figure represent the various possible ways that control flow can move through the system. The browser component regulates all interaction with the user, while the iHTML library component provides the public interface to the back-end language modules.

Content Types and Language Modules

The iHTML system must have some way to map between the MIME content type of files it receives and the language module(s) that can understand that data. There are two steps in this mapping: converting a content type into the name of the language that can handle it, and determining where that named language module can be found in the computer's file system.

The system does not directly define where the modules are located. Instead, it specifies that each module must be mapped to a unique name, leaving it up to the browser component to map these abstract names to the underlying file system. This is accomplished through a set of browser component routines (BR_GetBasePath() and BR_ParsePath()) that are called with the name of a language, and return a set of directories that make up that language's search path. When constructing this path, the browser can also take into account other information, such as the machine architecture it is running under (to support multiple architectures in the same file space). The system then looks through these directories for a shared library that is the language module it needs.

The library can also ask for a search path for the 'NULL' language. This is the top-level path, in which all the languages are defined. When first initializing, the library requests this path, and looks for files that contain mappings between language names and content types. There is usually one of these files for each language in the system. Some examples of the files are:

A language module that displays MPEG movies would have a content type mapping file similar to:
```
Name: mpeg
Content-type: video/x-mpeg
Content-type: video/mpeg
```
This tells the browser to look in the path of a language named "mpeg" for a language-module that can display any files encountered with one of the given content types.
Similarly, a language module that displays textual data can be defined as:
```
Name: text
Content-type: text/*
```
Here, the system interprets the "*" as matching anything, the same syntax that is used by metamail.
Last is an example of a language module for a full scripting language:
```
Name: python
Content-type: application/x-ihtml; language=python
```
This definition tells the browser to map all files with a content type of "application/x-ihtml" that also include a parameter named "language" whose value is "python", to the python module.

As a concrete example, the file system organization of iHTML language modules under Unix will usually look something like this:

-rw-------   1 hackbod  grads         64 Sep 30 23:34 python.lang
drwx------   4 hackbod  grads       1024 Sep 30 23:34 python/
-rw-------   1 hackbod  grads       1094 Sep 30 23:34 python/COPYRIGHT.python
drwx------   2 hackbod  grads       1024 Sep 30 23:34 python/hp-ux
-rwx------   1 hackbod  grads    1207916 Sep 30 23:34 python/hp-ux/impl.sl*
drwx------   2 hackbod  grads       1024 Sep 30 23:34 python/aix
-rwx------   1 hackbod  grads    1485342 Sep 30 23:34 python/aix/impl.so*
-rw-------   1 hackbod  grads         32 Sep 30 23:34 text.lang
drwx------   4 hackbod  grads       1024 Sep 30 23:34 text/
-rw-------   1 hackbod  grads       1094 Sep 30 23:34 text/COPYRIGHT.text
drwx------   2 hackbod  grads       1024 Sep 30 23:34 text/hp-ux
-rwx------   1 hackbod  grads     102680 Sep 30 23:34 text/hp-ux/impl.sl*
drwx------   2 hackbod  grads       1024 Sep 30 23:34 text/aix
-rwx------   1 hackbod  grads     134320 Sep 30 23:34 text/aix/impl.so*

This file system has two language modules, python and text, both with implementations (impl.sl and impl.so) available for HP/UX and AIX machines. The files python.lang and text.lang describe the content types these languages can handle.

HTML Syntax

Scripts and plug-ins are associated with a document through the World Wide Web Consortium's recently defined <OBJECT> tag. Its DTD, showing only the attributes implemented by the iHTML system, is as follows:

<!ENTITY % Context "(document|module|applet)">

<!ELEMENT object - - (param | %bodytext)*>
<!ATTLIST object
        %attrs      -- id, class, style, lang, dir --
        classid  %URL     #IMPLIED   -- identifies an implementation --
        data     %URL     #IMPLIED   -- reference to object's data --
        type     CDATA    #IMPLIED   -- Internet media type for data --
        codetype CDATA    #IMPLIED   -- Internet media type for code --
        context  %Context #IMPLIED   -- context object executes in --
        height   %Length  #IMPLIED   -- suggested height --
        width    %Length  #IMPLIED   -- suggested width --
        name     %URL     #IMPLIED   -- submit as part of form --
        >

The data and classid attributes are used together to determine what object to display. The system first looks at the data attribute. If given, the system retrieves the document it references, and uses its type to map it to a language module. If there is no language module to handle that type, or the data attribute is not supplied, the system then moves on to retrieve the classid URL, and execute a language module for it.

This allows an object to be embedded in a document as:

<OBJECT data="myanim.mpg" type="video/mpeg"
        classid="plaympeg.py"
        classtype="application/x-ihtml; language=python">

When encountering this, the system will first retrieve the "myanim.mpg", and try to find a language module to play the MPEG animation. If such a language module does not exist, it will then fall back on retrieving the "plaympeg.py" script, and hand the MPEG data off to that to be displayed.

Note that because of how iHTML unifies traditional "data" and "program" file types, there is very little difference between the use of the data and classid tags, except for the case described above. The data attribute can refer to a file that is actually a script, and the classid attribute can point to a file type that is traditionally considered to be pure data (e.g., an MPEG animation). In either case, the system will still perform the same actions of looking up and executing a back-end language module that handles the given file type.

Finally, iHTML introduces one extensions to the <OBJECT> tag, the context attribute. Its use will be discussed in the next section.

Applets vs. Document Scripts

The iHTML system distinguishes between two classes of client-side scripts:

Applets are essentially the familiar Java-style applet, which appears as a graphical object embedded in a document. An example of writing an <OBJECT> to create such an applet was shown in the previous section. An applet in iHTML can actually take the form of an executed program, or a data type like the traditional plug-in, as iHTML does not distinguish between these two types of files.
Document Scripts are similar in concept to Netscape JavaScript-style programs, in that they execute in the context of an entire document, rather than being embedded inside it. They vary quite a bit from JavaScript, however, in some details.

These scripts appear as an <OBJECT> tag within the header of a document. In addition, the new context attribute is used to identify the tag as referring to an object that should be executed as a document script:
```
<OBJECT context=document classid="watch.py"
        classtype="application/x-ihtml; language=python">
```
This markup works identically to the applet-style object in how the system maps it to a language module. The only difference is that, when it is handed to the language module to be executed, the system marks it as belonging to the entire document, so the module executes it in that context.

There is little difference between a script executed as a document script or as an applet; in both cases, they execute in the same basic language module, have the same general interface to the browser, and can perform many of the same operations. The only real difference is the high-level handle they use to interface with the browser. A document script directly interfaces with the browser document, while an applet is given its own graphical context inside the document through which it interacts.

HTML Parse Trees

In order to support many of the operations expected of document scripts -- particularly the ability to dynamically create document text -- the iHTML system provides an abstract interface to the underlying browser's HTML parse tree representation.

Every HTML document corresponds to a well-defined tree structure. As an example, consider the following document:

<HTML>
  <HEAD>
    <TITLE>Example Document</TITLE>
  </HEAD>
  <BODY>
    <H1>Example Title</H1>
    <P>Example paragraph.</P>
    <HR>
    <ADDRESS>Example address</ADDRESS>
  </BODY>
</HTML>

When parsed into its internal representation, this document becomes a tree of nodes, where <HTML> is the root of the tree. A standard representation of such a document is:

Or, in the more familiar top-down tree form, it would appear as:

This tree structure is visible to the iHTML system through a black-box data type called the HTMLNode and a set of functions for manipulating it. This provides a well-defined representation of the document, allowing a document script and other programs executing in the system not only to dynamically construct a document by creating new nodes, but also to go back and examine and modify existing HTML documents.

Symbol Names

There are three main classes of symbols that iHTML defines in the global C language name space:

Browser Component: The interface to the browser is defined through a set of functions and types that are prefixed with "BR_" and "IH", respectively. Examples are "BR_Reformat()" and "IHWidgetRep". (Note, however, that there are a few exceptions in the type names, so watch out.)
Library Component: The interface to the iHTML library is defined through a set of functions and types that are prefixed with "IH_" and "IH", respectively. In addition, actual structure names use the prefix "ih_". Examples are "IH_AllocBuffer()", "IHModuleInfo", and "ih_module_info_rec".
Language Component: The types used by a language are prefixed with "IH". The functional interface to the language modules is through a structure, which has its own name space; these function names do not use any prefix.

[Home] [ToC] [Up] [Prev] [Next]

_________.oo_Q_Q_oo.____________________________________________

Dianne Kyra Hackborn <hackbod@angryredplanet.com>
Last modified: Sun Oct 27 19:39:19 PST 1996

This web page and all material contained herein is Copyright (c) 1997 Dianne Hackborn, unless otherwise noted. All rights reserved.