[Home] [ToC] [Up] [Prev] [Next]
This section provides an overview of the iHTML system, its goals, and basic design. It should be read by anyone wishing to write a language module, incorporate the iHTML system into a browser, or have a basic understanding of the underlying iHTML architecture. It also provides an introduction to many of the concepts in the rest of the manuals.
The iHTML architecture is described by a set of C language header files. These header files define the low-level interface between various components in the system, whose implementations are otherwise hidden (they operate as black boxes).
The figure below is a high-level view of the iHTML architecture:
As shown, there are three main components to the system:
The arrows in the figure represent the various possible ways that control flow can move through the system. The browser component regulates all interaction with the user, while the iHTML library component provides the public interface to the back-end language modules.
The iHTML system must have some way to map between the MIME content type of files it receives and the language module(s) that can understand that data. There are two steps in this mapping: converting a content type into the name of the language that can handle it, and determining where that named language module can be found in the computer's file system.
The system does not directly define where the modules are located. Instead, it specifies that each module must be mapped to a unique name, leaving it up to the browser component to map these abstract names to the underlying file system. This is accomplished through a set of browser component routines (BR_GetBasePath() and BR_ParsePath()) that are called with the name of a language, and return a set of directories that make up that language's search path. When constructing this path, the browser can also take into account other information, such as the machine architecture it is running under (to support multiple architectures in the same file space). The system then looks through these directories for a shared library that is the language module it needs.
The library can also ask for a search path for the 'NULL' language. This is the top-level path, in which all the languages are defined. When first initializing, the library requests this path, and looks for files that contain mappings between language names and content types. There is usually one of these files for each language in the system. Some examples of the files are:
A language module that displays MPEG movies would have a content type mapping file similar to:
Name: mpeg Content-type: video/x-mpeg Content-type: video/mpeg
This tells the browser to look in the path of a language named "mpeg" for a language-module that can display any files encountered with one of the given content types.
Similarly, a language module that displays textual data can be defined as:
Name: text Content-type: text/*
Here, the system interprets the "*" as matching anything, the
same syntax that is used by metamail
.
Last is an example of a language module for a full scripting language:
Name: python Content-type: application/x-ihtml; language=python
This definition tells the browser to map all files with a content
type of "application/x-ihtml"
that also include a
parameter named "language"
whose value is
"python"
, to the python module.
As a concrete example, the file system organization of iHTML language modules under Unix will usually look something like this:
-rw------- 1 hackbod grads 64 Sep 30 23:34 python.lang drwx------ 4 hackbod grads 1024 Sep 30 23:34 python/ -rw------- 1 hackbod grads 1094 Sep 30 23:34 python/COPYRIGHT.python drwx------ 2 hackbod grads 1024 Sep 30 23:34 python/hp-ux -rwx------ 1 hackbod grads 1207916 Sep 30 23:34 python/hp-ux/impl.sl* drwx------ 2 hackbod grads 1024 Sep 30 23:34 python/aix -rwx------ 1 hackbod grads 1485342 Sep 30 23:34 python/aix/impl.so* -rw------- 1 hackbod grads 32 Sep 30 23:34 text.lang drwx------ 4 hackbod grads 1024 Sep 30 23:34 text/ -rw------- 1 hackbod grads 1094 Sep 30 23:34 text/COPYRIGHT.text drwx------ 2 hackbod grads 1024 Sep 30 23:34 text/hp-ux -rwx------ 1 hackbod grads 102680 Sep 30 23:34 text/hp-ux/impl.sl* drwx------ 2 hackbod grads 1024 Sep 30 23:34 text/aix -rwx------ 1 hackbod grads 134320 Sep 30 23:34 text/aix/impl.so*
This file system has two language modules, python and
text, both with implementations
(impl.sl
and impl.so
) available for HP/UX
and AIX machines. The files python.lang
and
text.lang
describe the content types these languages
can handle.
Scripts and plug-ins are associated with a document through the
World Wide Web Consortium's recently defined
<OBJECT>
tag. Its DTD, showing only the
attributes implemented by the iHTML system, is as follows:
<!ENTITY % Context "(document|module|applet)"> <!ELEMENT object - - (param | %bodytext)*> <!ATTLIST object %attrs -- id, class, style, lang, dir -- classid %URL #IMPLIED -- identifies an implementation -- data %URL #IMPLIED -- reference to object's data -- type CDATA #IMPLIED -- Internet media type for data -- codetype CDATA #IMPLIED -- Internet media type for code -- context %Context #IMPLIED -- context object executes in -- height %Length #IMPLIED -- suggested height -- width %Length #IMPLIED -- suggested width -- name %URL #IMPLIED -- submit as part of form -- >
The data
and classid
attributes are used
together to determine what object to display. The system
first looks at the data
attribute. If given, the
system retrieves the document it references, and uses its type to
map it to a language module. If there is no language module to
handle that type, or the data
attribute is not
supplied, the system then moves on to retrieve the
classid
URL, and execute a language module for it.
This allows an object to be embedded in a document as:
<OBJECT data="myanim.mpg" type="video/mpeg" classid="plaympeg.py" classtype="application/x-ihtml; language=python">
When encountering this, the system will first retrieve the "myanim.mpg", and try to find a language module to play the MPEG animation. If such a language module does not exist, it will then fall back on retrieving the "plaympeg.py" script, and hand the MPEG data off to that to be displayed.
Note that because of how iHTML unifies traditional "data" and
"program" file
types, there is very little difference between the use of the
data
and classid
tags, except for the
case described above. The data
attribute can refer to
a file that is actually a script, and the classid
attribute can point to a file type that is traditionally considered
to be pure data (e.g., an MPEG animation). In either case, the
system will still perform the
same actions of looking up and executing a back-end language module
that handles the given file type.
Finally, iHTML introduces one extensions to the
<OBJECT>
tag, the context
attribute. Its use will be discussed in the next section.
The iHTML system distinguishes between two classes of client-side scripts:
Applets are essentially the familiar
Java-style applet, which appears as a graphical object embedded
in a document. An example of writing an
<OBJECT>
to create such an applet was shown
in the previous section. An applet in iHTML can actually take
the form of an executed program, or a data type like the
traditional plug-in, as iHTML does not distinguish between
these two types of files.
Document Scripts are similar in concept to Netscape JavaScript-style programs, in that they execute in the context of an entire document, rather than being embedded inside it. They vary quite a bit from JavaScript, however, in some details.
These scripts appear as an <OBJECT>
tag within the header of a document. In addition, the new
context
attribute is used to identify the tag as
referring to an object that should be executed as a document
script:
<OBJECT context=document classid="watch.py" classtype="application/x-ihtml; language=python">
This markup works identically to the applet-style object in how the system maps it to a language module. The only difference is that, when it is handed to the language module to be executed, the system marks it as belonging to the entire document, so the module executes it in that context.
There is little difference between a script executed as a document script or as an applet; in both cases, they execute in the same basic language module, have the same general interface to the browser, and can perform many of the same operations. The only real difference is the high-level handle they use to interface with the browser. A document script directly interfaces with the browser document, while an applet is given its own graphical context inside the document through which it interacts.
In order to support many of the operations expected of document scripts -- particularly the ability to dynamically create document text -- the iHTML system provides an abstract interface to the underlying browser's HTML parse tree representation.
Every HTML document corresponds to a well-defined tree structure. As an example, consider the following document:
<HTML> <HEAD> <TITLE>Example Document</TITLE> </HEAD> <BODY> <H1>Example Title</H1> <P>Example paragraph.</P> <HR> <ADDRESS>Example address</ADDRESS> </BODY> </HTML>
When parsed into its internal representation, this document
becomes a tree of nodes, where <HTML>
is the
root of the tree. A standard representation of such a document
is:
Or, in the more familiar top-down tree form, it would appear as:
This tree structure is visible to the iHTML system through a black-box data type called the HTMLNode and a set of functions for manipulating it. This provides a well-defined representation of the document, allowing a document script and other programs executing in the system not only to dynamically construct a document by creating new nodes, but also to go back and examine and modify existing HTML documents.
There are three main classes of symbols that iHTML defines in the global C language name space:
BR_
" and
"IH
", respectively. Examples are
"BR_Reformat()
" and "IHWidgetRep
".
(Note, however, that there are a few exceptions in the type
names, so watch out.)
IH_
" and
"IH
", respectively. In addition, actual structure
names use the prefix "ih_
". Examples are
"IH_AllocBuffer()
", "IHModuleInfo
",
and "ih_module_info_rec
".
IH
". The functional interface to the language
modules is through a
structure, which has its own name space; these function names do
not use any prefix.
[Home] [ToC] [Up] [Prev] [Next]
Dianne Kyra Hackborn <hackbod@angryredplanet.com> | Last modified: Sun Oct 27 19:39:19 PST 1996 |