NOTE:
This project is no longer being maintained: it was developed for my masters thesis, which was completed in early 1997. I still, however, welcome any questions or comments that people may have.

[Home] [ToC] [Up] [Prev] [Next]


iHTML Browser Services

HTML Parse Trees

iHTML defines a type and set of functions for accessing a browser's internal parse tree representation. This allows language modules to not only dynamically construct HTML documents, but to also examine and modify existing documents.

A browser must implement these functions. At the very least, they are used by the iHTML Library component to parse <OBJECT> markup: the browser hands the library the node for each of these tags that it encounters, and the library then uses the browser's parse tree services to examine that data and determine how to handle it.


Types

HTMLNode
Synopsis
A single node in the browser's internal representation of an HTML document's parse tree.
Definition
void* (This is an opaque type that is defined by the browser's internal implementation.)
See Also
IHMarkup

The HTMLNode is the object through which other components in the iHTML system interact with a browser's internal HTML parse tree. It represents a single node in that tree. E.g., a tag, block of text, comment, etc. All interaction with the parse tree is done by calling functions with the opaque pointers to these nodes.


Functions

BR_AllocMarkup
Synopsis
HTMLNode BR_AllocMarkup(IHLanguage* lang, IHMarkup owner, const void* type, const char* attrs)
Arguments
(IHLanguage*) lang
The language that owns this markup node.
(IHMarkup) owner
The language object that owns this markup node.
(const void*) type
A tag ID representing the type of node to create.
(const char*) attrs
A string of attributes that will be parsed and associated with the node.
Return
(HTMLNode) A newly created browser-side markup node.
See Also
HTMLNode

Allocates a new node with the given type and attributes. If the type is "<text" or "<comment", the attrs argument is interpreted as the new node's text.

BR_ChildMarkup
Synopsis
HTMLNode BR_ChildMarkup(HTMLNode current)
Arguments
(HTMLNode) current
A node in an HTML parse tree.
Return
(HTMLNode) The node that is the first child to current in its tree.
See Also
HTMLNode

Returns the node that is the child to the given node in the tree, or NULL if current is a leaf in the tree. This essentially traverses deeper into the parse tree.

BR_CutMarkup
Synopsis
HTMLNode BR_CutMarkup(HTMLNode pos, HTMLNode end)
Arguments
(HTMLNode) pos
The first node to cut.
(HTMLNode) end
The last node to cut.
Return
(HTMLNode) The node that has replaced the first cut node.
See Also
HTMLNode

Deletes markup nodes from pos to end, and all of their child nodes. If end is NULL, deletes all nodes from pos to the last node in this branch. Returns the markup node that has replaced these in the tree, i.e. the one that is the next after pos.

BR_FirstMarkup
Synopsis
HTMLNode BR_FirstMarkup(HTMLNode current)
Arguments
(HTMLNode) current
A node in an HTML parse tree.
Return
(HTMLNode) The very top/first node in current's tree.
See Also
HTMLNode

Returns the absolute first HTMLNode in the given tree, i.e. the root of the tree, which should always be <HTML>

BR_FreeMarkup
Synopsis
void BR_FreeMarkup(HTMLNode root)
Arguments
(HTMLNode) root
Root of a tree of HTMLNodes to deallocate.
Return
nothing.
See Also
HTMLNode

Deallocates a tree of HTMLNode objects, being sure to remove all references within it and generally keep everything synchronized with any language-side objects associated with the nodes. The only nodes actually deallocated are the ones owned by the same IHMarkup object as the given root node. However, all references to other owners from these are also removed, so they may be deallocated as a side-effect.

BR_FreeMarkupAttr
Synopsis
void BR_FreeMarkupAttr(char* value)
Arguments
(char*) value
Attribute value retuned by BR_MarkupAttr().
Return
nothing.
See Also
HTMLNode, BR_MarkupAttr()

Deallocates the memory associated with an attribute value previously returned by BR_MarkupAttr(). Any value returned by that function can be used here, including NULL.

BR_GetTagID
Synopsis
void* BR_GetTagID(const char* tag)
Arguments
(const char*) tag
The name of the tag whose ID is to be looked up.
Return
(void*) An abstract browser ID for this tag type.
See Also
HTMLNode

Returns the ID number the browser uses for this particular tag. The exact ID code returned is browser-dependent; it may be an enumeration, the address of a string, etc. In addition to the normal tag names, such as "HEAD", "EM", "TABLE", etc., the following names are defined:

">text"
Returns the code for a text-containing node.
">comment"
Returns the code for a comment-containing node, if this browser saves comments.
">text"
Returns the code for a text-containing node.
">unknown"
Returns the code for a node of unknown type.

If the browser does not recognize the give tag type, it always returns the ID for the ">unknown" type.

The value returned is almost completely opaque -- the only thing a program can do with a tag ID is to compare it for equality to another tag ID.

BR_InitMarkup
Synopsis
HTMLNode BR_InitMarkup(HTMLNode root, IHLanguage* lang, IHMarkup owner)
Arguments
(HTMLNode) root
Root of a tree of HTMLNodes to initialize.
(IHLanguage*) lang
The language that owns this tree.
(IHMarkup) owner
The language object that owns this tree.
Return
(HTMLNode) The last node in the resulting tree.
See Also
HTMLNode

Initializes a raw HTMLNode tree returned by the browser's parser. E.g., the NCSA Mosaic implementation fills in each node's "prev" and "parent" parent pointers, and sets the owner of all the nodes to owner and lang. This function must be called on any raw HTMLNode tree which the scripting routines get from the browser. It returns the last node in the resulting tree.

BR_InsertMarkup
Synopsis
HTMLNode BR_InsertMarkup(HTMLNode pos, HTMLNode last, HTMLNode markup)
Arguments
(HTMLNode) pos
The position in the tree to insert the new node.
(HTMLNode) last
The node that is now after the new one.
(HTMLNode) markup
Markup to place around pos and next, e.g. <B>.
Return
(HTMLNode) The node that has replaced the first cut node.
See Also
HTMLNode

Inserts the given markup node into the tree, making its children the range from pos to the node before next. If pos is the same as next (or the node content type is EMPTY, or it is a text node), the node will have no children. If next is NULL, all of the nodes from pos to the end of this branch will be set as the new node's children.

BR_InsHeadMarkup
Synopsis
HTMLNode BR_InsHeadMarkup(HTMLNode parent, HTMLNode tree)
Arguments
(HTMLNode) parent
The position in the tree to place the new subtree.
(HTMLNode) tree
The new node or tree to place as the first child of parent.
Return
(HTMLNode) The newly placed node, identical to tree
See Also
HTMLNode

Places the given markup node (or a complete HTML tree) as the first child of the given parent node. If it already has a child node, it is inserted before them.

BR_InsTailMarkup
Synopsis
HTMLNode BR_InsTailMarkup(HTMLNode parent, HTMLNode tree)
Arguments
(HTMLNode) parent
The position in the tree to place the new subtree.
(HTMLNode) tree
The new node or tree to place as the last child of parent.
Return
(HTMLNode) The newly placed node, identical to tree
See Also
HTMLNode

Appends the given markup node (or a complete HTML tree) as the last child of the given parent node.

BR_LastMarkup
Synopsis
HTMLNode BR_LastMarkup(HTMLNode current)
Arguments
(HTMLNode) current
A node in an HTML parse tree.
Return
(HTMLNode) The very bottom/last node in current's tree.
See Also
HTMLNode

Returns the absolute last HTMLNode in the given tree, i.e. the very last leaf node of the tree.

BR_MarkupAttr
Synopsis
char* BR_MarkupAttr(HTMLNode node, const char* attr)
Arguments
(HTMLNode) node
A browser-side markup node.
(const char*) attr
Name of the tag attribute to retrieve.
Return
(char*) A C string representing the desired attribute's value.
See Also
HTMLNode, BR_FreeMarkupAttr()

Returns a pointer to a standard C string that is the value (if any) that the requested attribute has for this node. A NULL indicates this attribute was not supplied, the string "\0" is returned for attributes supplied with no value, and any other string is the attribute's value.

Note that the value returned by this functions must be deallocated when you are done with it by calling BR_FreeMarkupAttr().

BR_MarkupLanguage
Synopsis
IHLanguage* BR_MarkupLanguage(HTMLNode node)
Arguments
(HTMLNode) node
A browser-side markup node.
Return
(IHLanguage*) The language of this node's owner.
See Also
HTMLNode, IHMarkup, BR_MarkupOwner()

Returns a handle on the interface to the language that owns this node. Since the IHMarkup is an opaque type, the system can not use it to determine which language module it exists in. Thus every node must be able to store that language, so that it's language-side owner can be manipulated along with it.

BR_MarkupOwner
Synopsis
IHMarkup BR_MarkupOwner(HTMLNode node)
Arguments
(HTMLNode) node
A browser-side markup node.
Return
(IHMarkup) The language-side object that owns this node.
See Also
HTMLNode, IHMarkup, BR_MarkupLanguage()

Every browser node can have a language-side object associated with it, that "owns" that node in the tree. These objects must work together, to keep each other in sync, so they call into each other's components as operations on the tree are performed. This function returns the object that owns this node.

BR_MarkupText
Synopsis
char* BR_MarkupText(HTMLNode node)
Arguments
(HTMLNode) node
A browser-side markup node.
Return
(char*) The raw ASCII text associated with this node.
See Also
HTMLNode, BR_GetTagID(), BR_MarkupType()

Returns a pointer to a standard C string, that is the ASCII text associated with it. This function is valid only for nodes of type ">text" and ">comment". Nodes that are actually tags do not have any text associated with them, and return NULL. This text is a part of the node object -- it is valid as long as the node itself exists.

BR_MarkupType
Synopsis
void* BR_MarkupType(HTMLNode node)
Arguments
(HTMLNode) node
A browser-side markup node.
Return
(void*) What type of node this is.
See Also
HTMLNode, BR_GetTagID()

Returns an opaque value that indicates the type of node this is. The only thing that can be done with this to value is to compare it with other type values that have been returned by the browser -- if they are the same, they are the same types.

BR_NextMarkup
Synopsis
HTMLNode BR_NextMarkup(HTMLNode current)
Arguments
(HTMLNode) current
A node in an HTML parse tree.
Return
(HTMLNode) The node logically after current in the tree.
See Also
HTMLNode

Returns a pointer to the markup node that occurs logically after the given node in its parse tree. If the given node contains any other nodes (i.e., it is a <B> with text inside), the returned node is the one after its associated closing tag. If this is the last node in the branch, a NULL is returned.

Essentially, this function moves forward through the children of their parent node.

BR_NextMarkupDepth
Synopsis
HTMLNode BR_NextMarkupDepth(HTMLNode current)
Arguments
(HTMLNode) current
A node in an HTML parse tree.
Return
(HTMLNode) The node that occurs after current in its parse tree, in depth-first order.
See Also
HTMLNode

Returns the node that would occur next in the tree, if a depth-first traversal were being performed. This is essentially the order that the nodes occurred in the original HTML document.

BR_NextMarkupType
Synopsis
HTMLNode BR_NextMarkupType(HTMLNode current, const void* type)
Arguments
(HTMLNode) current
A node in an HTML parse tree.
(const void*) type
The desired type of node, e.g. <A>, a text node, etc.
Return
(HTMLNode) The node that occurs after current in its parse tree, in depth-first order, and is of the given type.
See Also
HTMLNode

Performs a depth-first search starting at the given place in the HTML tree, looking for the next node with the given type. Returns NULL if there are none.

BR_ParentMarkup
Synopsis
HTMLNode BR_ParentMarkup(HTMLNode current)
Arguments
(HTMLNode) current
A node in an HTML parse tree.
Return
(HTMLNode) The node that is the parent to current in its tree.
See Also
HTMLNode

Returns the node that is the parent to the given node in the tree, or NULL if current is the root of the tree. This essentially traverses up out of the parse tree.

BR_ParseHTML
Synopsis
HTMLNode BR_ParseHTML(const char* text, IHLanguage* lang, IHMarkup owner)
Arguments
(const char*) text
A fragment of HTML markup to parse and create a tree of HTMLNodes for.
(IHLanguage*) lang
The language that owns this new tree.
(IHMarkup) owner
The language object that owns this new tree.
Return
(HTMLNode) The first node in the newly created HTML parse tree.
See Also
HTMLNode

Allocates a complete HTML sub-tree based on the given fragment of ASCII HTML markup. The text is parsed identically to text in an HTML body or header, except that HTML fragments are allowed (e.g., a string that begins with <TABLE>), so that sub-trees can be created that are later inserted into a complete HTML parse tree.

BR_PasteMarkup
Synopsis
HTMLNode BR_PasteMarkup(HTMLNode pos, HTMLNode tree, int after)
Arguments
(HTMLNode) pos
The position in the tree to place the new subtree.
(HTMLNode) tree
The new node or tree to insert before or after pos.
(int) after
If TRUE, the new subtree is placed after pos
Return
(HTMLNode) The node that is now at the same position as pos. This is pos if after is FALSE, otherwise it is tree
See Also
HTMLNode

Places the given markup node (or a complete HTML tree) into the tree given by pos. Normally place the new markup into the position pos is at, but if after is TRUE, it will be placed immediately after.

BR_PrevMarkup
Synopsis
HTMLNode BR_PrevMarkup(HTMLNode current)
Arguments
(HTMLNode) current
A node in an HTML parse tree.
Return
(HTMLNode) The node logically before current in the tree.
See Also
HTMLNode

Returns a pointer to the markup node that occurs logically before the given node in its parse tree. If the given node contains any other nodes (i.e., it is a <B> with text inside), the returned node is the one before its associated closing tag. If this is the last node in the branch, a NULL is returned.

Essentially, this function moves backwards through the children of their parent node.

BR_PrevMarkupDepth
Synopsis
HTMLNode BR_PrevMarkupDepth(HTMLNode current)
Arguments
(HTMLNode) current
A node in an HTML parse tree.
Return
(HTMLNode) The node that occurs before current in its parse tree, in depth-first order.
See Also
HTMLNode

Returns the node that would occur previously in the tree, if a reverse depth-first traversal were being performed. This is essentially the reverse order that the nodes occurred in the original HTML document.

BR_RemoveMarkup
Synopsis
HTMLNode BR_RemoveMarkup(HTMLNode node)
Arguments
(HTMLNode) node
A node in an HTML parse tree.
Return
(HTMLNode) The node that replaced the given node.
See Also
HTMLNode

Deletes the given markup node, but not any of the nodes that it contains. It returns the markup node that it has been replaced with, which is its first child node.


[Home] [ToC] [Up] [Prev] [Next]

_________.oo_Q_Q_oo.____________________________________________
Dianne Kyra Hackborn <hackbod@angryredplanet.com>
Last modified: Sat Oct 26 20:04:58 PDT 1996

This web page and all material contained herein is Copyright (c) 1997 Dianne Hackborn, unless otherwise noted. All rights reserved.