HyperMedia, Media, and Hype: Did the Cutting Edge Grow Blunt?

Abstract

There is a widespread belief that the worlds of declarative markup and of immersive hypermedia are separated by a chasm of understanding. In fact, the future of XML, the future of HyperMedia, and of the World Wide Web, are all intertwined. This paper discusses places where the XML-based Web is weak, and suggests areas for further work.

1. Introduction.

We went wrong, as an industry, years and years ago, when we started to make executable content, programs that, when run, generate content, generate text and images and sound. We mixed up information and presentation.

Today we're fixing that with the Extensible Markup Language (XML) and specifications built on it. We're making a world in which interoperability and accessibility are paramount. It's a distributed world, in which millions of people with thousands of software clients can make sense of the same web page, even if the page looks different to them.

In choosing to emphasise interoperability over beauty, perhaps we've lost some coolness. Of course, nothing can be cooler than universal access, so we musn't kid ourselves that it's worth sacrificing what really matters for a gimmick or two.

This paper examines some areas of the immersive hypermedia experience, and also some areas of traditional non-animated passive-reflective technology more commonly known as print. We'll see that books can be hyper too, and we'll explore possible future directions for immersive experiences. We'll conclude by expounding on just why XML is a good basis for the future of hypermedia.

There's no reason why the game designers should have all the fun. And there's no reason why the documents, data and multimedia environments we make today should not be available to our grandchildren.

2. The Immersive Experience

Fairy-stories were plainly not primarily concerned with possibility, but with desirability. If they awakened desire, satisfying it while often whetting it unbearably, they succeeded. [place-holder for full secondary creation quote from JRRT On Fairy Stories Essay]

What do I mean by immersive? I mean primarily that the user can forget all else: for a while, they are totally absorbed, immersed in what they are doing. As John Tolkien observed half a century ago, this can also be the goal of writers of fiction for books. So there's nothing inherent in hypermedia, in having multiple formats, in having things that move and wriggle and speak and sing and respond, to make something immersive.

I claim that it's harder to evoke an immersive experience on the computer screen. I also believe that it's getting easier, and that it's important that our industry continue to push on this boundary.

What prevents total absorbtion, total concentration? Any distraction, whether external and unavoidable (want a cup of tea, love?) or whether it's intermal (You've been writing for twelve minutes, would you like to save the document?). A distraction is something that takes away your focus of attention. You can be distracted away from the computer entirely, or you can be distracted from your focus of attention even still using the computer. And this latter sort of distraction is entirely within our control.

3. Examples of distractions.

There are things we take for granted, but that are actually very distracting, and there are other things we immediately recognise as distractions.

3.1. Category or Level Errors

In a rôle-gaming situation, if you're a fantasy wizard trying to defend a bridge against a mighty balrog, a message saying, You have only seven endurance points left. Do you want to draw from your Life Points? is a distraction, because instead of thinking about the situation and the characters, you're thinking about the game mechanics.

In an online purchasing web site, when you've chosen six pairs of beautiful argyle socks, a dialogue box saying that your computer is low on memory is a distraction. For that matter, a pop-up window advertising chocolate would also be a distraction, but the store knows full well that it's a sort of distraction that makes money. It's like the racks of candies at the check-out in the supermarket.

3.2. Lost and Confused in Cyberspace

Once the web page has downloaded, it had better work right, and it had better be easy to navigate. Getting lost, saying where am I in this website? or how do I pay? is an unpleasant distraction.

Makers of games (and other educational software) know that people will use an unusual interface if it adds atmosphere and enjoyment, as long as it's not too difficult. By the time you've gone to the trouble of installing a game, you're probably excited about playing it: having to take time to learn that pressing the up-arrow and space at the same time fires little yellow goblins at the Space Mushrooms is a small price to pay for the heady joy of capturing your first Spore.

Of course, it would be terribly distracting to discover that accidentally pressing up-arrow and Shift in that same game ordered six pairs of socks, delivered the next week by the postal service. One wouldn't normally expect actions in an online game to have such results, although spending real money to purchase the right to use a magical sword is commonplace for players of such games as Asheron's Call, Ultima Online and Everquest.

3.3. The World Wide Wait

The distraction that the online vendors fear most is the one that leads potential customers to go away: the World Wide Wait. Few people will wait five minutes beside an unresponsive computer screen for a chance to buy a pair of socks, even if they're awfully nice and terribly expensive.

Several techniques have arisen to try to reduce this delay:

Changes in web page design, making smaller, faster-loading pages;
Increases in the proportion of users with broadband Internet access: by the end of 2001, 78% of Canadian households had dial-up access, up 29% against the previous year, while high-speed access was up 155% [Stats Canada, Dec 2001]>
Proxy servers that cache pages for groups of users, often inserted transparently (or semi-transparently) by an Internet Service Provider sich as AOL;
Web browsers (e.g. Mozilla) that can preemptively fetch the next page while the user is still reading the first one [ref Mozzy];
Use of the W3C DOM [ref W3C] and scripting to simulate multiple web pages (although this can cause problems with bookmarks and the REST hypothesis [ref RoyF.2002])

The difference between a twenty millisecond delay and a two second delay is the difference between a perception of instant and a perception of slow.

Older systems such as Hypercard [apple ref?] worked around this using graphical page transitions and sound. When the user chooses to view a different page, the browser plays some sound or other, and perhaps does a timed effect such as a fade to grey, swipe, venetian blind, mosaic. The combined audio and graphic transition services two functions: it adds to an experience by inserting Delighful Surprise, and, more importantly from the view of the technologist, it distracts users so they don't notice how long it takes to load the next page.

I have no real idea why page transitions didn't catch on for web browsers.

4. Paucity of Features

Web browsers are pretty basic compared with hypertext and hypermedia tools of even a decade ago. JavaScript can help only to a limited extent here.

4.1. Searching

Searching the current page often doesn't highlight all matches, only one at a time; there's no facility to display matches in a scrollbar-style overview, to save sets of matches, to add or subtract terms, to use stemming or a thesaurus, nor even to see how many matches there are.
Often the search functionality brings up a distracting dialogue box asking one to enter a search term. I often find that the result is hidden under the dialogue box!
I should be able to train my web browser so that when I right-click on a highlighted word or phrase, it can offer to look the term up using Google [ref google] or a dictionary server [reg dictionary.com] or find the next match in the current document.
Content-specific searches are easily implemented with Flash, Director, or in a custom hypertool. Although XHTML metadata and RDF admit of such searching in web browsers, it's taken eight years for Netscape to notice REL=author and REL=next links, let alone anything more sophisticated.

4.2. Typography and Motion

Web typography is in the early stages of infancy. And by infancy, I mean pre-natal. There's no reliable way to control kerning, tracking, letter-spacing, alternate fonts for small caps, hung punctuation, column balancing and feathering, alignment flush zones, or a host of other low-end typographic features. Downloaded fonts are hardly ever used (and aren't supported at all in the latest Mozilla, at least at the time of writing in November 2002; vote for the bug!). Where they're supported, different browsers don't interoperate, and the most nearly interoperable system (from Bitstream) has IPR problems outside the US.

Perhaps SVG Fonts are the way forward here, but it's too early to tell.

[Figure 1: insert illustration of drop cap with x heights, cap heights etc. clearly marked] (to be supplied)

The Web has given us an important advance: universality and accessibility. Does it really need to be at the cost of fine typography? I claim that it does not. Internationalised typography is a difficult challenge, particularly for implementors of scripts and languages for which computer support is relatively new or difficult. Specifications such as XSL/FO are a start, and Free software support for Unicode using tools such as Pango [www.pango.org] are also a major step forward.

4.3. Feedback-based Layout

This is an area in which XML is greatly lacking. It's not possible in XHTML, HTML, CSS or even SVG to say, make a box that's the width of the widest heading in this chapter, and use that box as a background for every chapter heading. You can get close with a combination of JavaScript, the DOM and CSS, but it's easy to find problems that aren't amenable to that approach. Consider the red dots in figure 2, showing the footnote that's under the mouse pointer. This deconstructionist stylistic device is considered ugly by some [Heller] and considered particularly suitable for online display by others [emigre30], but right now I made the screenshot with an image editor.

Screen shot of hypertext version of an essay

Figure 2. Screenshot of Galeon displaying an annotated version of an 18th Century essay on Stonehenge.

If you're planning to implement the page design, note that the right hand column may have to scroll automatically to bring the right footnote into view. It should do so smoothly but quickly, so as not to be distracting. A free picture of my right foot as a prize to the first person to do this well enough for me to use, working in multiple web browsers on multiple platforms, but using only published specifications such as XHTML and CSS.

In general, feedback-based layout lets style sheets use the actual measured dimensions of parts of the rendered document as input. You can of course create loops, just as any useful programming language can be used to write a useless infinite loop or recursion. But in a declarative style sheet system, such circular dependencies are easily detected and handled. The increase in expressive power to be able to say, align >this> with >that> is very high. A large part of successful graphic design is about precise alignment [Williams NDDB], and that often requires feedback. It's why HTML tables are still so popular for layout: they are the nearest thing HTML provided to reliable alignment.

4.4. Shapes and Run-arounds

There's no real reason in prinicple why I can't use an SVG path to describe a diamond-shaped browser window, or a circular paragraph in the middle of the screen. Such things are common in print for a good reason: we have over four thousand years of experience at layout, and we've learned that run-arounds are useful. They are useful on the web too. Figure Three shows a page from an eighteenth century hypermedia work, The Itinerary of John Leland, edited by Thomas Hearne. The margin notes here are rectangular run-arounds, and Figure Four shows a screen-shot of an HTML rendition of part of the same page.

Scanned page of 18th century critical edition

Figure 3. Thomas Hearne describes Wythiel. Note the footnotes and marginalia; this is hypertext from 1740.

Scanned page of 18th century critical edition

Figure 4. Thomas Hearne's Web Site

Early on, NCSA Mosaic recognised that rectangles were boring and implemented >imagemaps>. Later on we had client-side image maps, which can more easily be made accessible. The common use of partially-transparent images to simulate shaped buttons is a clear indication that web designers live in a post-rectangular era. For that matter, the >play> button on my personal retina-projection DVD player is triangular.

A not uncommon use of plugins such as Flash is to provide custom controls. Sometimes it's to hide controls when they are not needed, to reduce distracting clutter. Sometimes it's to make animated controls, that can be easier to use or more exciting, or simply fit in better. There really >is> a company selling black socks that uses the metaphor of sock drawers you pull to access different areas of the site, and it works. But it's not accessible if you're usng a screen reader or a text browser. Sometimes Flash is used to make controls that seemed to have been forgotten by the HTML designers, such as sliders, guages, dials and volume controls.

Of course, even less accessible than using Flash is using ActiveX, which isn't likely to work under Linux in the near future (but see Wine [WINE] for running Internet Explorer under Linux), particularly on a non-Intel platform. I don't really mind, because most of my web browsing starts with a search, and the search engines can't read Flash or ActiveX, so I simply don't see those sites. But I do mind when there's no alternative provided, when the site can't be archived properly and can't be seen by a significant minority of people.

5. The Sound of Muzak

We've probably all had the misfortune to go to a web site that's playing>patriotic Belgian songs rendered in MIDI, while it tries to tell you about My First Grandson. Or maybe they were American patriotic songs, or a symphony by C.P.E. Bach, played on the one-oscillator electronic beep.

One of the more interesting conference proceedings I've seen used pictures of the presenters' heads on a black background. In the background you coud hear voices muttering, as if you were in the hallway or lounge at a conference. When you moved the mouse pointer over head, that speaker's voice became louder. Clicking the mouse when the pointer was over a head took you to that person's presentation. Part of the idea was to give you the feeling of being there. [Mediamatic 1991?]

Sound mixing, integration of sound and text and animation, and audio transcriptions, all are on their way, coming to a web browser near you. The Helix DNA player (Real Networks RealPlayer) already has support for SMIL, and I'm hoping we'll see more support in other browsers soon.

Of course, I still want to be able to turn it off, especially if I accidentally find myself at the Osmonds' home page when I'm supposed to be taking minutes in a meeting.

6. Automatic links

Consider a web site made up of individual glossary entries. Every time someone submits a definition for a new term, every mention of that term in the whole site should immediately become a link.

There's already something a little like this with the Wiki, but Wiki has gone back to node-based hypertext, a step back in time towards Gopher and the dark ages of Hypercard. The author has implemented web-based collaborative glossaries, and found that they can be very useful. One has to be careful to avoid defining the or and, of course, and also since the links are automatic, they sometimes go to the wrong place.

A certain amount of serendipitous but meningless linkage is perfectly acceptable for humans, who can be amused or surprised or exasperated. This was why Tim Berners-Lee's well-known innovation can work: a broken web link doesn't make the entire web stop working, it's just an annoyance. But incorrect links can cause problems for machine-based reasoning, and so an alternate approach is to generate markup with queries, and then to check the markup either automatically where possible, or by hand where not.

So far there's nothing new here. But I want to have multiple links from a single point in the document (something Hypercard, SoftQuad Panorama, and many other systems could do many years ago), and I'd like to be able to blur the client/server boundary and move some of that searching into the browser.

XML Query will let me do some of this, I think. I'll be able to nake links that run queries that produce document fragments; maybe I'll be able to make parameterised queries, and say, every span element of class "term" is a candidate for $q in this query, label it Definition in the menu.

7. A Flash in the Pan

Does this mean I think that Macromedia's Flash is doomed to failure? In its present form, I hope so. The most recent version of Flash includes some XML support, and perhaps we'll see a future version of Flash that's no harder to use, and no less powerful, but that generates web pages that are fully accessble, and that can be viewed without special plugins, archived for the future, work on all platforms, and whose text can be indexed by search engines.

Perhaps we'll always want to have text generated procedurally, particularly in online games. I don't want to stamp out innovation, I want to encourage it. But where we've discovered a well-established need, let's make a standard, interoperable, accessible solution.

8. Conclusion: XML To The Rescue?

XML, like all standards work, is not first and foremost about technical excellence. It's good, but computer scientists don't generally go weak at the knees and starry-eyed when they speak of XML. It's not as good as TeX for mathematics they say. It's not significantly better than LISP S-expressions, they wail. The linguists and semioticists bemoan the strictness of XML's hierarchies, and want either approximate markup or overlapping markup [Wendel et al., Alan Renear]. The bibliographers find missing support for non-Unicode glyphs [Quin1996]. And the artists say the pointy brackets are the wrong colour.

What is good about XML is that everyone is using it. Every additional document, every additional tool, increases the overall value of XML, just as every new web page increases the value of the World Wide Web.

XML documents can be archived, and because they conform to a published and freely available specification we can have some confidence that those same documents can still be read fifty or 100 years from now. I can't read my CP/M Magic Wand word processing files from less than 20 years ago, and probably not many other people can either.

I recently went to a web page of a web design firm. When I tried to find them again, I found they weren't known to Alta Vista, nor to Google. Their home page was entirely done in Flash (and possibly using HTML frames too) with no alternate text provided. Not a good advert for them. Distressingly, the students at the university where I was lecturing all loved the site, because the animations were cool. They hadn't looked past the look of the thing.

The problem isn't Flash. It's proprietary closed formats.

It's up to all of us to make a Web that works, a Web that lasts, a Web that's for everyone. To do that, we've got to make it fun, fun enough that we want to carry on using it. We've got to take control of the distractions. And we've got to make it immersive, in a way that's interoperable across platforms, ubiquitous, assessible, internationalised, and loveable. This is the future of the Web. Let's give it some XML love.

9. Acknowledgements

Thanks to Sandro Hawke for proof-reading an early draft.

10. References

To follow.

HyperMedia, Media, and Hype: Did the Cutting Edge Grow Blunt?

Liam Quin

World Wide Web Consortium
200 Technology Square, Cambridge, MA 02139, USA

liam@w3.org

Abstract

Keywords

1. Introduction.

2. The Immersive Experience