WebProWorld IT Forum

Is Appleís Intel Switch the Beginning of a Cinderella Story?
For years the Mac was a highly specialized PC, ignored by most, but loved by certain industry aficionados. But no matter how dearly adored it was for its stability, security, and usefulness in the layout and design professions, it didnít have the appeal that came with heavy hitter Microsoft

Microsoft Pushes Blackberry Into Juicer
Microsoft announced plans to jam, can and push Blackberry's email system onto the back shelf when Microsoft CEO Steve Ballmer announced the Messaging and Security Feature Pack for Windows Mobile 5.0.

SmarterMail Whitelisting
I've got a client who is using SmarterMail Professional Edition v.2.6.1921.27523 I've got domain admin access to the mail server, so I can control the spam filtering and content filtering for all emails within the domain.

Recent Articles

CA Gets Clarity In Niku Deal
The all cash purchase valued at $21 per share brings the IT management and governance solution company into the Computer Associates fold.

To Motivate, Don't Demotivate
Recently, a couple of intended compliments threw me for a loop. Two people called me in the same week and wanted me to present keynote speeches at their conferences. Of course, that was the flattering part, but what got to me was that they both referred to me as a "motivational speaker."

Microsoft Push Will Jam Blackberry
Microsoft announced plans to squeeze the juice out of Blackberry when Steve Ballmer presented Windows Mobile 5.0. The new OS provides a number of new features for the handheld market and promises to hit companies like Research In Motion (RIM) and Good Technologies below the keypad.

Sun Rises With StorageTek
Sun Microsystems is buying Storage Technology Corporation or StorageTek for $37 per share in cash, adding up to about $4.1 billion. The acquisition has been approved by the boards of both companies.

HP and Microsoft Planning Identity Systems Dominance
Dennis Miller once said that "Bill Gates is a monocle and a Persian cat away from being a bad guy in a James Bond movie." Last week, Hewlett-Packard announced that it, along with Gates' Microsoft...

Make Sure Your Intranet Is Well Perceived By Staff
Many intranets are only now beginning to show their true potential. However, many staff, having had unsatisfactory previous experiences of the intranet, may need quite some convincing that the intranet is now genuinely useful.

Info Technology Departments May Lose Staff
A long-term choice for IT departments means either focusing on services or providing more business value.

Cisco Walks the FineGround
Cisco Systems announced that it will be taking over data center networking systems maker FineGround Networks, a privately-held company based in Campbell, California.

The Seven Things You Have To Say About Security
A provider of information risk management solutions offers its take on George Carlin's famed bit, but on the serious side. IPLocks offers a press release today, listing seven topics a business needs to consider in terms of security.

Intelís Craig Barrett Stepping Down
Paul Otellini will become the chipmaker's first CEO who is not an engineer, while Mr. Barrett becomes company chairman. Can a number cruncher thrive in the chief executive's spot?


The Machine Readable Web

By Ron Tower

The vast majority of the Web is intended for human readers. The goal has been to create an online experience for human beings. It is an open and ever growing body of information.

This is all great, but it does present some problems. There is just too much there. We aren't sure what information to trust. We can get lost in the Web and waste a lot of time. So we need some software tools to help us, but the information itself is not structured in a way that software can easily deal with. Enter the machine readable Web.

The most basic way for software to deal with information on the Web is to simply read the HTML of the pages and "analyze" it. This is what search engines do. They have software agents called spiders that walk the Web and index the pages. They then use various techniques to give us the "best" pages for the search queries we enter.

This is helpful and essential, but you still have to go to the pages (many pages) and try to find what you want. And you need to know when to go back to get updated information. You may even know that a page has the information you want and that it will be updated regularly, but you don't want to go back again and again to get that bit of information off that page.

There are tools called "screen scrapers" or Web page extractors that can read the pages and extract just the information you want, but the pages are unstructured and changing. The rules you describe for extracting the information may be complex and may not work as the page changes.

And content providers often don't want you to use their page that way. They want you to look at the whole page, so that you will get the other messages they have on the page (like marketing messages), not just the bit you want. They try to put up a "no droids allowed" sign, in this case, "no robots, we want human eyeballs only".

Some content providers realize that you can't always come to their site and that if they will give you a useful summary of what is on their site, you might come more often to see the details (and the other stuff you really don't want to see, but live with to get the content you want). A very useful way of doing this is using RSS feeds. RSS (Really Simple Syndication) provides the summary in an XML file that a software agent can easily process. RSS news readers or information aggregators go and get the summary for you and then you can see if you want to click through to see the details. (See http://www.w3schools.com/rss/default.asp for more on RSS.)

RSS is the first really successful example of the machine readable Web. The RSS XML file has a well known structure and is easy to produce and to process. It has also been successful because it is a win-win situation for content providers and consumers. Consumers get the summary information they want, making their Web browsing more effective and enjoyable. And providers get what they want, more traffic to their site.

Content providers had started down this path by providing HTML fragments that Web site authors could add to their sites. There are a few tools for individuals to also use these fragments. Commented HTML can be used to allow Web page extractors to more easily extract dynamic HTML fragments. HTML fragments, like RSS feeds, are useful for consumers of information and helpful for content providers in attracting traffic.

This brings us to a fundamental point. The content providers need to have a relatively easy way to provide the machine readable content and it has to fit in with their mission. And the consumers won't use it unless they get something useful from it. So we need the win-win for the machine readable Web to get off the ground. At this point, according to a Pew Research report (http://www.pewinternet.org/PPF/r/144/report_display.asp), 5% of internet users are using RSS. Most of these people are classic early adopters. But it seems like RSS is moving quickly to being more widely adopted.

But even this relative simple standard was not easy to get to. There was a lot of conflict between the "keep it simple" crowd and the "more features" crowd (see http://diveintomark.org/archives/2002/09/06). And RSS is just scratching the surface. After all, it just provides a title, a link, and a short summary for each item. Richer information will require a richer structure.

There is a community of researchers looking to provide the approach for this richer structure under the tag phrase "semantic Web". This is largely a vision and research project at this point. See http://www.w3.org/2001/sw/. One criticism of this work is that it is "too complex", and if you try to read some of it, you might be pardoned for developing a similar opinion. Of course, tools will be provided to hide the complexity from users, but the issue is whether it is too complex for the typical content provider and for tool developers. See http://www.snipsnap.org/space/RDF+too+complex if you are interested in exploring this issue.

A more near term approach is "Web services". This uses the Web infrastructure for application to application communications. It is not as easy as RSS, but it builds on a similar structure of XML as the data format. At this point it is mostly used for business-to- business and there are hardly any Web services that provide public information. And you need to define an interface for each kind of information you might as a content provider want to serve. See http://www.w3.org/2002/ws/ if you are interesting in exploring this.

Another intriguing use of the Web is machine-to-machine. The idea here is that many machines exist that have embedded computers. If they could hook into the Web, they could provide a lot of useful information. Some might be sensors of various kinds. Others might be cars, toasters, or washing machines. Wireless companies are interested in providing devices similar to cell phones to allow these machines to be accessible over the Web. See http://itpapers.zdnet.com/whitepaper.aspx.

So a machine readable Web is starting to become a reality with RSS and Web services and may progress even further with something like machine-to-machine or the semantic Web. Early adopter consumers are starting to adopt the idea via RSS. The key will be for content providers to adopt a richer set of machine readable formats like they have started to do for RSS and keeping it as simple as possible so a wide variety of software developers can provide tools for the end users. This may be the key to making the Web even more useful.

About the Author:
Ron Tower is the President of Sugarloaf Software and is the developer of Personal Watchkeeper, an information aggregator supporting a variety of ways to summarize the Web. http://www.sugarloafsw.com

About ITManagementNews
ITmanagementNews answers questions for IT managers. Our experts offer real-world advise and cutting edge technology for the enterprise. ITmanagementNews is focused on Delivering IT Solutions

ITManagementNews is brought to you by:


-- ITManagementNews is an iEntry, Inc. publication --
iEntry, Inc. 2549 Richmond Rd. Lexington KY, 40509
2005 iEntry, Inc. All Rights Reserved Privacy Policy Legal

archives | advertising info | news headlines | free newsletters | comments/feedback | submit article

Delivering IT SolutionsITManagementNewsNewsArchivesAbout UsFeedbackITManagementNews Home PageAboutArticle ArchiveNewsDownloadsWebProWorld ForumsJaydeiEntryAdvertiseContact