WebProWorld IT Forum

It's Windows vs. Windows as Microsoft battles piracy
It's about..Microsoft is going to block access (Partially) to their windows updates update soon for the users who are using Pirated Copy of Windows OS. do you think this is gonna work? people are really going to buy legal copy of XP?

Internet Problems
I recently reformatted my computer, I now have all the default files, except from a few programs and stuff, but I am having a problem with the internet. None of my javascript is working, the buttons when you post a thread do not work, nor does a quick-reply feature on most websites.

A little help with inadvertant shutdown
I am currently working on a HP 742C home PC. The problem is it shuts down either durring startup or minutes after startup. I tried Removing the PCI cards...

Recent Articles

Intranet Project Names
"What's in a name? That which we call a rose By any other word would smell as sweet." In this famous quote from Act II of Romeo and Juliet, Juliet tells Romeo that a name is an artificial and meaningless convention, and the fact he is a Montague and she a Capulet (warring families) means nothing to their love.

Intranet Portal Project - RAD or Waterfall?
In this short article, David Viney examines whether Rapid Application Development (RAD) or Waterfall development methodologies should be used during Intranet Portal projects.

Tech Analyst Giants Group Blog
Did you know that stalwart tech analysts Rob Enderle, Richard Doherty and Tim Bajarin have a group blog ... I didn't. The blog, called Technology Pundits, features the writings of the three most well-known analysts in the industry.

Bill Gates Touts Interoperability, XML
In an executive email appearing on Microsoft's site, Bill Gates talked about his vision for future software interaction and some of the methods he wants to use to arrive there. Gates' idea for the scope of future applications made use of the word "Interoperability". Working to make different types of software work together in the business world is what Gates considers to be crucial to success.

A-List Bloggers Hacked
InternetNews.com: A bug in a popular log file analysis program has been exploited by attackers ... who were then able to deface a popular blog and other Web sites. Blogger Jeremy Zawodny reported on his blog late Tuesday that his primary server had been hacked.

Satisfying IT Customers May Be a Bad Idea
Most IT departments I encounter say "customer satisfaction" is among their key goals. Unfortunately, this idea seems to lead too often to poor results. While the sentiments are laudable, the law of unintended consequences seems to interfere. Goals are tricky things.

Clustering Solutions and Zero Downtime Hosting Pitfalls
There are a number of benchmarks, which we may use to evaluate hosting companies. One of these is, reliability. Like most things in this life, reliability in web hosting is typically a function of how much we are willing to spend for it. In essence, a "cost-effectiveness" equation needs to be determined and solved.

IT Conference For South Africa
ITWeb in conjunction with Brainier Capital are hosting the IT Confidence 2005 Conference in Johannesburg late next month. It is a one-day event featuring analysts, economists and corportate executives, who will offer their projections regarding the South African ICT sector for the next year.

Microsoft Issues NT4 Critical Updates
Although they indicated support would cease, Microsoft has issued two critical updates for their NT4 server software package. Towards the end of last year, Microsoft indicated that support for the NT4 environment would no longer be available after December 2004.


Interview:Thunderstone VP Doran Howitt

By Jason Dowdell

Inherent (or Business) Risk

A few weeks ago I put up a post about the Google Small Business Search Appliance. I listed it's pros and cons and thought Google had come up with a pretty decent product.

After that post I was contacted by Doran Howitt, the VP of Marketing at Thunderstone. Doran told me about the Thunderstone Search Appliance SBE his company had recently launched. He pointed to an eWeek head to head comparison of the Thunderstone search appliance and Google's search appliance. The review is pretty good but I thought you deserved a bit more of that "personal touch" so I conducted an interview with Doran. Enjoy!

Jason: What does the "SBE" stand for in the Thunderstone Search Appliance?
Doran: "Small Business Edition". That's to differentiate it from our "enterprise" editions.

Jason: Do you allow your customers to expose the search results of your search appliance to the internet or is it strictly for use on internal corporate intranets?
Doran: Yes, either. In addition, we allow its use for *indexing* other web sites out on the internet. You can serve those search results either to the public or just for your internal use.

Jason: What Thunderstone software is embedded in the Search Appliance?
Doran: The appliance runs on top of our TEXIS software. Texis is our flagship product. It combines the features of a search engine and relational database. Texis is actually an entire application development suite for text-intensive or search-intensive applications.

Jason: What adjustments, if any, can users make to the algorithm(s) that determine the importance of a particular document for a particular query?
Doran: Users can set the rank knobs selectively for each search. They also can turn on or off the thesaurus, pattern matching, proximity, and stemming. That's if the administrator has turned those things on -- those are settings! And of course there is the + and - logic operators, phrases, and wild cards. For the geeks, you can search with a regular expression.

Jason: Is linking part of the built in ranking algorithm?
Doran: That information is captured for tracking and reporting. As of today we're not using it in ranking. The reason is that link weighting is not useful in most intra-net situations or within a single web site. It only helps in the context of a very broad web index, where links created by a huge number of people provide a kind of popularity measure. We would add linkages as a ranking feature if customers requested it, but so far they haven't.

Jason: Your FAQ page says the appliance can index data stored in relational databases like SQL Server, MySQL and Oracle but do you point it at a specific table(s) or can you tell it to only index the results of a particular query?
Doran: In the underlying Texis software, you actually point it at a table. That's not yet enabled in the appliance, mainly because in most situations we've seen, the appliance can get at all the dynamic content by HTTP. It can submit queries as needed. But we'll probably add direct database indexing in the next major release, because certainly there are situations where it would be useful.

Jason: How long does it take to set up the appliance from opening the box to having it online and indexing documents?
Doran: Setup and configuration should be 20 minutes or so. It's mainly a matter of pointing the crawler at the desired data. Although you might want spend a little more time prettying up the results page HTML! If you have a somewhat complicated web structure, where you only want some things indexed and not others, you would spend some more time defining the exclusion and inclusion rules.

Jason: What kind of support is involved / necessary from Thunderstone in order to get the box up and running?
Doran: We usually ask for the IP settings before shipping it, so that you can just plug in the ethernet and start going through the admin menus from a browser. In case of any problem we can remotely diagnose it. Of course some customers like to be talked through the initial configurations, and we're happy to do that. If you'll be indexing a public web site, usually we'll actually crawl it before we ship the box, so that when you plug it in, it's already to serve search results, and the updating will proceed in background.

Jason: Architecture: What's the operating system, amount of ram, processor speed / type, and hard disk size / arrangement on each of the appliances?
Doran: It's Linux and Intel on the inside. But the OS isn't exposed to the Appliance customer. The whole idea of an appliance is that you don't have to worry about software. Anyone who wants to get at the product at that level, can license the software without our box! As to Appliance memory, etc., those will vary depending on the capacity that you buy. The low end box will have 1GB and 40GB disk. That's all you need to index 50,000 documents or web pages.

Jason: What language is the search engine written in?
Doran: The core Texis software is written in C. The crawler application is written in our Texis Web Script, which is a compiled high-level language similar to PHP.

Jason: What are the 5 ranking knobs the user can adjust?
Doran: Closeness of query words to the beginning of a document; order of occurrence of the query words; proximity (closeness) of query words to each other within a record; rarity of query words in each document; and rarity of query words in the whole index.

Jason: What file type has been the most difficult to index from a programmatic standpoint? Flash files, video files, applets?
Doran: Not any of those. We index text and links found in any Flash file. Also text such as captions within video or graphics. Applets we see as just another frame with some JavaScript, so no problem there -- the Appliance executes the JavaScript and indexes everything found in the file. The thing that's occasionally troublesome is Lotus Notes with Domino. That tends to have a lot of different views of the same data, and web pages that are near duplicates, but different enough to confuse our duplicate detector. In the end, it typically takes some trial and error to get the exclusion settings right for Notes data.

Jason: What vertical market has required the most tuning / tweaking to your ranking algorithm on the software side?
Doran: An interesting example that we've seen is news publishing. Newspaper articles tend to have the most important material close to the beginning, so in a newspaper search application, you would give "lead bias" factor more weight. Magazine articles tend to start with an anecdote, which can actually be misleading as to what the rest of the article is about. So in a magazine archive, you'd crank that factor down.

Jason: Why is this application better for small businesses and organizations than the Google small business appliance?
Doran: Our product differs from Google's in a some key respects. One is that we allow our customers to index third-party information, that is, material you don't own, which may be on any web sites out on the internet. Google prohibits that, I guess because they don't want customers competing with their core business. Another key difference is that Thunderstone optionally licenses the underlying software. We even give out the crawler application source code. You can hack it up, create extensions, or tie it in with a larger set of applications. We'll even take back an appliance as trade-in on a software license! The software is available on all major Unix platforms, and Windows.

About the Author:
Jason Dowdell owns the blog MarketingShift.com. Jason is part of the KeywordRanking team of SEO professionals.

About ITManagementNews
ITmanagementNews answers questions for IT managers. Our experts offer real-world advise and cutting edge technology for the enterprise. ITmanagementNews is focused on Delivering IT Solutions

ITManagementNews is brought to you by:


-- ITManagementNews is an iEntry, Inc. publication --
iEntry, Inc. 880 Corporate Drive, Lexington, KY 40503
2005 iEntry, Inc. All Rights Reserved Privacy Policy Legal

archives | advertising info | news headlines | free newsletters | comments/feedback | submit article

Delivering IT SolutionsITManagementNewsNewsArchivesAbout UsFeedbackITManagementNews Home PageAboutArticle ArchiveNewsDownloadsWebProWorld ForumsJaydeiEntryAdvertiseContact