Today’s post comes from my friend, colleague, and information architect extraordinaire, Lisa Loyo.
Recently we’ve had some new web sites that don’t work well with our Google Search Appliance (GSA). These nearly complete sites weren’t being indexed completely and in one case, secure search could not be implemented.
Stakeholders were panicking. Developers were scrambling to find quick fixes to big problems. Fingers were being pointed.
All of it was preventable.
First, in a nutshell, this is how the GSA works:
The GSA crawls pages looking via links for unique URLs that match patterns such as a domain. When it finds one, the page gets indexed. Pages that are not indexed cannot be found.
So, why can’t I find that page?
Imagine a page that contains a list of five products. Users can click a link to the see the next five products and so on. The first page showed up in the search results. The subsequent pages did not.
The problem? The URL didn’t change on the subsequent pages!
The GSA could not find a unique URL to index. Thus not all of the products were indexed. Quite the conundrum! We’ve seen this problem more than once. It’s easy enough to avoid if thinking about search from the beginning.
Similar to the incomplete indexing, on another site the secure content and the public content were on the same page. The level of exposure depended on whether or not a user was logged in. Unfortunately, the URLs of the secure and public content were the same. The secure content was not even in a separate directory so it was impossible for the GSA to differentiate between the two types of content.
Five tips to ensure your new site plays nice with the Google Search Appliance:
- Think through the search features you want early on.
- Determine what the search engine needs in order to implement the features, whether it be metadata, site structure or something else.
- During early prototyping phase, discuss how to include everything the search engine needs in the web site.
- Test the search function as soon as possible not after the site is in beta and almost ready to launch.
- Think creatively! It’s possible to create content just for the site search engine but not public search engines or for viewing by the public. This can be a site map for indexing or a long list of specific items, such as directory entries to index.