What? My site doesn’t play nice with search?!

Today’s post comes from my friend, colleague, and information architect extraordinaire, Lisa Loyo.

Google Search AppliancesRecently we’ve had some new web sites that don’t work well with our Google Search Appliance (GSA).  These nearly complete sites weren’t being indexed completely and in one case, secure search could not be implemented.

Stakeholders were panicking.  Developers were scrambling to find quick fixes to big problems.   Fingers were being pointed.

All of it was preventable.

First, in a nutshell, this is how the GSA works:

The GSA crawls pages looking via links for unique URLs that match patterns such as a domain.  When it finds one, the page gets indexed.  Pages that are not indexed cannot be found.

So, why can’t I find that page?

Imagine a page that contains a list of five products.  Users can click a link to the see the next five products and so on.  The first page showed up in the search results.  The subsequent pages did not.

The problem?  The URL didn’t change on the subsequent pages!

The GSA could not find a unique URL to index.  Thus not all of the products were indexed.  Quite the conundrum!  We’ve seen this problem more than once.  It’s easy enough to avoid if thinking about search from the beginning.

Similar to the incomplete indexing, on another site the secure content and the public content were on the same page.  The level of exposure depended on whether or not a user was logged in.  Unfortunately, the URLs of the secure and public content were the same.  The secure content was not even in a separate directory so it was impossible for the GSA to differentiate between the two types of content.

Five tips to ensure your new site plays nice with the Google Search Appliance:

  1. Think through the search features you want early on.
  2. Determine what the search engine needs in order to implement the features, whether it be metadata, site structure or something else.
  3. During early prototyping phase, discuss how to include everything the search engine needs in the web site.
  4. Test the search function as soon as possible not after the site is in beta and almost ready to launch.
  5. Think creatively! It’s possible to create content just for the site search engine but not public search engines or for viewing by the public.  This can be a site map for indexing or a long list of specific items, such as directory entries to index.

Bookmark and Share

One thought on “What? My site doesn’t play nice with search?!

  1. Great advice. On public site search, you can say that the Google Search Appliance’s bot functions vary similarly to Google.com’s. A couple of points:

    1) Your Jumppages/sitemap still need to be paginated because they need to render under 2.5 mb of html because that’s all of the html that the indexer will ‘see’
    2) A great question to ask is not what needs to be indexed, but rather what needs to be found.

    Michael Cizmar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s