Last week Google announced the release of GSA 6.0, software which runs on Google Search Appliances used by enterprises to provide a Google-like search experience for intranet-based content. Major features of this release include:
- New options for combining and coordinating the operation of multiple appliances.
- Support for a new appliance offering, the GB-9009, and enhanced support for the existing GB-7007. The GB-1001, GB-5005, and GB-8008 are being retired.
- A new Policy ACL construct which implements “early binding” security trimming of search results (only return content to which the user has access).
The GSA home page and Google blog posts announcing the new release aren’t entirely clear on which of the listed features are fully supported, which are considered “beta,” and which are consider a “Google Lab Feature.” However, the Guide to Software Release 6.0 summarizes this information well.
Online Resources
Overall Impressions
As much as Google likes to tout user features, such as query suggestions or user user-added results, these features are still considered in “beta” and most enterprises won’t turn these on until they are fully supported. For all intents and purposes, the new supported capabilities of this release are mostly infrastructure-focused.
New software features permit multiple appliances to operate as a single federated collection of appliances. This comes at the same time Google is completing a turnover of its entire GSA product lineup. The GB-1001, GB-5005, and GB-8008 appliances are being replaced with the GB-7007 and GB-9009. By enabling the appliances to work together customers can now scale their search capacity by adding more appliances rather than by upgrading to a higher capacity appliance.
The new Policy ACL feature looks to be an innovative approach to security trimming search results. However, only adventurous enterprises will benefit from its use at this time. Policy ACLs empowers the appliance to make security access decisions itself (without the assistance of content source repositories) before returning search results. In other words, only return links to content to which the user requesting the search has access.
Prior to this release GSA provide this capability through “late binding” techniques which deferred to the content source repository for access decisions at query-time, a much less efficient method. However, the Policy ACL feature only provides the building blocks (e.g., administrative screens, APIs, new databases on the appliance) for enabling early binding security trimmed search results. Content repository connectors (e.g., for searching Sharepoint, Documentum, and others) available for use with GSA still use the late binding methods.
The rest of this blog post provides additional details on these new features.
Details on combining and coordinating the operation of multiple appliances
Two new features in GSA 6.0 enable flexibility in how appliances are purchased and deployed. The feature called “GSA-to-GSA Unification” in the announcement is referred to as “Federation” and “Dynamic Scalability” in the product documentation. This enables multiple search appliances to be deployed in different locations to work together by providing a combined user experience that search across all of the appliances. An appliance designated as the “primary node” works with other secondary node appliances to coordinate the entire setup. Each appliance can operate independently serving search results for collections they manage. In addition, a search submitted to the primary node appliance can cause the search to be executed (federated) across all participating appliances. The primary node appliance then combines these results into a single set of search results for the user.
This is an intriguing capability that can enable large enterprises to distribute appliances around the world to search local content stores while still offering a unified/global search experience for those who need it. However, properly configuring the appliances to work cooperatively looks to require significant planning, especially when dealing with secured content (which most enterprise networks deal with).
This feature can also be used to permit multiple appliances to work together so enterprises can scale their GSA farms horizontally (by adding more appliances) rather than upgrading to a larger capacity appliance. However, the “Multibox” feature also mentioned in this announcement will make this much easier.
Multibox provides a method of making multiple appliances operate as a single large appliance by sharing crawling and index serving duties. However, although included in GSA 6.0, this feature is still considered in beta.
Details Regarding Support for Early Binding Security Trimming
GSA 6.0 supports the use of early binding for checking access to content. Early binding is an approach for filtering search results so only content is returned to which the user submitting the query has access. Another way this has been described is “security trimming” of search results.
Earlier versions of GSA only supported late binding to implement security trimming. Late binding methods perform an access control check during a search operation (therefore, it is done late in the process, as opposed to early binding, which checks early in the process, during indexing). In short, GSA late binding impersonates the user submitting the search and tries to access a piece of content before displaying a link to it. If the access check fails then the matching search result is not shown. [for a more discussion on early versus late binding read this presentation from Mark Bennett and Miles Kehoe given at the 2008 Enterprise 2.0 Conference]
The challenge in implementing early binding is in the complexities that can arise when indexing content from multiple sources. With early binding, access control information is added to the search index as additional metadata stored along with the content in the search index. This additional metadata allows query processing to take into account the security semantics of the user submitting the search (my ID, my groups, what roles I assume). So, a search request like “vacation policies” is translated into something like “vacation policies which user smith has access to.” There is an index-time component to early binding (i.e., store the access control list along with the content) and a search-time component (i.e., determine who is submitting a query and what groups they belong to). With early binding there are no additional post search query access checks to external systems required (like late binding).
GSA implements early binding through the use of a construct called “Policy ACLs” (Policy Access Control Lists). These are made up of a URL pattern (these define the URLs this policy secures) and a list of allowed users and groups. Policy ACLs can be added through administrative screens (one by one via a simple web form or uploaded via a specially formatted text file), programmatically via the new Policy ACL API or as exact-match URL patterns embedded within feeds along with content and metadata (however, at this time, I cannot find the documentation for defining a Policy ACL within a feed). Identities of users are determined by the same methods the appliance authenticates (HTTP Basic, NTLM, Kerberos, etc.). Groups can be mastered by an LDAP directory or within a group database stored on the appliance that can be programmatically updated via the Policy ACL API.
Google has prebuilt content connectors for SharePoint, Documentum, LiveLink, and FileNet. These enable the GSA to index and provide search for these repositories. However, none of these connectors have been updated to support GSA early binding. To properly support early binding these connectors will have to map the security semantics of each of the source repositories to GSA’s Policy ACLs via the ACL API or within a feed.
So it appears the only enterprises that will benefit from this level of support for early binding are those capable of writing custom feed connectors that map the security semantics of their source application to what is provided by GSA. It will be interesting to see if Google developers add early binding support to their standard GSA connectors anytime soon (or at all). Their challenge will be in normalizing the security semantics of these content repositories with the semantics of the GSA’s Policy ACL construct.