20 March 2015

Azure Search is not ElasticSearch “in the cloud”

Azure Search may be using ElasticSearch as its underlying engine but it’s not offering ElasticSearch “in the cloud”. Azure Search provides a search abstraction aimed at a specific set of use cases and the fact that ElasticSearch is being used under the hood is almost incidental.

There’s a specific trade-off at work here. You get a search service that offers elastic scaling without having to worry about infrastructure. In return you get a very simplified approach to search with very rigid schemas.

Azure Search exposes its own REST API which is a significantly different beast from ElasticSearch. For starters it means that established tooling such as NEST is not supported. A few people are trying to write their own libraries for Azure Search but these are in a very embryonic state and you’re left to work directly with the REST API.

Schema-based indexes

One of the nicer features of ElasticSearch is that it takes care of mapping object schemas to the search engine. You just add documents and can tune the way they are indexed around the edges by adding mappings.

Azure Search takes a more rigid, contract-based approach. It demands that you explicitly create indexes for documents, complete with detailed field definitions. These Indexes cannot be inferred and have to be created before any documents can be uploaded.

This means that documents have to match the schema defined in the index. Non-key fields can be excluded but any unexpected fields will cause the document to be rejected with a Bad Request error.

You have to be pretty sure about the structure of your types before you index them as there’s no support for re-indexing when if want to update existing fields. You can create new fields but you have to add them to the schema before uploading the document.

Most of your search behaviour is defined at Index creation. This is the point at which you have to decide whether you want a field to be searchable, sortable or to support facets. Other features such as tweaking the search scoring or figuring out which fields to include in suggestions can be set as you go along, though not to the same level of eye-watering depth supported by ElasticSearch.

This rigid adherence to a schema can be quite a limitation, particularly if you have a large collection on your hands. However, the implication is that this may be improved in the future as Microsoft’s documentation makes heavy use of the word “currently” when describing schema updates:

“Currently, there is limited support for index schema updates. Any schema updates that would require re-indexing such as changing field types are not currently supported. “

Document structure

The support for data types is limited to the main primitive data types and there’s no support for the arrays, objects or nested types found in ElasticSearch. This does rather limit what can be done with a schema as they can be little more than highly indexed database tables.

Beyond the basic types they have thrown in string collections and geo-locations. The inclusion of these are indicative of the kind of scenarios that the service is aimed at, i.e. social content and simple eCommerce. For example, their published case studies include Autotrader.ca who have used the service to provide a mobile vehicle search and Photosynth who index images associated with tags and a geolocation. Azure Search is an abstraction that seems to mask a great deal of functional complexity in return for a simple learning curve.

Searching

Azure Search does not expose ElasticSearch’s search interface where complex queries can be passed in a request body. A simple query string syntax is exposed for basic text search and any more structured queries can be supported via an OData implementation.  Well, this is Microsoft.

Most high-level search features can be accessed through this interface, such as partial or fuzzy matching, language awareness, hit highlighting, facets and suggestions. However, many of the more advanced features of ElasticSearch are not exposed as word proximity and fine-grained control over relevance. The focus is on providing search for simple, one-dimensional entities rather than complex document storage and retrieval.

What’s the appeal?

Why bother with Azure Search if it’s just a dumbed-down version of ElasticSearch with rigid schemas?

From a functional point of view there’s not a great deal to recommend Azure Search unless you have very simple search requirements. It also helps if you’re committed to the Azure infrastructure  as the indexer functionality that streamlines data source integration only currently works with Azure-based data , i.e. SQL Azure and DocumentDB.

The main win here seems to be the PaaS argument, i.e. it saves you from having to set up, manage and fine tune a search infrastructure that can provide elastic scale. You don’t need to worry about sharding or esoteric tuning features. The trade-off is that you accept rigid schemas  and a more limited search API.

This is quite a bonus, as a fully elastic search infrastructure is not trivial to set up and run with ElasticSearch. That said Azure Search feels like a service very much in its infancy. I even managed to elicit a default ASP error page from one badly-formatted request. It will need some maturing time and schema flexibility before it can provide a search beyond a narrow set of simple use cases.

Filed under Architecture, Azure.