Google vs Bing for search results

Last weekend, I wanted to find out what time the Wimbledon men’s finals match began.  TV coverage started around about 1pm but I was pretty sure that wasn’t the start time. So off I went to Internet search.  And whilst I was at it, decided to compare Google and Bing.

Search Wimbledon on Google

Google search results for Wimbledon – Click image to view larger

Search for Wimbledon on Bing

Bing search results for Wimbledon – Click image to view larger

What’s interesting is that each is taking a very different approach to displaying the results.

Google seems to be trying to save you visiting a web site if a quick answer is what you are looking for. You get to see recent match results and the date/time for the next matches to take place. Yey – found what I was looking for.  You also see a variety of different sources – news articles, location map as well as the official web site.

Bing displays no information about the current tournament in its results summaries. And appears to assume you might not find what you’re looking for at the first attempt, offering a list of related searches in a prominent position over on the right of the results. Bing manages to display more results than Google in a smaller space, but it seems to be at the expense of helping decide which result is most likely to be useful.

From a personal perspective, I find the ‘Related searches’ list distracting on the Bing results page. It pulls my eyes over to it instead of reading the main results area. Google puts a list of related search links at the end of the first page. This feels more logical – if you haven’t clicked anything on the first page, maybe the results need refining.

It’s a similar story when searching for other facts, such as weather:

Search weather comparison

Yes, the UK weather this summer really is that bad…

I find I still favour Google for searches.  Quick facts can usually be found without needing to click further. Whether web sites like that outcome is another matter. But when it comes to applying these lessons for enterprise search designs, saving clicks can be a big productivity boost.

I haven’t found an example yet where Bing delivers demonstrably better search results, despite what Steve Ballmer says. Has anybody else? And of course the missing element to both is the conversation taking place in real-time. Google is starting to push it’s Google+ social network, if you’re signed in. But no Twitter, no Facebook, no chattering updates. They’re all taking place in the digital walled gardens.

Does Search Matter?

There’s been a host of news this week as the deal between Microsoft and Yahoo was finally inked. The net result: Yahoo drops its own search engine and adopts Microsoft’s Bing, increasing the market share for Bing to 28% against Google’s 65% and leaving 7% for everyone else (stats in the US)

Other chatter recently about search has been how real-time snippets of information, like Twitter updates, change the dynamics of search results. If you search for information on Google it’s unlikely you will see any Twitter results there. How can 140 characters rival an entire web page for relevance? But can it change?

Travel back in time to around 2003, and Bill Gates made a comment that, at the time, I thought was wrong. From memory it went along the lines “Search is done, there isn’t much more you can do to improve on what’s out there”. Of course, this was before Google IPO’d and people realised just how much money was being made on the back of those little text ads. All of sudden, improving search/winning market share became a much more interesting prospect.

But whilst I disagreed at the time, recently I’ve changed my mind. I think the comment was spot on for search as we know it today.

Relevance on Google is far from perfect. If you do a search involving words that have any kind of commercial value, chances are the result you really want is buried somewhere on page 5 or beyond. Hilariously, sometimes the ads help. If you are looking for a particular hotel chain to book a room, if they’ve paid enough for the ad space, they’ve got a better chance of appearing on your first page of results than with just the web site alone. SEO companies continuously learn how to exploit Google’s algorithms to promote their customers web sites in your results regardless of whether their customer is the result you want to find. But I still doubt there’s a better way than PageRank and friends to determine relevance of general web content.

Social media offers the opportunity of a new form of algorithm – instead of PageRank, evaluating a page based on the incoming links to it (a link from a high profile site carries more value than a link from an unknown site), how about introducing SocialRank? Evaluate the links shared by people that are then shared by other people. Sites like Facebook, Twitter, Friendfeed, Technorati, MySpace, etc. etc. contain vast social networks. I share a link, people like it and/or share it with their connections and so the link spreads.

Can we apply a rank to a page based on how the links spread? We probably can. But it won’t work.

Take FriendFeed for example. A high profile account like Robert Scoble’s means that everything he shares has a high probability of being noticed and re-shared. Does that mean anything? The same item shared by someone else might be ignored. A computer wouldn’t show such bias. If you want an idea of how bad search results would be if based on what people share, take a look at Twitter’s trending topics at any point in time – apparently that’s what the majority of people are twittering about. Would any of it help find what you’re looking for?

Search is useful and still a key part of the Internet. But increasingly we use other means to find information. On social sites such as Twitter and Facebook, we follow people we either trust or are interested in and discover information without ever looking for it. If I want a book, I often use Amazon and check the reviews. Random stuff to buy? Try eBay first. Need information about a topic, off I go to Wikipedia. If I’m looking for something I’ve read before and liked? I use the custom Google search on my web site – which includes everything on the web site and my FriendFeed account. FriendFeed aggregates everything I share through Google Reader, Delicious, Twitter, blog, web site, Slideshare… and any other service I upload stuff to.

Google is relegated to answering one-off questions (I couldn’t remember the Mac commands for a screen capture just now – easy result on Google) and desperate searches (old news, travel information) that struggle to be found anywhere.

How can Internet search be improved?

The easiest method to improve relevance across a broad range of content is to separate the results into buckets. Within enterprise search solutions, we call this federated results. Present Twitter and other real-time snippets in a separate list to standard web page results but on the same page. Use SocialRank for time-specific results such as news and travel, see how quickly a link spreads through social networks. Nuances could be included to make it difficult to game by one individual or organisation.

To give a simple example, the image below shows federated search results from a site I host on the Internet for clients. (No prizes for guessing the software being used.) I use it to show them what Google doesn’t tell them. In this example, I just entered the name of a company – Lloyds TSB. Results have come back from Twitter, Bing, Technorati and FriendFeed. (Flickr, YouTube and others are also included but snipped from the screenshot)

The Twitter comments aren’t pleasant. But better to know what your customers are saying and deal with it than not know until they’ve all left. I did the same test for a client recently and they found out that somebody had posted a Tweet asking if anyone from said client was on Twitter. Nobody was, they are now.

In short, search does matter but no more so than yesteryear whilst the format stays the same – a single page of mushed up results served with a side and topping of ads.

Whether you use Google or Bing will mostly come down to preference. (And some of that preference is more political than technical.) The relevance is ranked slightly differently – I find Bing seems to prefer domain name matches. But the differences are incremental and barely noticeable. As demonstrated in the two images below.


Same search as before – Lloyds TSB. Both display one non-Lloyds TSB domain result and both put it in 4th place. Interesting how they both display the top result the same but with different sub links to pick from. Decide for yourself which format is the best. Looking at them side-by-side here, I actually prefer the user interface for Bing. But people don’t switch browsers or search engines to discover what’s different. They change when they hear there is an alternative that is much better and easy to use. Microsoft’s challenge is that many people don’t even know there is a difference between a browser and a search engine, and some think Google is both (and they haven’t heard of Chrome):

Final note: this is one of the most rambling posts I’ve written in a while. If you got far enough through it to be reading this, thank you! 🙂 This was one of the posts that’s been floating around my head for months and it just needed to spew out but is far from perfect… I’ll endeavour to make the next one more to the point.

Microsoft vs Google in the Search Wars

Stop the clocks, blogging has recommenced 🙂

Couple of cheat posts coming up, starting with this one, which are really reproductions of comments I’ve left on other posts but with added juice.

Henry Blodget posted the following article on Silicon Valley Insider: It’s time for Microsoft to face the reality about Search and the Internet

It’s a great article and worth a read. Here’s the comment I left there:

I think people make a good point in highlighting that competition in search is a good thing for us as consumers. Just not sure it’s a good thing for Microsoft

Comparing with the likes of SQL and Exchange is comparing apples with pairs*, I always tell people to never underestimate just how hard MS will work to develop the winning product. But past successes have always been about bringing in a lower cost product with good enough features to compete against an expensive market leader (the business intelligence and systems management markets being two of the latest focus areas gaining ground in the enterprise software market).

Competing against ‘free’, a product used by one audience and paid for by another, is a completely different challenge and one that MS has yet to succeed in. Time will tell. But I doubt it will come from competing like for like. Google didn’t knock Alta Vista off the top by copying their business model. To take over a market means to do something different that weakens the incumbent players. Adding to the challenge is that ‘free’ or ‘freemium’ models have yet themselves to stand the test of time. Somebody somewhere always has to pay, one way or another. Making money from sales of a product or service still have far more long term potential than making money from people paying for the attention you’ve managed to capture.

And that all said, I still wouldn’t underestimate MS, Google isn’t the only one who can create ‘waves’** under cover

I suspect the ol’ Google vs Microsoft debate will rumble on for a few years yet. Steve Ballmer and quite a few influentials within Microsoft would like a big slice of the advertising market that is a fair bit bigger than the software market. But I’m still not confident that’s the right goal.

The argument goes that people are becoming used to not paying for online services. Yet Flickr has done quite well getting people on to premium accounts. Virtual worlds and multi-player online games like World of Warcraft also seem to find plenty of paying customers (I’m one of them, and a girl too – take note, Xbox team. I’m in that market that Nintendo noticed whilst considered a has-been at no.3 in the console market a few years ago). I’m also in that apparent small minority who pays for their music online. Amazon has done quite well just selling stuff that you go looking for rather than have thrown at you in a glitzy banner, eBay isn’t doing so bad either. None of them dependent on advertising revenue. The last two examples have both made money from closing the distance from customer to seller without advertising interrupting the process.

And who wants to be an advertising company anyway? Google succeeded ‘cos they managed to make the ads as unobtrusive as possible and came up with a great revenue concept (auction the search words) to get companies competing for what little ad space there is. Most people seem to dislike ads unless it is for the exact company they are looking for (I do a search for Hilton Hotels, I want the damn official web site, not a million travel web sites) or something completely original (that ends up in the ‘top 100 ads’ TV chart). Even worse are the fake web sites that get into search results only to present you with a page of even more ads than the search engine dared to show you.

Advertising is not a loved market. Microsoft is not a loved company. I don’t know if that’s the synergy they see but it’s not a great start. Without doing the research, I’m guessing the margins in advertising are not as healthy as software. And Google may be raking the cash in from ads but is pouring a fair chunk back out again, not least running the hardware to support YouTube.

What is right is the desire to create a new market. Monopolies (natural or otherwise) and market domination rarely last for very long… unless funded by government but let’s not go there today. Microsoft needs to keep testing new waters and best do it whilst there’s oodles of cash in the bank than start when it’s running out. And call me biased because I used to work there, but I still hope all this Google chatter is smoke and mirrors whilst they work on something worth paying for.

* Oh dear, didn’t notice I’d used pair instead of pear when posting the original comment, and you can’t go back and edit them. Oops.

** Google cleverly making waves (Techcrunch)

Delicious tags: Microsoft Google Business

Microsoft Search Workshop Part 2a

Originally, this series was supposed to be 4 parts. But started to realise that part 2 in one shot would be a very looooooong blog post. So here is part 2a. (Click here for part 1):

Key messages from the presentation:

Slide 3. Whilst most documents about indexing in SharePoint include a complicated diagram to explain the indexing process, here’s the simple version of how it works:

  • Connectors (aka Protocol Handlers) connect to a store and suck out its content. This should include security permissions and metadata managed by the store. Hence the connector needs to adhere to an agreed protocol to keep the store happy. Different stores manage files, security and metadata differently and require different connectors.
  • Once the indexing server has its hands on the content, filters are used to strip out the unnecessary gumpf (technical term) from within each item. If you open a MS Word document in Notepad, you’ll see a ton of square boxes before you get to any text. That’s because Notepad doesn’t understand MS Word formatting. Filters don’t care about formatting and chuck it all away to get down to the raw text and any metadata stored within the document. Different file formats need different filters.
  • So, for each item retrieved, the indexing server has a pile of raw text, metadata (some found within the document, some held by the store alongside the document), a link to the original item and security permissions (who can access it). All the metadata becomes ‘crawled properties’ and are dumped into a property store. All the individual words – the raw text – are dumped into the content index.
  • There is one additional element included – a static rank. We’ll cover that in part 3.

Slide 4. Now we have an index, people can start to query it and receive search results. When you type a query in a basic search box, here is what happens: (e.g. ‘SharePoint and enterprise search’)

  • The search query will be word broken and noise words removed – language match determines what dictionary and noise list are used (our example now becomes ‘SharePoint’; ‘Enterprise’, ‘Search’. Say bye bye to ‘and’ – it’s a noise word)
  • If you have stemming enabled (it is off by default), then the search will probably also include ‘searches’, ‘searching’ and ‘enterprises’. If the thesaurus has been configured, it may include additional acronyms for SharePoint, such as ‘SPS’. Not sure about ‘MOSS’ – technically, that’s a dictionary word). See related post – SharePoint and Stemming – for more information about stemming, noise words and the thesaurus
  • We now have a list of words that form the search query. A list of results are returned from the index that match any and all of the words in the query (when performing a basic search – advanced search enables you to only return docs that only match all words in the query, and other options)
  • The results are security-trimmed – anything that the user doesn’t have permission to see is removed. They can also be scope-trimmed, if a scope has been selected (e.g. only return docs that are less than one year old)
  • The remaining set of results are relevance ranked – a dynamic rank (calculated based on the search query terms) is added to the static rank held in the index – and returned as an ordered list

Slide 5. One of the most popular areas to customise in SharePoint is property management. An index can contain lots and lots of crawled properties. You can leverage those properties in search queries. For example, looking for all documents classified as ‘finance’ – see related post: Classifying content in SharePoint. To do this, you create managed properties – a managed property can be mapped to one or more crawled properties. The example in the slide – you might want people to be able to look up content classified by ‘customer name’, ‘customer name’ may be used across multiple different content stores with different titles. Managed properties can be added to the Advanced Search page and used in scopes. We’ll revisit this in part 2b (or 2c, depending on how long 2b ends up)

Slide 6: In a typical SharePoint deployment, you will have one single central indexing and search server. This server will index all your different content sources. If you’ve got the licences, you will probably separate search from indexing, using query servers. This means the indexing server can focus on doing indexing and propagates index changes up to the query servers. If the indexing server decides to take a break (literally), users can still search for content because copies of the index reside on each query server. It simply means there will be no updates to the index until the indexing server returns from vacation. The new feature introduced this year is federation. An indexing server can include federated connectors to other indexes. Great for accessing content not indexed natively by SharePoint and also great for spreading the indexing load. When a user submits a search query, results are returned from the central index and any other indexes with a federated connector. If you want to see this in action, try performing a search at http://www.infomash.co.uk/. You will see results returned from Flickr, Technorati, Twitter (Summize) and FriendFeed. If you want to have some fun, try http://www.infomash.co.uk/googmsft.aspx – you’ll see results returned from Google and Live side-by-side. Great for comparing how they determine relevance. (Note: it’s a prototype server, no guarantees regarding availability or performance)

Slide 7: Some capacity planning tips:

  • Already mentioned – the first scale issue you will hit is indexing server performance. Indexing will fight with search queries to win RAM and CPU attention. Put them in separate playgrounds (or, if you don’t have timezone problems, schedule indexing to only take place out of hours – but means your index will always be a day old).
  • The most popular indexing question – how big will the index be? The numbers can be quite frightening – up to 50% of the size of content you are indexing. Average is usually nearer to 20% but it all depends on your range of vocabulary. Lots of colourful language and each different word has to go into that index. Lots of metadata and it all gets stored…
  • The required disk space for the index is important – you need to allow 2.5 x the index size. This is due to the fact that you can, for a temporary period of time, have 2 copies of the index stored (to do with how changes are managed and propagated). But, it is 2.5 x the index size, not the size of the corpus being indexed (I’ve seen the latter stated at MS conferences and in some documents on TechNet – wrong.)
  • Federated connectors give you lots more flexibility in your architecture. Whilst you will mostly hear about how they let you include other indexes in your search results, there is a hidden benefit. You can split up your SharePoint indexes and then federate all the results on a single page. It won’t be one results list – each federated connector displays its results in a separate web part. Required, because each results set will have a different rank calculation. Great potential for branch office scenarios to save bandwidth and keep indexes fresher, and for dropping an indexing server onto niche applications and content stores (if they are running Windows Server 2003 or later, you can use Search Server Express and not have to pay for any extra licences)

Note: Federated connectors are currently only available in Search Server 2008. They are due to be released for SharePoint Server 2007, hopefully quite soon.

To download a copy of the presentation (3Mb) – MS-Search-Pt2a.pdf

Related blog posts:

Filed in Library: Microsoft SharePoint Microsoft Search

Technorati tags: SharePointMOSS 2007Search ServerEnterprise Search

Microsoft Search Workshop Part 1

Earlier this year, Joining Dots ran a series of Enterprise Search workshops for Microsoft UK. Its purpose was to help organisations explore what enterprise search means and what Microsoft technologies can do to help implement an effective search solution.

The workshop consisted of four sessions, containing a mix of presentations, hands-on demonstrations and plenty of discussion. Here is part 1:

Part 1 was all about setting the scene. First, exploring ‘what is enterprise search?’ Second, an introduction to the current products in Microsoft’s search portfolio. Note: at the time, the FAST acquisition had not completed.

Key messages from the presentation:

  • The most common question asked is ‘Why can’t our search be just like searching on Google?’ To begin answering that question, we need to define enterprise search – fundamentally different to Internet search. One of the challenges within many organisations has been that there is no dedicated focus on improving search. Instead it is often a feature of a larger project, such as an intranet replacement or new document management system. Before Google came along, that’s how the major Internet portals treated search…
  • Enterprise search technologies usually fit in one of three layers:
    • The simplest solutions help find what you know exists. Products are either free or low-cost and focus on ‘unstructured’ content, i.e. documents, email and web pages. Desktop search is available from the likes of Microsoft, Google and Yahoo. Network search tools include Microsoft Search Server, Google Mini Appliance and IBM/Yahoo Ominfind
    • The mid-tier typically provides a base platform for enterprise search, relatively inexpensive and focused on common requirements. Solutions should include security trimming (filtering results based on who you are and what you have permission to see) and indexing multiple sources of content. Some solutions start to move beyond unstructured content to also include people search (directories and social networks) and structured data (integrating business applications). This is the hunting ground for SharePoint Server 2007 and the Google Search Appliance.
    • The top-tier provides advanced indexing and search capabilities, such as automatic classification of content, concept-driven search interfaces and integration with business intelligence tools. Leaders in this space include Autonomy, Endeca and FAST.
  • Whilst advanced search is often the goal, many organisations would benefit from first identifying what content needs to be found. Is it just about documents? How accessible are those documents? And should enterprise search also include business applications and the ability to find people? We often prefer to seek answers from each other in the workplace… These are all questions that need to be answered if you want to implement an effective enterprise search solution.
  • Microsoft products and services span three areas of search: the web (Live.com), the desktop (Windows Desktop Search) and the intranet/company web sites (SharePoint Server 2007 and Search Server 2008)
    • Intranet search includes the ability to find documents, business data and people. Federated connectors enable results to be returned not only from multiple different content sources but also from multiple different indexes. The table in the presentation shows what features are available per product.
    • Desktop search enables individuals to query their own content, such as private email and locally-stored documents – content that is often difficult to access by intranet search tools.
    • Web search trends are worth following to see what’s likely to be coming down the line for enterprise search. On Live.com, concept-driven results enable you to refine your search query. If you do a search for videos, hovering over the video will start it playing inside your web browser…

To download a copy of the presentation (3.3Mb): MS-Search-Pt1.pdf

Filed in Library: Microsoft SharePoint | Microsoft Search

Technorati tags: SharePoint | MOSS 2007 | Search Server | Enterprise Search