Back to the blog home page Subscribe to our RSS FeedView our Facebook PageFollow us on Twitter width=View our YouTube ChannelVisit us on Foursquare WMpS Main Site
Jun 11, 2010

Posted by Tom Walker in Search Engine Optimisation (SEO) | 0 comments

Google Caffeine – New Search Index Completed

Google Caffeine – New Search Index Completed

Last year Google spoke of plans to implement an update that would give near real time search results in their index. Well on Tuesday they announced that they had completed their work and that it had been rolled out across all of their data centres, regions and in all languages. This update is called Caffeine. Caffeine is a new indexing system that will provide 50 percent fresher results than the old indexing system and will be the largest collection of web content Google has ever offered.

A Little History Lesson

Back in around 2000 Google would update its index once every 30 days (before then it was once every four months!) but there were certain events that made it clear this was no longer sufficient. Events like September 11th made it clear that freshness and real time results mattered. Many news websites went down due to the demand and Google decided to show the cached versions of these pages to meet the demand.

In 2003 Google introduced the Incremental Indexing System which would crawl 10 percent of the web and update the index every night with what it had found. This update was called Fritz and would crawl the web in batches. Whilst this system was constant, all pages in the batch had to wait until the entire batch had been completed before they could be added to the index.

How Caffeine is Different

Caffeine is intended to better deal with the ever expanding and evolving internet. The new system is constantly crawling the web but instead of having waiting for it to complete other processes, as soon as it indexes a page it will process that page through the entire indexing system and add it to the index. This has already resulted in a 50 percent fresher index than before.

“With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before – no matter when or where it was published.”
Carrie Grimes – Google Software Engineer

Google Caffeine

What This Means for Search Engine Optimisation

While Google’s index hasn’t got significantly larger at the moment, Caffeine will certainly make that possible. The web is ever expanding and with the new indexing system taking up nearly 100 million gigabytes of storage in one database and adding new information to that at a rate of hundreds of thousands of gigabytes per day, it’s going to need significant storage capacity.

Head of Web Spam at Google, Matt Cutts, said that, “Caffeine benefits both searchers and content owners because it means that all content (and not just content deemed “real time”) can be searchable within seconds after it’s crawled.” He also added, “It’s important to realize that caffeine is only a change in our indexing architecture. What’s exciting about Caffeine though is that it allows easier annotation of the information stored with documents, and subsequently can unlock the potential of better ranking in the future with those additional signals.”

This indicates that Caffeine is not a change to Google’s algorithm. It does mean that if Google decide to take into account a different metric or field of data in the future (one that Caffeine now gives them the ability store) they will not have to build new code to take advantage of it. So while Caffeine itself mat not directly affect rankings, it could impact on them at a later date.




Leave a Reply

Follow WMpS on Twitter