Google Server types
Google’s server infrastructure is divided into several types, each assigned to a different purpose:
- Web servers coordinate the execution of queries sent by users, then format the result into an HTML page. The execution consists of sending queries to index servers, merging the results, computing their rank, retrieving a summary for each hit (using the document server), asking for suggestions from the spelling servers, and finally getting a list of advertisements from the ad server.
- Data-gathering servers are permanently dedicated to spidering the Web. Google’s web crawler is known as GoogleBot. They update the index and document databases and apply Google’s algorithms to assign ranks to pages.
- Each index server contains a set of index shards. They return a list of document IDs (“docid”), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload.
- Document servers store documents. Each document is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They can also fetch the complete document when asked. These servers need more disk space.
- Ad servers manage advertisements offered by services like AdWords and AdSense.
- Spelling servers make suggestions about the spelling of queries.
The software that runs the Google infrastructure includes:
- Google Web Server (GWS) – custom Linux-based Web server that Google uses for its online services.
- Storage systems:
- Google File System and its successor, Colossus
- BigTable – structured storage built upon GFS/Colossus
- Spanner – planet-scale structured storage system, next generation of BigTable stack
- Google F1 – a distributed, quasi-SQL DBMS based on Spanner, substituting a custom version of MySQL.
- Chubby lock service
- MapReduce and Sawzall programming language
- Indexing/search systems:
- TeraGoogle – Google’s large search index (launched in early 2006), designed by Anna Patterson of Cuil fame.
- Google Caffeine (Percolator) – continuous indexing system (launched in 2010).
- Google Hummingbird – major search index update, including complex search and voice search.
- Borg declarative process scheduling software
Google has developed several abstractions which it uses for storing most of its data:
- Protocol Buffers – “Google’s lingua franca for data”, a binary serialization format which is widely used within the company.
- SSTable (Sorted Strings Table) – a persistent, ordered, immutable map from keys to values, where both keys and values are arbitrary byte strings. It is also used as one of the building blocks of BigTable.
- RecordIO – a sequence of variable sized records.
Google has around 900,000 servers in all its data centers based in world. Google’s data centers use around 260 million watts of power which accounts to 0.01% of global energy. Google doesn’t disclose the size of individual data center buildings, but journalists have managed to learn details of several sites from site plans filed with local planning boards:
- In the Dalles, Oregon, Google’s site includes three 68,680 square foot data center buildings, a 20,000 square foot administration building, a 16,000 square foot “transient employee dormitory” and an 18,000 square foot facility for cooling towers.
- The Google data center in Lenoir, North Carolina includes two buildings, according to permits on file with Caldwell County, which describe one 139,797 square foot data center, with a 337,008 square foot structure to follow. The permits say the smaller building cost $15.4 million, and the larger cost $24.5 million, according to the records.
Data center operators often standardize some of their construction process. The difference in the square footage reports for the data centers in The Dalles and Lenoir suggest that Google doesn’t standardize a single data center size (at least not on the level of MCI/WorldCom, which once built identical 109,000 square foot data centers in 25 cites). Google spokesman Barry Schnitt says Google data centers are not cookie-cutter designs, as the company is constantly updating its data center design and equipment to take advantage of the latest technological advances and efficiencies.
How much do Google data centers cost?
Back in 2007, Google reported spending $2.4 billion on data centers. By 2015 that figure rose to $11 billion. In fact, in the first quarter of 2016 alone, the company reported spending $2 billion on investments in production equipment, facilities, and data center construction. Google said it will invest an additional $600 million to expand its data center operations in The Dalles, Oregon, where the company built its first data center in 2006. Google will create a second campus in The Dalles on a 23-acre property about a mile from its existing campus, which houses three large data centers. The new campus will push Google’s investment in its Oregon operations to nearly $2 billion.