What does the Google data leak tell us?

What does the Google data leak tell us?
What does the Google data leak tell us?

Google: 2,500 pages of internal documents leaked

Rand Fishkin, SEO specialist and founder of SparkToro, announced this Tuesday, May 28, 2024 that he had received, from an anonymous Source, a leak of 2,500 pages of internal Google documents. These documents likely come from the company’s Content API Warehouse and offer insight into how the search engine’s algorithm works.

They suggest that the firm concealed certain aspects of its SEO system, in particular concerning NavBoost, the tool developed to improve the quality of search results using click data, and the use of Google Chrome data for ranking. contents. Rand Fishkin specifies that he consulted Google employees as well as technical SEO expert, Mike King, to confirm the reliability of the documents.

It appears to be a legitimate set of documents from Google’s Search division that contains an extraordinary amount of previously unconfirmed information about the inner workings of Google, Rand Fishkin claims.

At this time, Google has not made any statement regarding this leak.

Google leak: information contained in the documents

The revealed documents are of a technical nature and mainly provide information about the data collected by Google regarding web pages and users. These elements make it possible to deduce certain criteria used for classification. Here is some of the information revealed.

Google uses Chrome data

Google has always said it does not use clickstreams from Chrome to create its rankings, but the documents suggest the opposite. Rand Fishkin says: “In my opinion, Google probably uses the number of page clicks in Chrome browsers to determine the most popular/important URLs on a site, which go into calculating which URLs to include in the sitelinks feature.”

According to the specialist, the desire to analyze click flows was one of the main reasons leading to the creation of Google Chrome in 2008. Furthermore, Rand Fishkin indicates that “Google probably takes the number of page clicks from Chrome browsers and uses it to determine the most popular/important URLs on a site”to define which pages to include in sitelinks.

NavBoost uses click data

The existence of NavBoost was revealed in October 2023 by Pandu Nayak, vice president of search at Google, during testimony before the United States Department of Justice. The documents provide additional details on how it works, indicating that NavBoost counts the number of clicks, analyzes the bounce rate on pages and evaluates the reliability of clicks. Again, Google has until now always denied using user signals focused on clicks.

Filters are added on certain sensitive subjects

For certain sensitive queries, such as those related to COVID or elections, Rand Fishkin points out that Google has implemented “white lists”. These aim to favor sites deemed reliable, such as government authorities. This list can also extend to the private domain, such as travel sites.

Google identifies content authors

According to Rand Fishkin, the EEAT criterion (experience, expertise, authority and reliability), although put forward by Google, could “not having as direct importance as some SEOs think”, given that he is not mentioned in any of the leaked documents. However, the leak reveals that Google collects author data, including a field intended to identify whether an entity on the page is the author. Until now, Google claimed that author pages were primarily intended to improve the visitor experience, without influencing rankings.

Link indexes are classified according to three levels

Google classifies its link indexes into three levels: low, medium and high. Depending on the number of clicks and the Source of the clicks, the links will be taken into account in the site ranking or not. Rand Fishkin illustrates this information with the following example:

“- If Forbes.com/Cats/ has no clicks, it enters the low quality index and the link is ignored,
– If Forbes.com/Dogs/ has a high volume of clicks from verifiable devices […]it enters the high-quality index and the link transmits the ranking signals.”

Links considered “reliable” can transmit PageRank, while those of poor quality are ignored and therefore do not negatively affect the site’s ranking.

-

-

PREV Rothen asks that Létang not carry the Olympic flame after his refusal to release his players
NEXT this North African country has become essential in Europe – La Nouvelle Tribune