Some believe examining this codebase is distracting and has nothing to influence their SEO-related decisions. Of course, Yandex is not Google. However, both are state-of-the-art web search engines that remain at the cutting edge of technology.
Software engineers from both companies attend the same conferences (SIGIR, ECIR, etc.) and share findings and innovations in Information Gain, Natural Language Processing/Comprehension, and Machine Learning. Yandex had a presence in Palo Alto and Google previously in Moscow.
A quick LinkedIn search reveals several hundred engineers who have worked at both companies, though we don’t know how many are working on Search at either company.
In a more direct overlap, Yandex also uses open-source technologies critical to innovations in Search, such as Google’s TensorFlow, BERT, MapReduce, and, much lesser extent Protocol Buffers.
So, while Yandex is definitely not Google, it’s not just some random research project we’re talking about here. There is a lot we can learn about how a modern search engine is built by examining this codebase.
Leaked Codes Have 17,854 Rank Factors
A deep look at the code base reveals that Yandex has a large number of ranking factor files for different subsets of query processing and ranking systems.
When we scan them, we see that there are 17,854 ranking factors in total. These ranking factors include various measures of:
Clicks
Dwell Time
Data obtained using Metrika, Yandex’s equivalent of Google Analytics.
Yandex’s Top Priority Negative Ranking Factors
In summary, these factors suggest that to get the best score, you should:
Avoid ads
Update old content instead of creating new pages.
Make sure that most of the backlinks to your site have branded anchor text.
Yandex’s Top Priority Positive Ranking Factors
For your rankings to be positively affected, you must:
Play word games while creating your domain name
Make sure your domain is .com
Encourage people to search for your target keywords in Yandex Bar
Keep getting clicks
There Are Many Unexpected First Ranking Factors
The more interesting first-weighted ranking factors are the unexpected ones. Below is a list of seventeen factors that stand out.
FI_PAGE_RANK: +0.1828678331 — PageRank is Yandex’s 17th highest weighted factor. They had previously completely removed backlinks from their ranking system, so it’s not surprising that it’s this low on the list.
FI_SPAM_KARMA: +0.00842682963 — The SPAM hash gets its name from “antispammers” and is the probability that the server is spam; Based on whois information.
FI_SUBQUERY_THEME_MATCH_A: +0.1786465163 — How closely the query and document match thematically. It is the 19th highest weighted factor.
FI_REG_HOST_RANK: +0.1567124399 — Yandex has a host (or domain) ranking factor.
FI_URL_LINK_PERCENT: +0.08940421124 — The ratio of links with URL (rather than text) to the total number of links.
FI_PAGE_RANK_UKR: +0.08712279101 — Has a specific Ukraine PageRank
FI_IS_NOT_RU: +0.08128946612 — It is a positive thing that the domain name is not .RU. The Russian search engine doesn’t trust Russian sites 🙂
FI_YABAR_HOST_AVG_TIME2: +0.07417219313 — This is the average wait time reported by YandexBar
FI_LERF_LR_LOG_RELEV: +0.06059448504 — This is link relevance based on the quality of each link FI_NUM_SLASHES9417
FI_ADV_PRONOUNS_PORTION: -0.001250755075 — The ratio of pronoun names on the page.
FI_TEXT_HEAD_SYN: -0.01291908335— Presence of [query] words in the title, taking into account synonyms.
FI_PERCENT_FREQ_WORDS: -0.02021022114 — The ratio of the number of words, which are the 200 most frequently used words of the language, to the total number of words in the text.
FI_YANDEX_ADV: -0.09426121965 — More specific with the dislike for ads, Yandex penalizes pages that contain Yandex ads.
FI_AURA_DOC_LOG_SHARED: -0.09768630485 — The logarithm of the number of non-unique text fields in the document.
FI_AURA_DOC_LOG_AUTHOR: -0.09727752961 — The logarithm of the number of text fields for which this document owner is recognized as the author.
FI_CLASSIF_IS_SHOP: -0.1339319854 — Apparently, Yandex will pay less attention to you if your page is a store.
When we examine these strange ranking factors and the factors available in the Yandex codebase, we see that many things could be ranking factors.
Mike King suspects that the “200 signals” that Google reports are 200 signal classes, and each signal combines many other components. According to King, just as Google Analytics has dimensions associated with many metrics, Google Search probably has classes of ranking signals consisting of many attributes.
Chris Long — Yandex prioritizes content close to the homepage
Yandex Digs Google, Bing, YouTube, and TikTok!
The codebase also reveals that Yandex has many parsers for other websites and related services. Also, Yandex has parsers for various services as well as their own.
What Can We Add to What We Know About Google from the Yandex Leak?
Naturally, this is still the question on everyone’s mind. While there are certainly many similarities between Yandex and Google, the truth is that only a Google Software Engineer working on Search can definitively answer this question.
Still, this is the wrong question.
Indeed, this code should help us expand our thinking about modern search. Much of the collective understanding of search comes from what the SEO community learned through testing in the early 2000s and from the mouths of search engineers when the search was much less opaque. Unfortunately, this hasn’t kept up with the fast pace of innovation.
The insights from the Yandex leak’s many features and ranking factors should yield more hypotheses that need to be tested and considered for ranking in Google. They should also offer more that can be parsed and measured by SEO crawling, link analysis, and ranking tools.
Mert Erkal is the founder of Stradiji, which has been providing consultancy services on Search Engine Optimization (SEO), SEO Friendly Content Production and Optimization, and Conversion Optimization since 2009. SEO consultancy of enterprise companies is Mert's unique expertise. He has been sharing and commenting on weekly critical developments from the SEO world for about three years with his newsletter "SEOs Diners Club." With the advantage of remote working, he continues to provide SEO consultancy to English-speaking countries, especially the United States, Australia, and the United Kingdom.
Follow on LinkedIn