What Are LSI Keywords and Why They Matter in SEO

LSI Keywords has been a controversial topic over the last few years. To ensure that their content ranks high on Google, business owners, marketers, and SEO specialists are always on the lookout for Google’s ranking factors. So here comes the highly debated question:

“Are LSI Keywords one of Google’s ranking factors?”

This article seeks to answer this question and even more.

Here’s what you’ll learn:

What Are LSI Keywords?
What Problem Does LSI Keywords Solve?
How Do LSI Keywords Differentiate Polysemic Words and Synonyms?
Are LSI Keywords One of Google’s Ranking Factors?
The New Advanced LSI Keywords: Contextual Terms
How To Find LSI Keywords (Contextual Terms)
How To Use Contextual Terms
Does Contextual-Based SEO Work?
Summary

What Are LSI Keywords?

LSI keywords are words or phrases that are semantically related to the main keyword to supplement it with more context.

Developed in the 1980s, Latent Semantic Indexing (LSI), also known as Latent Semantic Analysis, is a Natural Language Processing (NLP) technique that aims to better understand a search user’s intent when entering a query into the search engine.

For example, if your main keyword is “cake”, some LSI keywords would be batter, flour, butter, oven, and bake. These words help build the context around the word “cake” and show that it means the dessert you eat at birthday parties or weddings.

A mind map showing some LSI keywords for the main keyword "cake".

So what’s the importance of LSI and LSI keywords?

According to Google, it has indexed hundreds of billions of web pages. Out of these billions of possible pages, how do you think Google decides which specific ones to show you when you type something into its search engine?

The first thing that would probably come to mind is that after receiving your query, search engines look for pages that contain matching terms to the ones you used and chooses those specific pages. For example, if you type in the word “waffles”, you would think that search engines will return pages that contain the exact word you typed in (“waffles”), right?

A graphic showing a picture of waffles when the word "waffle" is typed into a search engine.

Although it sounds like the most obvious method, it can be highly inaccurate. Many words have different meanings (polysemy), so matching literal terms has a risk of returning highly irrelevant pages to a search user. Additionally, different words can mean the same thing (synonymy), so only matching exact terms severely limits a search engine’s results to its users. Therefore, matching literal terms to a search user’s query proves to be insufficient and inaccurate.

A better, more intuitive way would be to retrieve information based on the conceptual relevance of a topic, which is what LSI tries to achieve. It does this by looking for the underlying (“Latent”) relationship between words (“Semantic”) to improve information retrieval (“Indexing”).

What Problem Does LSI Keywords Solve?

Ultimately, Google’s mission has always been the same: to match the best page to a search user’s query. In order to do this, Google needs to understand what exactly a search user means when they look up a word on the search engine. However, the complexity of the language construct poses many challenges for Google to achieve this.

One such challenge is when search users use polysemic words and synonyms in their queries.

For example, if I were to type in the word “date” on Google, how does it know whether I’m looking for today’s date, a romantic meeting with someone, or the fruit?

This is what we call polysemic words: words that have different meanings.

A graphic showing the different meanings of the polysemic word "date".

For example, the word ‘right’ could mean morally good, justified, or acceptable, or it could also mean the direction opposite left.

Likewise, the word ‘fan’ could refer to an apparatus with rotating blades that creates a current of air for cooling or ventilation, or a person who has a strong interest in or admiration for a particular person or thing.

On the other hand, synonyms are different words that mean the same thing.

Some examples of synonyms are easy and simple, big and huge, test and exam, etc.

A graphic showing three examples of synonyms, which are different words that mean the same thing.

A user could run a search for ‘computer’ using the terms ‘pc’ or ‘desktop’, or even ‘laptop’. A search engine that does not understand synonyms would consider those search terms independent of each other, and it would not return all the relevant results the user wants.

The same holds true for regional variations in the words used to describe the same thing. A US user looking for french fries would query ‘fries near my location’ while a UK user would search for ‘chips near my location’.

A competent search engine would have to understand that both users are probably looking for the same kind of food.

How Do LSI Keywords Distinguish Polysemic Words and Synonyms?

Latent semantic indexing (or latent semantic analysis) solves the issue of distinguishing polysemic words and synonyms using statistical methods. Latent semantic analysis analyzes the statistical co-occurrences of words that appear in proximity of each other in a set of documents to infer if they are contextually or semantically related to each other. This is what data scientists call Word Embedding in data science & Natural Language Processing.

In that sense, LSI keywords are keywords that are semantically or contextually related to the keyword you are targeting. For example, if your seed keyword is ‘car’, the corresponding LSI keywords would be ‘automobile’, ‘vehicle’, or even ‘cars’.

Some argue that LSI is not suitable for search engines because it was devised for smaller corpus of documents in the pre-internet era, and is impractical to be applied to such a vast amount of data as the world wide web. Although this is true from a search engine’s perspective, here’s why it is irrelevant: when doing SEO, we’re just looking for a particular keyword, which doesn’t have such a huge amount of data for analysis purposes.

When we’re dealing with specific keywords, we are dealing with only the Top 10-20 search results, which is why these very small sets of data makes LSI a wonderful technique for related keyword extractions.

Are LSI Keywords One of Google’s Ranking Factors?

Google has over 200 ranking factors — half of them are confirmed, while the other half… not so. One of those confirmed factors verified by Google themselves are none other than backlinks — this sent SEOs and website owners into a frenzy for the best website backlink checker they can get their hands on.

But the truth is, a lot of ranking factors are assumptions by the SEO community based on successful experiments and trustworthy rumors.

A few years ago, discussions were rife that Google uses latent semantic indexing in their search algorithm, and using LSI keywords in your content could improve your content’s ranking in the search results.

It makes sense to assume Google uses LSI in their search algorithm because it is imperative for them to be able to distinguish polysemic words and synonyms in order to accurately decipher their searcher’s intent.

However, in 2019, Google’s Search Advocate, John Mueller, poured cold water over the notion by confirming that Google does not use LSI keywords in their search engine.

On top of that, Google’s papers on their search algorithm made no mention of latent semantic indexing or latent semantic analysis, as with whitepapers from other search engines.

So, here comes the question again, “Are LSI Keywords one of Google’s ranking factors?”

The real answer? None of us will know (only Google knows)
The likely answer? No… but also somehow yes.

To really answer this, we need to visit the development history of Natural Language Processing, NLP.

In simple terms, NLP is the process of teaching computers to understand human language.

A timeline of the development history of Natural Language Processing, from Word Embeddings, to Word Vector, to BERT.

One of the very first few techniques of NLP is Word Embedding, a form of word representation that allows words with similar meanings to be grouped “closer”. And LSI? It’s one of the pioneers of the Word Embedding technique back in the days.

Then in 2013, as researchers try to break through Word Embedding, Word Vector (Word2Vec) was born. Building upon the concept of Word Embedding, researcher Tomas Mikolov developed Word Vector to help computer understand languages faster.

And finally, in 2018, Word Vector further evolved into BERT (Bidirectional Encoder Representations from Transformers). As opposed to the single-directional nature of Word Vector, BERT approaches NLP in a bidirectional way, which enables Google to understand search terms better the way a human does.

BERT takes into account other words in a search sentence and the relative location of the word in the sentence, to infer the context in which the words are being used, so it would more accurately reflect the searcher’s intent.

The model also classifies search terms into topical entities rather than sort each word independently. For example, the query “Bond” and “James Bond” are two different keywords, but they would be classified under the same entity.

By the end of 2020, Google started using BERT in almost every English-language query.

Considering this NLP development from Word Embedding, to Word Vector, and finally to BERT, we can say that no, Google probably doesn’t use LSI per se in their algorithm today.

But BERT that is highly popular among the SEOs today? Its history and roots go back to the basic, which is Word Embedding used by LSI.

A good analogy I would like to raise is a car engine. Rewind 30 years, and all car engines exist in the form of Gasoline engine; whereas today, they have evolved into advanced forms of engines like Plug-in Hybrid, Hydrogen Engine, and Pure EVs.

All these new engines are definitely better, but the plain old gasoline engine, despite being the older tech, still gets you from point A to point B.

And LSI? LSI is exactly like the gasoline engine; sure, it’s no longer the fanciest around, but it still does the job of building context for your content and move up the SERP ranking.

The New Advanced LSI Keywords: Contextual Terms

Just because Google does not use LSI keywords does not mean that LSI keywords are irrelevant in SEO.

On the contrary, now that Google is able to understand search terms the way humans do, using LSI and semantic keywords has never been more relevant. Having contextually and semantically related terms to your target keyword tells Google that your content is highly relevant.

It also renders your content much more appealing to human readers and by extension, Google’s BERT algorithm. Well-written content does not regurgitate the same keyword over and over again but peppers the article with contextual and related terms to make it more delightful to read.

Google even confirms so:

A screenshot of an excerpt from Google's "Ranking results" article that reads: "Just think: when you search for “dogs”, you likely don’t want a page with the word “dogs” on it hundreds of times. With that in mind, algorithms assess if a page contains other relevant content beyond the keyword “dogs” — such as pictures of dogs, videos, or even a list of breeds.

In other words, your blog post about dogs will rank better in search results if it contains related keywords like breeds of dogs and dog-related peripherals like leashes or harnesses.

In an effort to keep up with the everchanging Google algorithm to understand search users’ intent and match the best content for them, we at LSIGraph will always update our algorithms accordingly.

The key idea is still the same: to reinforce content relevancy so that it matches a user’s search intent, consequently helping it rank higher on Google.

Keeping in mind that LSI is still a building block of today’s BERT, LSIGraph has further improved our algorithm to incorporate a BERT-integrated machine learning model to return our users contextual and semantic keywords that would increase their content’s relevancy to Google’s ranking algorithms.

This upgrades LSI keyword into its more advanced form, “Contextual Terms”.

How To Find LSI Keywords (Contextual Terms)

Now you might wonder: how do you look for these Contextual Terms to add to your content?

There are 3 ways to do it.

First, you could do it manually. You can type in a specific keyword of interest into Google and scour the Top 10 pages one by one. From there, you would need to extract words and phrases used by those pages that are semantically and contextually related to your chosen keyword. Also keep in mind that you’d need to take note of the usage frequency of these keywords as it helps you gauge the most popular ones to include in your own content.

This way is doable but might be tedious, time-consuming, and you might miss out on some keywords.

Another way is by using Google’s NLP API, an interface that shows you how search engines perceive and evaluate the quality of a piece of text. You would need to feed the API a piece of content (preferably from the Top 10), which it then uses machine learning to divide the content into four categories:

Entities: Entities categorizes specific words or phrases into different categories like location, consumer good, person, event, price, and more.
Sentiment: Sentiment analyzes the emotion of the content to determine whether it’s positive, negative, or neutral.
Syntax: Syntax analyzes the language structure of the content and provides linguistic insights.
Categories: This shows you the overall category the content falls under.

A screenshot of Google's NLP API Syntax Analysis. — *Google’s NLP API’s Syntax Analysis*

Next, you would need to compile this information into a spreadsheet, and based on each of the four categories, decide on the keywords that are relevant to your niche, and that Google deems important.

The whole process is quite complicated, as Google’s NLP API is a complex tool, so there is a steep learning curve and you would need to spend days analyzing the data provided by the API.

The third, and easiest way, would be to use an SEO keyword research tool like LSIGraph. You would only need to input a seed keyword, which then returns a list of contextual terms you can add to your content to build relevancy. The best part? This process takes less than 10 seconds! You can skip all the hardwork and hassle, sit back, and just let LSIGraph do it for you.

A screenshot from LSIGraph showing a list of suggested Contextual Terms.

How To Use Contextual Terms

Now that you have your contextual terms (either the easy way or long, winded way), here comes the next part: Placement and Frequency. How do you know where to place them in your content and just how many should you use? Does it even matter?

There used to be a time where randomly placing as much keywords as possible into your content would work. But that was the old days.

Now, Google is much, much smarter. Randomly placing keywords where they don’t belong and overstuffing your content with them won’t cut it anymore. In fact, you could even get penalized for doing this and risk being banned by Google.

Hence, you would need to place your contextual terms in your content strategically and optimally use it so as not to use too much to the point of overstuffing and not too little that even Google won’t know it’s even there.

Generally, you would want to intersperse Contextual Terms naturally throughout your content.

The best places to include them would be:

Headings
Meta description
Anchor texts
Image Title, File Name, and Alt Text
Content body (obviously)

At times, you might struggle trying to figure out how you can add these contextual terms in your content. This is where using LSIGraph comes in handy. By clicking on a particular contextual term, you’ll be shown some “Examples of Use”. These examples of use show you how the Top 10 pages are using the contextual terms in their content, so that you can gain some ideas on how to include them for your own.

A screenshot from LSIGraph that shows some Examples of Use extracted from top 10 pages for the contextual term "search engine".

Also, you can easily figure out how many times to use the contextual terms in your content so that it’s optimized. LSIGraph tells you how many times you’ve used a contextual term in your content so far, and how many times you should be using it.

A screenshot from LSIGraph highlighting the set of numbers next to a contextual term. — *The set of numbers show the usage vs. optimal frequency of a contextual term*

LSIGraph takes out all the guesswork and tells you exactly where to place the contextual terms and how many times to use them so that your content is as highly optimized as possible.

Does Contextual-Based SEO Work?

Promising that something works isn’t nearly as convincing as showing that it works.

So, to answer the question as to whether contextual terms actually help in boosting a content’s SEO, let’s look at some data.

Three months ago, we published a new page about keyword research on LSIGraph. In case you don’t know, “keyword research” is a very, very competitive word.

It has the highest keyword difficulty, and a very low opportunity score (OS). This means it would be extremely difficult to rank for this keyword and see any traffic without extremely accurate optimization.

A screenshot from LSIGraph showing the phrase "keyword research" having a keyword difficulty of 100, and an Opportunity Score of 32. — *Keyword research’s high keyword difficulty*

Of course, because we know LSIGraph works, we used our own tool to optimize the page to help it rank.

Previously, we used to only include more related keywords in our content to optimize our page. Although this strategy helped us rank for low-hanging fruit keywords, it barely did anything for the more competitive keywords.

So this time, we optimized our page by adding as many contextual terms as possible, according to the suggested frequencies. We also did some on-page optimization based on the suggestion list provided to boost our content score further so that it’s in the well-optimized range.

A screenshot from LSIGraph that shows an arrow from the suggested list of Contextual Terms pointing to the text editor.

And the result?

In just three months’ time, our newly published page saw upward-trending traffic on Google Search Console.

A screenshot from Google Search Console of a traffic graph that has a linearly increasing trend.

Again, this is for an extremely difficulty keyword, so gaining any traffic is already a feat, let alone an upward-trending one!

In fact, we even ranked for a few other keywords related to “keyword research” that also had high keyword difficulty.

A screenshot showing a few keywords related to "keyword research" that have keyword difficulties ranging from 81-97.

Circling back to the question “Does contextual-based SEO work?”, the answer is yes.

Based on the data from our own article, optimally adding contextual terms helped us gain exponential traffic for a highly competitive keyword, and as a cherry on top, for other high-difficulty keywords as well.

Summary

The easy, simple answer to the question, “Is LSI Keywords one of Google’s ranking factors?” would be no, it isn’t.

However, this doesn’t mean that it’s entirely insignificant.

Because LSI is a part of what eventually evolved into BERT that is ubiquitously used by Google today, it still has relevance in what helps a content rank.

Similarly to how the Word Embedding technique progressed to today’s BERT, LSIGraph has also evolved from using LSI keywords to something even better: Contextual Terms.

Not discarding totally the idea of LSI keywords, but instead building off of it, Contextual Terms help to build your content’s contextual relevancy so that when Google’s bots crawl it, it understands what your content is about and therefore ranks it for the right audience.

So instead of wasting time getting hung up about whether LSI keywords matter or not, we can focus on using Contextual Terms that has proven to bring actual results to the table. Of course, finding contextual terms is one task, but knowing how to use them is another.

To make your life just a tad easier, we would highly suggest you to use an SEO keyword research tool that does all the work for you. It needs to be able to effectively come up with these contextual terms and tell you the best way to use them for your content.

This tool in question? It’s staring at you right in your face – get LSIGraph today!