® 
| At the bottom of the page, there is a link to a print ready version. |
What is Latent Semantic Indexing ? What is the Semantic Web ? What is "On-topic" Analysis ? Yahoo's Patent Application How can we Utilize Semantic Indexing ? Latent "Semantics" Summary |
|
What is Latent Semantic Indexing ?Latent semantic indexing is a technic used by many search engines to group related documents and words (and websites) together in clusters as belonging to the same themes (topics). For this they (like Google and MS Live) use latent semantic analysis, which comes from analysis of natural languages.It's a mathematical way to describe a document by listing words and/or phrases used (in a matrix). The value for a term or a word in each matrix cell is then proportional to the number of times each term appears in the document. Latent semantic analysis transforms the relationships between words and concepts into relationships between concepts and documents. With this relationship you can do the following:
Latent semantic analysis isn't used only to find related words but also to find related phrases. As a "normal good" - spam free - document generally contains only a limited number of related phrases, while a spam document contains an excessive number (may be much over a hundred), this technique can also be used to detect spam. It's rumoured Google and Yahoo are using it this way too. As semantic analysis spreads among search engines the automatic programs start "judging" the quality of webpages and websites. Because of this small details grow in importance. A computer program doesn't make difference between important and not so important errors. An error is an error. What is the Semantic Web ?The "father" of the Internet and the Semantic Web, Tim Berners-Lee states the Semantic Web is:QUOTE: "... about creating things from data you have compiled yourself, or combining it with volumes (data bases) of data from other sources to make new discoveries." END QUOTEThe goal is to share and process data automatically, i.e. by computers, instead of manually combining documents found by the computer. The Semantic Web is about data, not about documents. It is not marking up existing HTML documents, and it is not about applying artificial "intelligence". The Semantic Web is, according to Tim Berners-Lee, about data currently in relational data bases (like search engine's data banks), XML documents, spread sheets, and other format data files. "It is not about people encoding webpages". It is about applications (programs) generating machine-readable data on an entirely different automated scale. The Semantic Web therefore doesn't require content and webpage owners to individually encode their information. The great bulk of data suitable for the Semantic Web is already sitting in data bases. What is "On-topic" Analysis ?"On-topic" analysis is what the search engines do when comparing key phrases in Internet documents ( i.e. webpages ). By analyzing hundreds of documents it's possible to find sentences that tend to occur together in "good" documents ("co-occurrence").Once the search engines have made their own directories of co-occurring sentences in specific topics these same sentences are used to determine if a new document belongs to a certain topic ( theme ) or not. Usually one or a few sentences from a group can be used to forecast the presence of other sentences or concepts from that same topic. If those other concepts are not found in a document, may be it's deemed not to be a "good document". The result of this kind of document analysis is that you have to stay strictly within one subject on each webpage not to "blur" the theme. There is a discussion on Webmaster World whether Google is extending the "on-topic analysis" to cover complete websites ( or domains ). If that's the case, then reducing number of "ill-fitting" or non-relevant ( out of topic ) words and phrases should give better ranking for all pages within that website. In August/September 2007 Google published their Touch Graph Browser, which gives a "picture" of clusters of conceptually related websites. A conceptual analysis like this is based on latent semantic analysis. It can help you to check the linkage and the neighbourhood of your own or somebody else's website. To help you to determine if your webpage is "on-topic" or not you can use key word analysis. It's quite easy to see whether all your top keywords and top key phrases are within topic or not. If they are not, may be you have to split the webpage into two - with one specific topic or sub-topic each. Yahoo's Patent ApplicationYahoo's patent application [Dec. 2006] was published in February 2008: "System and Method for Determining Concepts in a Content Item Using Context". It's all about automated use of phrases to rank search results as to relevance in respect to search queries. I have before on another page ( How Find Popular Keywords ? ) stated my opinion: "It's better to use key phrases instead of keywords".The intention is, according to the patent application, to take into account how a phrase or concept is related to other phrases or concepts in the same document ( webpage ). After that certain phrases or concepts are associated with a certain webpage for indexing purposes. This is another explanation of semantic indexing. The key phrases are then compared with a data base list of user queries. The phrases are also identified in the way they are related to certain topics ( co-occurrence ). Based on the above the ranking algorithm calculates a value number for the relevance of a page in relation to a certain concept. The frequency of a phrase's occurrence can even be compared with average occurrence in other web documents and the query logs. So it's getting clear. Once Yahoo gets to applying this, we need to use related phrases instead of keywords when optimizing our webpages. There is really no difference between Google's and Yahoo's approach. How can we Utilize Semantic Indexing ?We (webmasters, designers, website owners) don't need to use semantic indexing, the search engines will do it. What we can do, is to utilize this knowledge when building our websites and webpages.As noted above semantic indexing means the search engines start recognizing related words and phrases, depending on a webpage's (or website's) theme (topic). Not only recognizing but even expecting certain words and certain expressions (key phrases). So instead of using "dictionary synonyms" we would be better off using "semantic synonyms" instead. Ah, but how? As said we need to use "related concepts" and then find "semantic synonyms" for these. Now we are coming to what is being called siloing. We start building mini-networks among the pages on our websites. As anchor texts (link text) we use these "semantic synonyms" and then optimizing the webpages in each mini-network for those related words and phrases, i.e. "semantic synonyms". And how to find these words ? Go for instance to Google, as search term you type "~" before the word or concept for which you want Google's "synonyms" [without the "quotation marks"]. In the result you get some words in bold - these are the "semantic synonyms" you are looking for. |
|
|||||
Last updated: |
|
since Oct. 10, 2006 according to: www.digits.com/ |
|
|