In computing, spamdexing (also known as search engine spam , search engine poisoning , black-hat SEO , < b> search spam or web spam ) is a deliberate manipulation of the search engine index. This involves a number of methods, such as repeating unrelated phrases, to manipulate the relevance or superiority of indexed resources, in a manner inconsistent with the purpose of the indexing system.
This can be considered as part of search engine optimization, although there are many search engine optimization methods that improve the quality and appearance of website content and serve content that benefits many users. Search engines use various algorithms to rank relevancy. Some of these include determining whether a search term appears in body text or the URL of a web page. Many search engines examine spamdexing examples and will remove suspected pages from their index. In addition, search engine operators can quickly block a list of results from all websites that use spamdexing, possibly alerted by a user's complaint about a fake match. The emergence of spamdexing in the mid-1990s made the leading search engines less useful at the time. Using unethical methods to make the website rank higher in search engine results than it would normally be called in the SEO (search engine optimization) industry as "black hat SEO". These methods focus more on breaking the rules and guidelines of search engine promotion. In addition, the perpetrators run the risk of their websites being severely punished by Google Panda and Google Penguin search-ranking algorithm results.
General spamdexing techniques can be classified into two broad classes: content spam (or spam terms ) and link spam .
Video Spamdexing
Histori
The earliest known reference to the term spamdexing is by Eric Convey in his article "Porno sneaks back to the Web," The Boston Herald, May 22, 1996, where he said:
Problems arise when site operators load their web pages with hundreds of foreign terms so that search engines will list them among legitimate addresses. This process is called "spamdexing," a combination of spamming - an Internet term for sending users unsolicited information - and "indexing."
Spamdexing is a search engine spam practice. It is a form of Search Engine Optimization (SEO) spamming, which is the art of creating websites that appeal to major search engines for optimal indexing. Spamdexing is the practice of creating websites that will be illegally indexed with high positions in search engines. Spamdexing is sometimes used to try and manipulate a search engine's understanding of a category. The goal of a web designer is to create web pages that will find favorable rankings in search engines, and they make their pages according to the standards they believe will help. Some of them use spamdexing, often without the knowledge of their clients.
Although spamdexing has disrupted the search for information on the internet, steps have been taken to curb it with some success.
Maps Spamdexing
Content spam
These techniques involve changing the logical view that search engines have over the content of the page. They all aim at a variant of the vector space model for retrieving information on text collections.
Keyword stuffing
Keyword stuffing involves placing keywords counted in a page to increase the number of keywords, variations, and page density. This is useful for making the page appear relevant to web crawlers in a way that makes it more likely to be found. Example: The Ponzi scheme promoter wants to attract web surfers to the site where he advertises his fraud. He puts the hidden text appropriate for popular music group fan pages on his page, hoping the page will be listed as a fan site and receive many visits from music lovers. Older versions of the indexing program simply count how often a keyword appears, and use it to determine the relevancy level. Most modern search engines have the ability to analyze pages for keyword stuffing and determine whether frequencies are consistent with other sites created specifically to attract search engine traffic. Also, large web pages are truncated, so a large dictionary list can not be indexed on a single web page.
Hidden or invisible text
Unrelated hidden text is camouflaged by making it the same color as the background, using a small font size, or hiding it in HTML code like the "no frame" section, zero alt attribute, zero-size DIV and "no script" section. People who filter websites for search engine companies may temporarily or permanently block entire websites because they have invisible text on some of their pages. However, hidden text is not always spamdexing: it can also be used to improve accessibility.
Meta tagger â ⬠<â â¬
This involves repeating keywords in a meta tag, and using meta keywords that are not related to the content of the site. This tactic has not been effective since 2005.
page Doorway
"Gateway" or doorway pages are low quality web pages that are made with very little content but are filled with very similar keywords and phrases. They are designed to rank highly in search results, but do not serve the purpose for visitors seeking information. Generally doorway pages will have "click here to enter" on the page. In 2006, Google overthrew BMW for using a "doorway page" into the company's German website, BMW.de.
scraper sites
Scraper sites are created using programs designed to "scrape" search engine results pages or other content sources and create "content" for websites. Presentation of specific content on these sites is unique, but merely a combination of content taken from other sources, often without permission. Such websites are generally full of advertisements (such as pay-per-click ads), or they redirect users to other sites. It is even feasible for scraper sites to outperform the original website for information and the name of their own organization.
Spinning articles
Spinning articles involves rewriting existing articles, as opposed to simply eroding content from other sites, to avoid penalties imposed by search engines for duplicate content. This process is performed by a hired or automated writer using a thesaurus database or neural network.
Machine translation
Similar to article spinning, some sites use machine translation to render their content in multiple languages, without human editing, producing text that can not be understood.
Publishing web pages containing information unrelated to the title is a misleading practice known as fraud. Although the target for a penalty from a reputable search engine is page rank, fraud is a common practice across several types of sites, including dictionary sites and encyclopedias.
Link spam
Spam links are defined as links between existing pages for reasons other than eligible. Link spam takes advantage of link-based ranking algorithms, which rank higher on the website, the more other high-ranking sitelinks. These techniques also aim to influence other link-based ranking techniques such as the HITS algorithm. There are many different types of spam links, built for positive and negative ranking effects on websites. (See Google penalty Ã,ç Negative SEO).
Link generation software
A common form of link spam is the use of link-linking software to automate the search engine optimization process.
Link farm
Link link is a very tight network of websites that are connected to each other for the sole purpose of trapping search engine ranking algorithms. It's also known as humorous as a community of shared admiration. The use of link farms has been greatly reduced after Google launched its first Panda Update in February 2011, which introduced significant improvements in the spam detection algorithm.
Personal blog networks
A blog network (PBN) is a group of authoritative websites used as a contextual link source that leads to the owner's primary website to achieve a higher search engine ranking. The owner of the PBN website uses an expired domain or an auction domain that has a link back from a high authority website. Google targets and penalizes multiple PBN users with several major deindexing campaigns since 2014.
Hidden links
Placing hyperlinks where visitors will not see them to increase link popularity. Highlighted link text can help rank higher webpages for the phrase match.
Sybil attack
Blog spam is a blog created solely for commercial promotion and the passage of link authority to the target site. Often these "splogs" are designed in a misleading way that will give effect to legitimate websites but on close inspection will often be written using spinning software or very poorly written and barely legible content. They are similar in nature to link farms.
Spam guest blog
Spam guest blog is the process of placing a guest blog on a website just to get links to other websites or websites. Unfortunately it is often confused with legitimate guest blogging forms with other motives besides placing links. Made famous by Matt Cutts openly declared a "war" against this link spam method.
Buy expired domains
Some spammers using expired domain crawler devices or monitor DNS records for domains that will expire soon, then buy them when they expire and replace pages with links to their pages. However, it's possible but does not confirm that Google resets link data on expired domains. To retain all previous Google rankings for that domain, it is recommended that the buyer retrieve the domain before the domain is "dropped". Some of these techniques can be applied to create a Google bomb - that is, to work with other users to improve a given page's rank for a particular query.
Filling cookies
Cookie stuffing involves placing an affiliate tracking cookie on a website visitor's computer without their knowledge, which then generates revenue for the person doing the cake stuffing. It not only generates fake affiliate sales, but also has the potential to override other affiliate cookies, basically stealing commissions earned legally.
Using a writable page in the world
The user-editable website can be used by spamdexers to include links to spam sites if proper anti-spam measures are not taken.
Automatic spambots can quickly create parts of the site that users can edit for unusable use. Programmers have developed a variety of automatic spam prevention techniques to block or at least slow down spambots.
Spam on blog
Spam on blogs is a placement or random link to another site, placing the desired keyword into the hyperlinked inbound link text. Guest books, forums, blogs, and any sites that receive visitor comments are specific targets and are often the victims of spaming drives where automated software makes unreasonable posts with normally irrelevant and unwanted links.
Comment spam
Comment spam is a form of link spam that appears on web pages that allow dynamic user editing like wikis, blogs, and guestbooks. This can be a problem because agents can be written that automatically randomly select user edited web pages, such as Wikipedia articles, and add spam links.
Spam wiki
Wiki spam is a form of link spam on the wiki page. Spammers use open wiki system edit capabilities to place links from wiki sites to spam sites. The subject of the spam site is often unrelated to the wiki page on which the link was added.
Referrer log spamming
Spammers occur when spammers or facilitators access a web page ( umpire ), by following links from other web pages ( referrers ), so referees are given referrer addresses by the person's Internet browser. Some websites have a referrer log indicating which pages link to the site. By having a robot randomly accessing many sites fairly often, with a particular message or address provided as a referrer, the message or Internet address then appears in the referrer logs of the websites that have the referrer log. Because some Web search engines base the importance of sites on the number of different sites that connect with them, referrer-log spam can improve search engine spammers' site rankings. Additionally, site administrators who view their referrer log entries in their logs may follow links back to the spammer referrer page.
Countermeasures
Due to the amount of spam posted to a user-editable web page, Google proposes a nofollow tag that can be embedded with links. Linked search engines, such as Google's PageRank system, will not use links to increase the score of linked websites if they carry nofollow tags. This ensures that spam links to user-editable websites will not increase the ranking of sites with search engines. Nofollow is used by some major websites, including WordPress, Blogger, and Wikipedia.
Other types
Mirror website
The mirror site is hosting multiple websites with similar conceptual content but using different URLs. Some search engines rank higher to the results where searchable keywords appear in the URL.
URL redirection
URL redirects are taking users to other pages without their intervention, for example. , using META refresh tags, Flash side redirects, JavaScript, Java, or Server. However, 301 Redirects, or permanent redirects, are not considered malicious behavior.
Cloaking
Cloaking refers to one of several ways to serve pages to search engine spiders that are different from those seen by human users. This can be an attempt to mislead search engine-related content on certain websites. Cloaking, however, may also be used to improve the site's accessibility ethically for users with disabilities or to provide content that human users can not process or decompile. It is also used to deliver content based on user location; Google itself uses IP shipping, a form of cloaking, to deliver results. Another form of cloaking is swapping code , i.e , optimizing the page for the top ranking and then swapping other pages in place after the top ranking is reached. Google refers to this type of redirection as Sneaky Redirects .
Overall plan
By search engine manager
Spamdexed pages are sometimes deleted from search results by search engines.
By search engine users
Users can create search keywords, for example, the previous keyword "-" (minus) will remove the site containing the keywords on their page or in their domain from the page URL of the search results. For example, the search keyword "-naver" will remove the site containing the word "naver" on their page and the page whose domain contains the "naver" URL.
Google Chrome Extensions
Google itself launched Google Chrome extension "Personal Blocklist (by Google)" in 2011 as part of prevention action against content farming. By 2018, extensions only work with Google Chrome PC version.
See also
- Take hostile information
- Index (search engine) - search engine indexing overview
- TrustRank
- Web scraping
References
External links
Other tools and information for webmaster
- AIRWeb Series from a workshop on Adversarial Information Retrieval on the Web
Source of the article : Wikipedia
