Skip User Agents
For a relevant customer experience, it is unnecessary that crawlers like those used by Google, Yahoo, Bing and others get targeted pages. In addition it is undesirable that crawlers influence Scoring and Normalization averages. The same holds for link checker tools and services, like WatchMouse, W3C-checklink or Xenu Link Sleuth. Also, in the Real-Time Visitor Analysis screen it is most likely not desirable that these kind of crawler or linkchecker visitors are shown. From a crawler's point of view, it also doesn't make sense that it indexes a targeted page. For example the stores nearby block on the right of a page doesn't add any value to be indexed by search engines.
Hence, by default, we do not target these kind of requests at all. By default, the Relevance Module ignores requests from a large set of commonly used crawlers and link checkers. This is done by checking the request User-Agent header whether it contains some string that indicates that it is a robot or link checker. You can add extra user agents to skip in the repository at the multi-valued property at: /targeting:targeting/targeting:skipUserAgents
It pays to take a look at the collected request log data to exclude agents that do not need to be targeted.
The default list of skipped user agents is the as below, largely taken from http://www.useragentstring.com/pages/useragentstring.php
-
ABACHOBot
-
Accoona-AI-Agent
-
AddSugarSpiderBot
-
AnyApexBot
-
Arachmo
-
B-l-i-t-z-B-O-T
-
Baiduspider
-
BecomeBot
-
BeslistBot
-
BillyBobBot
-
Bimbot
-
Bingbot
-
BlitzBOT
-
boitho.com-dc
-
boitho.com-robot
-
btbot
-
CatchBot
-
Cerberian Drtrs
-
Charlotte
-
ConveraCrawler
-
cosmos
-
Covario IDS
-
DataparkSearch
-
DiamondBot
-
Discobot
-
Dotbot
-
EARTHCOM.info
-
EmeraldShield.com WebBot
-
envolk[ITS]spider
-
EsperanzaBot
-
Exabot
-
FAST Enterprise Crawler
-
FAST-WebCrawler
-
FDSE robot
-
FindLinks
-
FurlBot
-
FyberSpider
-
g2crawler
-
Gaisbot
-
GalaxyBot
-
genieBot
-
Gigabot
-
Girafabot
-
Googlebot
-
Googlebot-Image
-
Googlebot-Mobile
-
Googlebot-News
-
Googlebot-Video
-
GurujiBot
-
HappyFunBot
-
hl_ftien_spider
-
Holmes
-
htdig
-
iaskspider
-
ia_archiver
-
iCCrawler
-
ichiro
-
igdeSpyder
-
IRLbot
-
IssueCrawler
-
Jaxified Bot
-
Jyxobot
-
KoepaBot
-
L.webis
-
LapozzBot
-
Larbin
-
LDSpider
-
LexxeBot
-
Linguee Bot
-
LinkWalker
-
lmspider
-
lwp-trivial
-
mabontland
-
magpie-crawler
-
Mediapartners-Google
-
MJ12bot
-
MLBot
-
Mnogosearch
-
mogimogi
-
MojeekBot
-
Moreoverbot
-
Morning Paper
-
msnbot
-
MSRBot
-
MVAClient
-
mxbot
-
NetResearchServer
-
NetSeer Crawler
-
NewsGator
-
NG-Search
-
nicebot
-
noxtrumbot
-
Nusearch Spider
-
NutchCVS
-
Nymesis
-
obot
-
oegp
-
omgilibot
-
OmniExplorer_Bot
-
OOZBOT
-
Orbiter
-
PageBitesHyperBot
-
Peew
-
polybot
-
Pompos
-
PostPost
-
Psbot
-
PycURL
-
Qseero
-
Radian6
-
RAMPyBot
-
RufusBot
-
SandCrawler
-
SBIder
-
ScoutJet
-
Scrubby
-
SearchSight
-
Seekbot
-
semanticdiscovery
-
Sensis Web Crawler
-
SEOChat::Bot
-
SeznamBot
-
Shim-Crawler
-
ShopWiki
-
Shoula robot
-
silk
-
Sitebot
-
Snappy
-
sogou spider
-
Sosospider
-
Speedy Spider
-
Sqworm
-
StackRambler
-
suggybot
-
SurveyBot
-
SynooBot
-
Teoma
-
TerrawizBot
-
TheSuBot
-
Thumbnail.CZ robot
-
TinEye
-
truwoGPS
-
TurnitinBot
-
TweetedTimes Bot
-
TwengaBot
-
updated
-
Urlfilebot
-
Vagabondo
-
VoilaBot
-
Vortex
-
voyager
-
VYU2
-
webcollage
-
Websquash.com
-
wf84
-
WoFindeIch Robot
-
WomlpeFactory
-
Xaldon_WebSpider
-
yacy
-
Yahoo! Slurp
-
Yahoo! Slurp China
-
YahooSeeker
-
YahooSeeker-Testing
-
YandexBot
-
YandexImages
-
YandexMetrika
-
Yasaklibot
-
Yeti
-
YodaoBot
-
yoogliFetchAgent
-
YoudaoBot
-
Zao
-
Zealbot
-
zspider
-
ZyBorg
-
AbiLogicBot
-
Link Valet
-
Link Validity Check
-
LinkExaminer
-
LinksManager.com_bot
-
Mojoo Robot
-
Notifixious
-
online link validator
-
Ploetz + Zeller
-
Reciprocal Link System PRO
-
REL Link Checker Lite
-
SiteBar
-
Vivante Link Checker
-
WatchMouse
-
W3C-checklink
-
Xenu Link Sleuth