Appendix H: List of internet robots, crawlers and spiders

Note: The main Code of Practice document takes precedence in the case of any conflicts between it and this appendix.

The growing use of internet robots, crawlers and spiders has the potential to artificially inflate usage statistics. Only genuine, user-driven usage should be reported in COUNTER usage reports. Usage of full text articles that is initiated by automatic or semi-automatic bulk download tools, such as RightFind or PubHive should only be recorded when the user has clicked on the downloaded full-text article in order to open it.

Activity generated by internet robots, crawlers and spiders must be excluded from all COUNTER usage reports.

This list of internet robots, crawlers and spiders was published in April 2016 and last updated March 2023. Please note it is rationalised, removing some previously redundant entries (e.g. the text ‘bot’ - msnbot, awbot, bbot, turnitinbot, etc. - which is now collapsed down to a single entry ‘bot’).

The list is displayed below and also available on the COUNTER Robots repository.

This page will always show the readme and give potential users and contributors of the list more information on how to integrate the list.

For further information on regular expression matching, see Regular expressions.

Please let us know of any user agents that should be included in this list or to suggest other amendments.