Skip to main content

Information collected on and after July 11, 2024 will be used in accordance with the purpose of collection stated below. Click here for details on how information collected prior to this time will be handled.

About ICC-Crawler

ICC-Crawler is a program called a "crawler" that automatically crawls the Internet and collects web pages. ICC-Crawler is operated by the Universal Communication Research Institute at the National Institute of Information and Communications Technology (NICT).

We take great care to ensure that ICC-Crawler does not cause problems for target hosts during collection. In the unlikely event that ICC-Crawler does cause a problem, please contact us using the contact information below. We will immediately stop collecting from the target host.

Collection policy

  • We do not place an excessive load on target hosts during collection.
    To reduce the load on target hosts, we monitor the number of connections per unit of time during collection, in order to avoid making excessive connections. We adjust connection schedules to the extent possible so as not to place an excessive load on target hosts during collection, even if a single target host uses multiple IP addresses or hostnames.
  • We comply with any instructions given in robots.txt files.
    ICC-Crawler scans the robots.txt file published by the target host, and complies with access restrictions set on the target host.

    If Crawl-Delay has been set in the robots.txt file, we will access the host using either the set access interval time or the minimum access interval time set for the crawler, whichever is larger.
  • If we receive a request not to access a host, we will no longer access that host.
    If we receive a request not to access a host or IP address, we will ensure it is not accessed.
  • We comply with our purpose of use.
    Any information collected will be used within the scope of the purpose of use NICT has defined.

Purpose of page collection

NICT uses the information it collects for research and development of advanced information processing technologies including multilingual translation, information analysis, and artificial intelligence technologies. NICT also uses this information for other related activities.

NICT may also provide collected information and/or the products of its research or joint research conducted using collected information to third parties (including private companies and public institutions; the same hereafter), to the extent permitted by law, for joint research with third parties, research and development by third parties, or the use of its research products by third parties. Third parties who are provided with information or research products will use these for their own project purposes.

Refusing collection

1. Use robots.txt
ICC-Crawler supports the REP (Robots Exclusion Protocol). For further information on the REP, please refer here (RFC9309).
Adding the following rule to robots.txt could disable collection from all pages.
User-agent: ICC-Crawler
Disallow: /
Moreover, adding the following rule to robots.txt could disable collection from the designated path and below, as well as from certain types of files within the specified path.
User-agent: ICC-Crawler
Disallow: /contact/
Disallow: /*.jpg
Furthermore, upon adding the following rule to robots.txt to disable collection from all pages, collection could be enabled from the designated path and below, as well as from certain types of files within the specified path.
User-agent: ICC-Crawler
Disallow: /
Allow: /product/
Allow: /service/*.html
2. If access continues even after adding the setting described in 1
Please contact us if access by ICC-Crawler continues even after adding the setting described in 1. We will take measures to stop collecting from the host.

Crawler IP addresses

202.180.34.186
61.86.246.72

Contact information

Common Infrastructure Group, General Planning Office,
Universal Communication Research Institute, National Institute of Information and Communications Technology
Phone: +81-774-98-6300, FAX: +81-774-98-6955