日本語ページ

 

ICC-Crawler Introduction

 ICC-Crawler is operated by Universal Communication Research Institute (UCRI) at NICT. The main goal behind developing the crawler is to collect Web pages for researches related to Web-search and data mining. Recently, we are planning to use it for crawling weblogs too. The crawler is used by the members of UCRI at NICT to crawl Web-pages only for the research purposes. Our crawling policy distinctly respects the general crawling norm. Though we duly understand the concern of the webmasters, we would like to assure that our crawler is only crawling pages for performing researches and not for any business use. Please have a glance at our crawling policy for better understanding. We sincerely appreciate your cooperation and support.

Policy

Our crawler always respects the common crawling norm as like following:

  • It always reads the "robots.txt" and never crawls the restricted pages.

User-agent: *
Disallow: /cgi-bin

User-agent: ICC-Crawler
Disallow: /

  • Given Crawl-Delay in /robots.txt, our crawler will connect every "Crawl-Delay" time. Otherwise, the rate of access will be controlled so that the crawler does not inflict excessive load on the accessed servers.
  • In case, anyone wants his/her pages not to be crawled at all, if he/she kindly contact us, we will make sure that it is properly respected from then onwards.

Information on Current Crawling

IP addresses of currently crawling machines are:
202.180.34.186
61.86.246.72

Goal

We would like to clarify again that our crawler is collecting pages solely for research purposes. We are interested in crawling large volume of pages for following ongoing researches at our group:

  • Construction of Web Archive
  • Collection of research data for high level information processing technology such as Multi-language Translation, Information Analysis and so forth.

Contact

Strategic Information Gathering Common Infrastructure Team
Planning Office
Universal Communication Research Institute
National Institute of Information and Communications Technology
Phone. +81-774-98-6300 Fax. +81-774-98-6955