Library harvest costs websites dear

Last updated 23:46 26/10/2008

Relevant offers

Website owners are complaining the National Library's attempt to "harvest" more than 100 million New Zealand webpages, some of which will be preserved in its new $24 million digital archive, has slowed access to websites and cost them money.

The archive, which opens on Thursday, will let people access historically significant webpages, electronic files and scanned documents from the library's website.

The National Library contracted a United States company to capture a "snapshot" of the Internet in New Zealand.

The director of web hosting company Netspace Services, Gerard Creamer, says the use of an overseas harvester resulted in costly spikes in his customers' international traffic.

One customer is likely to cop a $2500 bill for the month and harvesting had caused congestion on servers as they struggled to respond, he says. "We've had to call in people in the middle of the night to look at our machines."

He says the library's decision to ignore the robots.txt protocol increased the amount of international traffic websites would have to pay for. Website owners can use the protocol to instruct search engines and web harvesters not to index certain files, such as large data files.

Message boards run by the New Zealand Network Operators Group ran hot last week as members expressed their dissatisfaction. The library posted information about the harvest on its website but did not notify website owners of its plans.

The National Library says it had a legal mandate to preserve New Zealand's social and cultural history, including websites, blogs and YouTube videos. Its digital archive will preserve webpages so they can still be accessed even if the original technology used to create them becomes obsolete. The Government granted the library $24 million in 2004 to develop the archive and $5.3 million to cover annual costs.

Heavy traffic on servers was a potential side-effect of harvesting but the library said it had only received about 10 complaints. It did not notify website owners of the harvest because "it could not see a good way to do so without effectively becoming spammers", but conceded it could have communicated better.

The National Library says it is using the US harvester because it is the most-experienced. "We hope that after observing the experts at work we'll be able to manage future harvests from within New Zealand."

The library ignored the robots.txt protocol because it wanted to "harvest as full a snapshot of the `.nz' domain as possible", but website owners could request the harvester observe the protocol.

Ad Feedback

 

- © Fairfax NZ News

Special offers

Featured Promotions

Sponsored Content