The IWF Metadata Harvester Website

Author: Laurence D. Finston.

The following copyright notice applies to the text and source code of this web site, and any graphics that may appear on it. The software described in this text has its own copyright notice and license, which can be found in the distribution itself.

Copyright (C) 2006, 2007 IWF Wissen und Medien gGmbH

Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 2 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of this license is included in the file COPYING.TXT

Last updated: February 12, 2007


Table of Contents

Top
An Important Announcement
Introduction
Supported Platforms and Portability
Late Breaking News
Old News
Distribution
Documentation
OAI (Open Archives Initiative)
dc_test (Database)
ATest (C++ Program)
Z39.50 (Pica)
PICA_DB (Database)
ZTest (C++ Program)
Scantest (C++ Program)
LO_DB (Database)
Mailing List
Links
Contact

Back to top

An Important Announcement

2007.02.12.
The author will no longer be maintaining this website, i.e., the IWF Metadata Harvester website.
Please see the LDF Metadata Exchange Utilities website instead.


The IWF Metadata Harvester was developed by the author while participating in a programm sponsored by the German government at the IWF Wissen und Media gGmbH, Göttingen, Germany. Since this program ended on January 31, 2007, he is developing a new package based on the IWF Metadata Harvester under the name LDF Metadata Exchange Utilities.

The author would like to express his appreciation to the IWF Wissen und Media gGmbH for permitting him to publish his work there under the GNU General Public License and the GNU Free Documentation License.


Back to contents
Back to top

Introduction

2007.02.12.
Please see An Important Announcement.

The IWF Metadata Harvester is a package for retrieving metadata from servers, writing it to databases, and representing it in human-readable form. It currently retrieves data from servers using two different sets of standards: The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and Z39.50.

OAI servers provide records in the form of XML files in a format based on the Dublin Core standard. Z39.50 servers, on the other hand, can provide records in a variety of formats, the most common of which is USMARC. The IWF Metadata Harvester can currently only process records from z39.50 servers in the Pica format, which is widespread in the Netherlands and Germany. In the future, I hope to extend it to process records in USMARC and other formats as well.

The IWF Metadata Harvester consists of 3 programs written in C++ and three databases:

ATest: C++ program for retrieving data from OAI servers (see ATest)
dc_test: Database for storing the data retrieved by ATest (see dc_test)
ZTest: C++ program for retrieving data in Pica format from Z39.50 servers (see ZTest)
PICA_DB: Database for storing the data retrieved by ZTest (see PICA_DB)
Scantest: C++ program implementing an interpreter for controlling the other C++ programs and accessing the databases. (see Scantest)
LO_DB: A database with a reduced set of tables and fields for use in the “Lectures Online” Project of the IWF Wissen und Medien gGmbH (see LO_DB)

Supported Platforms and Portability

The programs ATest and ZTest were developed using Microsoft Visual C++ in Microsoft Visual Studio under Microsoft Windows and the databases dc_test, PICA_DB and LO_DB were developed using Microsort SQL Servers. The program Scantest was also originally developed using MS Visual C++, but it is currently being developed using the GNU Compiler Collection (GCC) under Windows. Unlike ATest and ZTest, Scantest uses only Standard C++ and the C++ Standard Template Library.


Back to contents
Back to top

Late Breaking News

2007.02.12.
Please see An Important Announcement.

2007.01.24.
All variables, functions, and parser rules in the Scantest package are now documented its Texinfo manual. However, at present, there are few explanations. I will try to add more as soon as possible.

The Scantest Manual in HTML format
The Scantest Manual in PDF format.
See Documentation for other formats.


Back to contents
Back to top

Old News


Back to contents
Back to top

Distribution

The source code for the IWF Metadata Harvester is available from CVS repository for this project at the Savannah developers' website. The main web page for the IWF Metadata Harvester at Savannah is here.

Snapshots of the individual sub-packages are available in the form of compressed archive files (gzipped tar files):
ATest:  atestsnp.tar.gz
ZTest:  ztestsnp.tar.gz
Scantest:  sctstsnp.tar.gz

Back to contents
Back to top

Documentation

There two user manuals for the IWF Metadata Harvester. The first documents the programs ATest and ZTest and the databases dc_test and PICA_DB. The second documents the program Scantest.

User manual I in HTML format
User manual I in HTML format, compressed (gzipped) for downloading
User manual I in PDF format
User manual I in PDF format, compressed (gzipped) for downloading
User manual I in PostScript format, compressed (gzipped) for downloading
User manual I in DVI format, compressed (gzipped) for downloading
The current complete Texinfo sources for User Manual I are available in the form of a snapshot: txinfsnp.tar.gz.

User manual for Scantest in HTML format for browsing.
User manual for Scantest in PDF format.
User manual for Scantest in HTML format, compressed (gzipped) for downloading.
User manual for Scantest in PDF format, compressed (gzipped) for downloading.
User manual for Scantest in PostScript format, compressed (gzipped) for downloading.
User manual for Scantest in DVI format, compressed (gzipped) for downloading.
User manual for Scantest in Info format, compressed (gzipped) for downloading.
The current complete Texinfo sources for the Scantest manual are contained in the snapshot of the complete sub-package: sctstsnp.tar.gz. They are in the subdirectory Scantest-1.0/DOC/TEXINFO/.

Back to contents
Back to top

OAI (Open Archives Initiative)


Back to contents
Back to top

dc_test (Database)


Back to contents
Back to top

ATest (C++ Program)


Back to contents
Back to top

Z39.50 (Pica)


Back to contents
Back to top

PICA_DB (Database)


Back to contents
Back to top

ZTest (C++ Program)


Back to contents
Back to top

ScanTest (C++ Program)


Back to contents
Back to top

LO_DB (Database)


Back to contents
Back to top

Mailing List

The iwf-mdh-help mailing list is available for users to ask questions and get help. The address is iwf-mdh-help-*-AT-*-iwf-mdh-help@nongnu.org (Replace -*-AT-*- with @ for the email address). However, please note that you must subscribe to the list in order to be able to post to it. This is to prevent spam being sent to this list. In addition, all postings from non-subscribers are discarded with no notification, so that innocent parties, whose addresses may be being used by spammers, will not receive erroneous rejection notifications. You can subscribe to the iwf-mdh-help mailing list here.
Back to contents
Back to top

Links


Back to contents
Back to top

Contact

If you want to contact me about the IWF Metadata Harvester please put “IWF Metadata Harvester” or something similar in the subject line of your email. Otherwise, it's likely to be filtered.

If you want to encrypt an email to me, you can use my public key .
Fingerprint: 0007 566D B0E0 96AE 3F4A EBE5 6213 D0F0 7376 08BA

Laurence Finston
Kreuzbergring 41
D-37075 Göttingen
Germany
email: lfinsto1-*-AT-*-gwdg.de
s246794-*-AT-*-stud.uni-goettingen.de
Please use only one address at a time! (Replace -*-AT-*- with @ for the email addresses.)

IWF Wissen und Medien gGmbH
Nonnenstieg 72
37075 Göttingen
Germany


Back to contents
Back to top