ckanext.oaipmh package

Submodules

ckanext.oaipmh.cmdi module

class ckanext.oaipmh.cmdi.CMDIHarvester(**kwargs)[source]

Bases: ckanext.oaipmh.harvester.OAIPMHHarvester

client = None
gather_stage(harvest_job)[source]

See OAIPMHHarvester.gather_stage()

info()[source]

See ;meth:ckanext.harvest.harvesters.base.HarvesterBase.info.

md_format = 'cmdi0571'
on_deleted(harvest_object, header)[source]

See OAIPMHHarvester.on_deleted() Mark package for deletion.

parse_xml(f, context, orig_url=None, strict=True)[source]

ckanext.oaipmh.cmdi_reader module

class ckanext.oaipmh.cmdi_reader.CmdiReader(provider=None)[source]

Bases: object

Reader for CMDI XML data

namespaces = {'cmd': 'http://www.clarin.eu/cmd/', 'oai': 'http://www.openarchives.org/OAI/2.0/'}
read(xml)[source]

Extract package data from given XML. :param xml: xml element (lxml) :return: oaipmh.common.Metadata object generated from xml

read_data(xml)[source]

Extract package data from given XML. :param xml: xml element (lxml) :return: dictionary

exception ckanext.oaipmh.cmdi_reader.CmdiReaderException[source]

Bases: exceptions.Exception

Reader exception is thrown on unexpected data or error.

ckanext.oaipmh.controller module

Serving controller interface for OAI-PMH

class ckanext.oaipmh.controller.OAIPMHController[source]

Bases: ckan.lib.base.BaseController

Controller for OAI-PMH server implementation. Returns only the index page if no verb is specified.

index()[source]

Return the result of the handled request of a batching OAI-PMH server implementation.

ckanext.oaipmh.harvester module

class ckanext.oaipmh.harvester.OAIPMHHarvester(**kwargs)[source]

Bases: ckanext.harvest.harvesters.base.HarvesterBase

OAI-PMH Harvester

fetch_stage(harvest_object)[source]

The fetch stage will receive a HarvestObject object and will be responsible for: - getting the contents of the remote object (e.g. for a CSW server, perform a GetRecordById request). - saving the content in the provided HarvestObject. - creating and storing any suitable HarvestObjectErrors that may occur. - returning True if everything went as expected, False otherwise.

Parameters:harvest_object – HarvestObject object
Returns:True if everything went right, False if errors were found
fetch_xml(url, context)[source]

Get xml for import. Shortened from fetch_stage()

Parameters:
  • url – the url for metadata file
  • type – string
Returns:

a xml file

Return type:

string

gather_stage(harvest_job)[source]

The gather stage will receive a HarvestJob object and will be responsible for: - gathering all the necessary objects to fetch on a later. stage (e.g. for a CSW server, perform a GetRecords request) - creating the necessary HarvestObjects in the database, specifying the guid and a reference to its job. The HarvestObjects need a reference date with the last modified date for the resource, this may need to be set in a different stage depending on the type of source. - creating and storing any suitable HarvestGatherErrors that may occur. - returning a list with all the ids of the created HarvestObjects.

Parameters:harvest_job (HarvestJob) – HarvestJob object
Returns:A list of HarvestObject ids
get_package_ids(set_ids, config, last_time, client)[source]

Get package identifiers from given set identifiers.

import_stage(harvest_object)[source]

The import stage will receive a HarvestObject object and will be responsible for: - performing any necessary action with the fetched object (e.g create a CKAN package). Note: if this stage creates or updates a package, a reference to the package should be added to the HarvestObject. - creating the HarvestObject - Package relation (if necessary) - creating and storing any suitable HarvestObjectErrors that may occur. - returning True if everything went as expected, False otherwise.

Parameters:harvest_object – HarvestObject object
Returns:True if everything went right, False if errors were found
info()[source]

Harvesting implementations must provide this method, which will return a dictionary containing different descriptors of the harvester. The returned dictionary should contain:

  • name: machine-readable name. This will be the value stored in the database, and the one used by ckanext-harvest to call the appropiate harvester.
  • title: human-readable name. This will appear in the form’s select box in the WUI.
  • description: a small description of what the harvester does. This will appear on the form as a guidance to the user.

A complete example may be:

{
    'name': 'csw',
    'title': 'CSW Server',
    'description': 'A server that implements OGC's Catalog Service
                    for the Web (CSW) standard'
}
Returns:A dictionary with the harvester descriptors
md_format = 'oai_dc'
metadata_registry(config, harvest_job)[source]
on_deleted(harvest_object, header)[source]

Called when metadata is deleted from server. Return False if dataset is ignored.

parse_xml(f, context, orig_url=None, strict=True)[source]

Parse XML and return package data dictionary.

Parameters:
  • f – data as string
  • context – CKAN context
  • orig_url – orgininal URL
  • strict – No used here, required by caller
Returns:

package dictionary (used for package creation)

populate_harvest_job(harvest_job, set_ids, config, client)[source]
validate_config(config)[source]

[optional]

Harvesters can provide this method to validate the configuration entered in the form. It should return a single string, which will be stored in the database. Exceptions raised will be shown in the form’s error messages.

Parameters:harvest_object_id – Config string coming from the form
Returns:A string with the validated configuration options

ckanext.oaipmh.ida module

class ckanext.oaipmh.ida.IdaHarvester(**kwargs)[source]

Bases: ckanext.oaipmh.harvester.OAIPMHHarvester

OAI-PMH Harvester

gather_stage(harvest_job)[source]

See OAIPMHHarvester.gather_stage()

info()[source]

See ckanext.harvest.harvesters.base.HarvesterBase.info().

md_format = 'oai_dc'
parse_xml(f, context, orig_url=None, strict=True)[source]

ckanext.oaipmh.importcore module

ckanext.oaipmh.importcore.dummy_metadata_reader(xml_element)[source]

A test metadata reader that always returns the same metadata

Parameters:xml_element (any) – XML input
Returns:metadata dictionary
Return type:oaipmh.common.Metadata instance
ckanext.oaipmh.importcore.generic_rdf_metadata_reader(xml_element)[source]

Transform RDF/XML documents into metadata dictionaries

This function takes an RDF document in XML format, transforms it into an RDF graph, and traverses that graph to find all nodes in the graph and give them namepaths.

Parameters:xml_element (lxml.etree.Element instance) – RDF/XML document
Returns:metadata dictionary
Return type:oaipmh.common.Metadata instance
ckanext.oaipmh.importcore.generic_xml_metadata_reader(xml_element)[source]

Transform XML documents into metadata dictionaries

Parameters:xml_element (lxml.etree.Element) – XML document
Returns:metadata dictionary with all the content of xml_element
Return type:oaipmh.common.Metadata
ckanext.oaipmh.importcore.is_reverse_relation(rel1, rel2)[source]

Tells whether two elements are mutual reverses

Parameters:
  • rel1 (string) – name of relation
  • rel2 (string) – name of relation
Returns:

whether rel1 and rel2 are reverse relations

Return type:

boolean

ckanext.oaipmh.importcore.namepath_for_element(prefix, name, indices, md)[source]

Helper function to form name paths

This function takes a prefix and name and concatenates them into a “name path”. As a side effect, it also counts the elements with a same name path and gives them unique indices, and marks the count of such elements in the metadata dictionary.

Parameters:
  • prefix (string) – the namepath of the parent element
  • name (string) – the name of the current element
  • indices (a hash from strings to integers (inout)) – a hash to keep counts
  • md (a hash from strings to any type (inout)) – a dictionary of metadata keys (namepaths) and values
Returns:

a new namepath with name appended to prefix

Return type:

string

ckanext.oaipmh.importcore.namespaced_name(name, namespaces)[source]

Substitutes a namespace prefix in a URL with its short form.

Parameters:
  • name (string) – the URL
  • namespaces (list of (string, string)) – a list of (short prefix, long prefix) pairs
Returns:

the URL, with a short prefix

Return type:

string

ckanext.oaipmh.importformats module

ckanext.oaipmh.importformats.ExceptReturn(exception, returns)[source]
ckanext.oaipmh.importformats.copy_element(source, dest, md, callback=None)[source]

Copy element in metadata dictionary from one key to another

This function changes the metadata dictionary, md, by copying the value corresponding to key source to the value corresponding to the key dest. It also copies all elements if it is an indexed element, and language information that pertains to the copied element. The parameter callback, if given, is called with any element names formed (indexed or no).

Parameters:
  • source (string) – key to be copied
  • dest (string) – key to copy to
  • md (hash from string to any value (inout)) – a metadata dictionary to update
  • callback (function of (string, string) -> None) – optional callback function, called with source, dest and their indexed versions
ckanext.oaipmh.importformats.create_metadata_registry(harvest_type=None, service_url=None)[source]

Return new metadata registry with all common metadata readers

The readers currently implemented are for metadataPrefixes oai_dc, nrd, rdf and xml.

Returns:metadata registry instance
Return type:oaipmh.metadata.MetadataRegistry
ckanext.oaipmh.importformats.nrd_metadata_reader(xml)[source]

Read metadata in NRD schema

This function takes NRD metadata as an lxml.etree.Element object, and returns the same metadata as a dictionary, with central TTA elements picked to format-independent keys.

Parameters:xml (lxml.etree.Element instance) – RDF metadata as XML-encoded NRD
Returns:a metadata dictionary
Return type:a hash from string to any value
ckanext.oaipmh.importformats.person_attrs(source, dest, result)[source]

Callback for copying person attributes

ckanext.oaipmh.oai_dc_reader module

class ckanext.oaipmh.oai_dc_reader.DcMetadataReader(xml)[source]
read()[source]

Parse metadata and return metadata (oaipmh.common.Metadata) with unified dictionty.

class ckanext.oaipmh.oai_dc_reader.DefaultDcMetadataReader(xml)[source]

Bases: ckanext.oaipmh.oai_dc_reader.DcMetadataReader

class ckanext.oaipmh.oai_dc_reader.IdaDcMetadataReader(xml)[source]

Bases: ckanext.oaipmh.oai_dc_reader.DcMetadataReader

ckanext.oaipmh.oai_dc_reader.dc_metadata_reader(harvest_type)[source]

Get correct reader for given harvest_type. Currently supports ‘ida’ or ‘default’.

ckanext.oaipmh.oaipmh_server module

OAI-PMH implementation for CKAN datasets and groups.

class ckanext.oaipmh.oaipmh_server.CKANServer[source]

Bases: oaipmh.common.ResumptionOAIPMH

A OAI-PMH implementation class for CKAN.

getRecord(metadataPrefix, identifier)[source]

Simple getRecord for a dataset.

identify()[source]

Return identification information for this server.

listIdentifiers(metadataPrefix, set=None, cursor=None, from_=None, until=None, batch_size=None)[source]

List all identifiers for this repository.

listMetadataFormats()[source]

List available metadata formats.

listRecords(metadataPrefix, set=None, cursor=None, from_=None, until=None, batch_size=None)[source]

Show a selection of records, basically lists all datasets.

listSets(cursor=None, batch_size=None)[source]

List all sets in this repository, where sets are groups.

ckanext.oaipmh.plugin module

class ckanext.oaipmh.plugin.OAIPMHPlugin(**kwargs)[source]

Bases: ckan.plugins.core.SingletonPlugin, ckan.plugins.interfaces.IRoutes

OAI-PMH plugin, maps the controller and uses the template configuration stanza to have the template render in case there is no parameters to the interface.

before_map(map)[source]

Map the controller to be used for OAI-PMH.

update_config(config)[source]

This IConfigurer implementation causes CKAN to look in the `public` and `templates` directories present in this package for any customisations.

It also shows how to set the site title here (rather than in the main site .ini file), and causes CKAN to use the customised package form defined in package_form.py in this directory.

ckanext.oaipmh.rdftools module

RDF reader and writer for OAI-PMH harvester and server interface

ckanext.oaipmh.rdftools.nsow(name)[source]
ckanext.oaipmh.rdftools.nsrdf(name)[source]
ckanext.oaipmh.rdftools.rdf_writer(element, metadata)[source]

ckanext.oaipmh.run_import module

ckanext.oaipmh.run_import.test_fetch(url, record_id, fmt)[source]
ckanext.oaipmh.run_import.test_list(url)[source]

Module contents