ckanext.oaipmh package¶
Submodules¶
ckanext.oaipmh.cmdi module¶
ckanext.oaipmh.cmdi_reader module¶
- class ckanext.oaipmh.cmdi_reader.CmdiReader(provider=None)[source]¶
Bases: object
Reader for CMDI XML data
- namespaces = {'cmd': 'http://www.clarin.eu/cmd/', 'oai': 'http://www.openarchives.org/OAI/2.0/'}¶
ckanext.oaipmh.controller module¶
Serving controller interface for OAI-PMH
ckanext.oaipmh.harvester module¶
- class ckanext.oaipmh.harvester.OAIPMHHarvester(**kwargs)[source]¶
Bases: ckanext.harvest.harvesters.base.HarvesterBase
OAI-PMH Harvester
- fetch_stage(harvest_object)[source]¶
The fetch stage will receive a HarvestObject object and will be responsible for: - getting the contents of the remote object (e.g. for a CSW server, perform a GetRecordById request). - saving the content in the provided HarvestObject. - creating and storing any suitable HarvestObjectErrors that may occur. - returning True if everything went as expected, False otherwise.
Parameters: harvest_object – HarvestObject object Returns: True if everything went right, False if errors were found
- fetch_xml(url, context)[source]¶
Get xml for import. Shortened from fetch_stage()
Parameters: - url – the url for metadata file
- type – string
Returns: a xml file
Return type: string
- gather_stage(harvest_job)[source]¶
The gather stage will receive a HarvestJob object and will be responsible for: - gathering all the necessary objects to fetch on a later. stage (e.g. for a CSW server, perform a GetRecords request) - creating the necessary HarvestObjects in the database, specifying the guid and a reference to its job. The HarvestObjects need a reference date with the last modified date for the resource, this may need to be set in a different stage depending on the type of source. - creating and storing any suitable HarvestGatherErrors that may occur. - returning a list with all the ids of the created HarvestObjects.
Parameters: harvest_job (HarvestJob) – HarvestJob object Returns: A list of HarvestObject ids
- get_package_ids(set_ids, config, last_time, client)[source]¶
Get package identifiers from given set identifiers.
- import_stage(harvest_object)[source]¶
The import stage will receive a HarvestObject object and will be responsible for: - performing any necessary action with the fetched object (e.g create a CKAN package). Note: if this stage creates or updates a package, a reference to the package should be added to the HarvestObject. - creating the HarvestObject - Package relation (if necessary) - creating and storing any suitable HarvestObjectErrors that may occur. - returning True if everything went as expected, False otherwise.
Parameters: harvest_object – HarvestObject object Returns: True if everything went right, False if errors were found
- info()[source]¶
Harvesting implementations must provide this method, which will return a dictionary containing different descriptors of the harvester. The returned dictionary should contain:
- name: machine-readable name. This will be the value stored in the database, and the one used by ckanext-harvest to call the appropiate harvester.
- title: human-readable name. This will appear in the form’s select box in the WUI.
- description: a small description of what the harvester does. This will appear on the form as a guidance to the user.
A complete example may be:
{ 'name': 'csw', 'title': 'CSW Server', 'description': 'A server that implements OGC's Catalog Service for the Web (CSW) standard' }
Returns: A dictionary with the harvester descriptors
- md_format = 'oai_dc'¶
- on_deleted(harvest_object, header)[source]¶
Called when metadata is deleted from server. Return False if dataset is ignored.
- parse_xml(f, context, orig_url=None, strict=True)[source]¶
Parse XML and return package data dictionary.
Parameters: - f – data as string
- context – CKAN context
- orig_url – orgininal URL
- strict – No used here, required by caller
Returns: package dictionary (used for package creation)
- validate_config(config)[source]¶
[optional]
Harvesters can provide this method to validate the configuration entered in the form. It should return a single string, which will be stored in the database. Exceptions raised will be shown in the form’s error messages.
Parameters: harvest_object_id – Config string coming from the form Returns: A string with the validated configuration options
ckanext.oaipmh.ida module¶
- class ckanext.oaipmh.ida.IdaHarvester(**kwargs)[source]¶
Bases: ckanext.oaipmh.harvester.OAIPMHHarvester
OAI-PMH Harvester
- md_format = 'oai_dc'¶
ckanext.oaipmh.importcore module¶
- ckanext.oaipmh.importcore.dummy_metadata_reader(xml_element)[source]¶
A test metadata reader that always returns the same metadata
Parameters: xml_element (any) – XML input Returns: metadata dictionary Return type: oaipmh.common.Metadata instance
- ckanext.oaipmh.importcore.generic_rdf_metadata_reader(xml_element)[source]¶
Transform RDF/XML documents into metadata dictionaries
This function takes an RDF document in XML format, transforms it into an RDF graph, and traverses that graph to find all nodes in the graph and give them namepaths.
Parameters: xml_element (lxml.etree.Element instance) – RDF/XML document Returns: metadata dictionary Return type: oaipmh.common.Metadata instance
- ckanext.oaipmh.importcore.generic_xml_metadata_reader(xml_element)[source]¶
Transform XML documents into metadata dictionaries
Parameters: xml_element (lxml.etree.Element) – XML document Returns: metadata dictionary with all the content of xml_element Return type: oaipmh.common.Metadata
- ckanext.oaipmh.importcore.is_reverse_relation(rel1, rel2)[source]¶
Tells whether two elements are mutual reverses
Parameters: - rel1 (string) – name of relation
- rel2 (string) – name of relation
Returns: whether rel1 and rel2 are reverse relations
Return type: boolean
- ckanext.oaipmh.importcore.namepath_for_element(prefix, name, indices, md)[source]¶
Helper function to form name paths
This function takes a prefix and name and concatenates them into a “name path”. As a side effect, it also counts the elements with a same name path and gives them unique indices, and marks the count of such elements in the metadata dictionary.
Parameters: - prefix (string) – the namepath of the parent element
- name (string) – the name of the current element
- indices (a hash from strings to integers (inout)) – a hash to keep counts
- md (a hash from strings to any type (inout)) – a dictionary of metadata keys (namepaths) and values
Returns: a new namepath with name appended to prefix
Return type: string
- ckanext.oaipmh.importcore.namespaced_name(name, namespaces)[source]¶
Substitutes a namespace prefix in a URL with its short form.
Parameters: - name (string) – the URL
- namespaces (list of (string, string)) – a list of (short prefix, long prefix) pairs
Returns: the URL, with a short prefix
Return type: string
ckanext.oaipmh.importformats module¶
- ckanext.oaipmh.importformats.copy_element(source, dest, md, callback=None)[source]¶
Copy element in metadata dictionary from one key to another
This function changes the metadata dictionary, md, by copying the value corresponding to key source to the value corresponding to the key dest. It also copies all elements if it is an indexed element, and language information that pertains to the copied element. The parameter callback, if given, is called with any element names formed (indexed or no).
Parameters: - source (string) – key to be copied
- dest (string) – key to copy to
- md (hash from string to any value (inout)) – a metadata dictionary to update
- callback (function of (string, string) -> None) – optional callback function, called with source, dest and their indexed versions
- ckanext.oaipmh.importformats.create_metadata_registry(harvest_type=None, service_url=None)[source]¶
Return new metadata registry with all common metadata readers
The readers currently implemented are for metadataPrefixes oai_dc, nrd, rdf and xml.
Returns: metadata registry instance Return type: oaipmh.metadata.MetadataRegistry
- ckanext.oaipmh.importformats.nrd_metadata_reader(xml)[source]¶
Read metadata in NRD schema
This function takes NRD metadata as an lxml.etree.Element object, and returns the same metadata as a dictionary, with central TTA elements picked to format-independent keys.
Parameters: xml (lxml.etree.Element instance) – RDF metadata as XML-encoded NRD Returns: a metadata dictionary Return type: a hash from string to any value
ckanext.oaipmh.oai_dc_reader module¶
ckanext.oaipmh.oaipmh_server module¶
OAI-PMH implementation for CKAN datasets and groups.
- class ckanext.oaipmh.oaipmh_server.CKANServer[source]¶
Bases: oaipmh.common.ResumptionOAIPMH
A OAI-PMH implementation class for CKAN.
- listIdentifiers(metadataPrefix, set=None, cursor=None, from_=None, until=None, batch_size=None)[source]¶
List all identifiers for this repository.
ckanext.oaipmh.plugin module¶
- class ckanext.oaipmh.plugin.OAIPMHPlugin(**kwargs)[source]¶
Bases: ckan.plugins.core.SingletonPlugin, ckan.plugins.interfaces.IRoutes
OAI-PMH plugin, maps the controller and uses the template configuration stanza to have the template render in case there is no parameters to the interface.
- update_config(config)[source]¶
This IConfigurer implementation causes CKAN to look in the `public` and `templates` directories present in this package for any customisations.
It also shows how to set the site title here (rather than in the main site .ini file), and causes CKAN to use the customised package form defined in package_form.py in this directory.
ckanext.oaipmh.rdftools module¶
RDF reader and writer for OAI-PMH harvester and server interface