Import Using the koLibRI JAR File¶
Java Version¶
On some installations of the koLibRI (expecially on Windows Operating Systems) it seems to be a good idea to use Java8 JDK instead of Java7, so it is recommended to use Java8 right from the start!
Downloading the Software¶
Please download at first the configuration and folder ZIP file, where all the needed config files and templates are stored in:
This ZIP file please do extract to your preferred working folder. Furthermore you will need the koLibRI Command Line Module prepared for the usage of the TextGrid import. Please put this JAR into your working folder and rename it to kolibri-cli-6.4.0.jar (just for simplifying things, and for having the right name used in the kolibri-go scripts ... ):
You now should have a folder structure like the following:
kolibri-addon-textgrid-import
⌊ config
⌊ tglab_config.xml
⌊ tgrep_config.xml
⌊ policies.xml
⌊ …some more koLibRI config files…
⌊ folders
⌊ dest
⌊ hotfolder
⌊ data
⌊ hotfolder-dfg-viewer
⌊ hotfolder-test
⌊ log
⌊ metadata-responses
⌊ temp
⌊ work
⌊ kolibri-cli-6.4.0.jar
⌊ kolibri-go.bat
⌊ kolibri-go.sh
⌊ version.txt
Please check if you have the correct ZIP downloaded, the file version.txt should contain the current SNAPSHOT version of the koLibRI import tool: kolibri-addon-textgrid-import-6.4.0
Configuring koLibRI¶
See Configuration.
Starting the koLibRI Workflow Tool¶
If everything is configured correctly, and are all data copied, koLibRI can be started. You do need a Java Virtual Machine using Java 7 or higher. Change into your work directory kolibri-textgrid-import containing the JAR file, the config and folder directories and the kolibri-go scripts and type
./kolibri-go.sh config/tglab_config.xml
in a Linux console and MAX OS terminal or
kolibri-go.bat config\tglab_config.xml
in a Windows/DOS command shell.
You can check the status of your imports either in the TextGridLab’s project you imported in or in the TextGridRep Sandbox, depending on your configuration. For using the correct charsets (depending on your local charset configuration some special chars, such as ö, ä, ü, may not be correctly processed), the -D trigger in the kolibri-go scripts already are set to UTF-8. Furthermore you are allowed to use more then the default 50.000 XML child objects, so we set this to 500.000, as needed for handling projects with many objects.
Configuration¶
Chosing Configuration File from Template¶
There are two template configuration files in the config/ folder:
tglab_config.xml¶
is to be used to import data into the TG-lab, so you can work with your data inside the chosen TextGrid project.The data will not be visible to users other than you and the users you decide to share it with. All non-public services are preconfigured in this file.
tgrep_config.xml¶
is to be used to import directly to the TG-rep. Your data is visible to the public immediately (at first in the TextGrid Repository Sandbox only, and after finally publishing for everyone and everywhere).
Please chose one of that files according to your import plans.
Editing the Config File¶
Commonly Used Settings¶
<field>defaultPolicyName</field>
Setting the import policy: The parameter defaultPolicyName can address the following policies (as existing in the policies.xml file). Edit the config file of your choice, and chose a value. Depending on your import policy, other configuration values have to be set, please see below.
- aggregation_import
- This policy is used to automatically create TextGrid metadata for each file out of the file name and the detected file format. For every folder a TextGrid aggregation is created and imported, so the folder structure will appear in TextGrid the same as in the import folder itself.
- complete_import
- If you use this policy, all given files simply are imported, no additional metadata is created, so you need to have a complete set of TextGrid objects including TextGrid metadata. TextGrid URIs are taken from TG-crud whenever needed, so your files must be linked to each other (such as aggregation references) by local file pathes. File extensions for existing TextGrid editions, collections, works, aggregations, XML and metadata files can be configured if needed, but it is recommended to use the default ones and not change them.
- continue_import
- Use this policy to continue a broken or stopped import (e.g. in case of an error). Just configure the hotfolder to be the temp folder, the files were processed in.
- delete_import
- An already imported set of objects can be deleted from the sandbox again. Uses the TG-crud service directly. This can be used with an URI list (as a file) or by giving a root URI. Please see configuration of the class DeleteFiles.
- publish_import
- An already imported set of objects will be finally published. Uses the TG-publish service. This can be used with an URI list (as a file) or by giving a root URI. Please see configuration of the class PublishFiles.
- dfgviewermets_import
- Takes as input one (or more) DFG Viewer METS file according to the DFG Viewer METS Specification and creates a folder structure from the physical and logical StructMap, that then is imported into the TextGrid. MODS and/or TEI metadata will be mapped to TextGrid metadata via existing MODS/TEI XSL transformation files, or can be done via custom XSL files.
<field>rbacSessionId</field> and <field>projectId</field>
Authentication and project settings: Please add the two values with your TextGrid Project ID (projectId) and your Session ID (rbacSessionId).
Aggregation Import Configuration¶
If you are using aggregation_import, just set the data as described above and run the koLibRI.
<field>hotfolderDir</field>
Choosing a hotfolder: As hotfolder ./folders/hotfolder/ is pre-configured. Just copy your data to publish into the data/ folder. The data is copied before processing starts, so the original data will not be touched. If chosen aggregation_import as policy, please put only ONE folder in the hotfolder containing files and folders to import 8this would be the already existing data/ folder). All those files will be imported in ONE TextGrid project as files and aggregations.
Complete Import Configuration¶
If you are using complete_import, just set the data as described above and run the koLibRI.
<field>hotfolderDir</field>
Choosing a hotfolder: As hotfolder ./folders/hotfolder/ is pre-configured. Just copy your data to publish in the data/ folder. The data is copied before processing starts, so the original data will not be touched. All data will exactly be imported as prepared by the user. Please note that you do also need metadata files according to the TextGrid Metadata Schema. Everything else works according to the aggregation import hotfolderDir documentation.
<field>createNewRevisions</field>
[since version 6.7.0-SNAPSHOT on dev.textgridlab.org]
Set this flag to true if you want to import new revisions of all your (existing) files to TG-lab or TG-rep. It is recommended testing this on the test system first, it is not yet deployed on the productive system. Some issues to be noted:
- At revision import all objects must have TextGrid URIs instead of local file pathes. You need to have revision URIs!
- Only the existing files can be revisioned, new files must still be coped with
- Old PIDs are being replaced by new ones for each to be revisioned
- PIDs must still be added to the TEI files. Where could we do that? Maybe koLibRI could add them to each TEI file? Or just add the TextGrid URI to the corresponding file and rewriting in koLibRI does the rest (GetPidsAndRewrite)?
- Navigator doesn’t show revisioned objects correctly right now
- Revisions also will not be displayed correctly in the TG-lsb’s revision view (right click on Show Revisions)
- Please report issues to me: mailto:funk@sub.uni-goetingen.de
DFG Viewer METS Import Configuration¶
If you are using dfgviewermets_import, just set the data as described above and run the koLibRI.
<field>hotfolderDir</field>
Choosing a hotfolder: As hotfolder ./folders/hotfolder/ is pre-configured. Put all your METS files directly into the hotfolder/ folder. For each METS file there will be created a root Aggregation/Edition/Collection, please see below. It is possible to put more than one METS file into the hotfolder. koLibRI then processes the import concurrently with a configurable number of threads (please see general configuration options in the koLibRI configuration file).
<field>rootAggregationMimetype</field>
DFG Viewer aggregations: For DFG Viewer Import you can chose the format of your root aggregation (there is one root aggregation for every METS file). It can be chosen to be imported as a TextGrid Aggregation (text/tg.aggregation+xml), Edition (text/tg.edition+tg.aggregation+xml) or Collection (text/tg.collection+tg.aggregation+xml).
Please note: Custom XSLT stylesheets for metadata creation can be specified in the properties of <class name=”actionmodule.textgrid.DfgViewerMetadataProcessor”>.
Publish Configuration¶
To finally publish your objects after sandbox publishing - every koLibRI import is published to the sandbox first - you must use the policy publish_import.
<field>objectUri</field>
Please use import mapping file, project ID, or root URI of TextGrid object to assemble TextGrid objects to be published, such as
- file:./folders/temp/1470065621459_data_URI.imex (URI mapping file)
- file:./folders/temp/1470065621459_data_PID.imex (PID mapping file)
- textgrid:12345.0 (TextGrid URI)
- project:TGPR-f1867520-4a53-9ced-9da5-503762ba0f61 (project ID)
If you are using a TextGrid URI as an object URI, all objects of an edition or collection are being published, including the collection itself. If a single TextGrid item is referenced (no aggregation), only this item will be published.
<field>dryrun</field>
Use to check what will happen before publishing anything. Will not publish anything unless set to false recommended)!
Delete Configuration¶
Already published data can still be deleted, if it was imported into the TextGrid Sandbox and has not yet been finally published using the publish_import policy. To delete objects from the sandbox, change the policy to delete_import.
<field>objectUri</field>
Please use import mapping file, project ID, or root URI of TextGrid object to assemble TextGrid objects to be deleted, such as
- file:./folders/temp/1470065621459_data_URI.imex (URI mapping file)
- file:./folders/temp/1470065621459_data_PID.imex (PID mapping file)
- textgrid:12345.0 (TextGrid URI)
- project:TGPR-f1867520-4a53-9ced-9da5-503762ba0f61 (project ID)
If using TextGrid URI as object URI, all objects of an edition or collection are being deleted, including the collection itself. If a single TextGrid item is referenced (no aggregation), only this item will be deleted.
<field>dryrun</field>
Use to check what will happen before deleting anything. Will not delete anything unless set to false (recommended)!
Editing the Metadata Template File (Optional)¶
The config file for the metadata generation used by some policies such as aggregation_import and dfgviewermets_import in module textgrid.TextgridMetadataProcessor (textgrid_metadata_template.xml) is used for the creation of metadata for every file to be imported! The metadata stated in this file is used for metadata file creation and can be edited according to the TextGrid Metadata Schema. Metadata not fitting in the schema will not be accepted.
Logging and Keeping¶
All imports are logged to the file /folders/log/kolibri.log. Please keep all the folders in the /folders/temp/ folder, and especially all the files with suffix _URI.imex for later publication or deletion policies. If PIDs are created, the PID mapping is stored to _PID.imex files. These files format is also being used in the TG-lab import and export module.
Change More Parameters?¶
DON’T!
There is some more information for every config file value in the description tags of each value in the config file’s module class definitions. But: Do not change anything else unless you are REALLY sure about it!
Hints and Tricks¶
If as hotfolder is given a directory with files contained only the import will do nothing, because the koLibRI will import the one and only directory from WITHIN the hotfolder. If you want to import the files contained in the hotfolder, too, just set the readDirectoriesOnly flag of the processstarter.MonitorHotfolder to FALSE! Beware: All rewriting will be restricted to single files now (so no rewriting will happen at all!) because every file will be handled one after another!