Glossary

All definition for terms and acronyms used in this documentation.

PGCN: Plateforme de Gestion de Contenu Numérisé. Original name of NumaHOP.

OCR : Optical Character Recognition. Process by which text is recognized on images.

Metadata standards and file formats

METS : Metata-Data Encoding and Transportation Standard An XML standard.

EAD : Encoded Archival Description.

DC : DublinCore Small Set of core metadata for bibliographic units User Guide.

SIP : Submission Information Package A file describing a package to be archived.

AIP : Archival Information Package A file representing an archived packaged.

XSD : XML Schema Definition An XML file describing an XML format allowing an XML parser to verify a file is well formed according to a standard.

MARC: Format for bibliographic data Spec.

ALTO : Analyzed Layout and Text Object.

Protocols

OAI-PMH : Open Archives Initiative Protocol for Metadata Harvesting Spec.

Z39.50 : A protocol to searching and retrieving data in databases between servers over TCP/IP.

IIIF : A REST API for requesting documents and their images SPEC.

NumaHOP Vocabulary

Digitalization service provider: The company(for external digitalization) or service (internal digitalization) doing the digitalization process.

Document Unit: Also referred as Doc Unit or DU; A Document or part of a document for collections to be processed by NumaHOP.

Workflow: Steps the Document Unit takes trough NumaHOP. Can be defined at the project level, or the bundle level.

Notice: Bibliographic meta-datas attached to a DU.

Condition Report: A check of the state of the document after it was manipulated during the digitalization process.

Delivery: The step where the digitalized documents are deposited into NumaHOP and attached to the correct Document Unit.

Import: The step where we create the Document Unit.

Grouping of Document Units

Project: A project is usually a set of documents to be digitalized by the same provider.

Bundle: A physical set of Documentary Units averaging around 20 to 50 documents.

Train: As in digitalization train. Smaller set of document for insurance reason.

All sets of grouping are not necessarily useful for all use cases. For smaller volume of documents in a project the train is not as useful.

Design Pattenrn jargon

DTO: Data Transfer Object an object traveling between the view and the controller.

Service: Class containing business logic.

Repository: Class abstracting a storage method (database, elasticsearch or file system).

Mapper: Class performing mapping from a domain object to another.