In Support of Knowledge: Synergy and the Integration of Cultural Heritage Data

The relationship between information and computers has always been limited to some extent by the latter, and this has been particularly apparent in cultural heritage systems. However, interest in cultural heritage computing started as early as the 1960s, and the first museums and computers conference was held at the Metropolitan Museum in New York, sponsored by IBM, in April 1968. The curatorial reviewer of that event, Edward F. Fry, then an assistant curator at the Guggenheim, and later an Andrew W. Mellon distinguished Professor of Art History, set out a positive vision for how computers might become useful to cultural heritage experts ( This included a different type of ‘standard’, one that was flexible and could support developing knowledge across all disciplines, including the organisation of bibliographic material alongside material culture and other historical information. The ultimate aim, according to Fry, would be to provide scholars with a resource that allowed the pursuit of new knowledge from existing and connected facts.

The success of modern institutional information applications relies heavily on human knowledge and engagement. This often produces an uneasy relationship because knowledge is compromised when translating it into rows and columns often for simple catalogue and inventory formats. While user experience (UX) techniques and general computing have developed considerably over the years, systems are still limited by a lack of expression and flexibility available at the point at which knowledge is encoded into structured digital data. While these limitations can be managed within the closed environment of the organisation, information systems don’t not come close to achieving Fry’s criteria, now 45 years old.

Software applications attempt, with the help of technologists, to represent data stripped of context and placed within artificial schemas. Systems that carry data derived from information that is semantically complex (like much humanities data) rely on the knowledge of users to fill in the gaps left implicit, ambiguous or missing by computer systems. The majority of databases in which information is stored have no ability to store semantics effectively and therefore the layers of software that make up an application try to compensate for this through programmatic interpretation. People are therefore an integral part of any internal information system, both in its production and it final utility and effectiveness. But surely it makes more sense for subject experts to specify the semantics?

This model is also highly expensive. Ongoing knowledge production requires changes to the database; and changes to the database require changes to the layers of software that interpret and present the information. Often more training is required for users to understand the changes, both formal and informal. These are processes and costs that have been established and normalised over a long period of time, and are broadly accepted or often deep rooted.

Despite these limitations we happily publish data (orphaned from all of the original layers that make it useful internally) to open environments for others to use. Traditional standards help to some extent but in many knowledge organisations working with richer datasets (particularly cultural heritage), making this information useful to wider audiences is difficult and consumes vast amounts of money with little understanding of the benefits. The people most likely to engage with raw data are technologists who are often not in a position to fully interpret and present the data alone, and conversations with knowledge providers are often limited by traditional database development processes. For example, projects compensate by only consuming data that conforms to a ‘core’ model that is easier to understand without much additional interpretation and processing (or so it seems).

By only processing core fields, a mindset is gradually generated that there is no need to consult with the original producers of the data, which is perceived as an overhead. In many cases producers of data are asked to only provide that which conforms to the core model, reducing the aggregator’s overheads further. In all of this, aggregators disseminate a notion that publishing any data in any form is always a good thing. Yet aggregation systems have failed to provide the value and benefits that their original mission statements promised and in some cases do not accurately represent source data. A core model has, divorced from its wider context, a limited set of use cases beyond providing simple finding aids, and as data repositories increase in size these finding aids come under increasing strain, burying information rather than making it more discoverable.

Imagine if you turned this established model on its head! Imagine that instead of spending vast amounts of money on software layers which require high levels of technical skill to create, manage, support and change (this in itself limiting progress) more of this effort was instead focussed on the data. Instead of building databases that force the meaning and context out of information just so that it conforms to a database structure, and instead of commissioning technologists to reassemble the meaning using programming code – imagine a method of representation designed for information experts that could incorporate the semantics of the information they produce into the data itself, making the implicit, explicit. Just as Edward F. Fry suggested that it should. What would this mean?

Firstly it would mean that much more of the effort required to build an information system would be placed in the hands of the people that understand the information and who would also use and develop it, since it would better reflect their information needs. That doesn’t mean that technologists are not important, but it would allow a more appropriate use of skills. Secondly, with meaning and context made explicit, less resources would be needed to build the application software. Technical experts would focus, not on trying to make sense of abstract artificial data and data structures, but instead on materialising the meaning already embedded in the data. Thirdly, the data would be far more useful for others because, with its context intact, it becomes suitable for research and engagement alike – Researchers need context, but so do general interested digital visitors – and it would become accessible to a far greater number of potential users. Finally, as information evolves it becomes easier and less costly to change the software which itself becomes more flexible. Additionally, data would not need to rely on software providing both a preservation object and a meaningful practical digital object at the same time.

The final part of this imagined world is a data environment that allows the flexible assertion of patterns of information which carry meaning and context. If the language used to provide this were based on universal concepts rather than artificial standards and specialised terminology, and if this framework supported existing data rather than replaced it, then many other benefits start to arise. In this knowledge orientated information world new assertions could be added without breaking the systems that carry them and this would provide the foundation for a very large number of use cases and help connect organisations in a more significant way to communities of people and organisations also interested in improving and using similar or related information. Such a semantic framework would also provide the basis for integration, not by attempting to individually link pieces of information as a technical exercise – an impossible and error prone endeavour – but through an alignment of universal semantics, with provenance, able to transcend organisations, sectors and national borders. Such a change in thinking requires a new set of processes that reflect a greater emphasis on information and the people who create it, and a re-alignment of the relationship between knowledge producers and the producers/developers of information systems.

The Synergy system is about changing the way that we think about digital information and promoting more useful and quality information. It comes from a realisation that without more emphasis on the data, information will remain a second class citizen in a so called, ‘information world’, and serve very limited aims and objectives. The Synergy model understands that effective global information networks need to represent more fully, and be connected to, the knowledge organisations that produce data on a daily basis and who know how data should be represented effectively with all its crucial features intact – A Web of Knowledge.

While large amounts of effort have been directed into digitisation, generating huge amounts of digital information, less attention has been given to how that information should be represented to a varied and growing audience. Whether used for research, education or engagement purposes, a key challenge is to represent data in a way that fully conveys its original context so that it can be effectively built upon and related to other connected and supporting information. For people outside cultural heritage institutions to understand the information and use it effectively the language and concepts used to help convey or represent it must be more universal (and conform to real world logic) but also true to its original institutional meaning. The Synergy system describes how organisation can change and influence their engagement with digital information services and communities on the World Wide Web and elsewhere. It puts a far greater emphasis on the curation and representation of data. The key elements of Synergy are these;

  1. It describes in full the ecosystem required for sustainable data provisioning between data providers and data aggregators as an ongoing and collaborative undertaking.
  2. It addresses the lack of functionality and flexibility in current aggregation systems and therefore the lack of user orientated tools necessary to generate data including data cleaning, data mapping definitions & associated metadata transfer.
  3. It describes the necessary knowledge and input needed from providers (or provider communities) to create quality sustainable aggregations with meaning and context.
  4. It defines a modular architecture that can be developed and optimized by different contributors with minimal inter-dependencies.
  5. It support the ongoing management of data transfers between providers and target data repositories and the delivery of transformed data at defined times, including updates.

In doing this it describes a different model to the one currently implemented as an attempt to move away from short term thinking about data provisioning, to more long term and sustainable systems that are more cost effective, deliver better quality data and elevate information to a first class citizen.

The Synergy reference model is described at

Dominic Oldman

Contextual Cultural Heritage at the American Institute of Indian Studies, Gurgaon, India

DSC_1678This two day workshop in Gurgaon, New Delhi, was organised by Professor Donna Kurtz of the e-Research Centre of University of Oxford and Vandana Sinha, the Academic Director (Center for Art and Archaeology) at the American Institute of Indian Studies (AIIS). It was made possible through an Arts and Humanities Research Council (AHRC) grant and also through the kind support and generosity of Oberoi Hotels & Resorts. The workshop was held at the headquarters of the AIIS which houses a truly astonishing photographic archive of Indian art and culture and hosts a world leading specialist library.IMG-20150407-00683

The event provided the opportunity to exchange knowledge on cultural heritage digital representation and integration using the Virtual Museum of Images and Sound (VMIS) Project from the American Institute, Oxford University’s CLAROS system and the Andrew W. Mellon funded ResearchSpace project at the British Museum as a backdrop for discussions. It also encouraged a mutual exchange of knowledge on meaningful knowledge representation, the contextual ontology CIDOC CRM (Conceptual Reference Model), and the various tools under development aimed at scholars and subject experts. The messages of the workshop centred on information sharing and linking with the following key points.

  • Cultural heritage organisations, through the World Wide Web, reach a wide range of audiences. These communities need to be better served by these institutions in the representation and integration of data.


    A CIDOC Conceptual Reference Model semantic representation of AIIS spreadsheet data

  • The Web of Knowledge relies on the fluidity of data. While barriers on the Web may be frustrating for human browsing they prevent the use of computers to make sense of the vast amounts of knowledge encoded as data. Some organisations, like the Yale Center for British Art, the National Gallery in Washington and the Rijksmuseum in Amsterdam have removed barriers to their digital images and data and already work towards a more fluid and knowledge driven Web.
  • Cultural Heritage institutions use their collections to provide particular perspectives on history and therefore provide a particular commentary about contemporary society. However, each institution provides only particular perspectives, and only by making information accessible (particularly in a computer readable and reusable form) can these perspectives can be combined, compared and studied. Without releasing institutional knowledge, perspectives are not represented in important information channels and, at worst, perspectives can be misrepresented.
  • The World Wide Web contains an increasing amount of information containing visual inaccuracies and contextual and semantic mistakes. Only by source organisations engaging with the Web of Data can these misrepresentations be confronted.
  • The value of cultural heritage resources lies in its availability for continued scholarship and creativity and ultimately the dissemination of the results of this work in accessible, meaningful and engaging ways.IMG-20150407-00689
  • While the Web has exposed the extent of the audiences interested in cultural heritage data, cultural heritage organisations have yet to respond to this by reviewing the type of information they record and publish. Information about the significance and relevance of items in a collection is becoming the important factor in serving new communities and audiences. Cultural heritage organisations can address these issues by reusing the results of external research (based on information they have made accessible) to enrich their own knowledge systems.
  • Only when meaningful contextual links are established between different cultural heritage organisations will audiences be able to see a more complete picture of history that combines local knowledge with national, and harmonises data across collections, archives, bibliographic information and other scholarly sources.
2015-04-11 09.35.32

Sebastian Rahtz (Oxford University) and Dominic Oldman (British Museum) at the Taj in Agra

The American Institute of Indian studies and Oxford University e-Research centre are committed to supporting these principles and strive to promote knowledge collaboration and help develop the Web of Knowledge into a serious resource for academic researchers and enthusiast alike.

Presentation Slides from Professor Kurtz, Dominic Oldman, Head of ResearchSpace at the British Museum, and Sebastian Rahtz, Chief Data Architect at Oxford University are available at                         .


Bridging the Gap – Reviewing the Getty / WMF Arches System

archesThis week I participated on a panel (at the Computer Applications and Quantitative Methods in Archaeology Conference) talking about Linked Open Data in the context of the Arches Heritage Inventory and Management System, jointly developed by the Getty Conservation Institute (GCI) and World Monuments Fund (WMF). Designed initially to support immovable cultural heritage, the Arches system uses a new approach to information system design by implementing Open World ontologies at the core of a self contained data management application. The first implementations of Arches use the CIDOC CRM for the underlying data models (for example, input screens are generated directly from the ontology model) to make full use of the rich relationships and semantics that CIDOC CRM has to offer, and showing the full potential of the system. It represents a significant milestone in data management design because it confronts the issue of how internal information systems can more easily and meaningfully connect to Open Data environments, necessary for collaboration.

We typically use traditional information systems that contain meaning and context in many different layers of the application. Meaning exists in the data, the business rules and application logic, the user interface; and some meaning is also typically left implicit and only understood by the internal users of the system who ‘fill in’ the semantic gaps.

Research data requires context. Actually, engagement data requires context! Typically relatively little of this context makes it through to Linked Data publications. Data publication often means “Raw Data” straight from database files with essential meaning, locked away in the other application layers, missing. People who publish data and create the models are often not the data experts and they don’t collaborate sufficiently with data ‘knowledge’ experts. Therefore target LOD schemas/ontologies, and therefore open data, lacks context and meaning.

The Arches system is different in that the system is based on graph database principles and creates a user oriented data management environment based on real world ontologies like the CIDOC CRM. This means that context and meaning is built into the underlying database at the start (when the need for modelling is generally accepted) rather than trying to interpret and extract the meaning at the end. Yes, it still means modelling, but all new database applications require data modelling. With Arches, and other CIDOC CRM applications, CRM models (templates) can potentially be published and reused making the modelling process far more collaborative than traditional artificial Closed World modelling.

By using an underlying data model that natively supports context and semantics, and not locking it away in other components that can’t be easily (or are not) transformed into data, Arches bridges the gap between the Closed World information systems and contextual Open World requirements. Arches now joins a growing portfolio of innovative CIDOC CRM applications.












Museum Documentation – Moving from the Closed, to the Open World

open world

The Open World

As cultural heritage organisations increase their engagement with the Web and the ‘open’ digital world it is becoming increasingly important that they don’t simply apply the same methods and practice that they currently continue to use in their internal and closed world. This is particularly important for the documentation of cultural objects which must radically change if museums are to become valued open world digital organisations.

Current collection management systems are based on standards and techniques designed for a closed world environment. They record information in ways that, when combined with the knowledge of internal curators and experts, are useful for internal purposes. However, the technical transfer or publishing of this data to the Web effectively creates a flat linear resource which is separated from this internal knowledge, significantly limiting its uses and value.

Using knowledge representation methods that attempt to transfer some of the missing context and semantics enhances the data considerably but these methods (Semantic Web ontologies) could provide a far better representation if the original method of documentation  was not so affected by the closed world mindset. However, in terms of existing documentation this is the legacy that we have, and no-one involved in the past in defining collection digitisation anticipated or understood the potential that open world environments might provide for collection data. Revisiting that documentation raises a number of issues.

In new digitisation projects, however, these closed world mindsets no longer have to be applied. Yet in the same way that much of the cultural heritage world has so far failed to realise the potential of the Web beyond electronic publishing for human consumption and replicating the same things they did with hard copy publishing, we also seem intent on continuing with the same type of closed world documentation even though we know that open world requirements and benefits are different. Even for special projects we assume that we need to use the same approach and documentation standards (albeit with different tools) that we are using with our internal collection system – a misplaced assumption about making new digitisation conform to legacy data and systems.

This is a big mistake. When we look at new digitisation we need to use approaches that enhance the possibilities of the data, not give it the same ‘closed world’ limitations. We need new approaches to documentation that are not based on the premise of creating an internal inventory catalogue, but rather ones that directly embed more of the experts knowledge (curators, archivists, librarians, academics) into the data and therefore provide a richer source for knowledge representation methods that can benefit a wider range of users – including cultural institutions themselves.

Museum curators need to understand these new possibilities, take the initiative and insist that documentation and technical departments that are still working with legacy closed world standards and approaches do not continue to limit the possibilities of new data. Ultimately, the way that we document objects in museums needs to change to reflect the fact that we no longer digitise simply to keep an internal record, but instead to provide a valuable, rich and engaging resource for a range of different uses. Without recognising these necessary changes we risk having a far larger legacy of data that we will inevitably need to re-visit.

Dominic Oldman




How should we treat data? Like we were Humanists

broken relationships


It strikes me how we (digital humanists) have a very different relationship with structured data compared to the one we have with text and literature. This seems to me to be reflected in the way that we treat data. While many different initiatives around the world attempt to bring data from cultural organisations together, we seem intent on accepting a narrow view about the possibilities of data, computers and the interaction of people, and as a result are happy to ignore the possibilities and benefits of capturing the context (and meaning) attached to data by the experts (people) who produced it and who continually update and develop it. If humanist researchers digitise a book to learn more about it, isn’t the objective to discover more, to discover the hidden relationships and meanings and make connections with other evidence that we have? Do we seek to exclude the elements of it that would give us this insight and throw them away? If not, then why do we accept this situation with cultural data?

Many cultural information systems were designed as closed systems to be used internally in union with the knowledge of the institution and its experts. The original data schemas were often produced to create a functional inventory or reference, and as an internal system they offer a valuable resource – but they are used in combination with other internal knowledge about the data (a knowledge built up over time). If you separate data from its institutional knowledge and context then you lose this essential part of the overall ‘information system’. This is why, when we represent data it should not be just a technical process. It should involve and add institutional knowledge to ensure that the data carries with it as much of this additional and valuable local meaning as possible. Data providers, the institutions themselves, could be providing data that is far more expressive and far more likely to help people (researchers, teachers, ‘the public’ and the institutions themselves) understand their relationship with the past – the type of representations that we take for granted when working on digital literature projects.

“…what would we be without memory? We would not be capable of ordering even the simplest thoughts, the most sensitive heart would lose the ability to show affection, our existence would be a mere never-ending chain of meaningless moments, and there would not be the faintest trace of a past.”   Max Sebald (The Rings of Saturn)

Let’s not make data meaningless and technical, devoid of memory and perspective. Let’s treat it in such a way that it can also evoke meaningful and long lasting memories, and let’s allow it to make connections between different memories (perhaps ones separated by time and place) many of which have been long since forgotten and locked away in our knowledge/memory silos. Let’s use data to produce powerful narratives about history – like we do with literature. Let’s treat data like we were humanists.

For a more formal version of this blog see: 


*By Kathy Kimpel (Flickr: IMG_0327) [CC-BY-2.0 (]

A British Museum Endpoint CIDOC CRM Example Query

An example of a query from the Endpoint (link


#The query retrieves gold coins from a particular time span and cultural period which have the words Augustus inscribed on them.
PREFIX crm: <>                  #Prefix for the CIDOC-CRM
PREFIX skos: <>               #Prefix for SKOS terminologies
PREFIX xsd: <>                

PREFIX rdf: <>
PREFIX rdfs: <>
PREFIX bmo: <>    #British Museum Ontology PX_
PREFIX fts: <>                #OWLIM Fast Text serach

?object                        #Object URI
?objectphy                     #Object physical Description (sub propoerty of P2_has_note)
?inscriptext                   #Inscription text
?startdate                     #xsd start date for range
?enddate                       #xsd end date for range
?comment                       #curatorial comments on the object (sub propoerty of P2_has_note)


?object crm:P2_has_type ?objecttype .                        #An object has an object type
?objecttype skos:prefLabel "coin" .                          #the object type is "coin"

?object bmo:PX_physical_description ?objectphy .             #has a physical description
?object bmo:PX_curatorial_comment  ?comment .                #has a curatorial comment 

?object crm:P130_shows_features_of ?inscription .            #object has a feature
?inscription rdf:type crm:E34_Inscription .                  #the feature is an inscription 
?inscription rdfs:label ?inscriptext .                       #the inscription label has some text 

?object crm:P45_consists_of ?materialid .                    #The object is made of a material
?materialid skos:prefLabel ?metal .                          #The material label has some text 

?object crm:P108i_was_produced_by ?production .              #The Object was produced through a production event
?production crm:P9_consists_of ?productionpart  .            #The production event consists of parts
?productionpart crm:P4_has_time-span ?timespandate  .        #One part describes a time span    
?timespandate crm:P82a_begin_of_the_begin ?startdate .       #The time span has a begining date
?timespandate crm:P82b_end_of_the_end ?enddate .             #The time span has an end date

?object crm:P108i_was_produced_by ?production .              #The object was produced through a production event
?production crm:P9_consists_of ?productionpart1  .           #The production event consists of parts
?productionpart1 crm:P10_falls_within ?matcultureid .        #One part describes the cultural period
?matcultureid skos:prefLabel ?period .                       #The period has a descriptive label 
#FILTER regex(str(?period),"Roman Imperial","i")             #You can look the id up using regex
#<Imperial:Roman:> fts:exactMatch ?period .                  #You can also do this with OWLIM FTS    
?matcultureid skos:prefLabel "Roman Imperial" .              #You can also use the exact literal if you know it

{FILTER ((?startdate >= "-0035-01-01"^^xsd:date) &&          #the date range 
(?enddate <= "-0010-12-31"^^xsd:date))}                                                                              

FILTER (regex(?inscriptext, "AVGVST","i"))                   #the inscription contains the text AVGVST (Augustus)

FILTER (regex(?metal, "GOLD","i"))                           #the coin contains copper 



A Museum Mapping for the Real World

The BM Mapping in a Nutshell

The contextual generalisations of the Conceptual Reference Model (CRM) have been derived by examining a wide range of data models particularly from the cultural heritage and museums sector. This means that many of the concepts are familiar to those working with museum documentation. The following provides a summary of the mapping for a typical British Museum object based on some key CRM properties (it is not comprehensive and does not describe all elements). As you will see the CRM closely follows the events and concepts that many museum staff find familiar and uses real world descriptions rather than the technical labels commonly found in databases tables. The CRM differentiates itself from other aggregating frameworks because it does not attempt to strip down data sources into a common set of fields but instead provides a model for harmonising rich and complex data sets, from whatever source and however formed, to derive the maximum benefit.

The CRM requires an understanding of the domain and range of the properties that it uses. Every CRM property has a carefully selected scope of use. It will only be applicable for a specified domain (the subject of an RDF triple) and a specified range (the object of a triple). The CRM has a structure of classes and the while the domain and range are specified at the highest appropriate class its sub-classes are also implied. Therefore the range of E39_Actor includes the sub-classes E21_Person and E74_Group. A domain of E5_Event includes the sub-class E7_Activity and its sub-class E8_Acquisition – and so on.

The following summary is not an instruction in mapping CRM but an exercise in demonstrating the appropriateness of the ontology to cultural heritage objects. The narrative underlines terms that are reflected in the labels of properties and classes that are listed (again not comprehensively or in any order) next to the narratives. The summary is taken from a larger document that does describe the mapping process in more detail.

Narrative Properties Classes
Museums hold objects that tell the history of the world. These objects sometimes have a title and are recorded with an identifier (an accession number). Some objects form part of a sub-collection with a collection title.
  • P102_has_title
  • P1_is_identified_by
  • P46i_forms_part_of
  • E22_Man-Made_Object
  • E35_Title
  • E42_Identifier
  • E78_Collection
CRM Mapping NoteThe domain of the property P102_has_title is E71_Man-Made_Thing.  A BM object is typed as an E22_Man-Made_Object which is part of the E71_Man-Made_Thing class hierarchy. Therefore P102_has_title can be used with a man-made object. The range is E35_Title. Therefore the node that the triple uses as an object node (as in subject – predicate – object) must be of type, E35_Title.The domain of P1_is_identified_by is E1_CRM_Entity (so any entity in the CRM could have an identifier. The range is E41_Appellation. E41_Appellation has sub-classes that include E42_Identifier. Therefore to make the triple using P1_is_identified_by valid the object node in the triple is, and is typed as, an E42_Identifier.Lastly, P46i_forms_part_of is the inverse of the property P46_is_composed_of, and is used to show that the object forms part of a collection. It has a domain and a range of E18_Physical_Thing. This class includes the sub-class E24_Physical_Made-Made_thing which in turn has the sub-class E78_Collection. Therefore a collection is a type of E18_Physical_Thing and is valid for the mapping.
Most collection catalogue databases will allow curators to write some comments or notes about the object.
  • P3_has_note
  • E62_String
CRM Mapping NoteAs you might expect P3_has_note has the domain of E1_CRM_Entity and can therefore apply to any triple subject (you can write notes about anything). Its range is E62_String and therefore the node it points to must be of type E62_String. Straight forward, yes?(Note: You may be starting to understand how the CRM ensures integrity of mapping. This is essential for the end product to make sense, but also ensures data harmonisation.
Museums will record where the object came from and therefore the details of the various transfers of it from one person or organisation to another, and ultimately to the current owner. However, the current owner could be a third party if the object is on loan, and the acquisition may simply be a transfer of custody rather than of ownership.
  • P23_transferred_title_from
  • P51_has_former_or current owner.
  • P52_has_current_owner
  • P28_custody_surrendered_by
  • E22_Man_Made_object
  • E8_Acquisition
  • E10_Transfer of Custody
CRM Mapping NoteP23_transferred_title_from is a predicate that uses a subject node with a type of E8_Acquisition (domain) but must refer (range) to an E39_Actor (e.g. a person E21 or a Group E78). This makes sense because the object must come from some sort of group or person. For P51_has_former_owner we are talking about the object’s (E22_Man-Made_Object) former owner (the domain is E18_Physical_Thing) and a range of E39_Actor again (Acquisitions work around people or organisations). Likewise the property P52_has_current_owner also operates in the domain of the physical thing (E18_Physical_Thing) and the range of an Actor.P28_custody_surrendered_by has a range of E39_Actor but the domain is E10_Transfer_of _Custody. This triple operates between the acquisition node (typed as a transfer of custody as well as an acquisition) and the actor from which the object was transferred. Other forms of transfer exist likeP24_transferred_ownership_through (rather than ‘from’). The semantics are different and therefore there will be different forms of acquisition mapping. We call this different constructs
In some cases details of where an object was originally found are known and recorded.  The find itself is an event at which the object was present.
  • P12i_was_present_at
  • EX_Discovery (BM specialisation)
CRM Mapping NoteP12i_was_present_at is the inverse of the property P12_occurred_in_the_presence_of which is used in the domain of E5_Event. The Museum has created a sub-class of E5_Event called EX_Discovery to describe the event of discovery of an object. If the CRM doesn’t have a class that describes your entity fully then you can usually create a sub-class of an existing CRM class. The BM has limited the number of class specialisations to the absolute minimum and instead made use of typing by vocabularies.
Further investigation of the object will often provide more information about how the object was created or produced in the first place. Like an acquisition or a find, a production is an event with a range of useful information. For example, the technique used to produce the object. The BM records the broad production types to support precise searching.
  • P108i_was_produced_by
  • P32_used_general_technique
  • E12_Production
  • E55_Type
CRM Mapping NoteP108i_was_produced_by provides the initial relationship between the collection item and the production event node. Therefore the domain must be the classes that describe an object, in this case E24_Physical_Man-Made_Thing which clearly denotes that this is an artificial thing that has been produced (and encompasses E22_Man_Made_object).P32_used_general_technique works with activities (E7_Activity being the domain) and production is indeed an activity because it is a sub-class of E11_Modification which is, in turn, a sub-class of E7_Activity. The British Museum then uses a thesaurus of technique terms in a SKOS format – the term itself is typed as E55_Type – which is the range of P32_used_general_technique.
The period in which production falls within is a key piece of information and may be accompanied by a date or specific time period.
  • P10_falls_within
  • P4_has_time_span
  • E52_Time_Span
CRM Mapping NoteP10_falls_within has both a domain and a range of E4_Period.  An example of a period is an E7_Event and all activities are therefore within the sub-classes of E4_Period, including say, an E8_Acquisition. Therefore in the mapping we can use P10_falls_within with any event object but must ensure that the triple subject comes within the realms of E4_Period node before defining the details of the period. This is done by creating an appropriate date URI, typed as a time span, to hold the date information.
People and places are commonly associated with production information. The people, groups or artistic schools who carried out the production of the object and the locations where production took place (which might be various) are important material aspects of the object.
  • P14_carried_out_by
  • P7_took_place_at
  • E21_Person
  • E39_Actor
  • E74_Group
  • E53_Place
CRM Mapping Note The most frequent use of the generalisation P14_carried_out_by in the Museum’s mapping is in production, and in particular, the relationship with people and places (using P7_took_place_at).Unsurprisingly P14_carried_out_by has a domain of E7_Activity (as production is also an event) and a range of E39_Actor.  P7 P14_carried_out_by has the same domain but the range is, of course, E53_Place.
Other people indirectly involved in the process might be those who influenced the production (another artist for example) or were the motivation for it, like a coin minted for an emperor. Otherwise an object might have been made for an event.
  • P15_was_influenced_by
  • P17_was_motivated_by
  • E21_Person
  • E39_Actor
  • E74_Group
  • E53_Place
  • E5_Event
CRM Mapping NoteP15_was_influenced_by and P17_was_motivated_by are again talking about the domain of ‘activities’. In the BM mapping these might be people or groups who have influenced or motivated (although P15 and P17 have a range which includes all CRM entities (E1_CRM_Entity) a production process. These are generalisations that are very open to reification with internal vocabularies. For example, the motivation of something through an authority, like an emperor, is useful information to add to the mapping.
If the object has an inscription on it then this is a type of visual item which might directly refer to a subject, place, person or group.
  • P65_shows_visual_item
  • P67_refers
  • E65_Creation
  • E34_Inscription
CRM Mapping NoteP65_shows_visual_item refers to  E24_Physical_Man-Made_Thing and has a range of E36_Visual_Item. In the mapping this is a node created for the purpose of using P67_refers (which has the domain of E89_Propositional_Object – a class containing immaterial objects like information objects (E73_Information_Object). An Information object includes a visual item (E36_Visual_Item) that itself contains a class for ‘markings’ (E37_Mark) and this has the sub-class for inscriptions (E34_Inscription) – and therefore is a valid range for P65_shows_visual_item.
An inscription is a type of production process (a creation) in its own right and therefore we may record specific production information against it including the person who carried out the inscription (who may be different from the object producer).
  • P14_carried_out_by
  • E21_Person
CRM Mapping NoteSee production above
The object may directly depict or visually represent as an image place, person or group and so on.
  • P62_depicts
  • P65_shows_visual_item
  • P138_represents
  • E38_Image
  • E21_Person
  • E39_Actor
  • E53_Place
CRM Mapping NoteDepiction is a short cut for a visual image (picture) representation. P65_shows_visual_item can refer to a picture (image) on the object itself (the domain and range above). Instead of P67_refers a pictorial representation uses the property P138_represents. This operates with a (you guessed it) an E36_Visual_Item and any other CRM entity (E1_CRM_Entity).
The object may also have more indirect associations to these things and carry references. The object may also have conceptual subjects (information object) which can tell people more about the object and its meaning.
  • P128_carries P67_refers
  • P129_is_about
  • E73_Information_Object
CRM Mapping NoteAn object may refer to something in a more indirect conceptual way. The Museum has invented a URI node called ‘concept’ and typed it as an E73_Information_Object. P128_carries can use this node as its range (with a domain of E24_Physical_Man-Made_thing – the physical object) and this provides the basis for a reference to an E21_Actor (like an ethnic group – e.g. this visual design alludes to a particular culture)  or an recorded event. P67_refers has the domain E89_Propositional_Object (including an E28_Conceptual_Object) and can have a range of any concept.  P128 has the range E90_Symbolic_Object (including the “aggregation of symbols”) covering subject terms.
Other more technical information is also recorded against the object like the material it was made out of (or that it consists of).
  • P45_consists_of
  • E57_Material
CRM Mapping NoteThis one is straight forward. P45_consists_of has a domain and refers to an E19_Physical_Thing and has the range of E57_Material. This would be a thesauri identifier leading to a SKOS schema for the term.
Dimension measurements are taken for the object and these are stored as values and units.
  • P43_has_dimensionP90_has_value
  • P91_has_unit
  • E54_Dimension
CRM Mapping NoteP43_has_dimension operates over E70_Thing (our object again) and has the range of an E54_Dimension. E54 provides the domain for P90_has_value which has the range of E60_Number.   P91_has_unit has the same domain but the range E58_Measurement.
Objects will be documented in bibliographic material which is created through publishing and authoring. This includes journals (a component of a series) and references that are part (components) of a collection.
  • P70i_is_documented_in
  • P94i_was_created_by
  • P148i_is_a_component_of
  • E31_Document
  • EX_Bibliographic _Series
  • E65_Creation
CRM Mapping NoteNot surprisingly P70_documents (P70i _ is_documented_in) refers to E31_Document and can apply to any CRM entity. P94_has_created has the domain E65_Creation and applies to E28_Conceptual_Object. In this case the concept is ‘Authoring’. P148i refers to a document being part of a E89_Propositional_Object
People might be identified with different names (or appellations) and the Museum records people who belong to or were members of a school (of art for example). These are people of different (belong to) national groups.
  • P131_is_identified_by
  • P107i_is_current_or_former_member_of
  • E39_Actor
  • E21_Person
  • E74_Group
CRM Mapping NoteP131 is used specifically to identify the name of an E39_Actor with the range E82_Actor_Appellation. P107 deals with members of a group with the domain being E74_Group and a range of E39_Actor.

The Costs of Cultural Heritage Data Services: The CIDOC CRM or Aggregator formats?

Martin Doerr (Research Director at the Information Systems Laboratory and Head of the Centre for Cultural Informatics, FORTH)
Dominic Oldman (Principal Investigator of ResearchSpace, Deputy Head IS, British Museum

June 2013

Many larger cultural institutions are gradually increasing their engagement with the Internet and contributing to the growing provision of integrated and collaborative data services. This occurs in parallel with the upcoming so-called aggregation services, seemingly striving to achieve the same goal. At a closer look however, there are quite fundamental differences that produce very different outcomes.

Traditional knowledge production occurred in an author’s private space or a lab with local records, notwithstanding field research. This space or lab may be part of an institution such as a museum. The author (scholar or scientist) would publish results and by making content accessible it would then be collected by libraries. The author ultimately knows how to interpret the statements in his/her publication and relate that to the reality referred in the publication, from the field, from a lab or from a collection. Many authors are also curators of knowledge and things.

The librarian would not know this context, would not be a specialist of the respective field, and therefore must not alter in any way the content. However, (s)he would integrate the literature under common dominant generic concepts and references, such as “Shakespeare studies”, and preserve the content.

In the current cultural-historical knowledge life-cycle, we may distinguish three levels of stewardship of knowledge: (1) the curator or academic, (2) the disciplinary institution (such as the Smithsonian, the British Museum or smaller cultural heritage bodies) (3) the discipline-neutral aggregator (such as Europeana or IMLS-DCC).  Level (2) typically acts as “provider” to the “aggregator”.

Obviously, the highest level can make the least assumptions about common concepts, in particular a data model, in order to integrate content. Therefore, it can offer services only for very general relationships in the provided content. On the other side, questions needing such a global level of knowledge will be equally generic. Therefore, the challenge is NOT to find the most common fields in the provider schemata (“core fields”), but the most relevant generalizations (such as “refers to”, avoiding overgeneralizations (such as “has date”). These generalizations are for accessing content, but should NOT be confused with the demand of documenting knowledge. At that level some dozens of generic properties may be effective.

The preoccupation of providers and aggregators with a common set of fields has the result that they only support rudimentary connections between the datasets they collect and as a result reduce the ability for researchers to determine where the most relevant knowledge may be located. As with the library, the aggregator’s infrastructure can only support views of the data (search interfaces) that reflect their own limited knowledge because the data arrives with little or no context and over-generalized cross-correlations (“see also”, “relation”, ”coverage”).

The common aggregation process itself strips context away from the data creating silos within the aggregator’s repository. Without adequate contextual information searching becomes increasingly inadequate the larger the aggregation becomes. This limitation is passed on through any Application Programming Interfaces that the aggregator offers. Aggregators slowly begin to understand that metadata is an important form of content, and not only a means to query according to current technical constraints. Some aggregators, such as the German Digital Library, store and return rich “original metadata” received from providers and derive indexing data at the aggregator side, rather than asking providers to strip down their data.

The institution actually curating content must document it so that it will not only be found, but understood in the future. It therefore needs an adequate [1] representation of the context objects come from and their meaning. This representation already has some disciplinary focus, and ultimately allows for integrating the more specialized author knowledge or lab data. For instance, chronological data curves from a carbon dating (C14) lab should be integrated at a museum level (2) by exact reference to the excavation event and records, but on an aggregator level (3) may be described just by a creation date.

A current practice of provider institutions to manually normalize their data with millions of pounds, dollars or euros directly to aggregator formats appears to be an unbelievable waste of money and knowledge. The cost of doing so exceeds by far the cost of the software of whatever sophistication. It appears much more prudent to normalize data at an institutional level to an adequate representation, from which the generic properties of a global aggregator service can be produced automatically, rather than producing, in advance of the aggregation services, another huge set of simplified data for manual integration.

This is precisely the relationship between the CRM and aggregation formats like the EDM. The EDM is the minimal common generalization at the aggregator level, a form to index data at a first level. The CRM is a container, open for specialization, for data about cultural-historical contexts and objects. The CRM is not a format prescription. Concepts of the CRM are used as needed when respective data appear at the provider side. There is no notion of any mandatory field. Each department can select what it regards as mandatory for its own purpose, and even specialize further, without losing the capacity of consistent global querying by CRM concepts. CRM data can automatically be transformed to other data formats, but even quite complex data in a CRM compatible form can effectively be queried by quite simple terms [3].

Similarly, institutions may revise their data formats such that the more generic CRM concepts can automatically be produced from them, i.e., make their formats specializations of the CRM to the degree this is needed for more global questions. For instance, the features of the detailed curve of a C14 measurement are not a subject for a query at an institutional level. Researchers would rather query to retrieve the curve as a whole.

The British Museum understands this fundamental distinction and therefore understands the different risks and costs. This means both the long term financial costs of providing data services, important to organizations with scarce resources, but also the cost to cultural heritage knowledge communities and to society in general. As a consequence they publish using the CRM standard. They also realize that data in the richer CRM format is much more likely to be comprehensible in the future than in “core metadata” form.

Summarizing, we regard publishing and providing information in a CRM compatible form [2] at the institutional or disciplinary level to be much more effective in terms of research utility (and the benefits of this research to other educational and engagement activities). The long-term costs are reduced even with further specializations of such a form, and the costs of secondary transformation algorithms to aggregation formats like EDM are marginal.

Dominic Oldman


[1]  Smith B. Ontology. The Blackwell Guide to the Philosophy of Computing and Information., pages 155–166, 2003. Floridi, L. (ed). Oxford: Blackwell.

[2] Official Version of the CIDOC CRM, the version 5.0.4 of the reference document.
Nick Crofts, Martin Doerr, Tony Gill, Stephen Stead, Matthew Stiff (editors), Definition of the CIDOC Conceptual Reference Model, December 2011.
Available: doc file (3.64 Mb), pdf file (1.56 Mb)

[3] Tzompanaki, K., & Doerr, M. (2012). A New Framework For Querying Semantic NetworksMuseums and the Web 2012: the international conference for culture and heritage on-line. April 11-14, San Diego, CA, USA