News SWORD v2

Discussing the scope of SWORDv2

As part of the SWORD v2 developments, the Technical Advisory Panel have been busy discussing many aspects of the proposed new version of the standard.  This has been a lively and engaging process.  If you would like to read these discussions and contribute any feedback, you would be very welcome!

One particularly interesting thread came from the project’s technical lead.  The message concerns the scope of SWORD v2 (what areas it should contribute to, ans which it should not):

Hi Folks,

There’s been some great discussion on the list this past week or two, and I thought it might be time for a summary of what looks to me to be a key sticking point: the scope of sword.

There are two distinct sides to this argument as it’s been articulated on this list:

a) That we should adopt the approach of content management API like CMIS or more likely GData

b) That SWORD should be not say anything about what happens to the content once it is sent to the server.

In general, I am against (a) for a number of reasons.  First, I am concerned that the idioms that are associated with GData are not /necessarily/ appropriate.  The hierarchical file system is a common idiom but an idiom nonetheless, and it wouldn’t be SWORD’s place to therefore build itself over the top of it.  CMIS I have a harder time refuting or accepting, so am open to persuasion either way.  Secondly, I don’t see a reason to re-create a content management standard, since they already exist.  SWORD should, instead, provide support for the things that these standards don’t provide for our sector/use cases, while not preventing the use of them.

From a purists perspective of (b) the main thing that SWORD offers, then, is support for Packaging (with a capital P).  This is a valuable addition to the community since it is both common in our sector and expressly not covered at least by GData and I believe not by CMIS (though again, open to correction).  The support for packaging, though, needs to extend to a full CRUD implementation of AtomPub, which is a large part of what the profile attempts to do.  I think we have had some good technical discussion which which will allow the next draft of the profile to do better at that.

In the mean time, there are some grey area parts of the profile, particularly In Progress and Suppress Metadata which are more content management than they are deposit.  I, personally, think these are important; they are light touch, the profile doesn’t mandate the server to obey them, and they help fulfill known use cases.  Likewise the Statement could be viewed as more content management than not, although we have tried to pitch that as more an informational resource rather than an operational one (i.e. read but not write).

What I’m going to suggest for the next draft is as follows:  we’ll put some more time into analysing the appropriate ways of updating and overwriting deposit packages using the feedback on this list.  And we will extend the profile to cover how you would use the SWORD headers to be used in content management operations /if that’s what your implementation wants/ (e.g. how you might use Suppress Metadata or In Progress with GData).  There will, obviously, be plenty of time for comment.

In conclusion: we must constrain the scope of sword to something which doesn’t tread on anyone’s toes and is of value to the community.  Too far one way or the other and we’ll either be superseded or of no value.




Decisions regarding the challenges of SWORDv2

Following some great recent discussions by the SWORD Technical Advisory Panel, we’re pleased to announce a few decisions that have been made regarding some of the details for the new version 2 of SWORD.  The full email announcing the decisions is shown below, or can be seen in the list archives of the technical advisory group:

The decisions came about from discussions within the group over the past few weeks.  They relate to the following questions:

  1. Whether the Statement should be embedded in the Deposit Receipt or be a separate document referenced in an atom:link element: In order to allow SWORD v2 to move from a fire-and-forget methodology to one where a SWORD client can interact with the deposit through what we’re calling the ‘deposit lifecycle’, some form of feedback is required where the client can ask the server for details of what has happened to the deposited item(s).  The proposal is to support this via the provision of a ‘statement’.  Think of it a bit like a bank account statement: You can see what has gone into the bank account (deposits), what might have have happened to the deposit (e.g. interest being added), and full details of of the item.The question here, was whether a copy of the statemnet should be given to the SWORD client when it makes the deposit(s), or if the client should ask for a copy of the statement whenever it wants it.
  2. Whether to use OAI-ORE for the Statement format or an Atom Feed (as per CMIS and GData): There is a decision to be made as to how the statement should be formatted.  Should it be formatted as an OAI-ORE resource map, or using an Atom Feed.  There are pros and cons for each method.
  3. How the client and server should negotiate over the format of the content returned by the edit-media link (EM-URI): If multiple formats of statement are allowed, how should the client and server come to an agreement as to which is the best format to send, based upon a combination of the servers capabilities and the clients preferences.  This problem is known as content negotiation.

The full email below outlines these problems, and the decisions made.  The next job is to now attempt the implementation of the standard, and based on the experiences of the developers and initial users, the standard will likely become refined further.

Dear All,

Thanks for your extensive feedback on the various issues that we have been discussing on this list, it has been really valuable for the project team to get this input.  We have, we think, identified 3 particular issues of contention:

1/ Whether the Statement should be embedded in the Deposit Receipt or be a separate document referenced in an atom:link element

2/ Whether to use OAI-ORE for the Statement format or an Atom Feed (as per CMIS and GData)

3/ How the client and server should negotiate over the format of the content returned by the edit-media link (EM-URI)

The project team has gone through each of these issues carefully, and attempted to extract the simplest solutions but with a view to keeping the SWORD 2.0 specification quite open at this stage, so that community best practices can actually inform the standard itself in the long run.

Therefore, we’re proposing the following approaches to these issues:

1/ Whether the Statement should be embedded in the Deposit Receipt or be a separate document referenced in an atom:link element

If the Statement is to be embedded in the Deposit Receipt, then it needs really to be in OAI-ORE form, for the purposes of being clear foreign markup.  Nonetheless, bearing in mind that there is a question as to whether the Statement should be an Atom Feed, it is clear that this solution will not be adequate by itself.  We therefore propose that the standard provided to the project’s funded developers to code against says that an OAI-ORE serialisation MAY be embedded in the Deposit Receipt (the Deposit Receipt will not be required to meet the OAI-ORE spec for being a resource map itself).

Alongside – or instead – of this, there MAY be one or more atom:link elements in the Deposit Receipt which link to an external Statement. These atom:link elements can specify their type attribute to say whether they are an application/rdf+xml or  application/atom+xml;type=feed.  It will be a requirement of the spec that there MUST be an embedded Statement or at least one separate Statement.

Therefore, you may see a Deposit Receipt like:

  <atom:link rel="" type="application/rdf+xml" href="http://....."/>
    <!-- ORE statement goes here -->

2/ Whether to use OAI-ORE for the Statement format or an Atom Feed (as per CMIS and GData)

Another good reason for the approach in (1) is that this means we can provide different Statement URIs with different type attributes.  We plan to ask developers to produce an ORE and an Atom Feed Statement format under the project funding.  So you may see a Deposit Receipt like:

  <atom:link rel="" type="application/rdf+xml" href="http://....."/>
  <atom:link rel="" type="application/atom+xml;type=feed"href="http://....."/>
      <!-- ORE statement goes here -->

The combination of approaches in (1) and (2) may seem woolly or indecisive, but we believe that we can’t determine in advance which of these approaches is better, and that it should be up to the community of users and implementers to decide which approach works best based on actual usage of the developed software.  Therefore, while the burden of implementation is placed on the funded portion of the project, we expect community driven implementations/usages to favour one approach over another (possibly taking into account things like compatibility with GData and CMIS, or preferring the more semantic web approach of ORE). We can then use this information later in deriving a SWORD spec which is based on best practices.

3/ How the client and server should negotiate over the format of the content returned by the edit-media link (EM-URI)

The Content Negotiation issue arises from the fact that AtomPub requires at most one edit-media URI with a given type to be available in the Atom Entry (Deposit Receipt).  Since the SWORD server may contain multiple files rather than the one file that AtomPub assumes, what this EM-URI returns under GET is unclear.  We initially considered 2 approaches:

a/    A separate HTTP header like Accept-Packaging to allow content negotiation on a package format
b/    A separate HTTP header like Accept-Media-Features to allow general content negotaiton on feature sets

As we discussed, both of these have pros and cons, and none of the approaches to doing this are marked by any best practices, which makes the project team unwilling to commit to anything too complex or substantial, at a risk to the simplicity and overall success of SWORD. Instead we are suggesting adopting a much simpler approach:

The Deposit Receipt can contain already contain a sword:package element (as per SWORD 1.3), and SWORD 2 plans to allow an arbitrary number of such elements.  These elements will describe the packaging formats supported by the server, so the client will know in advance what the capabilities of the server are.  Therefore, instead of engaging in a content negotiation process, the client will just specify a separate HTTP header indicating what package format should be returned.  Whether this header re-uses the Packaging header used during deposit or specifies a new header has yet to be decided.

Hopefully these approaches make sense to the group.  We are interested in how you think these will go down both during the project and beyond in the community, and if there are any obvious problems with what we’re proposing here as the way forward for SWORD.

All the best,

(On-Behalf-Of the SWORD project team)


SWORD Technical Advisory Panel

As part of the development of SWORD v2 a Technical Advisory Panel has been formed.  This panel consists of experts from across the SWORD and general Digital Repository domains, along with experts in related fields. The purpose of the panel is to ensure that the standard develops in a way that meets the needs of its user community, that it exhibits best-practice in the area of Internet standards, that developers are able to work with it, and that it tries to be generic enough to allow interoperability with other types of systems whilst maintaining its focus on repository resource deposit.  The panel consists of people from universities, national libraries, research funders, commercial companies, developers, repository domain experts, and repository managers.

The following people have generously donated their time and expertise to be on this panel:

  • Julie Allinson (The University of York)
  • Tim Brody (University of Southampton)
  • Pablo de Castro (SONEX / Universidad Carlos III de Madrid)
  • Charles Duncan (Intrallect)
  • Reinhard Engels (Harvard University Library)
  • David Flanders (JISC)
  • John Fearns (Symplectic)
  • Kathi Fletcher (Shuttleworth Foundation Fellow)
  • Steve Hitchcock (University of Southampton)
  • Jason Hoyt (Mendeley)
  • Bill Ingram (University of Illinois at Urbana-Champaign)
  • Richard Jones (SWORD Technical Lead)
  • Graham Klyne (University of Oxford)
  • Stuart Lewis (SWORD Community Manager / The University of Auckland Library)
  • Mark MacGillivray (Developer)
  • Andrea Marchitelli (CILEA)
  • Alistair Miles (The Wellcome Trust Centre for Human Genetics)
  • Ben O’Steen (Developer)
  • Glen Robson (National Library of Wales)
  • Richard Rodgers (MIT)
  • Robert Sanderson (LANL)
  • Peter Sefton (Australian Digital Futures Institute, University of Southern Queensland)
  • Nick Sheppard (UKCoRR / Leeds Metropolitan)
  • Eddie Shin (MediaShelf)
  • Alec Smecher (Public Knowledge Project)
  • Adrian Stevenson (UKOLN)
  • Ian Stuart (Repository Junction / EDINA)
  • Ed Summers (Library of Congress)
  • David Tarrant (University of Southampton)
  • Robin Taylor (The University of Edinburgh)
  • Graham Triggs (BioMed Central)
  • Alex Wade (Microsoft External Research)
  • Paul Walk (UKOLN)
  • Simeon Warner (arXiv)
  • Scott Wilson (CETIS)
  • Nathan Yergler (Creative Commons)

In the interests of openness, the group discussions are being archived in an open mail archive: