SWORD V1 Case Study – arXiv

Authors: Simeon Warner and Thorsten Schwander

Abstract

The arXiv e-print archive (http://arxiv.org/) has for many years used an ad-hoc automated upload interface to accept submissions from proxies (overlay journals, and other journals posting open-access copies of articles to arXiv) and from our remote submission site in France (http://hal.archives-ouvertes.fr/ ,HAL). In collaboration with Microsoft, we are working to produce a new interface to allow automated upload of conference articles from the Conference Management Toolkit site (http://cmt.research.microsoft.com/cmt/ ,CMT) and Microsoft eJournals. We are implementing the SWORD profile of the Atom Publishing Protocol at arXiv to facilitate these uploads.

Introduction

The arXiv e-print archive was started in 1991 and has grown to serve the physics, mathematics, computer science, quantitative biology and statistics communities. It has more than 450,000 full text articles, many with multiple revisions, and accepted more than 60,000 new submissions in 2007. While most of these submissions are from individual scientists and are made via an interactive web interface, there are a few other ingest routes:

  1. Our French mirror site (http://fr.arxiv.org/) is operated in conjunction with the Hyper Articles en Ligne (HAL) repository of the CCSD in Lyon, France. Submissions made to HAL that are in appropriate subject areas are automatically uploaded to arXiv.
  2. Several journals deposit copies of accepted papers in arXiv. In some cases arXiv is a secondary distribution and archiving mechanism (e.g. journals of the IMS such as Annals of Statistics (http://www.imstat.org/aos/) where the publisher archives copies on arXiv (see: http://www.imstat.org/publications/copyright.htm). In others, the primary distribution mechanism or original publication venue is arXiv although local copies may also be stored (see for example, earlier issues of Advances in Mathematical Physics (ATMP), http://www.intlpress.com/ATMP/archive/vol07-1.htm). In the latter case we can characterize the journal as an overlay on arXiv.
  3. In a few cases a local report series, working paper series, or conference proceedings is copied to arXiv by a single person/authority on behalf of the individual authors. We refer to this as proxy submission.

These ingest routes are facilitated by a scripted submission that has evolved from direct use of arXiv’s human interface (simulating HTTP POST form data, scraping HTML results) to a HTTP POST form data submission with custom XML responses. We have supplied a Perl library to proxy submitters to ease local implementation.

We envisage an improved upload API for arXiv not only as improving the ingest routes described above, but also as an enabling technology in the following three areas:

  1. Conference proceedings are an important communication and publication venue in some disciplines, notably in computer science. Thus to serve the computer science community well arXiv should provide facilities to support the convenient upload of conference proceedings. (This situation is quite different from physics and mathematics where conference papers are often somewhat second-class versions of material published in journals.) In partnership with Microsoft and the Conference Management Toolkit (CMT) team we wish to develop an automated and straightforward way for a conference organizer to upload the articles from a conference en masse to arXiv for long term availability and preservation. We anticipate making the interface used available to other common conference management systems.
  2. The evolution and proliferation of Institutional Repositories (IR) means that there are often two or more places that an author may wish to store a copy of their work: the appropriate subject repository and their local IR (maybe several local IRs for multiple authored papers). It clearly makes sense to automate any transfer between repositories so that the authors/submitters do not have to waste time re-entering metadata etc..
  3. In 1991 arXiv was conceived as a service or system that individual humans interacted with directly. Since then the web has arrived and we have become used to searching many sites/resources via popular search engines such as Google. There have been a number of other developments such as the OAI-PMH, RSS etc. which automate extraction of information from arXiv. We will no doubt see continued development through work such as OAI-ORE that will allow services and agents to interact with the content of repositories in additional ways. An obvious question in this context is how services and agents might improve interaction with the ingest process too? Imagine a plug-in for your word processor and bibliography tool that connects to appropriate repositories so that `Clippy’ might pop up a notice about a relevant paper you should read — based on the content and citations of the paper you are working on — instead of asking whether you want help writing a letter. The same tool might also assist in the deposit, along with accurate lineage and citation data, to arXiv or another repository via a standard interface. A well defined and standard repository ingest interface will be necessary to achieve this.

This report is a case study of efforts to implement a SWORD ingest interface for the first case above: conference proceedings upload from CMT and eJournals. We imagine and intend that the same interface will be appropriate for the other cases too, hopefully with very little change or customization.

Starting in Summer 2007, arXiv developed an API for automated clients/agents to query the repository content (see http://arxiv.org/help/api). This API is also based around the Atom Syndication Format so the chance of reusing and combining expertise added to the appeal of SWORD.

Implementation of SWORD at arXiv

We have followed the SWORD specification SWORD_APP_Profile_1.2 for level 1 compliance. As such arXiv supports facilities of SWORD such as MD5 checksums, X-On-Behalf-Of, X-Format-Namespace, verbose and noOp. In the eight sub-sections that follow I describe and discuss various choices that we have made in implementing SWORD at arXiv. I highlight areas where there has been some difficulty or where architectural choices were necessary. The interface as described is currently in testing both by ourselves and by the CMT and eJournals teams. We expect to deploy it to accept real article uploads from CMT and eJournals in the next few weeks. When that is working we will review our choices and contact others who may wish to use the interface.

Authentication, authorization, and security

For security and authentication, the SWORD specification (see SWORD_APP_Profile_1.2#14._Securing_the_Atom_Publishing_Protocol) defers to the APP requirements for support of `At a minimum, client and server implementations MUST be capable of being configured to use HTTP Basic Authentication RFC2617 in conjunction with a TLS connection as specified by RFC2818.’

For the implementation at arXiv we opted for implementation of HTTP Basic Authentication over TLS/SSL using Apache configured with mod_ssl and the OpenSSL library. We had to work through some issues with the acceptance of our self-signed certificates by the libraries used on the client side by our collaborators. However, since these issues have been resolved the process has worked well and cleanly separates these security aspects from the workings of the ingest API. The HTTP Basic Authentication is hooked into our user database and no modifications to that were required to support the SWORD interface. The user database already has fields to indicate whether a particular (authenticated) user is authorized to make proxy submissions for example.

Upload granularity

SWORD_formats lists METS as the common packaging format for Fedora, DSpace and Eprints software, and IMS CP for the Intrallect IntraLibrary. Because the arXiv interface has grown out of a human mediated web upload, it accepts one or more files that may be individual files or tar or zip archives. All the uploaded files are unpacked and combined before validation and any ingest processing are done. While we could extend arXiv to support METS, our initial thought was to use zip for the SWORD uploads. However, on consultation with the CMT team there was concern over possible large sizes of uploads, concomitant long waits for replies, and associated network problems (CMT team members quoted experience with `articles and supplementary material of up to 5 GB though more usually around 100 MB’). We thus decided to allow files to be uploaded separately (which is what CMT will do), but to also support multiple files packaged as a zip file. We feel this approach is perhaps closer to the feeling and intent of the Atom Publishing Protocol than the examples given in the SWORD document. The final upload is a wrapper document containing the metadata and link elements of rel=”related” to all of the already uploaded media entries and it initiates the `article’ upload into arXiv.

In APP/SWORD terms, each file and the wrapper upload are separate uploads. The file uploads and the wrapper upload are differentiated by the media type of the upload, and the treatment indicated in the response. The client must record the atom:entry:link[@rel=”edit”] values returned for each file uploaded and then uses these when constructing the wrapper. We could allow browsing/checking of media entries uploaded via the media links returned, but this has not been implemented.

The URI returned in the HTTP Location header as part of any deposit action gives access to the media link entry associated with the media deposit, the atom entry response to a wrapper deposit, or an atom entry conveying error information.

If desired, the MD5 checksum of the payload of each deposit can be transmitted in the SWORD HTTP Header extension Content-MD5 RFC2616 for integrity check of the transferred payload. If this header is present, arXiv will validate the checksum and issue the appropriate error should there be a mismatch.

Finally, when all media entries have been uploaded a special wrapper upload is made with media type

  • application/atom+xml;type=entry

The media links for the component files of the article are enumerated in the metadata as

  • atom:entry:link[@rel=”related”]

elements. When the wrapper is received then the ownership of each media entry is checked (to match the submitter of the wrapper), and, if specified, the MIME type is checked against that stored for the media entry.

If all the media entries and the metadata are OK then they are combined into a submission package and piped into arXiv’s submission queue. As implemented now, it is only at this stage that any zip files would be unpacked.

Choice of metadata format

arXiv uses its own metadata format (since 1991 with minor extensions) rather than any standard such as Dublin Core. The Atom Syndication Format includes elements within a feed that bear striking resemblance to metadata typically associated with a scholarly article (title, author, summary…), and these elements map quite well onto the internal arXiv metadata set. Because of this and that the rest of SWORD is based around Atom we decided to adopt Atom as our metadata exchange format for the SWORD interface. We added a few arXiv extensions (which is permitted in Atom) to allow us to express our complete metadata in an Atom entry document.

There is a slight awkwardness in the semantics of authorship. Atom has two elements associated with authorship: atom:author and atom:contributor. We have decided to use atom:author to convey the submitter information (see section 3.4) and atom:contributor to convey authorship information for the article. The standard Atom entries used are:

  • atom:title – The title of the article (mandatory).
  • atom:contributor – The authors of the article (mandatory).
  • atom:summary – The abstract of the article (mandatory).
  • atom:category – The `categories’ the article belongs to. This maps to three classes of category in arXiv, each with a different namespace:
    • arXiv’s internal subject categories (at least one mandatory),
    • the ACM Computing Classification System (ACM) (optional), and the
    • Mathematics Subject Classification (MSC) (optional).

The following elements are arXiv specific extensions (in the arXiv namespace) and are the same set of extensions used for the arXiv API:

  • arxiv:primary_category – The primary arXiv category (mandatory).
  • arxiv:comment – The author’s comment if present (optional).
  • arxiv:affiliation – The author’s affiliation included as a sub-element of atom:author (optional).
  • arxiv:journal_ref – A bibliographic journal reference (optional).
  • arxiv:doi – A URL to the DOI resolver for an external resource (optional).

There is a restriction that the arxiv:primary_category must be permitted as a primary category of the particular collection posted to. The list of secondary categories can be any from the list of available categories for the collection (which happens to be the same for all collections in arXiv).

Following the SWORD specification, we accept the X-Format-Namespace HTTP header to indicate the namespace of the metadata format. Since the format is Atom plus extensions and the metadata is a valid Atom entry document we denote the Atom format using the Atom namespace URI (http:/http://www.w3.org/2005/Atom/).

Submission on behalf of a third party

At first glance, the SWORD notion of submission `on behalf of’ (see SWORD_APP_Profile_1.2#5._Protocol_Operations) matches very well with arXiv’s notion of a `proxy submitter’. However, when it came to implementation with the CMT system we realized that in some sense we have an extra level of proxying: there is the CMT system which is a proxy on behalf of the conference organizer, and the conference organizer is a proxy on behalf of the author of the paper. This is more complicated than we wish to record or administer so we decided to insist that each conference or conference organizer create an account with arXiv and it is that account through which the submissions will be made. The CMT system will not have an identity within arXiv, it will simply provide the services. Thus, the CMT system will authenticate with arXiv as the conference account and then submit X-On-Behalf-Of the real corresponding author of the article (and similarly for the eJournals system). This follows the SWORD model cleanly.

Interaction sequence

The first request of a client interacting with arXiv’s SWORD interface is expected to be to get the service document (/sword-app/servicedocument). The service document includes the usual SWORD information indicating level of compliance and some other features available:

...
  <sword:level>1</sword:level>
  <sword:verbose>true</sword:verbose>
  <sword:noOp>true</sword:noOp>
...

The service document also includes the list of collections that the authenticated user may submit to. The collections listed will vary by user as we have an endorsement system that permits certain users access only to certain subject areas. An excerpt for the Statistics collection (subject area) of arXiv might be:

 ...
    <collection href="https://arXiv.org/sword-app/stat-collection">
      <atom:title>The Statistics archive</atom:title>
      <accept>application/atom+xml;type=entry</accept>
      <accept>application/zip</accept>
      <accept>application/xml</accept>
      <accept>application/pdf</accept>
      <accept>application/postscript</accept>
      <accept>application/vnd.openxmlformats-officedocument.wordprocessingml.document</accept>
      <accept>text/xml</accept>
      <accept>image/jpeg</accept>
      <accept>image/jpg</accept>
      <accept>image/png</accept>
      <accept>image/gif</accept>
      <sword:formatNamespace>http://www.w3.org/2005/Atom</sword:formatNamespace>
      <sword:collectionPolicy>Open Access</sword:collectionPolicy>
      <dcterms:abstract>The Statistics e-print archive at http://arXiv.org/</dcterms:abstract>
      <sword:mediation>true</sword:mediation>
      <sword:treatment>will be posted pending moderator approval</sword:treatment>
      <categories fixed="yes">
        <atom:category term="http://arxiv.org/terms/arXiv/stat.AP"
                       scheme="http://arxiv.org/terms/arXiv/"
                       label="Statistics - Applications"/>
        <atom:category term="http://arxiv.org/terms/arXiv/stat.CO"
                       scheme="http://arxiv.org/terms/arXiv/"
                       label="Statistics - Computation"/>
 ...

where we also see the set of categories permitted for submissions to the Statistics collection. This is very clean match to the arXiv model. However, we have a requirement that all articles have a primary category from a somewhat more restricted set. We have had to add additional information to convey this to a client that understands arXiv-specific additions:

...
  <arxiv:primary_category fixed="yes">
    <atom:category term="http://arxiv.org/terms/arXiv/stat.AP"
                   scheme="http://arxiv.org/terms/arXiv/"
                   label="Statistics - Applications"/>
    <atom:category term="http://arxiv.org/terms/arXiv/stat.CO"
                   scheme="http://arxiv.org/terms/arXiv/"
                   label="Statistics - Computation"/>
    <atom:category term="http://arxiv.org/terms/arXiv/stat.ML"
                   scheme="http://arxiv.org/terms/arXiv/"
                   label="Statistics - Machine Learning"/>
    <atom:category term="http://arxiv.org/terms/arXiv/stat.ME"
                   scheme="http://arxiv.org/terms/arXiv/"
                   label="Statistics - Methodology"/>
    <atom:category term="http://arxiv.org/terms/arXiv/stat.TH"
                   scheme="http://arxiv.org/terms/arXiv/"
                   label="Statistics - Theory"/>
  </arxiv:primary_category>
...

Except for the name of the arxiv:primary_category element, this list has the same form as the atom:categories element. In the example we see that a submission to the Statistics collection must have one of the five primary categories listed. It may also belong to any of the dozens of other categories listed in the atom:categories element.

Having verified repository information the client must then upload one or more files (see #Upload_granularity, or media entries in APP terms, and then a wrapper entry (see #Choice_of_metadata_format). These uploads are made to the particular collection using the URI indicated in the collection element’s href attribute (https://arxiv.org/sword-app/stat-collection in this case).

Error reporting

The SWORD profile provides for a limited set of error codes to be reported via HTTP status codes and an additional X-Error-Code response header. The error codes detailed in the specification (see SWORD_APP_Profile_1.2#HTTP_Header_extensions include `ErrorContent’, `ErrorChecksumMistmatch’, `TargetOwnerUnknown’ etc.. There is no place for additional human readable information and no guidance for extension. There is some guidance in the Use of HTTP Response Codes) section of the SWORD specification as to which HTTP status code should be used for which error condition. There is no guidance as to how the recommended `human-readable explanations are supplied along with HTTP response codes’ should be implemented. There is room for SWORD development work here (see #Improved_method_for_error_reporting).

For arXiv interaction with CMT we determined the need for: detailed human readable messages, indication of additional error conditions, and well defined codes for each specific condition that can be machine extracted. Examples of additional error conditions include `summary element empty or missing’ (we demand an abstract), `no surname for author’, `no primary author’ (the primary or contact author must be indicated), `no contact email for primary author’, `primary category term invalid (each submission must have a primary category), `more than one primary category specified’, `no primary category specified’. All of the additional errors are indicated with HTTP status code `400 Bad Request’.

Our solution for explicit (and extensible) coding and the inclusion of detailed human-readable messages is to return an Atom entry document for all error 4xx error conditions. This is very convenient for programmers of client applications as they will then always expect to receive and parse an Atom entry document. It is also convenient for developers/testers as access via most web clients will sensibly display an Atom entry. (This is also the approach taken with the arXiv API).

While using Atom naturally deals with the requirement for a human readable description, there is no standard element suitable for including error status codes for programmatic parsing. We have thus used an extension element in the Atom entry, <arxiv:errorcode>, with numeric error codes. A typical error response looks like:

HTTP 1.1  400 BAD REQUEST
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:sword="http://purl.org/net/sword/"
       xmlns:arxiv="http://arxiv.org/schemas/atom">
  <author>
    <name>arXiv</name>
  </author>
  <title>ERROR</title>
  <id>info:arxiv/E43FA902-DF3B-11DC-978A-443549719828</id>
  <updated>2008-02-19T09:34:27Z</updated>
  <source>
    <generator uri="https://arxiv.org/sword-app/"
               version="0.9">sword@arxiv.org</generator>
  </source>
  <arxiv:errorcode>123</arxiv:errorcode>
  <summary>Here is the detailed error explanation of exactly what created code 123</summary>
  <sword:treatment>processing failed</sword:treatment>
  <sword:formatNamespace>http://www.w3.org/2005/Atom</sword:formatNamespace>
  <link rel="alternate" href="https://arxiv.org/help" type="text/html"/>
</entry>

Data types accepted

SWORD uses the service document to specify the media types (MIME types) accepted by a repository. Although not very fine-grained, this is probably a good compromise between specificity and simplicity. The arXiv SWORD interface lists the following media types as acceptable:

...
    <accept>application/xml</accept>
    <accept>application/pdf</accept>
    <accept>application/postscript</accept>
    <accept>application/vnd.openxmlformats-officedocument.wordprocessingml.document</accept>
    <accept>image/jpg</accept>
    <accept>image/png</accept>

    <accept>application/zip</accept>

    <accept>application/atom+xml;type=entry</accept>
...

Workflow

SWORD has no notion of workflow or of a callback for interaction beyond a single HTTP exchange. The submission process at arXiv, however, has several stages, some of which may involve significant delay:

  1. upload of the files and metadata (POST in the current web UI, in the old arXiv machine interface and two or more uploads in SWORD).
  2. checks on metadata (e.g. title collision to reject accidental duplicates), file size, permission to submit to different collections/categories. This stage is rolled into the metadata and wrapper upload in SWORD.
  3. processing (docx to PDF, TeX/LaTeX to PS and/or PDF) and/or validation of the uploaded files.
  4. manual checks and moderation for appropriateness leading either to rejection/removal or announcement on the usual arXiv schedule (8pm EST/EDT Tue,Wed,Thu,Fri,Sun).

Only the first two of these are implemented within the SWORD API. If these two stages are successful then the article is entered into the arXiv submission queue. It may be removed, delayed, renumbered or reclassified by administrator actions. With some proxy submission sites we have automated ways to pass messages back about changes but the default is to email the submitter. In this case it would be the conference organizer. There is room for SWORD development work here (see #Improved_integration_with_more_complex_workflows).

Ideas for further development of SWORD

Improved self description

It seems that any carefully programmed SWORD interface will impose some limits on upload file size to protect the repository system, comply with local file size constraints and/or implement local file size policies. For arXiv, where worldwide accessibility is considered important, we set fairly low submission size limits. It might be useful to be able to express these in the service document so that a client could act accordingly rather than getting an error message for an oversize attempt. Perhaps:

<sword:maxUploadSize>10000000</sword:maxUploadSize> <!-- 10MB limit -->

The implementation at arXiv assumes several files will be uploaded via multiple SWORD deposits before finally being submitted to the repository. If such a model were to be widely adopted then it may also be worthwhile to specify a total submission size if different.

Improved method for error reporting

In section 3.6 I have outlined the mechanism that we have adopted for extending error reporting. I think requiring the use of an Atom entry return instead of a weakly specified recommendation to include a human-readable message is a distinct improvement and would be even more powerful if adopted within the SWORD specification. It actually simplifies client code to use Atom, because the processing of responses can use standard parsing libraries.

The use of an arxiv:errorcode is clearly too implementation specific. If this method were adopted in the SWORD specification then it would make sense to have something like sword:errorcode and perhaps the best way for a restricted and extensible vocabulary would be to use URIs for the error codes, and have the URI as an attribute, as is customary, instead of relaying it in element content. There could then be a set of standard SWORD error codes (including the ones already defined), for example:

<sword:errorcode href="http://purl.org/net/sword/error/ChecksumMismatch" />

and particular implementation would be free to use errors not in the SWORD namespace (and thus easily recognized as non-standard), e.g.:

<sword:errorcode href="http://arxiv.org/schemas/sword/error/NoPrimaryCategory" />

If this were adopted then the HTTP header extension X-Error-Code could sensibly be dropped.

Improved integration with more complex workflows

I imagine that there are many repositories that have workflows at least as complex as arXiv’s (see #Workflow). Some way for the submitting client to indicate a callback mechanism better than email may well be broadly appropriate. Specification of such a mechanism would first require an overview of the methods that would work with a range of repository systems.

Acknowledgements

The work discussed in this report was carried out by the arXiv.org team in collaboration with the Microsoft CMT and eJournals teams. Thorsten Schwander lead the work at arXiv and coded our implementation.

Appendix A: An example of transactions for a deposit to arXiv

Assume that the service document at https://arxiv.org/sword-app/servicedocument has been retrieved if necessary and the client is to submit an article to the cs.CE category comprising one PDF file and metadata. First, the PDF must be uploaded with a POST to https://arxiv.org/sword-app/cs-collection

POST /cs-collection HTTP/1.1
Host: arxiv.org/sword-app
Authorization: Basic ZGFmSjgrc2VjZXJlrB==
Content-Lenght: 342567
Content-Type: application/pdf
Content-MD5: ec9d04f8ab478c4fe7bc84578f3e3f36
X-On-Behalf-Of: Fred Bloggs <fred@blogs.nowhere>
X-FormatNamespace: http://arxiv.org/schemas/arxiv-meta
X-Verbose: true

... binary data ...

and the response will be similar to

HTTP/1.x 201 Created
Date: Fri, 29 Feb 2008 15:07:03 GMT
Location: https://arxiv.org/sword-app/getid?id=deposit.0712/14
Content-Type: application/atom+xml;type=entry
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:sword="http://purl.org/net/sword/"
       xmlns:arxiv="http://arxiv.org/schemas/arXiv-meta">
  <author>
    <name>a conference</name>
  </author>
  <title>Accepted media deposit to arXiv</title>
  <id>info:arxiv/app/08020394</id>
  <updated>2008-02-19T05:55:00Z</updated>
  <content type="application/pdf"
           src="https://arxiv.org/sword-app/edit/08020394"/>
  <source>
    <generator uri="https://arxiv.org/sword-app/"
               version="0.9">sword@arxiv.org</generator>
  </source>
  <summary>A media deposit of type "application/pdf"
    was stored in the author&apos;s workspace</summary>
  <sword:treatment>stored in author&apos;s workspace</sword:treatment>
  <sword:noOp>false</sword:noOp>
  <sword:formatNamespace>http://arxiv.org/schemas/arXiv-meta</sword:formatNamespace>
  <arxiv:primary_category scheme="http://arxiv.org/terms/arXiv/"
    term="http://arxiv.org/terms/arXiv/cs.CE">Computer Science -
      Computational Engineering, Finance, and Science</arxiv:primary_category>
  <link rel="edit-media" href="/https://arxiv.org/sword-app/edit/08020394"/>
  <link rel="edit" href="https://arxiv.org/sword-app/edit/08020394.atom"/>
</entry>

As this article has just one file then the next upload will be of the wrapper entry with metadata. Again, a POST to https://arxiv.org/sword-app/cs-collection.

POST /cs-collection HTTP/1.1
Host: arxiv.org/sword-app
Authorization: Basic ZGFmSjgrc2VjZXJlrB==
Content-Lenght: 342567
Content-Type: application/atom+xml;type=entry
Content-MD5: e65d04f8ab478c4fe7bc86e78a2e3f31
X-On-Behalf-Of: Fred Bloggs <fred@blogs.nowhere>
X-FormatNamespace: http://arxiv.org/schemas/arxiv-meta
X-Verbose: true
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:arxiv="http://arxiv.org/schemas/atom">
  <updated>2008-01-21T12:20:21Z</updated>
  <source>
    <generator uri="http://www.microsoft.com/sword-app/"
               version="0.1">test</generator>
  </source>
  <title>Of elephants and triangles</title>
  <summary>We consider the similarities and differences between elephants and traingles.</summary>
  <id>https://arxiv.org/sword-app/edit/08020394</id>
  <author>
    <name>aconference@somewhere.edu</name>
  </author>
  <contributor>
    <name>Fred Bloggs</name>
    <email>fred@bloggs.nowhere</email>
    <arxiv:affiliation>Somewhere University</arxiv:affiliation>
  </contributor>
  <arxiv:primary_category scheme="http://arxiv.org/terms/arXiv/"
    term="http://arxiv.org/terms/arXiv/cs.CE">Computer Science -
      Computational Engineering, Finance, and Science</arxiv:primary_category>
  <category term="http://arxiv.org/terms/arXiv/cs.CE"
            scheme="http://arxiv.org/terms/arXiv/"/>
  <link rel="related" href="https://arxiv.org/sword-app/edit/08020394"
        type="application/pdf"/>
  <link rel="alternate" href="https://arxiv.org/sword-app/edit/08020394"
        type="application/pdf"/>
</entry>

and, assuming this is successful, then the reply from arXiv will be like:

HTTP/1.x 202 Accepted
Date: Fri, 29 Feb 2008 15:08:32 GMT
Location: https://arxiv.org/sword-app/getid?id=deposit.0712/14
Content-Type: application/atom+xml;type=entry
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:sword="http://purl.org/net/sword/"
       xmlns:arxiv="http://arxiv.org/schemas/arXiv-meta">
  <author>
    <name>aconference</name>
  </author>
  <title>Accepted deposit wrapper to arXiv</title>
  <id>info:arxiv/app/08020395</id>
  <updated>2008-02-19T05:55:01Z</updated>
  <content type="application/atom+xml" src="https://arxiv.org/sword-app/edit/08020395"/>
  <source>
    <generator uri="https://arxiv.org/sword-app/" version="0.9">sword@arxiv.org</generator>
  </source>
  <summary>We consider the similarities and differences between elephants and traingles.</summary>
  <sword:treatment>atom wrapper used to initiate ingestion into arXiv</sword:treatment>
  <sword:noOp>false</sword:noOp>
  <sword:formatNamespace>http://arxiv.org/schemas/arXiv-meta</sword:formatNamespace>
  <arxiv:primary_category scheme="http://arxiv.org/terms/arXiv/"
                          term="http://arxiv.org/terms/arXiv/cs.CE"/>
  <category scheme="http://arxiv.org/terms/arXiv/"
            term="http://arxiv.org/terms/arXiv/cs.CE"/>
  <link rel="edit-media" href="/https://arxiv.org/sword-app/edit/08020395"/>
  <link rel="edit" href="https://arxiv.org/sword-app/edit/08020395.atom"/>
  <link rel="alternate" href="http://arxiv.org/resolve/app/08020395"/>
</entry>

At this point the SWORD mediated communication is complete. The response Accepted deposit wrapper to arXiv and sword:treatment element atom wrapper used to initiate ingestion into arXiv indicate that all initial checks passed and the package was entered into the arXiv submission queue. The

  • atom:entry:link[@rel=alternate]

element gives the URL that may be used by the client to poll later to check status of the submission through arXiv’s workflow.