XProc 3.0: file steps

Editor's Draft

This Version:
https://ndw.github.io/steps/master/head/file/
Latest Version:
http://spec.xproc.org/master/head/file/
Editors:
Norman Walsh
Achim Berndzen
Gerrit Imsieke
Erik Siegel
Repository:
This specification on GitHub
Report an issue
Changes:
Diff against current “status quo” draft
Commits for this specification

This document is also available in these non-normative formats: XML, automatic change markup from the previous draft courtesy of DeltaXML.


Abstract

This specification describes the file related steps for XProc 3.0: An XML Pipeline Language.

Status of this Document

This document is an editor's draft that has no official standing.

This section describes the status of this document at the time of its publication. Other documents may supersede this document.

This document is derived from XProc: An XML Pipeline Language published by the W3C.


1 Introduction

This specification describes the file related XProc steps. A machine-readable description of these steps may be found in steps.xpl.

Familarity with the general nature of [XProc 3.0] steps is assumed; for background details, see [XProc 3.0 Steps].

2 p:directory-list

The p:directory-list step produces a list of the contents of a specified directory.

<p:declare-step type="p:directory-list">
     <p:output port="result" content-type="application/xml"/>
     <p:option name="path" required="true" as="xs:anyURI"/>        
     <p:option name="detailed" as="xs:boolean" select="false()"/>  
     <p:option name="max-depth" as="xs:string?" select="'1'"/>     
     <p:option name="include-filter" as="xs:string*"/>             
     <p:option name="exclude-filter" as="xs:string*"/>             
</p:declare-step>

Conformant processors must support directory paths whose scheme is file. It is implementation-defined what other schemes are supported by p:directory-list, and what the interpretation of 'directory', 'file' and 'contents' is for those schemes. It is a dynamic error (err:XC0102) if an implementation does not support directory listing for a specified scheme.

If path is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option or p:directory-list in the case of a syntactic shortcut value). It is a dynamic error (err:XC0017) if the absolute path does not identify a directory. It is a dynamic error (err:XC0012) if the contents of the directory path are not available to the step due to access restrictions in the environment in which the pipeline is run.

If the detailed option is true, the pipeline author is requesting additional information about the matching entries, see Section 2.1, “Directory list details”.

The max-depth option may contain either the string “unbounded” or a string that may be cast to a non-negative integer. An integer value of 0 means that only information about the directory that is given in the path option is returned. A max-depth of 1, which is the default, will effect that also information about the top-level directory’s immediate children will be included. For larger values of max-depth, also the content of directories will be considered recursively up to the maximum depth, and it will be included as children of the corresponding c:directory elements.

If present, the value of the include-filter or exclude-filter option must be a sequence of strings, each one representing a regular expressions as specified in [XPath and XQuery Functions and Operators 3.1], section 7.61 “Regular Expression Syntax”.

The regular expressions will be matched against an item’s file system path relative to the top-level path that was given in the path option. If the item is a directory, a trailing slash will be appended.

Examples: A file file.txt in the directory specified by path will remain file.txt, a relative path dir1/file.txt will remain dir1/file.txt, while a relative path dir1/dir2 will become dir1/dir2/ if dir2 is a directory.

Regular expressions that match a/a/b/file.txt are, for example, ^/(\w+/){2,3}.+\.txt$, a/a/b/, or /file\.[^/]+$.

If any include-filter pattern matches the slash-augmented relative path, the entry is included in the output. If a directory’s path matches the inclusion regex, the directory’s content will not automatically be included, too. They need to match, the regular expression, too. So the filter regex ^dir/ will match the directory content but ^dir/$ won’t, and as a consequence the directory’s content will not be included in the result.

If a relative path is matched by an include filter, all its ancestor directories starting from the initial directory (but not their content if not included explicitly) will be included, too.

Example 1. Sample Directory List Output for a Single File

For a file a/a/b/file.txt below the initial directory /home/jane, this output will be produced, omitting content that might be present in the intermediate directories:

<c:directory xml:base="file:///home/jane/" name="jane">
  <c:directory xml:base="a/" name="a">
    <c:directory xml:base="a/" name="a">
      <c:directory xml:base="b/" name="b">
        <c:file xml:base="file.txt" name="file.txt"/>
      </c:directory>
    </c:directory>
  </c:directory>
</c:directory>

If the exclude-filter pattern matches the slash-augmented relative path, the entry (and all of its content in case of a directory) is excluded in the output.

If both options are provided, the include filter is processed first, then the exclude filter. As a result, an item is included if it matches (at least) one of the include-filter values and none of the exclude-filter values.

If no include-filter is given, that is, if include-filter is an empty sequence, any item will be included in the result (unless it is excluded by exclude-filter).

Note

There is no way to specify a list of values using attribute value templates. If the option shortcut syntax is used to provide the include-filter or exclude-filter option, it will consist of a single regular expression. To specify a list of regular expressions, you must use the p:with-option syntax.

The result document produced for the specified directory path has a c:directory document element whose base URI, attached as an xml:base attribute, is the absolute directory path (expressed as a URI that ends in a slash) and whose name attribute (without a trailing slash) is the last segment of the directory path.

<c:directory
  name = string
  uri = anyURI>
    (c:file |
     c:directory |
     c:other)*
</c:directory>

Its contents are determined as follows, based on the entries in the directory identified by the directory path. For each entry in the directory and subject to the rules that are imposed by the max-depth, include-filter, and exclude-filter options, a c:file, a c:directory, or a c:other element is produced, as follows:

  • A c:directory is produced for each subdirectory not determined to be special. Depending on the values of the three options, it may contain child elements for the directory’s content.

  • A c:file is produced for each file not determined to be special.

    <c:file
      name = string
      uri = anyURI
      content-types? = ContentTypes />

  • Any file or directory determined to be special by the p:directory-list step may be output using a c:other element but the criteria for marking a file as special are implementation-defined.

    <c:other
      name = string
      uri = anyURI />

Each of the elements c:file, c:directory, and c:other has a name attribute, whose value is a relative IRI reference, giving the (local) file or directory name.

Each of these element also contains the corresponding resource’s URI in an xml:base attribute, which may be a relative URI for any but the top-level c:directory element. In the case of c:directory, it must end in a trailing slash. This way, users will always be able to compute the absolute URI for any of these elements by applying fn:base-uri() to it.

2.1 Directory list details

If detailed is false, then only the name and xml:base attributes are expected on c:file, c:directory, or c:other elements.

If detailed is true, then the pipeline author is expecting additional details about each entry. The following attributes should be provided by the implementation:

readable

true” if the entry is readable.

writable

true” if the entry is writable.

hidden

true” if the entry is hidden.

last-modified

The last modification time of the entry, expressed as a lexical xs:dateTime in UTC.

size

The size of the entry in bytes.

The precise meaning of these properties are implementation-defined and may vary according to the URI scheme of the path. If the value of an attribute is “false” or if it has no meaningful value, the attribute may be omitted.

Any other attributes on c:file, c:directory, or c:other are implementation-defined.

Document properties

No document properties are preserved.

3 p:file-copy

The p:file-copy step copies a file.

<p:declare-step type="p:file-copy">
     <p:output port="result" primary="true" content-types="application/xml"/>
     <p:option name="href" required="true" as="xs:anyURI"/>        
     <p:option name="target" required="true" as="xs:anyURI"/>      
     <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
</p:declare-step>

The p:file-copy step copies the file or directory named in href to the new name specified in target. If the target is a directory, the step attempts to move the file into that directory, preserving its base name.

It is a dynamic error (err:XD0064) if the href or target option value is not a valid xs:anyURI. If the href or target is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option or p:file-copy in the case of a syntactic shortcut value).

If the copy is successful, the step returns a c:result element containing the absolute URI of the target.

If an error occurs and fail-on-error is false, the step returns a c:error element which may contain additional, implementation-defined, information about the nature of the error.

If an error occurs and fail-on-error is true, one of the following errors is raised:

  • It is a dynamic error (err:XD0011) if the resource referenced by the href option does not exist, cannot be accessed or is not a file or directory.

  • It is a dynamic error (err:XC0050) if the URI scheme of the target option is not supported or the file or directory cannot be copied to the specified location.

Document properties

No document properties are preserved.

4 p:file-delete

The p:file-delete step deletes a file or a directory.

<p:declare-step type="p:file-delete">
     <p:output port="result" primary="true" content-types="application/xml"/>
     <p:option name="href" required="true" as="xs:anyURI"/>        
     <p:option name="recursive" as="xs:boolean" select="false()"/> 
     <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
</p:declare-step>

The p:file-delete step attempts to delete the file or directory named in href.

It is a dynamic error (err:XD0064) if the href option value is not a valid xs:anyURI. If the href option is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option or p:file-delete in the case of a syntactic shortcut value).

If href specifies a directory, it can only be deleted if the recursive option is true or if the specified directory is empty.

If the delete is successful, the step returns a c:result element containing the absolute URI of the deleted file or directory.

If an error occurs and fail-on-error is false, the step returns a c:error element which may contain additional, implementation-defined, information about the nature of the error.

If an error occurs and fail-on-error is true, one of the following errors is raised:

  • It is a dynamic error (err:XD0011) if the resource referenced by the href option does not exist, cannot be accessed or is not a file or directory.

  • It is a dynamic error (err:XXXXXX) if an attempt is made to delete a non-empty directory and the recursive option was set to false.

Document properties

No document properties are preserved.

5 p:file-info

The p:file-info step returns information about a file, directory or other file system object.

<p:declare-step type="p:file-info">
     <p:output port="result" primary="true" content-types="application/xml"/>
     <p:option name="href" required="true" as="xs:anyURI"/>        
     <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
</p:declare-step>

The p:file-info step returns information about the file, directory or other file system object named in the href option.

It is a dynamic error (err:XD0064) if the href option value is not a valid xs:anyURI. If the href option is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option or p:file-info in the case of a syntactic shortcut value).

If the href option is not a file: URI, the result is implementation defined.

If the href option is a file: URI, the step returns:

  • If href option references a file: A c:file element with standard attributes (see below).

  • If href option references a directory: A c:directory element with standard attributes (see below).

  • If href option references any other file system object: Implementation defined (for example an c:other or c:device element). It is advised to use the standard attributes (see below) if applicable.

The following attributes are standard on a returned c:file or c:directory element. All attributes are optional and must be absent if not applicable. Additional implementation-defined attributes may be present, but they must be in a namespace.

AttributeTypeDescription
readablexs:booleantrue if the object is readable.
writablexs:booleantrue if the object file is writable.
hiddenxs:booleantrue if the object is hidden.
last-modifiedxs:dateTimeThe last modification time of the object expressed in UTC.
sizexs:integerThe size of the object in bytes.

If an error occurs and fail-on-error is false, the step returns a c:error element which may contain additional, implementation-defined, information about the nature of the error.

If an error occurs and fail-on-error is true, one of the following errors is raised:

  • It is a dynamic error (err:XD0011) if the resource referenced by the href option does not exist, cannot be accessed or is not a file, directory or other file system object.

Document properties

No document properties are preserved.

6 p:file-mkdir

The p:file-mkdir step creates a directory.

<p:declare-step type="p:file-mkdir">
     <p:output port="result" primary="true" content-types="application/xml"/>
     <p:option name="href" required="true" as="xs:anyURI"/>        
     <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
</p:declare-step>

The p:file-mkdir create the directory named in the href option. If this includes more than one directory component, all of the intermediate components are created. The path separator is implementation-defined.

It is a dynamic error (err:XD0064) if the href option value is not a valid xs:anyURI. If the href option is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option or p:file-mkdir in the case of a syntactic shortcut value).

If the create is successful, the step returns a c:result element containing the absolute URI of the directory created.

If an error occurs and fail-on-error is false, the step returns a c:error element which may contain additional, implementation-defined, information about the nature of the error.

If an error occurs and fail-on-error is true, one of the following errors is raised:

Document properties

No document properties are preserved.

7 p:file-move

The p:file-move step moves a file or directory.

<p:declare-step type="p:file-move">
     <p:output port="result" primary="true" content-types="application/xml"/>
     <p:option name="href" required="true" as="xs:anyURI"/>        
     <p:option name="target" required="true" as="xs:anyURI"/>      
     <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
</p:declare-step>

The p:file-move step moves the file or directory named in href to the new location specified in target. If the target option specifies an existing directory, the step attempts to move the file or directory into that directory, preserving its base name.

It is a dynamic error (err:XD0064) if the href or target option value is not a valid xs:anyURI. If the href or target is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option or p:file-move in the case of a syntactic shortcut value).

If the href option specifies a device or other special kind of object, the results are implementation-defined.

If the move is successful, the step returns a c:result element containing the absolute URI of the target.

If an error occurs and fail-on-error is false, the step returns a c:error element which may contain additional, implementation-defined, information about the nature of the error.

If an error occurs and fail-on-error is true, one of the following errors is raised:

  • It is a dynamic error (err:XD0011) if the resource referenced by the href option does not exist, cannot be accessed or is not a file or directory.

  • It is a dynamic error (err:XXXXXX) if the resource referenced by the target option is an existing file or other file system object.

  • It is a dynamic error (err:XC0050) if the URI scheme of the target option is not supported or the file or directory cannot be moved to the specified location.

Document properties

No document properties are preserved.

8 p:file-create-tempfile

The p:file-create-tempfile step creates a temporary file.

Editorial Note

TBD: This is an almost direct port from EXProc. Details might still change.

Check the error codes. ANd is the name ok? p:file-create-tempfile sounds better to me?

I added the option to leave out the href option and have the tempfile created in the system's temp directory. Ok?

<p:declare-step type="p:file-create-tempfile">
     <p:output port="result" primary="true" content-types="application/xml"/>
     <p:option name="href" as="xs:anyURI?"/>                       
     <p:option name="suffix" as="xs:string?"/>                     
     <p:option name="prefix" as="xs:string?"/>                     
     <p:option name="delete-on-exit" as="xs:boolean" select="false()"/>
     <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
</p:declare-step>

The p:file-create-tempfile creates a temporary file. The temporary file is guaranteed not to already exist when the step is called.

If the href option is specified it must be the URI of an existing directory. The temporary file is created here. If there is no href option specified the location of the temporary file is implementation defined, usually the operating system's default location for temporary files.

It is a dynamic error (err:XD0064) if the href option value is not a valid xs:anyURI. If the href is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option or p:file-create-tempfile in the case of a syntactic shortcut value).

If the prefix option is specified, the filename will begin with that prefix. If the suffix option is specified, the filename will end with that suffix.

If the delete-on-exit option is true, an attempt will be made to automatically delete the temporary file when the processor terminates the pipeline. No error will be raised if this is unsuccessful.

If the temporary file creation is successful, the step returns a c:result element containing the absolute URI of this file.

If an error occurs and fail-on-error is false, the step returns a c:error element which may contain additional, implementation-defined, information about the nature of the error.

If an error occurs and fail-on-error is true, one of the following errors is raised:

Document properties

No document properties are preserved.

9 p:file-touch

The p:file-touch step updates the modification timestamp of a file.

<p:declare-step type="p:file-touch">
     <p:output port="result" primary="true" content-types="application/xml"/>
     <p:option name="href" required="true" as="xs:anyURI"/>        
     <p:option name="timestamp" as="xs:dateTime?"/>                
     <p:option name="fail-on-error" as="xs:boolean" select="true()"/>
</p:declare-step>

The p:file-touch step updates the modification timestamp of the fiel specified in the href option.

It is a dynamic error (err:XD0064) if the href option value is not a valid xs:anyURI. If the href option is relative, it is made absolute against the base URI of the element on which it is specified (p:with-option or p:file-touch in the case of a syntactic shortcut value).

If the timestamp option is set, the file's timestamp is set to this value. Otherwise the file's timestamp is set to the current system's date and time.

If an error occurs and fail-on-error is false, the step returns a c:error element which may contain additional, implementation-defined, information about the nature of the error.

If an error occurs and fail-on-error is true, one of the following errors is raised:

  • It is a dynamic error (err:XD0011) if the resource referenced by the href option does not exist and cannot be created or exists and cannot be accessed.

Document properties

No document properties are preserved.

10 Step Errors

This step can raise dynamic errors.

[Definition: A dynamic error is one which occurs while a pipeline is being evaluated.] Examples of dynamic errors include references to URIs that cannot be resolved, steps which fail, and pipelines that exhaust the capacity of an implementation (such as memory or disk space). For a more complete discussion of dynamic errors, see Dynamic Errors in XProc 3.0: An XML Pipeline Language.

If a step fails due to a dynamic error, failure propagates upwards until either a p:try is encountered or the entire pipeline fails. In other words, outside of a p:try, step failure causes the entire pipeline to fail.

The following errors can be raised by this step:

err:XC0012

It is a dynamic error if the contents of the directory path are not available to the step due to access restrictions in the environment in which the pipeline is run.

See: p:directory-list

err:XC0017

It is a dynamic error if the absolute path does not identify a directory.

See: p:directory-list

err:XC0050

It is a dynamic error if the URI scheme of the target option is not supported or the file or directory cannot be copied to the specified location.

See: p:file-copy, p:file-move

err:XC0102

It is a dynamic error if an implementation does not support directory listing for a specified scheme.

See: p:directory-list

A Conformance

Conformant processors must implement all of the features described in this specification except those that are explicitly identified as optional.

Some aspects of processor behavior are not completely specified; those features are either implementation-dependent or implementation-defined.

[Definition: An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.]

[Definition: An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.]

A.1 Implementation-defined features

The following features are implementation-defined:

  1. Conformant processors must support directory paths whose scheme is file. It is implementation-defined what other schemes are supported by p:directory-list, and what the interpretation of 'directory', 'file' and 'contents' is for those schemes. See Section 2, “p:directory-list”.
  2. Any file or directory determined to be special by the p:directory-list step may be output using a c:other element but the criteria for marking a file as special are implementation-defined. See Section 2, “p:directory-list”.
  3. The precise meaning of these properties are implementation-defined and may vary according to the URI scheme of the path. See Section 2.1, “Directory list details”.
  4. Any other attributes on c:file, c:directory, or c:other are implementation-defined. See Section 2.1, “Directory list details”.

A.2 Implementation-dependent features

The following features are implementation-dependent:

    B References

    [XProc 3.0] XProc 3.0: An XML Pipeline Language. Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.

    [XProc 3.0 Steps] XProc 3.0 Steps: An Introduction. Norman Walsh, Achim Berndzen, Gerrit Imsieke and Erik Siegel, editors.

    [XPath and XQuery Functions and Operators 3.1] XPath and XQuery Functions and Operators 3.1. Michael Kay, editor. W3C Recommendation. 21 March 2017

    C Glossary

    dynamic error

    A dynamic error is one which occurs while a pipeline is being evaluated.

    implementation-defined

    An implementation-defined feature is one where the implementation has discretion in how it is performed. Conformant implementations must document how implementation-defined features are performed.

    implementation-dependent

    An implementation-dependent feature is one where the implementation has discretion in how it is performed. Implementations are not required to document or explain how implementation-dependent features are performed.

    D Ancillary files

    This specification includes by reference a number of ancillary files.

    steps.xpl

    An XProc step library for the declared steps.