About dctap

dctap is a Python package for parsing and normalizing spreadsheets or CSV Files that follow the model for DC Tabular Application Profiles (DCTAP) (see installation instructions).

The dctap package includes a command-line tool for viewing the normalized contents of a given TAP in one of three interchangeable formats: a verbose indented-TXT format (for human users) and YAML or JSON (for machines). The tool checks a CSV File for potential violations of the DCTAP model and emits warnings or helpful suggestions.

An Application Profile describes models, vocabularies, and usage patterns that are expected or required to be found in Instance Data. Developing a shared profile can help data providers capture consensus models on the “shape” of data in a given domain and improve the coherence or interoperability of data in that domain. Developing that profile in a simple spreadsheet, using DCTAP, can make it easier for people to participate in that process and use its results.

An Application Profile is also commonly used as a basis for data validation. While the dctap package itself does not support validation (or any other operation touching on instance data), it can however serve as a preprocessor for validation applications downstream. The normalized representation of a DCTAP CSV in JSON, for example, can be converted into validation schemas expressed in Shape Expressions Language (ShEx) or Shapes Constraint Language (SHACL).

dctap aims at catching a few of the more obvious inconsistencies in a given TAP – malformed regular expressions, the use of literal datatypes with nonliteral values, and the like. These checks are documented below and in extensive unit tests. The checks err on the side of tolerance, and error messages are meant as helpful hints to editors of early drafts. Users are free to customize the DCTAP model with local extensions. Any part of a given TAP not recognized by dctap as a built-in or customized feature of the DCTAP model is simply ignored.