DATA SOURCE (COLLECTION)

In the Data Source area you manage the data in your project. The data items (documents) you import are organized in collections. A project can contain multiple collections.

Documents

On the NLG platform each data item is called a document. It should include a uid field to uniquely identify the document (any string that is unique and could act as primary key on your side) and a human readable and meaningful name field. uid and name are used in the web frontend for displaying the document.

Structured Data

The AX NLG platform processes structured data in contrast to unstructured information like e.g. text, full text or images.

This is how the structured data is handled on the platform:

  • All data is transmitted, accessed and interpreted as JSON
  • Certain pre-processing steps like a CSV or XLSX upload are available but those are automatically transformed to JSON as well.

Collections

Collections are the target buckets for your documents. You can use collections to group documents that have a shared trait, like text category, language, keyword density, or to differentiate them for different delivery options. Each Collection gets its own API endpoint via the collection ID.

Collections can either be completely deleted; or then can be emptied, in which case only the documents are removed from the collection, the collection and its settings will be retained. The content ruleset also remains unchanged.

Languages

When you create a collection, you have to assign it a language from the language pool you selected in the project settings. Then the language rules for the assigned language apply to this collection. This means that you can have multiple languages in a project, but each collection is assigned to exactly one language.

Text length

At this point, maximum and minimum limits for the text length can be set. The maximum text length specifies the number of characters that must not be exceeded in a text. Setting a value enables the following steps:

  1. The text is generated as normal. If length limits for individual statements and branchings are given, they apply first.
  2. If text length exceeds the maximum, branchings marked as optional are being trimmed, in order from end towards beginning of the text.
  3. If text length still exceeds the maximum, statements marked as non-obligatory are being removed, again starting from the end.

Unlike the upper limit, the software cannot act independently for the minimum text length. Texts below the limit are declared low quality and the status will be set to insufficient. This allows them to be identified and corrected if required.

Further Settings for Collections

The rest of the settings control when to generate text and how to deliver it:

  • webhook URL: Where to send new texts.

  • webhook secret (optional, but recommended): Webhook pushes can be signed with this key to verify authenticity.

  • used ruleset version: draft /published These settings affect the publishing workflowopen in new window. For this you have to create two data collections in your project. One collection is for you to test your draft version with. The other one needs to be configured to use the published version of your project and will hold productive documents you wish to publish content for.

  • autogenerate new documents: When you add new documents, text generation will start automatically.

  • autogenerate existing documents after changes: When you change existing documents, text generation will start automatically.

  • export format: Select your preferred file export format for this collection (JSON, CSV, XLSX). Only the selected format will be generated. Select no exports to stop generating exports.

    This only applies to file exports, API results are always in JSON format.

Import Data

To add documents to your collection you can upload files via API (see How to use our APIopen in new window for a manual) or upload data manually:

  • either copy-paste tabular content or JSON data into the text field,
  • or upload a .csv or .xlsx (Microsoft Excel) file.