DATA SOURCE (COLLECTION)
In the Data Source area you manage the data in your project. The data items (documents) you import are organized in collections. A project can contain multiple collections.
On the NLG platform each data item is called a document. It should include a uid field to uniquely identify the document (any string that is unique and could act as primary key on your side) and a human readable and relevant name field. uid and name are later used in the web frontend for displaying the document.
The AX NLG platform processes structured data in contrast to unstructured information like e.g. text, fulltext or images. This is how the structured data is handled on the platform:
- All data is transmitted, accessed and interpreted as JSON
- Certain pre-processing steps like a csv or xsls upload are available but those are automatically transformed to JSON as well.
Collections are the target buckets for your documents. You can use collections to group documents that have a shared trait, like text category, language, keyword density, or to differentiate them for different delivery options. Each Collection gets its own API endpoint via the collection ID.
Collections can either be completely deleted or emptied, in which case only the documents are removed from the collection, the collection and its settings will be retained. The content ruleset also remains unchanged.
When you create a collection, you have to assign it a language from the language pool you selected in the project settings. Then the language rules for the selected language apply to this collection. This means that you can have multiple languages in a project, but each collection is assigned to exactly one language.
At this point, maximum and minimum limits for the text length can be set. The maximum text length specifies the number of characters that must not be exceeded in a text. To not surpass this limit, non-mandatory sentences are gradually omitted.
Unlike the upper limit, the software cannot act independently for the minimum text length. Texts below the limit are declared low quality and the status will be set to insufficiant. This allows them to be identified and corrected if required.
Further Settings for Collections
You can choose these settings when you create a new collections, but you can edit them anytime later:
- webhook url: Where to send new texts.
- webhook secret(optional): Webhook pushes can be signed with this key to verify authenticity.
- used ruleset version: draft /published These settings affect the publishing workflow. For this you have to create two data collections in your project. One collection is for you to test your draft version with. The other one needs to be configured to use the published version of your project and will hold productive documents you wish to publish content for.
- autogenerate new documents: When you add new documents, text generation will start automatically.
- autogenerate existing documents after changes: When you change existing documents, text generation will start automatically.
- export format: Select your preferred file export format for this collection (json, csv, xlsx). Only the selected format will be generated. If you select no exports you stop generating exports. This only applies to file exports, API results are always in JSON format.
To add documents to your collection you can upload files via API (see How to use our API for a manual) or upload data manually:
- either copy-paste tabular content or JSON data into the text field,
- or upload a .csv or .xlsx (Microsoft Excel) file.