# 5 DATA SOURCE/COLLECTION

//technical criteria for data quality (?) Create (explain: Language setting, what does this mean?) Edit Collection: Settings, delete/empty collection Preview Test Object//

# Documents

Each data item is called a document. It should include a uid field to uniquely identify the document (any string that is unique and could act as primary key on your side) and a human readable and relevant name field - this name is later used in the web frontend for displaying the document.

# Collections

Collections are the target buckets for your data (documents). Each collection features certain configurations for the text production, like language, text length or keyword density. Each Collection gets its own endpoint via the collection ID.

You can use collections to group documents that have a shared trait, like text category, or to differentiate for different delivery options.

# 5.1 IMPORT DATA

# Upload your data manually

In parallel to the API, manual file upload is possible as well.

Inside the Import feature in a collection, you can

  • either copy-paste tabular content or JSON data into the text field
  • upload a csv or xlsx (Excel) file

# File upload via API

For cases, where your system exports a file instead of individual data items, individual API calls for each document can not be implemented. For those cases, the file upload feature is available via API as well "bulk upload".

The API upload uses the same transformations as the manual upload.

  1. Prepare the import file Export the data from your source system in a format, that is suitable for the upload. You can try the manual upload feature to test this.

  2. Send the file to the API After you have a working file format, you can send this to the Bulk Upload API.

See for the API call.

The hint= field specifies the format. Available are:

json_magic (a List of Elements) csv_magic CSV xlsx_magic Excel 3. See the results The upload is processed in the background, and after a few seconds, your documents start to appear in the collection. Please note, that since the whole file has to be parsed in a bulk operation, this process is not suitable for real time processing.

Custom File Types, XML, Data Processing and Cleanup For some enterprise customers, certain custom file types and processing chains are implemented on a per-customer basis. Those are specified via the hint parameter. Your account manager provides you with your custom hint, if available. The process and API follows the same procedures as above.

# Upload Report

For manual Uploads or Uploads via API, a processing report is being produced after the file has been transformed into documents.

This reports shows the completed documents, and any errors that occurred.

For up to 7 days, the uploaded file can be downloaded as well, helping you in debugging any file format issues on your side.

# 5.2 Structured DATA

  • the NLG Cloud will process structured data (in contrast to unstructured information like text, fulltext, images etc.). Although these unstructured information can be transmitted in the data as well, and then be passed into parts of the text: e.g. a video or image url can be placed at a certain part in the text
  • inside the platform all data is transmitted and interpreted as JSON
  • certain pre-processing steps like a csv or xsls upload are available as well, those are automatically transformed to JSON
  • If you want to leasrn how to create a data model within an upload file please refer to this "guide" (opens new window) //TODO