Skip to main content

Data Batching

In scenarios like generating individual documents for each customer, data batching is a more efficient option than sending separate document generation requests. This is accomplished by providing SQL that returns multiple filenames along with corresponding data sections.

There are two methods to implement data batching:

  1. Manual Data Batching
  2. AOP Internal Data Batching

Manual Data Batching

In this approach, SQL is structured to generate multiple filenames and corresponding data sections. AOP then creates separate documents for each dataset. The generated output files will be returned in a ZIP file.

Example

The following example demonstrates SQL that generates three separate documents, one for each dataset.

ZIP and PDF download

For data batching with template types such as DOCX, XLSX, and PPTX, where the output type is same as template type, the generated documents will be returned in a ZIP file containing the respective generated documents for each batch.

However, if the output type is PDF, it is possible to merge all the generated document by performing the merge operation using the following Init PL/SQL.

aop_api_pkg.g_output_merge := 'true'; 

Data Source

The SQL query generates distinct objects with a filename and data for each selected customer with IDs 1, 2, and 3. The data contains customer information, which AOP uses for data replacement via tags. A visual representation of the batching process can be seen in the following image.

Demo SQL and resultant Data Structure

select
'file_' || to_char(d.customer_id) as "filename",
cursor(
select
c.cust_first_name as "cust_first_name",
c.cust_last_name as "cust_last_name",
c.cust_city as "cust_city",
cursor(select o.order_total as "order_total",
'Order ' || rownum as "order_name",
cursor(select p.product_name as "product_name",
i.quantity as "quantity",
i.unit_price as "unit_price", APEX_WEB_SERVICE.BLOB2CLOBBASE64(p.product_image) as "image"
from aop_sample_order_items i, aop_sample_product_info p
where o.order_id = i.order_id
and i.product_id = p.product_id
) "product"
from aop_sample_orders o
where c.customer_id = o.customer_id
) "orders"
from aop_sample_customers c
where c.customer_id = d.customer_id
) as "data"
from aop_sample_customers d
where d.customer_id in (1,2,3)

Template

The template is a standard invoice format that can be used for each customer's details.

 IR-template.docx  

Manual Batching Template

Output

After processing the provided data source and template with an output type of docx, AOP returns a ZIP file containing three generated output documents.

 Manual-Batch-Output.zip  

When used the output type pdf and Init PL/SQL aop_api_pkg.g_output_merge := 'true';, the resultant following merged PDF output will be generated.

 output.pdf  

AOP Internal Data Batching

Available From: v24.3

When the provided SQL generates a very large dataset and AOP is unable to generate a single document for DOCX, XLSX, PPTX, or PDF, AOP Internal Data Batching can be useful.

Some applications and use cases for AOP Internal Data Batching are listed below:

  1. Generating a single file per customer.
  2. For PDF output, large datasets with output type PDF may cause a timeout error. In such cases, batching the data and merging is the best solution.
  3. AOP Internal Batching provides the following internal tags, which can be used as conditional tags to add separate sections based on requirements (as shown in the example):
    • batch_number – returns the batch number starting from 1 to batch_size.
    • batch_condition – returns the conditions on which the batch is generated.
    • last_batch – returns true or false.

There are two ways to perform batching, which are listed as follows:

  1. Batch by Size
  2. Batch by Condition

To enable batching, the following Init PL/SQL should be provided.

aop_api_pkg.g_output_batch_selector      varchar2(500 char) := null;  -- The hierarchy of data selector to point the data key that needs to be batch/split, e.g: orders:products 
aop_api_pkg.g_output_batch_size number := null; -- Number of batches to generate; refers to number of files to split into
aop_api_pkg.g_output_batch_condition varchar2(500char) := null; -- Batch condition to generate batch E.g-1: category , E.g-2: unit_price > 100? "Expensive" : unit_price < 50 ? "Cheap" : "Medium"
Batch Selector

Enabling the batch selector activates data batching. The batch selector is a hierarchy of keys or columns that AOP uses to split the data into batches. For instance, if the batch selector is customer:order:product, the final selector product (the product cursor in SQL) must always return one or more data arrays, while the preceding selectors customer and order must each return an array containing a single data entry. The hierarchy must also be maintained, meaning the customer cursor should include order, and the order cursor should include product.

If no batch size or condition is specified, two batches will be created by default. When a batch size is provided, AOP will evenly distribute the data on each batch based on the data length of the final selector.

Batch by Size

Batch by size refers to batching of the data based on the number of files to generate. Following Init PL/SQL is used to enable data batch by size.

aop_api_pkg.g_output_batch_selector   varchar2(500 char) := null;  -- The hierarchy of data selector to point the data key that needs to be batch/split, e.g: orders:products 
aop_api_pkg.g_output_batch_size number := null; -- Number of batches to generate; refers to number of files to split into

Example 1

Following example demonstrates the data batching by size for the template type DOCX and XLSX and the output type is same as template type resulting all outputs in the ZIP.

Data Source

The data source for batching by size is given as:

select
'file1' as "filename",
cursor(
select
c.cust_first_name as "cust_first_name",
c.cust_last_name as "cust_last_name",
c.cust_city as "cust_city",
cursor(select o.order_total as "order_total",
'Order ' || rownum as "order_name",
cursor(select p.product_name as "product_name",
p.category as "category",
i.quantity as "quantity",
i.unit_price as "unit_price",
APEX_WEB_SERVICE.BLOB2CLOBBASE64(p.product_image) as "image",
40 as "image_max_width"
from aop_sample_order_items i, aop_sample_product_info p
where o.order_id = i.order_id
and i.product_id = p.product_id
) "product"
from aop_sample_orders o
where c.customer_id = o.customer_id
) "orders"
from aop_sample_customers c
where customer_id = 1
) as "data"
from dual

Following Init PL/SQL is used to provide the batch selector and size:

aop_api_pkg.g_output_batch_selector   := 'orders:products'; 
aop_api_pkg.g_output_batch_size := 3;
Template

Here the following DOCX and XLSX template are used.

 template.docx    template.xlsx  

Output

Upon processing the above template and data source, AOP will return the ZIP file including the three generated DOCX and XLSX files with evenly distrubuted products.

 batch-docx.zip    batch-xlsx.zip  

Example 2

Following example demonstrates the data batching by size for the template type DOCX and output type PDF.

Data Source

Data Source SQL is same as previous example's data Source.

Following Init PL/SQL is used to provide the batch selector, batch size and merge:

aop_api_pkg.g_output_batch_selector   := 'orders:products'; 
aop_api_pkg.g_output_batch_size := 3;
aop_api_pkg.g_output_merge := 'true';
Template

The DOCX template includes the tags like {#batch_number==1} ... {/} which encloses the section that needs to be printed in 1st batch file only, and {#last_batch} ... {/} encloses the texts that needs to be enclosed in the last batch only. The values for the batch_number and last_batch is created by AOP when creating the data batches.

 template.docx  

batch-by-size-pdf

Output

Upon processing the given data source and template, following output is obtained.

 batch-pdf.pdf  

Batch by Condition

Batch by condition refers to batching of the data based on the condition. Condition can be provided as an angular expressions which results in true or false, ternary operator result values, or the distinct values etc.

Some of the examples of conditional expressions are:

"batch_condition":"unit_price > 80"
// this creates two batches, first batch which satisfies the condition (batch with `batch_condition: true`) and other batch which doesnot satisfies the condition (batch with `batch_condition: false`)

"batch_condition":"category"
// this will batch the data on the basis of uniqueness/distinct values of category. For each unique value of category, separate batches will be generated.

"batch_condition":"unit_price > 100? \"Expensive\" : unit_price < 50 ? \"cheap\" : \"Medium\""
// Three batches will be generated with batch_condition:"Expensive", batch_condition:"cheap" and batch_condition:"Medium"

"batch_condition":"product_name.includes(\"Shoes\")"
// two batches, which gives true and false for the statement.

The conditional expression/statement supports all the conditional operator and functions documented at Simple Angular like expressions

Keys in the condition

The keys which are used in the condition expression/statement should belong to the cursor of the final selector.

Following Init PL/SQL is used to enable data batch by condition:

aop_api_pkg.g_output_batch_selector      varchar2(500 char) := null;  -- The hierarchy of data selector to point the data key that needs to be batch/split, e.g: orders:products 
aop_api_pkg.g_output_batch_condition varchar2(500char) := null; -- Batch condition to generate batch E.g-1: category , E.g-2:

Example 1

Following example demonstrates the data batching by condition that uses ternary operator for the template type DOCX and the output type PDF with the merge operation.

Data Source

The data source for batching by condition is similar to the previous data source.

Following Init PL/SQL is used to provide the batch selector and condition:

aop_api_pkg.g_output_batch_selector   := 'orders:products'; 
aop_api_pkg.g_output_batch_condition := 'unit_price > 110? "Expensive" : unit_price < 80 ? "Cheap" : "Medium"';
aop_api_pkg.g_merge := 'true';
Template

For the template, following DOCX template format is used.
 template.docx  

Output

Upon processing the above template and data source, AOP will return the PDF file including the merged document contents based on the given condition.

 batch-pdf.pdf  

Example 2

Following example demonstrates the data batching by condition based on the distinct values of the key and uses template type DOCX and output type PDF with the merge operation.

Data Source

Data Source SQL is same as of previous example's data source.

Following Init PL/SQL is used to provide the batch selector, batch condition and merge:

aop_api_pkg.g_output_batch_selector   := 'orders:product'; 
aop_api_pkg.g_output_batch_condition := 'category';
aop_api_pkg.g_output_merge := 'true';
Template

For the template, following template format the which includes the tags like {#batch_number==1} ... {/} which enclosed the section that needs to be printed in 1st file only, and {#last_batch} ... {/} encloses the texts that needs to be enclosed in the last batch only. The values for batch_condition, batch_number and last batch will be created by AOP when creating the batches.

 template.docx  

Output

Upon processing the given data source and template, following PDF output is obtained.

 batch-pdf.pdf