Home
Turium Algoreus
Turium Algoreus
  • Turium Algoreus Documentation
    • Turium Algoreus Overview
      • How to Guides
        • Axons (Pipeline) User Guide
          • Algoreus Genesis
          • Algoreus Node
          • Steps for a simple batch Axon in Algoreus
          • Configuring Axon in Algoreus
          • Deploying an Axon in Algoreus
          • Running an Axon in Algoreus
          • Viewing and downloading logs in the Genesis in Algoreus
          • Scheduling an Axon in Algoreus
          • Reusable Axons in Algoreus
          • Using Triggers in Algoreus
          • Working with multiple versions of the same node in Algoreus
          • Modifying a draft Axon in Algoreus
          • Editing a deployed Axon in Algoreus
          • Duplicating An Axon in Algoreus
          • Deleting an Axon in Algoreus
          • Deploying nodes from the Algoreus Hub
          • Using node templates in Algoreus
          • Exporting and importing Axons in Algoreus
          • Dynamic resource configuration in Algoreus
          • Working with namespaces in Algoreus
        • Soma (Transformation) User Guide
          • Algoreus Soma Overview
          • Algoreus Soma Concepts
          • Algoreus Soma UI components
          • Working with multiple datasets
          • Navigating between Soma and Algoreus Genesis
          • Editing a transformation created in the Soma
          • Soma data types
          • Working with connections in Soma
          • Parsing a CSV file
          • Strings Formatting
          • Sending records to error
          • Working with numbers in Soma
          • Working with Decimal types in Soma
          • Performing date transformations in Soma
          • Filtering records
          • Finding and replacing values in a column
          • Filling null or empty cells
          • Copying, deleting, and keeping columns
          • Renaming a column
          • Joining two columns
          • Swapping two column names
          • Extracting fields to retrieve values
          • Exploding fields
          • Masking data
          • Encoding records to store or transfer data
          • Decoding records to store or transfer data
          • Applying a Hashing algorithm to a column
          • Upgrading the Soma transformation node version
          • Viewing and downloading a schema in Soma
          • Viewing Soma Service logs
        • Cerebellum (Operations and Monitoring) User Guide
          • Logging and Monitoring
          • Metrics
          • Dashboard and Reports
          • Preferences and Runtime Arguments
          • Transaction Service Maintenance
        • Engram (Metadata) User Guide
          • System Metadata
          • Discovery and Lineage
          • Audit Logging
          • Metadata Management
          • Accessing Metadata Programmatically
          • Metadata Field-Level Lineage
        • Clone (Replication) User Guide
          • Cloning overview
          • Clone Concepts
          • Adding Transformations to a Cloning Job
          • Deleting a Cloning Job
          • Tutorial: Cloning data from Oracle Database to BigQuery
        • Algology (Visualisation) User Guide
          • Dashboards
            • Using Dashboards
            • Building Dashboards
            • Manage dashboards
            • Publishing Dashboard
            • Playlist
            • Create and manage reports
            • Share dashboards and panels
            • Access Dashboard Usage
            • Search Dashboards
          • Panel Editor
            • Configure Panel Options
            • Configure standard options
          • Visualisations
            • Alert List
            • Bar Chart
            • Bar Gauge
            • Candlestick Panel
            • Canvas
            • Dashboard List
            • Flame Graph
            • Gauge
            • Heatmap
            • Histogram
            • Logs
            • Node Graph
            • Traces Panel
            • Pie Chart
            • State Timeline
            • Stat Panel
            • Time series
            • Trend Panel
            • Text Panel
            • Table
            • GeoMap
            • Datagrid Panel
            • Status history
            • Annotations
          • Explore
            • Logs in Explore
            • Queries in Explore
            • Tracing in Explore
            • Inspector in Explore
    • Turium Algoreus Connectors
Powered by GitBook
On this page
  • Concepts and Terminology
  • Field Lineage for Algoreus
  • Field Lineage for Algoreus Nodes

Was this helpful?

  1. Turium Algoreus Documentation
  2. Turium Algoreus Overview
  3. How to Guides
  4. Engram (Metadata) User Guide

Metadata Field-Level Lineage

PreviousAccessing Metadata ProgrammaticallyNextClone (Replication) User Guide

Last updated 1 year ago

Was this helpful?

Algoreus provides a way to retrieve the lineage for data entities. A data entity can have an associated schema. The schema defines different fields in the data entity along with their data type information. Field Level Lineage allows a user to get a more granular lineage view of a data entity. A field lineage for a given data entity shows for the specified time range all the fields that were computed for a data entity and the fields from source entities that participated in the computation of those fields. Field lineage also shows the detail operations that caused the transformation from fields of a source data entity to the field of a given data entity.


Concepts and Terminology

  • Field: Field identifies a column in a data entity. A field has a name and data type.

  • EndPoint: EndPoint defines the source or destination of the data along with its namespace from where the fields are read or written to.

  • Field Operation: Operation defines a single computation on a field. It has a name and description.

  • Read Operation: Type of operation that reads from the source EndPoint and creates a collection of fields.

  • Transform Operation: Type of operation that transforms a collection of input fields to a collection of output fields.

  • Write Operation: Type of operation that writes the collection of fields to the destination EndPoint.

  • Origin: Origin of the field is the name of the operation that outputted the field. The <origin, fieldName> pair is used to uniquely identify the field because the field can appear in the outputs of multiple operations.


Field Lineage for Algoreus

@Override
public void initialize() throws Exception {
  MapReduceContext context = getContext();
  List<Operation> operations = new ArrayList();

  Operation read = new ReadOperation("Read", "Read passenger information", EndPoint.of("ns", "passengerList"),
                                     "id", "firstName", "lastName", "address");
  operations.add(read);

  Operation concat = new TransformOperation("Concat", "Concatenated fields",
                                            Arrays.asList(InputField.of("Read", "firstName"),
                                            InputField.of("Read", "lastName")), "fullName");
  operations.add(concat);

  Operation normalize = new TransformOperation("Normalize", "Normalized field",
                                               Collections.singletonList(InputField.of("Read", "address")),
                                               "address");
  operations.add(normalize);

  Operation write = new WriteOperation("Write", "Wrote to passenger dataset", EndPoint.of("ns", "passenger"),
                                       Arrays.asList(InputField.of("Read", "id"),
                                                     InputField.of("Concat", "fullName"),
                                                     InputField.of("Normalize", "address")));
  operations.add(write);

  // Record field operation
  context.record(operations);
}

Field Lineage for Algoreus Nodes

Nodes in Algoreus data axons can also record the field lineage. The capability to record lineage is available in the prepareRun() method of the node by using the context provided to the prepareRun() method.

@Override
public void prepareRun(BatchSourceContext context) throws Exception {
  if (config.getSchema() != null && config.getSchema().getFields() != null) {
    List<Schema.Field> fields = config.getSchema().getFields();
    // Make sure the schema and fields are non null
    FieldOperation operation = new FieldReadOperation("Read", "Read from files",
                                                      EndPoint.of(context.getNamespace(), config.referenceName),
                                                      fields.stream().map(Schema.Field::getName)
                                                        .collect(Collectors.toList()));
    context.record(Collections.singletonList(operation));
  }
}

@Override
public void prepareRun(StageSubmitterContext context) throws Exception {
  FieldOperation operation = new FieldTransformOperation("Concatenate", "Concatenated fields",
                                                         Arrays.asList(config.fieldToConcatenate1,
                                                                       config.fieldToConcatenate2),
                                                         config.newFieldName);
  context.record(Collections.singletonList(operation));
}

@Override
public void prepareRun(StageSubmitterContext context) throws Exception {
  FieldOperation operation = new FieldTransformOperation("Normalize", "Normalized field",
                                                         Collections.singletonList(config.fieldToNormalize),
                                                         config.fieldToConcatenate2);
  context.record(Collections.singletonList(operation));
}

@Override
public void prepareRun(BatchSinkContext context) throws Exception {
  if (schema.getFields() != null) {
    FieldOperation operation = new FieldWriteOperation("Write", "Wrote to Algoreus Table",
                                                       EndPoint.of(context.getNamespace(), "passenger"),
                                                       schema.getFields().stream().map(Schema.Field::getName)
                                                         .collect(Collectors.toList()));
    context.record(Collections.singletonList(operation));
  }
}

Concepts and Terminology
Field Lineage for Algoreus
Field Lineage for Algoreus Nodes