Home
Turium Algoreus
Turium Algoreus
  • Turium Algoreus Documentation
    • Turium Algoreus Overview
      • How to Guides
        • Axons (Pipeline) User Guide
          • Algoreus Genesis
          • Algoreus Node
          • Steps for a simple batch Axon in Algoreus
          • Configuring Axon in Algoreus
          • Deploying an Axon in Algoreus
          • Running an Axon in Algoreus
          • Viewing and downloading logs in the Genesis in Algoreus
          • Scheduling an Axon in Algoreus
          • Reusable Axons in Algoreus
          • Using Triggers in Algoreus
          • Working with multiple versions of the same node in Algoreus
          • Modifying a draft Axon in Algoreus
          • Editing a deployed Axon in Algoreus
          • Duplicating An Axon in Algoreus
          • Deleting an Axon in Algoreus
          • Deploying nodes from the Algoreus Hub
          • Using node templates in Algoreus
          • Exporting and importing Axons in Algoreus
          • Dynamic resource configuration in Algoreus
          • Working with namespaces in Algoreus
        • Soma (Transformation) User Guide
          • Algoreus Soma Overview
          • Algoreus Soma Concepts
          • Algoreus Soma UI components
          • Working with multiple datasets
          • Navigating between Soma and Algoreus Genesis
          • Editing a transformation created in the Soma
          • Soma data types
          • Working with connections in Soma
          • Parsing a CSV file
          • Strings Formatting
          • Sending records to error
          • Working with numbers in Soma
          • Working with Decimal types in Soma
          • Performing date transformations in Soma
          • Filtering records
          • Finding and replacing values in a column
          • Filling null or empty cells
          • Copying, deleting, and keeping columns
          • Renaming a column
          • Joining two columns
          • Swapping two column names
          • Extracting fields to retrieve values
          • Exploding fields
          • Masking data
          • Encoding records to store or transfer data
          • Decoding records to store or transfer data
          • Applying a Hashing algorithm to a column
          • Upgrading the Soma transformation node version
          • Viewing and downloading a schema in Soma
          • Viewing Soma Service logs
        • Cerebellum (Operations and Monitoring) User Guide
          • Logging and Monitoring
          • Metrics
          • Dashboard and Reports
          • Preferences and Runtime Arguments
          • Transaction Service Maintenance
        • Engram (Metadata) User Guide
          • System Metadata
          • Discovery and Lineage
          • Audit Logging
          • Metadata Management
          • Accessing Metadata Programmatically
          • Metadata Field-Level Lineage
        • Clone (Replication) User Guide
          • Cloning overview
          • Clone Concepts
          • Adding Transformations to a Cloning Job
          • Deleting a Cloning Job
          • Tutorial: Cloning data from Oracle Database to BigQuery
        • Algology (Visualisation) User Guide
          • Dashboards
            • Using Dashboards
            • Building Dashboards
            • Manage dashboards
            • Publishing Dashboard
            • Playlist
            • Create and manage reports
            • Share dashboards and panels
            • Access Dashboard Usage
            • Search Dashboards
          • Panel Editor
            • Configure Panel Options
            • Configure standard options
          • Visualisations
            • Alert List
            • Bar Chart
            • Bar Gauge
            • Candlestick Panel
            • Canvas
            • Dashboard List
            • Flame Graph
            • Gauge
            • Heatmap
            • Histogram
            • Logs
            • Node Graph
            • Traces Panel
            • Pie Chart
            • State Timeline
            • Stat Panel
            • Time series
            • Trend Panel
            • Text Panel
            • Table
            • GeoMap
            • Datagrid Panel
            • Status history
            • Annotations
          • Explore
            • Logs in Explore
            • Queries in Explore
            • Tracing in Explore
            • Inspector in Explore
    • Turium Algoreus Connectors
Powered by GitBook
On this page
  • Pruning Invalid Transactions in Algoreus
  • To manually prune the invalid list
  • Pruning Invalid Transactions in a Replicated Cluster
  • Automated Pruning of Invalid Transactions

Was this helpful?

  1. Turium Algoreus Documentation
  2. Turium Algoreus Overview
  3. How to Guides
  4. Cerebellum (Operations and Monitoring) User Guide

Transaction Service Maintenance

PreviousPreferences and Runtime ArgumentsNextEngram (Metadata) User Guide

Last updated 1 year ago

Was this helpful?

Pruning Invalid Transactions in Algoreus

The Transaction Service in Algoreus Cerebellum keeps track of all invalid transactions to exclude their writes from future reads. However, the invalid list can grow over time and potentially impact performance. To address this, you can manually prune the invalid list after the data of invalid transactions has been removed during major HBase compactions of the transactional tables.


To manually prune the invalid list

Follow these steps:

  1. Find the minimum transaction state cache reload time across all HBase region servers by locating the last occurrence of the following line in the HBase region server logs:

    [<instance.name>] Transaction state reloaded with snapshot

    Here, <instance.name> represents the unique identifier for the Algoreus instance being pruned, as defined in the algoreus-site.xml configuration file.

  2. Run the following command on each region server, replacing <instance.name> with the appropriate value, to retrieve the transaction cache state reload time:

    grep -F "[<instance.name>] Transaction state reloaded with snapshot" <region-server-log-file> | tail -1

    This command will provide lines similar to the following (each line represents one entry from a region server log): 15/08/22 00:22:34 INFO coprocessor.TransactionStateCache: [algoreus] Transaction state reloaded with snapshot from 1440202895873 15/08/22 00:22:42 INFO coprocessor.TransactionStateCache: [algoreus-] Transaction state reloaded with snapshot from 1440202956306 15/08/22 00:22:44 INFO coprocessor.TransactionStateCache: [algoreus] Transaction state reloaded with snapshot from 1440202956306 15/08/22 00:22:47 INFO coprocessor.TransactionStateCache: [algoreus] Transaction state reloaded with snapshot from 1440202956306 15/08/22 00:23:34 INFO coprocessor.TransactionStateCache: [algoreus] Transaction state reloaded with snapshot from 1440202956306

  3. Identify the minimum time across all region servers. For example, if the minimum time is 1440202895873, note this as the pruneTime.

  4. Perform a flush and a major compaction on all Algoreus transactional tables.

  5. Wait for the major compaction to complete.

  6. Obtain the minimum time again to determine the pruneTime. Use the value obtained in step 3 as the pruneTime.

  7. If the Algoreus tables are replicated to other clusters, refer to the section below () to obtain the pruneTime for the slave clusters.

  8. The final pruneTime is the minimum pruneTime across all replicated clusters. This pruneTime indicates that the invalid transaction list can be safely pruned until (t - 1 day), where t is the pruneTime.

  9. You can retrieve the current length of the invalid transaction list using a specific call to retrieve the number of invalid transactions.


Pruning Invalid Transactions in a Replicated Cluster

If the Algoreus tables are replicated to a slave cluster, follow these steps to obtain the pruneTime for that slave cluster:

  1. Copy over the latest transaction snapshots from the master cluster to the slave cluster.

  2. Wait for three to four minutes for the latest transaction state to be reloaded from the snapshot.

  3. Run steps 1 to 5 from the previous section (Pruning Invalid Transactions) on the slave cluster to find the pruneTime specific to that slave cluster.


Automated Pruning of Invalid Transactions

Starting from Algoreus, automated pruning of the invalid transactions list is supported. However, it is turned off by default.

Note that for automated pruning to work in a secure Hadoop cluster with authorization enabled, Algoreus needs to have the ability to list all Algoreus tables and their table descriptors in HBase. If Algoreus cannot list the table descriptors, running automated pruning can result in data inconsistency.


Pruning Invalid Transactions in Algoreus
To manually prune the invalid list
Pruning Invalid Transactions in a Replicated Cluster
Automated Pruning of Invalid Transactions
Pruning Invalid Transactions in a Replicated Cluster