Transaction Service Maintenance
Pruning Invalid Transactions in Algoreus
The Transaction Service in Algoreus Cerebellum keeps track of all invalid transactions to exclude their writes from future reads. However, the invalid list can grow over time and potentially impact performance. To address this, you can manually prune the invalid list after the data of invalid transactions has been removed during major HBase compactions of the transactional tables.
To manually prune the invalid list
Follow these steps:
Find the minimum transaction state cache reload time across all HBase region servers by locating the last occurrence of the following line in the HBase region server logs:
[<instance.name>] Transaction state reloaded with snapshot
Here,
<instance.name>
represents the unique identifier for the Algoreus instance being pruned, as defined in the algoreus-site.xml configuration file.Run the following command on each region server, replacing
<instance.name>
with the appropriate value, to retrieve the transaction cache state reload time:grep -F "[<instance.name>] Transaction state reloaded with snapshot" <region-server-log-file> | tail -1
This command will provide lines similar to the following (each line represents one entry from a region server log):
15/08/22 00:22:34 INFO coprocessor.TransactionStateCache: [algoreus] Transaction state reloaded with snapshot from 1440202895873 15/08/22 00:22:42 INFO coprocessor.TransactionStateCache: [algoreus-] Transaction state reloaded with snapshot from 1440202956306 15/08/22 00:22:44 INFO coprocessor.TransactionStateCache: [algoreus] Transaction state reloaded with snapshot from 1440202956306 15/08/22 00:22:47 INFO coprocessor.TransactionStateCache: [algoreus] Transaction state reloaded with snapshot from 1440202956306 15/08/22 00:23:34 INFO coprocessor.TransactionStateCache: [algoreus] Transaction state reloaded with snapshot from 1440202956306
Identify the minimum time across all region servers. For example, if the minimum time is 1440202895873, note this as the pruneTime.
Perform a flush and a major compaction on all Algoreus transactional tables.
Wait for the major compaction to complete.
Obtain the minimum time again to determine the pruneTime. Use the value obtained in step 3 as the pruneTime.
If the Algoreus tables are replicated to other clusters, refer to the section below (Pruning Invalid Transactions in a Replicated Cluster) to obtain the pruneTime for the slave clusters.
The final pruneTime is the minimum pruneTime across all replicated clusters. This pruneTime indicates that the invalid transaction list can be safely pruned until (t - 1 day), where t is the pruneTime.
You can retrieve the current length of the invalid transaction list using a specific call to retrieve the number of invalid transactions.
Pruning Invalid Transactions in a Replicated Cluster
If the Algoreus tables are replicated to a slave cluster, follow these steps to obtain the pruneTime
for that slave cluster:
Copy over the latest transaction snapshots from the master cluster to the slave cluster.
Wait for three to four minutes for the latest transaction state to be reloaded from the snapshot.
Run steps 1 to 5 from the previous section (Pruning Invalid Transactions) on the slave cluster to find the pruneTime specific to that slave cluster.
Automated Pruning of Invalid Transactions
Starting from Algoreus, automated pruning of the invalid transactions list is supported. However, it is turned off by default.
Note that for automated pruning to work in a secure Hadoop cluster with authorization enabled, Algoreus needs to have the ability to list all Algoreus tables and their table descriptors in HBase. If Algoreus cannot list the table descriptors, running automated pruning can result in data inconsistency.
Last updated