Logging and Monitoring

Algoreus compiles logs and metrics for all its internal services and user applications. This functionality is crucial in debugging applications within Algoreus and assessing their performance. Access to these logs, metrics, and additional monitoring information is provided through Algoreus Cerebellum.

Within Hadoop clusters, the programs executing inside their containers create individual log files as part of the container's components. Since an application can comprise multiple programs dispersed across the cluster nodes, the total logs for the application may also be scattered. These files, typically transient and unavailable after the container's lifecycle, are not conducive for post-mortem diagnostics, troubleshooting, or performance analysis.

To remedy these issues, the Algoreus log framework was designed to:

  • Centralize logs location, merging the logs of the individual containers of a program into one;

  • Ensure logs are both persistent (available for future use and analysis) and accessible during the program's operation;

  • Be expandable using custom log axons; and

  • Allow adjustment of the logging behavior at the level of an individual application as well as the entire cluster.


Logging Example

This diagram exemplifies the steps Algoreus follows when logging a program of an application:

  1. Logs are collected from a specific program operating in a YARN container.

  2. YARN records the log messages produced by containers to files within the container.

  3. Additionally, Algoreus programs broadcast these messages to the Message Broker.

  4. The Algoreus Log Saver Service is programmed to read log messages from the Message Broker. The log saver reads the messages, groups them by program or application, buffers and sorts them in memory, then finally persists them to files in HDFS. Each of these files corresponds to one program or application, based on how the grouping is configured. (This is set by the property log.publish.partition.key)

  5. Apart from persisting logs to files, the Log Saver also reports metrics about the number of log messages produced by each program. These metrics can be retrieved by querying the Algoreus metrics system.

  6. For security purposes, the files written out to persistent storage in HDFS have permissions set so they are only accessible by the Algoreus user.

Logging is configured using instances of Logback's "logback" file, which consists of log axons with log appenders:

  • A log axon is a process that consumes log events from the Message Broker, buffers, groups by application or program, sorts, and then triggers the log appenders defined in its configuration.

  • A log appender (or appender) is a Java class, responsible for consuming and processing messages. This typically includes persisting the log events. It can also, for example, collect metrics, maintain metadata about the storage, or emit alerts when it identifies certain messages.


Retrieving Log Messages from a Program

The logging of an application's programs are configured by the logback-container.xml file, bundled with the Algoreus distribution. This "logback" performs log rotation once a day at midnight and discards logs older than 14 days. Changes can be made to logback-container.xml; after this, applications or programs need to be restarted for the modified logback file to take effect. Changing the logback-container.xml will only affect programs that are started after the change; existing running programs will not be affected..

Algoreus system services run either on cluster edge nodes or in YARN containers, their logging and its configuration depends on the service and where it is located. The log messages emitted by Algoreus system services can be retrieved by:

  • Using the Algoreus Cerebellum: the details downloading the logs emitted by a system service.

  • You can view log messages of system services in the Algoreus Administration page.


Configuring System Service Logs

Algoreus system services that run in YARN containers, such as the Metrics Service, are configured by the same logback-container.xml that configures user application program logging. Algoreus system services that run on cluster edge nodes, such as Algoreus Master or Router, are configured by the logback.xml. Changes can be made to logback.xml; afterwards, the service(s) affected will need to be restarted for the modified "logback" file to take effect.

When running under Distributed Algoreus, the log levels of system services can be changed at runtime without either modifying the logback.xml or restarting Algoreus. The Algoreus Logging Microservices can be used to set the log levels of a system service while it is running. Once changed, they can be reset back to what they were originally by using the reset endpoint.

Note: The Logging for changing system service log levels can only be used with system services that are running under Distributed Algoreus.


Organizing the Log Storage Node

The Log Storage Node is the vital Algoreus service that interprets log messages from the Message Broker, orchestrates them in log axons, secures them to HDFS, and transmits metrics pertaining to logging to the Cerebellum.

In addition to the inherent Algoreus Log Axon, you can determine custom log axons to be executed by the log storage node and carry out specific tasks.

The file maintains properties that govern the delivery of logs to the Message Broker, the log storage node, the Algoreus log axon, and any custom log axons that have been configured.


Delivering Logs to the Message Broker

These properties control the delivery of logs to the Message Broker:

  • Parameter Name

  • Default Value

  • Description

  • Message Broker topic name utilized to publish logs

  • Number of Algoreus Message Broker service partitions to dispatch the logs to node

  • Dispatch logs from an application or a node to the same partition.

Valid values are "application" or "node". If set to "application", logs from all the nodes of an application go to the same partition. If set to "node", logs from the same node go to the same partition. Changes to this property require a restart of all Algoreus applications.

If an external Message Broker service is used (instead of the Algoreus Message Broker service), the number of partitions used for log.publish.num.partitions must match the number set in the external service for the topic being used to publish logs (log.messagebroker.topic).

By default, log.publish.partition.key is set to node, which means that all logs for the same node go to the same partition. Set this to application if you want all logs from an application to go to the same instance of the Log Storage Node.


Log Storage Node

These properties control the Log Storage Node:

  • Parameter Name

  • Default Value

  • Description

  • Maximum number of log storage instances to run in YARN

  • Number of log storage instances to run in YARN

  • Memory in megabytes for each log storage instance to run in YARN.

  • Number of virtual cores for each log storage instance in YARN

Log storage instances should range from a minimum of one to a maximum of ten. The maximum is set by the number of Message Broker partitions, which by default is 10.


Log Axon Setup

The Algoreus log axon is configured by settings in the algoreus-site.xml file.

Custom log axons are configured by a blend of the settings in the algoreus-site.xml file and a "logback" file used to specify the custom axon. The XML file is placed in the log.process.axon.config.dir, a local directory on the Algoreus Master node that is scanned for log processing axon configurations. Each axon is defined by a file in the Logback XML format, with .xml as the file name extension.


Configuring Custom Log Axons

In Algoreus, custom log axons can be configured by creating "logback" files located in a designated directory in the algoreus-site.xml file. The default configuration for this directory is as follows:

Here is an example of a "logback" file for a custom log axon, demonstrating the usage of two appenders (STDOUT and rollingAppender). The file should be located in the designated directory with a .xml file extension:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n</pattern>
    </encoder>
  </appender>

  <property name="df.log.saver.instance.id" value="instanceId"/>

  <appender name="rollingAppender" class="io.algoreus.algoreus.logging.plugins.RollingLocationLogAppender">

    <!-- log file path will be created by the appender as: <basePath>/<namespace-id>/<application-id>/<filePath> -->
    <basePath>plugins/applogs</basePath>
    <filePath>securityLogs/logFile-${df.log.saver.instance.id}.log</filePath>

    <!-- df is the owner of the log files directory, so df will get read/write/execute permissions.
    Log files will be read-only for others. -->
    <dirPermissions>744</dirPermissions>

    <!-- df is the owner of the log files, so df will get read/write permissions.
    Log files will be read-only for others -->
    <filePermissions>644</filePermissions>

    <!-- It is an optional parameter, which takes the number of milliseconds.
    The appender will close a file if it is not modified for the fileMaxInactiveTimeMs
    period of time. Here, it is set for thirty minutes. -->
    <fileMaxInactiveTimeMs>1800000</fileMaxInactiveTimeMs>

    <rollingPolicy class="io.algoreus.algoreus.logging.plugins.FixedWindowRollingPolicy">
      <!-- Only specify the file name without a directory, as the appender will use the
      appropriate directory specified in filePath -->
      <fileNamePattern>logFile-${df.log.saver.instance.id}.log.%i</fileNamePattern>
      <minIndex>1</minIndex>
      <maxIndex>9</maxIndex>
    </rollingPolicy>

    <triggeringPolicy class="io.algoreus.algoreus.logging.plugins.SizeBasedTriggeringPolicy">
      <!-- Set the maximum file size appropriately to avoid a large number of small files -->
      <maxFileSize>100MB</maxFileSize>
    </triggeringPolicy>

    <encoder>
      <pattern>%-4relative [%thread] %-5level %logger{35} - %msg%n</pattern>
      <!-- Do not flush on every event -->
      <immediateFlush>false</immediateFlush>
    </encoder>
  </appender>

  <logger name="io.algoreus.algoreus.logging.plugins.RollingLocationLogAppenderTest" level="INFO">
    <appender-ref ref="rollingAppender"/>
  </logger>

  <root level="INFO">
    <appender-ref ref="STDOUT"/>
  </root>

</configuration>

For custom log appenders, you can utilize any existing Logback appender. The RollingLocationLogAppender, an extension of the Logback FileAppender, allows the use of HDFS locations within log axons. If necessary, you can also develop and implement your own custom appender. Ensure that your custom appender implements the Appender interface and has access to Algoreus Cerebellum system components through the AppenderContext.


Enabling Access Log

Access logging can be enabled in Distributed Algoreus with security enabled. It logs each HTTP access through the Authentication Server and Router in the standard access log format.

To enable access logging, follow these steps:

  1. In the logback-container.xml file located in /etc/algoreus/conf, uncomment and configure the following properties:

<appender name="AUDIT" class="ch.qos.logback.core.rolling.RollingFileAppender">
  <file>access.log</file>
  <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
    <fileNamePattern>access.log.%d{yyyy-MM-dd}</fileNamePattern>
    <maxHistory>30</maxHistory>
  </rollingPolicy>
  <encoder>
    <pattern>%msg%n</pattern>
  </encoder>
</appender>
<logger name="http-access" level="TRACE" additivity="false">
  <appender-ref ref="AUDIT" />
</logger>

<appender name="EXTERNAL_AUTH_AUDIT" class="ch.qos.logback.core.rolling.RollingFileAppender">
  <file>external_auth_access.log</file>
  <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
    <fileNamePattern>external_auth_access.log.%d{yyyy-MM-dd}</fileNamePattern>
    <maxHistory>30</maxHistory>
  </rollingPolicy>
  <encoder>
    <pattern>%msg%n</pattern>
  </encoder>
</appender>
<logger name="external-auth-access" level="TRACE" additivity="false">
  <appender-ref ref="EXTERNAL_AUTH_AUDIT" />
</logger>
  1. By default, the access.log and external_auth_access.log files will be available under the /home/algoreus directory. You can configure the log paths by modifying the logback.xml file. For example:

<file>/var/log/algoreus/access.log</file>

  1. After modifying the logback.xml file, restart the algoreus-router and algoreus-auth-server services using the appropriate commands:

$ /etc/init.d/algoreus-router restart $ /etc/init.d/algoreus-auth-server restart


Monitoring Utilities

Algoreus can be monitored using Nagios. A Nagios-style plugin is available for checking the status of Algoreus applications, programs, and the Algoreus instance itself.


Last updated