User-Defined Log

List of Contents

This chapter describes the concept of user-defined log and how to write collector templates which are applied into Machbase.

Concept of Data Conversion

The following is a conceptual diagram to show the collector operations in sequence.

First, a raw data source is parsed based on the regular expression described in the template file of the collector. And it is divided into multiple pieces of texts.
Then, the divided pieces are mapped into a table of database through COL_LIST described in the template file.
Therefore, all the works are done by writing COL_LIST and regular expressions for parsing when you want to create a template for inserting a temporary file to the database.

machregex

Creating a regular expression to parse your text as shown in the previous section will be the key for creating custom templates. For that reason, Machbase provides the utility "machregex". It is very useful tool to check whether the data were parsed properly by the user-defined regular expressions.
The machregex contains a sample that has SYSLOG, ACCESS LOG of Apache web server, and TRACE LOG of Machbase. You can also write new regular expressions based on machregex.
To support the regular expression, Machbase uses the PCRE (Perl Compatible Regular Expressions) library.

Execute machregex

[mach@localhost ~/mach_collector_home/bin]$ ./machregex 
=================================================================
     Machbase Collector Regex Utility
     Release Version 3.0.0.8634.official
     Copyright 2015, Machbase Inc. or its subsidiaries.
     All Rights Reserved.
=================================================================

Usage> ./machregex Pattern NewlinePattern

Result file : machregex.ok machregex.err

<< APACHE access log >>
  => machregex "^([0-9.:]+)\\s([\\w.-]+)\\s([\\w.-]+)\\s(\\[[^\\[\\]]+\\])\\s\"((?:[^\"]|\")+)\"\\s(\\d{3})\\s(\\d+|-)\\s\"((?:[^\"]|\")*)\"\\s\"
((?:[^\"]|\")*)\"$" "^([0-9.:]+)\s" < DATA.LOG

<< MACH trace log >>
  => machregex "^\\[(\\d+[-]\\d+[-]\\d+\\s\\d+[:]\\d+[:]\\d+)+\\s([P][-]\\d+)+\\s([T][-]\\d+)+\\]\\s((?:[^\\0])*)$" "^\\[" < DATA.LOG

<< syslog >>
  => machregex "^(([a-zA-Z]+)\\s+([0-9]+)\\s+([0-9:]*))\\s(\\S*)\\s+((?:[^\\0])*)$" ".*" < DATA.LOG

The machregex has three default test sets as above.

Test machregex

Below shows the results of performing machregex against syslog. It summarizes the important contents only.

[mach@localhost bin]$ machregex "^(([a-zA-Z]+)\\s+([0-9]+)\\s+([0-9:]*))\\s(\\S*)\\s+((?:[^\\0])*)$" ".*" </var/log/syslog
machregex "^(([a-zA-Z]+)\\s+([0-9]+)\\s+([0-9:]*))\\s(\\S*)\\s+((?:[^\\0])*)$" ".*" </var/log/syslog
Pattern => (^(([a-zA-Z]+)\s+([0-9]+)\s+([0-9:]*))\s(\S*)\s+((?:[^\0])*)$)
========================================================================
.............
========================================================================
SUCCESS[107] (rc=7)(Aug 19 18:17:01 localhost CRON[6553]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
)
  ALL (0:110) => [Aug 19 18:17:01 localhost CRON[6553]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
]
  0 (0:15) => [Aug 19 18:17:01]
  1 (0:3) => [Aug]
  2 (4:6) => [19]
  3 (7:15) => [18:17:01]
  4 (16:37) => [localhost]
  5 (38:110) => [CRON[6553]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
]
=======================================================================
SUCCESS[107] (rc=7)(Aug 19 18:39:01 localhost CRON[6616]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && 
[ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
)
  ALL (0:232) => [Aug 19 18:39:01 localhost CRON[6616]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && 
[ -d /var/lib/php5 ] && /usr/lib/php5/sephp5/maxlifetime))
]
  0 (0:15) => [Aug 19 18:39:01]
  1 (0:3) => [Aug]
  2 (4:6) => [19]
  3 (7:15) => [18:39:01]
  4 (16:37) => [localhost]
  5 (38:232) => [CRON[6616]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sphp5/maxlifetime))
]
Summary : Success(107), Failure(0) <== It shows that all of them were successfully completed.
[mach@localhost bin]$

The regular expressions of machregex retrieve syslog text file and divide them into 6 pieces, as you can see above.
It displays from number 0 to number 5. From the displayed numbers, you can use the stable data; number 0, 4 and 5. To link between the results of machregex and table columns, use COL_LIST variables of syslog.tpl as shown above.

Example of Creating Custom Template

This chapter creates a text log file. The example below describes how to make a template file.

test.log

The log file is prepared as below.

[2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever.
[2014-08-18 13:51:19] superman  message-2 : This is the best machine data DBMS ever.
[2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever.
[2014-08-18 13:51:33] superman  message-4 : This is the best machine data DBMS ever.
[2014-08-18 13:51:34] batman    message-5 : This is the best machine data DBMS ever.
[2014-08-18 13:52:34] superman  message-6 : This is the best machine data DBMS ever.
[2014-08-18 13:53:34] batman    message-7 : This is the best machine data DBMS ever.
[2014-08-18 13:54:31] superman  message-8 : This is the best machine data DBMS ever.
[2014-08-18 13:55:30] batman    message-9 : This is the best machine data DBMS ever.
[2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever.
[2014-08-18 13:57:59] superman  message-11 : This is the best machine data DBMS ever.

For the log file above, it converts into three columns: tm, user, and msg. The type of tm column is set to datetime, user for varchar (16), and lastly, msg for varchar (512). Also, "user" and "msg" columns are to make keyword indexes.

Create Regular Expressions Through machregex

Create a Regular Expression

First, to parse date field is as follows.
\[([0-9-: ]+)\]
Second, it is (\S+) for user field.
Third, it expresses the rest of string as ([^\0]*).
Combining these, it becomes \[([0-9-: ]+)\]\s(\S+)\s+([^\0]*).
You have to put this as the argument of the internal string as follows.

"\\[([0-9-: ]+)\\]\\s(\\S+)\\s+([^\\0]*)"

To parse a newline, use expression below.

"^\\["

Verify a Regular Expression

[mach@localhost ~/mach_collector_home/bin]$ machregex "\\[([0-9-: ]+)\\]\\s(\\S+)\\s+([^\\0]+)" "\\[" <test.log
Pattern => (\[([0-9-: ]+)\]\s(\S+)\s+([^\0]+))
============================================================================
SUCCESS[2] (rc=4)([2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:19]
  1 (22:31) => [spiderman]
  2 (32:85) => [message-1 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[3] (rc=4)([2014-08-18 13:51:19] superman  message-2 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:19] superman  message-2 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:19]
  1 (22:30) => [superman]
  2 (32:85) => [message-2 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[4] (rc=4)([2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:33]
  1 (22:31) => [spiderman]
  2 (32:85) => [message-3 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[5] (rc=4)([2014-08-18 13:51:33] superman  message-4 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:33] superman  message-4 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:33]
  1 (22:30) => [superman]
  2 (32:85) => [message-4 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[6] (rc=4)([2014-08-18 13:51:34] batman    message-5 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:34] batman    message-5 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:34]
  1 (22:28) => [batman]
  2 (32:85) => [message-5 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[7] (rc=4)([2014-08-18 13:52:34] superman  message-6 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:52:34] superman  message-6 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:52:34]
  1 (22:30) => [superman]
  2 (32:85) => [message-6 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[8] (rc=4)([2014-08-18 13:53:34] batman    message-7 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:53:34] batman    message-7 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:53:34]
  1 (22:28) => [batman]
  2 (32:85) => [message-7 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[9] (rc=4)([2014-08-18 13:54:31] superman  message-8 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:54:31] superman  message-8 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:54:31]
  1 (22:30) => [superman]
  2 (32:85) => [message-8 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[10] (rc=4)([2014-08-18 13:55:30] batman    message-9 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:55:30] batman    message-9 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:55:30]
  1 (22:28) => [batman]
  2 (32:85) => [message-9 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[11] (rc=4)([2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever.
)
  ALL (0:86) => [[2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:56:44]
  1 (22:31) => [spiderman]
  2 (32:86) => [message-10 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[11] (rc=4)([2014-08-18 13:57:59] superman  message-11 : This is the best machine data DBMS ever.)
  ALL (0:85) => [[2014-08-18 13:57:59] superman  message-11 : This is the best machine data DBMS ever.]
  0 (1:20) => [2014-08-18 13:57:59]
  1 (22:30) => [superman]
  2 (32:85) => [message-11 : This is the best machine data DBMS ever.]
Summary : Success(11), Failure(0)
[mach@localhost ~/mach_collector_home/bin]$

Verify a Regular Expression

[mach@localhost ~/mach_collector_home/bin]$ machregex "\\[([0-9-: ]+)\\]\\s(\\S+)\\s+([^\\0]+)" "\\[" <test.log
Pattern => (\[([0-9-: ]+)\]\s(\S+)\s+([^\0]+))
============================================================================
SUCCESS[2] (rc=4)([2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:19] spiderman message-1 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:19]
  1 (22:31) => [spiderman]
  2 (32:85) => [message-1 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[3] (rc=4)([2014-08-18 13:51:19] superman  message-2 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:19] superman  message-2 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:19]
  1 (22:30) => [superman]
  2 (32:85) => [message-2 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[4] (rc=4)([2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:33] spiderman message-3 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:33]
  1 (22:31) => [spiderman]
  2 (32:85) => [message-3 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[5] (rc=4)([2014-08-18 13:51:33] superman  message-4 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:33] superman  message-4 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:33]
  1 (22:30) => [superman]
  2 (32:85) => [message-4 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[6] (rc=4)([2014-08-18 13:51:34] batman    message-5 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:51:34] batman    message-5 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:51:34]
  1 (22:28) => [batman]
  2 (32:85) => [message-5 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[7] (rc=4)([2014-08-18 13:52:34] superman  message-6 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:52:34] superman  message-6 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:52:34]
  1 (22:30) => [superman]
  2 (32:85) => [message-6 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[8] (rc=4)([2014-08-18 13:53:34] batman    message-7 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:53:34] batman    message-7 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:53:34]
  1 (22:28) => [batman]
  2 (32:85) => [message-7 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[9] (rc=4)([2014-08-18 13:54:31] superman  message-8 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:54:31] superman  message-8 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:54:31]
  1 (22:30) => [superman]
  2 (32:85) => [message-8 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[10] (rc=4)([2014-08-18 13:55:30] batman    message-9 : This is the best machine data DBMS ever.
)
  ALL (0:85) => [[2014-08-18 13:55:30] batman    message-9 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:55:30]
  1 (22:28) => [batman]
  2 (32:85) => [message-9 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[11] (rc=4)([2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever.
)
  ALL (0:86) => [[2014-08-18 13:56:44] spiderman message-10 : This is the best machine data DBMS ever.
]
  0 (1:20) => [2014-08-18 13:56:44]
  1 (22:31) => [spiderman]
  2 (32:86) => [message-10 : This is the best machine data DBMS ever.
]
============================================================================
SUCCESS[11] (rc=4)([2014-08-18 13:57:59] superman  message-11 : This is the best machine data DBMS ever.)
  ALL (0:85) => [[2014-08-18 13:57:59] superman  message-11 : This is the best machine data DBMS ever.]
  0 (1:20) => [2014-08-18 13:57:59]
  1 (22:30) => [superman]
  2 (32:85) => [message-11 : This is the best machine data DBMS ever.]
Summary : Success(11), Failure(0)
[mach@localhost ~/mach_collector_home/bin]$

Create test.rgx

As shown above, create the test.rgx file. The example is written in "$MACHBASE_HOME/collector/samples/test.rgx".

###############################################################################
# Copyright of this product 2013-2023,
# Machbase Corporation (Incorporation) or its subsidiaries.
# All Rights reserved
###############################################################################

#
#  This file is for Machbase trace collector regex file.
#

LOG_TYPE=custom

COL_LIST= (
     (
        REGEX_NO = 0
        NAME = tm
        TYPE = datetime
        SIZE = 8
        DATE_FORMAT="%Y-%m-%d %H:%M:%S"
         ),
     (
        REGEX_NO = 1
        NAME = user
        TYPE = varchar
        SIZE = 16
        USE_INDEX = 1
         ),
     (
        REGEX_NO = 2
        NAME = msg
        TYPE = varchar
        SIZE = 512
        USE_INDEX = 1
         )
)

REGEX="\[([0-9-: ]+)\]\s(\S+)\s+([^\0]+)"

END_REGEX="\["

Create test.tpl

Change "$MACHBASE_HOME/collector/custom.tpl" to "test.tpl" and write it as shown below. The file is provided as an example in the "$MACHBASE_HOME/collector/samples/test.tpl".

###############################################################################
# Copyright of this product 2013-2023,
# Machbase Corporation(Incorporation) or its subsidiaries.
# All Rights reserved
###############################################################################

#
#  This file is for Machbase collector template file.
#

###################################################################
# Collect setting
###################################################################

COLLECT_TYPE=FILE
LOG_SOURCE=/home/mach/machbase_home/collector/samples/test.log

###################################################################
# Process setting
###################################################################

REGEX_PATH=/home/mach/machbase_home/collector/samples/test.tpl

###################################################################
# Output setting
###################################################################

DB_TABLE_NAME = "custom_table"
DB_ADDR       = "127.0.0.1"
DB_PORT       = 5656
DB_USER       = "SYS"
DB_PASS       = "MANAGER"

# 0: Direct insert
# 1: Prepared insert
# 2: Append
APPEND_MODE=2

# 0: None, just append.
# 1: Truncate.
# 2: Try to create table. If table already exists, warn it and proceed.
# 3: Drop and create.
CREATE_TABLE_MODE=2

Create and Execute a Collector

It successfully created the collector "myclt" and executed it.

mach> create collector localhost.myclt from "/home/mach/mach_collector_home/collector/samples/test.tpl";
Created successfully.
Elapsed Time : 0.106
mach>
mach> alter collector localhost.myclt start;
Altered successfully.
Elapsed Time : 0.051
mach>

Debug a Collector

However, TESTTABLE is not created yet.

mach> select * from testtable;
[ERR-02025 : Table TESTTABLE does not exist.]

The program stopped for debugging because of the probability of errors. And it carried out for TRACE as shown below.

mach> alter collector localhost.myclt stop;
Altered successfully.
Elapsed Time : 1.438
mach> alter collector localhost.myclt start trace;
Altered successfully.
Elapsed Time : 0.030
mach>

Discover and Solve Problems via Trace Log

A collector is executed in TRACE mode. Check $MACHBASE_HOME/trc/machbase.trc.

[2016-03-13 23:44:35 P-29741 T-139982693979904][INFO] PREPARE Error [create table custom_table ( collector_type varchar(32), collector_addr ipv4, collector_origin varchar(512), 
collector_offset long, tm datetime, user varchar(16), msg varchar(512))] (100007DA:Error in parse (syntax): near token (user varchar(16), msg varchar(512))).)

In the message above, built-in keyword "user" is used and there is a parsing error when creating the table. Thus, after changing "user" to "myuser," in the test.tpl, execute it again.

A partial contents from "test.rgx"
...........

COL_LIST= (
     (
        REGEX_NO = 0
        NAME = tm
        TYPE = datetime
        SIZE = 8
        DATE_FORMAT="%Y-%m-%d %H:%M:%S"
         ),
     (
        REGEX_NO = 1
        NAME = myuser   <== Modified!
        TYPE = varchar
        SIZE = 16
USE_INDEX = 1
         ),
     (
        REGEX_NO = 2
        NAME = msg
        TYPE = varchar
        SIZE = 512
USE_INDEX = 1
         )
)
..................

Execute and Check Results

Execute the collector in TRACE mode and check $MACHBASE_HOME/trc/machbase.trc.
Executing the collector again based on the modified tpl file, and the result is confirmed.

mach> alter collector localhost.myclt stop; <== Stop the TRACE mode.
Altered successfully.
Elapsed Time : 1.246
mach> alter collector localhost.myclt start; <== Execute it again in a normal mode after the modification 
Altered successfully.
Elapsed Time : 0.036
mach>

The result of CUSTOM_TABLE is displayed as shown below.

mach> select tm, myuser, msg from custom_table;
tm                              myuser            
-----------------------------------------------------
msg                                                                               
------------------------------------------------------------------------------------
2014-08-18 13:57:59 000:000:000 superman          
 message-11 : This is the best machine data DBMS ever.

2014-08-18 13:56:44 000:000:000 spiderman         
message-10 : This is the best machine data DBMS ever.

2014-08-18 13:55:30 000:000:000 batman            
   message-9 : This is the best machine data DBMS ever.

2014-08-18 13:54:31 000:000:000 superman          
 message-8 : This is the best machine data DBMS ever.

2014-08-18 13:53:34 000:000:000 batman            
   message-7 : This is the best machine data DBMS ever.

2014-08-18 13:52:34 000:000:000 superman          
 message-6 : This is the best machine data DBMS ever.

2014-08-18 13:51:34 000:000:000 batman            
   message-5 : This is the best machine data DBMS ever.

2014-08-18 13:51:33 000:000:000 superman          
 message-4 : This is the best machine data DBMS ever.

2014-08-18 13:51:33 000:000:000 spiderman         
message-3 : This is the best machine data DBMS ever.

2014-08-18 13:51:19 000:000:000 superman          
 message-2 : This is the best machine data DBMS ever.

2014-08-18 13:51:19 000:000:000 spiderman         
message-1 : This is the best machine data DBMS ever.

[11] row(s) selected.
Elapsed Time : 0.000
mach>

results matching ""

    No results matching ""