Adding a custom log format to the Skyfence Cloud Discovery Free tool
Skyfence Cloud Discovery Free tool can analyze multiple log files with no practical limit on the size of each file (or combined total size) the tool can scan. The tool supports many popular web proxy and firewall log formats from major vendors (e.g. Bluecoat, Palo Alto Networks, GreenPlum, Websense) standard formats like the NSCA Common Log Format (CEF) and the format for the Squid free web proxy. Still, you may find you need to scan a format Skyfence does not currently support out-of-the box. The tool supports a straightforward, relatively easy mechanism to add a format if the need arises.
Since user traffic is usually contained in logs of security devices like firewalls, proxy/web proxy, network security monitors (e.g. IDS) it is recommended to work with the network administrators who manage these devices to export traffic logs into .csv or table formatted files with the specific fields (see below) which are required for analysis.
More than 90% of all medium and large organizations have one or more of the devices above capturing the majority of users Internet activity. This means almost all organizations have the log data required to use the Skyfence Discovery Tool and generate a detailed report.
Side note: For best scan results it is recommend enough log data is added to a single scan to account for at least a week to a month or more of well-distributed user traffic. Avoid logs files that were collected during times of low user access activity (e,g. when a large portion of the users from a location are offsite). .
There are three basic steps to add a custom format to the Skyfence tool. First, create the XML that defines the log file format. Second, ensure all the required fields are in the log data and that each field maps the column name to the relvant required field. Third, create the log .format file so your new log format type can be selected from the tool drop down menu before running your discovery scan.
The three sections below will explain each of the three steps.
- Create log file format definition (.format)
- Identification and mapping of fields to columns
- Creation of .format file
1. Create log file format definition
Please note that if any one of these fields is missing, then the tool cannot analyze the data (the tool will fail to produce a scan result and will warn the user rather than produce a partial or inaccurate result).
Below are the six fields required by the tool in order to analyze the logs.
- EVENT TIME - Time of the event
- SOURCE ADDRESS (IP or Hostname)
- TRAFFIC VOLUME (in bytes or Kb)
- DESTINATION HOST/HOSTNAME or IP (DNS name or IP)
- ACTION - Whether traffic was allowed or denied
- USER - Application user details
Now let’s describe the definition structure of .format file:
The log format definition file (.format) is an xml file and with nested tags/elements. There are five tags/elements used to define a format.
1. <name> - Name of .format file - This name will appear in drop down menu of discovery tool and any string(s) can be assigned
2. <type> - Type of log file, supported formats. Values are TABLE, CSV
3. <separator> - Delmiter/separator among columns of log file. The numeric character reference uses the format: &#nn; decimal form &#xhh; hexadecimal form. In this example,   refers to non-printing space.
4. <commentIndicator> - Character denoting a comment and is ignored by the tool
5. <fields> - Log file fields/columns. Within this tag, each <field> nested tag/element will define a column from the log file. See some of common fields and their definitions in the example below.
Below is an example XML file with each element/tag explained within comments in blue text. This definition can be used as starter sauce for your own custom formats.
<logFormat> <!-- Top level tag -->
<name>my custom format</name> <!-- Name of .format file - This name will appear in drop down menu of discovery tool and any string(s) can be assigned -->
<type>TABLE</type> <!-- Type of log file, supported formats. Values are TABLE, CSV -->
<separator> </separator> <!-- Delimiter/separator among columns of log file. The numeric character reference uses the format: &#nn; decimal form &#xhh; hexadecimal form. In this example,   refers to non-printing space. -->
<commentIndicator>#</commentIndicator> <!-- Character denoting a comment and is ignored by the tool -->
<fields> <!-- Log file fields/columns -->
<field> <!-- Field tag -->
<type>SOURCE</type> <!-- Field/column type - This value is hard coded and should not be changed. In this example, this field represents source of network or application traffic -->
<name>Origin address</name> <!-- Name of the field from the log file which maps to SOURCE. In this example, "Origin address" field/column will be parsed as SOURCE address within discovery tool. -->
<index>4</index> <!-- Sequence number of field/column starting from 0. In this example, field/column number 5 with a header name "Origin address" will be mapped to SOURCE. -->
<!-- Format of values. In this example, date time format is set -->
2. Identification and mapping of fields to columns:
As mentioned above please make certain mandatory columns are available in exported log files. The columns in the log file are indexed from left most (0) to right most. Therefore, if there is ten columns in the file the index number of the first column is 0 and the index number of the 10th column is 9. The six required fields don't need to be in a specific sequence, but must exist in the log file. The name of the column does not need to be the same as the relevant field, but should be properly mapped as in the example below.
In above example, time data is in column number 3 with a header name "Event time". That column is mapped to the required field "TIME" and has an index number of 2.
3. Creation of .format file:
If the log file or export data to be used is from one of the support vendors or standards, then use pull down option to pick correct format file and run the scan.
Note: Refer to KB article How to run a scan in the Skyfence Cloud Discovery Tool?
To create a new format file, follow these steps:
1. Go to folder C:\Program Files (x86)\CloudDiscovery\formats (Windows 64bit) and copy one of existing .format files to the following custom format directory: c:\users<name of user>\Documents\CloudDiscovery\formats\ . This folder contains all custom format.
2. Rename the copied file to name of your choice (e.g. "my own.format"). If log file is a table with some delimiter, then choose blue_coat.format and rename it. If the log file is a CSV file, then copy palo_alto.format and open file for editing.
3. Edit copied .format file in a text editor and change value of tag <name>my custom format</name> to match name of the format file. Please note that new .format names will be listed in drop down only after this step.
4. Change relevant values so that log file column headers are mapped to required fields used by the tool..
5. For mapping help, please refer section 2 of this article.
6. Once the mapping of fields to columns in log file is completed, the format file is ready to used for a scan. This new format file will also appear in "Log Type" drop down that appears when the user clicks on the “Add a file” or “Add a folder” buttons.
Running a scan with new .format file:
Refer to article How to run a scan in the Skyfence Cloud Discovery Tool? and follow the steps.
During step 5. Choose "Log Type" from available drop down options and select new .format file created.
Continue further to finish the scan as described in article How to run a scan in the Skyfence Cloud Discovery Tool?