masscan logging demystified

Deep dive into logging options for the ultimate scanning tool

Posted by Emre Bastuz on January 20, 2016

Introduction

Recently I got around to do some testing with masscan, a great tool by Robert Graham for doing portscans on large networks.

I did get some inspiration from an article on blog.erratasec.com that describes how the scan results can be saved in a binary format and later on converted to other supported log formats like "grepable" or "json".

In this article I describe what I learned about the logging features that masscan provides and also show a patch that I created to add the "Timestamp" field to the "grepable" log format.

During some testing I also noticed a possible bug for the feature of logging results into a Redis database and will describe what I found.

Capabilities and samples

"masscan" offers the features to log scan results in different formats, those formats being

  • binary
  • xml
  • grepable
  • json
  • null
  • redis
  • text (a.k.a. "list")
  • unicornscan

(see the "src" folder in the masscan distribution, particularly the files starting with "out-").

The best logging option for doing a large scale scan is to use the binary format, as it is the most detailed one and also requires the least disk space.

This binary format can then be converted to another format later on.

Doing a scan with binary logging looks the this:

masscan 192.169.0.0/16 -p443 --banners -oB mynetwork.bin

where the option "-oB" specifies the binary format, followed by the output filename.

Once the scan is completed, the logging data can be output to another format like so:

masscan --readscan mynetwork.bin --output-filename mynetwork.json --output-format json

Depending on the scan result for a particular host, masscan will write one line of log entry per finding type (in grepable format as an example):

  • Port is open:

Host: 192.168.0.1 () Ports: 443/open/tcp////

  • Port is open, service is identified, banner is available:

Host: 192.168.0.1 Port: 443 Service ssl Banner: TLS/1.1 cipher:0xc013, www.mysite.com, www.mysite.com

  • Port is open, service is identified and involves an x509 certificate (the x509 certificate logged as "banner" field):

Host: 192.168.0.1 () Port: 443 Service: X509 Banner: MIIFAjCCA+qg...r1O0=

Differences in Log Formats

Not all available logformats do contain the same level of detail.

For an overview, please see the following table:

time ip port state/status ip_proto reason  ttl proto/service owner sunrpc version banner cert
binary yes yes yes yes yes yes yes yes n/a n/a n/a yes yes
xml no yes yes yes yes yes yes yes n/a n/a n/a yes yes
grepable no yes yes yes yes no no yes n/a n/a n/a yes yes
json no yes yes yes yes yes yes yes n/a n/a n/a yes yes
null n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a
redis no yes yes no yes no no no n/a n/a n/a no no
text yes yes yes yes yes no no yes n/a n/a n/a yes yes
unicornscan no yes yes yes yes no yes no n/a n/a n/a no no

Please note that the fields "sunrpc", "owner" and "version" are mentioned in the source code for grepable and unicornscan output but are not actually used by the software.

Please also note that some of the formats mentioned above are available during the scanning process (binary, xml, grepable, json and text/list) and some additionaly during conversion (redis, unicornscan).

Picking the right format

As seen in the table above, the most detailed logformat is the "binary" format. I strongly recommend using this for saving the scan results.

In case the data needs to be converted to be fed into a parser, this can easily be accomplished by using the "--readscan" option and let masscan do the conversion.

I found the "grepable" format to be the easiest to parse with regular expressions as the data fields are are preceded by a field identifier like "Host: ", "Ports: ", etc.

Unfortunately, the "grepable" format does not provide the "Timestamp" field. However this field can be easily added with a small modification in the source code.

The patch that I have used for this can be found here:

--- out-grepable.c   2016-01-19 23:39:22.000000000 +0100
+++ new.out-grepable.c  2016-01-07 18:35:56.000000000 +0100
@@ -130,12 +130,14 @@
 grepable_out_status(struct Output *out, FILE *fp, time_t timestamp,
     int status, unsigned ip, unsigned ip_proto, unsigned port, unsigned reason, unsigned ttl)
 {

  • UNUSEDPARM(timestamp); +/*** UNUSEDPARM(timestamp); ***/ UNUSEDPARM(out); UNUSEDPARM(reason); UNUSEDPARM(ttl);

  • fprintf(fp, "Host: %u.%u.%u.%u ()",

  • fprintf(fp, "Timestamp: %lu", timestamp);
    +

  • fprintf(fp, "\tHost: %u.%u.%u.%u ()", (unsigned char)(ip>>24), (unsigned char)(ip>>16), (unsigned char)(ip>> 8), @@ -167,11 +169,13 @@ char banner_buffer[4096];

    UNUSEDPARM(ttl);

  • UNUSEDPARM(timestamp);

  • /*** UNUSEDPARM(timestamp); ***/ UNUSEDPARM(out); UNUSEDPARM(ip_proto);

  • fprintf(fp, "Host: %u.%u.%u.%u ()", +

  • fprintf(fp, "Timestamp: %lu", timestamp);
    +

  • fprintf(fp, "\tHost: %u.%u.%u.%u ()", (unsigned char)(ip>>24), (unsigned char)(ip>>16), (unsigned char)(ip>> 8),

To use it, just download the file as "out-grepable.c-diff" and do a

 patch out-grepable.c out-grepable.c-diff 

in the folder "/path/to/masscan/src/".

After a recompilation of "masscan", the field "Timestamp" will be available in the log format "grepable":

Timestamp: 1453243603 Host: 192.168.0.1 () Ports: 443/open/tcp////

Output to Redis

When I played around with the output option "Redis" I noticed some odd behavior.

To reproduce it, just fire up "masscan" to write binary scan results to a Redis database:

masscan --readscan myscan.bin --output-format redis --redis 127.0.0.1:6379

After all data has been transferred to the DB, masscan sends a "+PING" and expects a confirmation string of "+PONG".

In my case the DB sent a ":0" instead and caused masscan to exit with an error message: redis: unexpected response from redis server: :0

As outputting log data to Redis was just some playing around for me, I did not get too much into figuring out the cause for the error.

For the interested reader: the Redis version used was 2.8.17-1, the one that comes with Debian Jessie at the time of writing this article.

Where to go from here

After having converted the scanning data to the format of choice, it was quite simple to move the data into a database for further analysis.

I chose to use a combination of Elasticsearch, Logstash and Kibana for this.

The details on that particular setup will be covered in a followup post.