Good UNIX tools

aka:  Small things done well 

We spend a lot of time sweating the details when we build Canary. From our user flows to our dialogues, we try hard to make sure that there’s very few opportunities for users to be stuck or confused.

We also never add features just because they sound cool.
Do you “explode malware”? No. 
Export to STYX? No. 
Darknet AI IOCs? No. No. No.. 

Vendors add rafts of “check-list-development” features as a crutch. They hope that one more integration (or one more buzz-word) can help make the sale. This is why enterprise software looks like it does, and why it’s probably the most insecure software on your network.

This also leads to a complete lack of focus. To quote industry curmudgeon (and all around smartypants) Kelly Shortridge: "it is better to whole-ass one thing than to half-ass many". We feel this deeply.

Most of us cut our teeth on UNIX and UNIX clones and cling pretty fastidiously to the original Unix philosophies¹:
  • Make each program do one thing well
  • Expect the output of every program to become the input to another
This is pretty unusual for modern security software. Everybody wants to be your “single pane of glass”. Everybody wants to be a platform.

We don’t. 

Tired: Vendors trying to be an island.
Wired: Vendors who work well together.
Inspired: Let’s get ready to Rumble...

Rumble, HD Moore’s take on network discovery, shares a similar perspective, and provides effortless network inventory visibility without credentials, tap ports, or heavy appliances. Rumble tries to provide the best inventory possible through a single agent and a light (but smart) network scan that is safe to run in nearly any environment. (If you are a fan of the quick deployment and light touch of Canaries, you should check out Rumble’s similar approach to network asset inventory!)

It's fast, It has a free tier, and now It integrates with your Canary Console too.

To illustrate this integration, assume someone reaches out to a (fake) Windows Server called \\BackupFS1 and copies \\Salaries\2020\Exco-Salaries.xlsx. Your Canary will send a single, high fidelity message to let you know that skullduggery is afoot. We can tell you that host-192.168.2.136 accessed the Canary, and that AcmeCorp/Bob (or his creds) accessed the share. 


We give you some details on the attacker, but what if you were also running a Rumble inventory of this network? Well then we can simply hand you over.

From June, Canary customers who are also running Rumble, will notice a new integration option under their Flocks Settings. 

    
Rumble Integration Settings

Once this is turned on, IP Addresses in alerts will include a quick link that allows you to investigate the address inside of Rumble.


The integration is light and non-obtrusive, but should immediately add value. It also affords us a touch for a slight flourish. It’s possible that you could use both Canary and Rumble, and never visit the settings page to enable the feature. We have users with hundreds of birds who only visit their Console once or twice a year (when there’s an actual alert). It’s ok. We got you!


The Canary Console will automatically detect if you have a valid Rumble login, and if you do, will enable the integration to show you the link². You won’t have to think about it, it will “just work”.



____________

¹ https://archive.org/details/bstj57-6-1899/page/n3/mode/2up

² If you hate this, you can stop it from happening by setting the integration to “never” in your settings.


Why control matters

In March we moved from Groove to Zendesk - with this migration our Knowledge Base (KB) moved also.

The challenge we faced was name-spacing - KB articles hosted on Groove were in the name-space  http://help.canary.tools/knowledge_base/topics/, but the namespace /knowledge* is reserved on Zendesk and is not available for our use. This forced us to migrate all KB pages to new URLs and update the cross-references between articles.  This addressed the user experience when one lands at our KB portal  by clicking a valid URL or when typing https://help.canary.tools in a browser.

What isn’t resolved though, is thousands of Canaries in the field that have URLs now pointing to the old namespace. We design Canary to be dead simple, but realise that users may sometimes look for assistance. To this end, the devices will often offer simple “What is this?” links in the interface that will lead a user to a discussion on the feature.

With the move (and with Zendesk stealing the namespace), a customer who clicked on one of those links would get an amateurish, uninformative white screen saying “Not Found”.

This is a terrible customer experience! 

The obvious  way forward is a redirect mechanism which maps https://help.canary.tools/knowledge_base/* URLs to the Zendesk name-space. By implication - the DNS entry help.canary.tools cannot point directly to Zendesk; it needs to point to a system that is less opaque to us so can we configure it at will.


That’s straight-forward! On a fuzzy-match we should have something up and running in minutes with AWS CloudFront . This allows us to map name-spaces from https://help.canary.tools/* to https://thinkst.zendesk.com/* with minimal effort.

Step 1:
URL: https://help.canary.tools/some/uri
GET /some/uri HTTP/1.1

Step 2:
URL: https://thinkst.zendesk.com/hc/en-gb/some/uri
GET /hc/en-gb/some/uri HTTP1.1

The next step is to intercept request to the /knowledge_base name-space, and return an HTTP/301 redirect to the correct URL in  Zendesk.  We make use of the Lambda@Edge functionality to implement a request handler.


30 minutes later and few lines of Python it seemed like we had it all figured out but for one not-so-minor detail - images in KB articles weren’t loading. What is going on?

The DNS record help.canary.tools was pointing to CloudFront while the origin for the content was configured as thinkst.zendesk.com, so when CloudFront requested an image it got HTTP redirect back to itself causing an Infinite redirect loop.

Surely this is fixable by adding the correct Host: canary.tools header to the request ? Nope! Instead of a redirect, now we were getting a 403 from CloudFlare (N.B NOT CloudFront) - Zendesk uses CloudFlare for its own content delivery. WTF?!?

After a few iterations the “magic” (read: plain obvious) incantation was discovered. Note 104.16.55.111 is the IP address behind thinkst.zendesk.com

This is somewhat expected, since Zendesk is configured to think it's serving requests for help.canary.tools.

Without this option Zendesk rewrites all relative URIs in the KB to yet another name-space: https://thinkst.zendesk.com/* which brought its own set of challenges, complexity and non-deterministic behavior.

To avoid confusion and further issues down the line we imposed a design constraint on ourselves - a simplifying assumption: the browser’s address bar should only ever display help.canary.tools - the  thinkst.zendesk.com name-space should never leak to customers.

Committed to this approach, the next hurdle we faced was Server Name Indication (SNI). 

Server Name Indication (SNI) is an extension to the Transport Layer Security (TLS) computer networking protocol by which a client indicates which hostname it is attempting to connect to at the start of the handshaking process. This allows a server to present multiple certificates on the same IP address and TCP port number and hence allows multiple secure (HTTPS) websites (or any other service over TLS) to be served by the same IP address without requiring all those sites to use the same certificate.

CloudFront was doing exactly what it was configured to do. It connected to (and negotiated SNI for) thinkst.zendesk.com which resulted in a 403 error because Zendesk is configured for SNI help.canary.tools.

For any of this to work, what CloudFront needed to do was connect to  thinkst.zendesk.com ( 104.16.55.111 ), but negotiate SNI for help.canary.tools. By any other name - we needed “SNI spoofing” (not really a thing - I just coined the phrase).

Can CloudFront do that? No, it can’t :_(   And just like that we had to rethink our approach - CloudFront was not the solution.

Another failed approach was setting the Host mapping field in Zendesk to kb.canary.tools, and it may have worked but for a bug in Zendesk which fails to re-generate SSL certificates when the Host mapping field is updated in their admin console, so browsing to https://kb.canary.tools was met with certificate validation errors. How long does it take for Zendesk to rotate certificates? We don't know (but it's more than 30 minutes).

There were just too many moving parts in too many systems to allow us sanely and consistently reason about customer experience.

Retrospectively, the root-cause of all our problems was still related to name-spacing: both CloudFront and Zendesk (rightfully) believed they are authoritative for the hostname help.canary.tools

  • From the perspective of the entire  Internet help.canary.tools points to CloudFront.
  • From the perspective of CloudFront - help.canary.tools points to Zendesk.
So if both systems share the same name in public, how do the two systems address each other in private?
The answer was some form of Split-Horizon DNS. The least sophisticated version would've been to simply hack the /etc/hosts file on the host serving requests for help.canary.tools, but this functionality exists natively in Nginx's upstream{} block. Of course, those IP addresses could change, but this is a manageable risk that can be remediated in minutes. In contrast round-trip times on tickets to Zendesk are measured in days.

The proxy_ssl_server_name option enables SNI., and the kb_uri variable uses http_map_mod for performing lookups/rewrites on URLs in https://help.canary.tools/knowledge_base/* name-space.

In the end, the Nginx configuration necessary to address our needs was as simple as this:

map $request_uri $kb_uri {
     default "";
     /knowledge_base/topics/url1 /hc/en-gb/articles/360002426777
     ....
}
upstream help.canary.tools {
# dig help.canary.tools
server 104.16.51.111:443;
server 104.16.52.111:443;
server 104.16.53.111:443;
server 104.16.54.111:443;
server 104.16.55.111:443;
}
server{
listen 443 ssl;
server_name help.canary.tools;
location / {
proxy_pass https://help.canary.tools/;
proxy_ssl_server_name on;
}
location /knowledge_base/ {
if ($kb_uri != ""){
return 301 https://help.canary.tools$kb_uri;
}
return 302 https://help.canary.tools;
}
}

Where are we now?

https://help.canary.tools is now Nginx running on EC2.  It's all Dockerized and Terraformed so the configuration and deployment is reproducible in minutes. 



Nginx SSL certificate renewals and refreshes are automated using CertBot (thanks to this guide). Down the line we can add mod-security giving us a level of visibility into potential attacks against - this level of visibility is unfathomable even if CloudWatch was a viable solution.

Using Docker's native support for AWS CloudWatch all Nginx access logs land up in CloudWatch which  gives us dashboarding, metrics and alarming for free.


We now get alerted every time a customer attempts to access a missing URL in the https://help.canary.tools/knowledge_base/*. Mean while, the customer doesn't get an ugly "Not found" error message - they are redirected to our Knowledge Base home page where they can simply use the search function. This has already paid dividends in helping us rectify missing mappings.


From CloudWatch we can directly drill down into Nginx access logs to examine any anomalous behavior.



This is a stark contrast from the world where the application layer was opaque to us - bad user experiences and broken links would have gone completely unnoticed.

Control matters. This is why.