FT8/FT4 spots & backend scalability

KD0BTO · July 12, 2024, 10:24pm

Hi Manuel,

I’m curious about the backend algorithmic implementation and the scalability of the spot ingestion/trigger matching. I’m an avid FT8 user, and most of my triggers are on DXCCs that are dropped entirely. It would be awesome to process the full pipe of spots from RBN & PSK Reporter again someday.

I work on backend systems processing high volumes of event data for my day job, so I’m naturally curious where the bottleneck is in the current architecture today. In an older thread (Do you need any programming help?), you gave some details about the current stack & some great ideas on what to do to scale things out. I imagine the intensity of the trigger matching is much higher than the gathering & preprocessing, so decoupling with a queue makes a lot of sense. Did you ever give rewriting part of the code in Rust a go?

73
David KD0BTO

HB9DQM · July 13, 2024, 4:37pm

Hello David,

The challenge is matching the > 20 million spots per day (without FT4/FT8 DXCC filtering) against the 100k+ triggers (with even more conditions) currently defined, without requiring a big expensive cluster with lots of CPU cores. Currently HamAlert runs on 16 dedicated cores, which is enough for the filtered spots, but not for the full feed.

Obviously a naive linear search of all triggers/conditions for each spot is long out of the question, and direct queries against MongoDB, where the triggers are stored, are also inefficient due to the query complexity, even with lots of indexes.

Therefore, the HamAlert backend uses a scheme that involves translating all the trigger definitions into hash tables, one for each spot attribute (e.g. band, mode, DXCC etc.). The keys of the hash tables are the possible attribute values, like ‘10m’, ‘20m’, or ‘cw’, ‘ssb’, or DXCC numbers), and the values are a list of trigger IDs (integers, in ascending order) that match this particular attribute value. So if somebody defined a trigger with a band condition for 40m, 20m and 10m, then that trigger ID would appear in the value lists of the band hash table under the keys ‘40m’, ‘20m’ and ‘10m’. If a trigger doesn’t have a particular condition, then it is included under a special ‘null’ key in that hash table, which is always appended to the list obtained by looking up the actual spot attribute value.

In this way, the backend only has to do one hash table lookup for each spot attribute, resulting in about 30 long lists of trigger IDs. On those, it performs set intersection, i.e. finding trigger IDs that appear in all the lists. This is made much more efficient due to the fact that the trigger IDs are already sorted, and “galloping lookahead” using binary search can be used.

The result is a list of matching trigger IDs, which can then be fetched to send the notifications to the corresponding users.

This is a simplified description; the reality is a bit more complex due to things like ‘not’ conditions or time ranges. And rate limiting notifications also adds more processing expense. But you get the general idea. It also shows why HamAlert does not like it when users define overly general triggers (like with only a ‘20m’ band condition), as those generate many thousand matches per day, only to be weeded out afterwards by the rate limiting. Such mostly useless triggers are automatically disabled to reduce the load.

If you have an entirely different idea on how to do trigger matching, I’m all ears. My current plan, as already outlined in the other post, is to rewrite the matching engine in Rust with the same basic algorithm, to hopefully gain enough performance over the current Node.js implementation to warrant turning off the DXCC filtering again. The basic problem of FT8 and digimode spots in general remains though: the number of spots tends to increase with the square of the number of users, as almost every user is also a spotter.

73,

Manuel HB9DQM

KD0BTO · July 14, 2024, 2:47am

Hi Manuel,

Your approach using hash tables/inverted indexes for each attribute value, including the ‘null’ for triggers without a condition on an attribute, makes complete sense - thank you for the detailed reply.

My best suggestion is to replace the attribute hash table values – the lists of triggers that contain a certain attribute value – with “roaring bitmaps”. See https://roaringbitmap.org and https://arxiv.org/pdf/1603.06549 for info on these data structures; they are quite popular for inverted indexes like those in the trigger attribute hash tables.

For set intersections, these can be hundreds of times faster than alternatives. Also, these compressed bitsets can represent sets of integers (IDs of triggers) in a very space-efficient way, making it easier to keep everything in-memory and take better advantage of CPU caching. For the “null” entries, where there are probably many trigger IDs in large contiguous runs, run-length encoding should cut the size significantly.

It sounds like the triggers already have integer IDs, and you mentioned they’re ascending, which makes me think they’re probably relatively dense. You’d add these unsigned 32-bit ints to the roaring bitmap for each attribute value. Then, for each report, fetch the ~30 corresponding bitmaps, and compute the set intersection + union with ‘null’, almost exactly like the current approach. If the ~100k trigger ID integers are relatively close to each other (dense), the set intersection will be between dense “chunks” internally, which happen using fast bit operations on the CPU. Fig. 1(c) of the paper shows the effect of higher density compared to sparse integer sets can be another order-of-magnitude or more speedup, on top of everything else that’s fast about these specialized bitsets.

I recommend using the croaring-rs crate (GitHub - RoaringBitmap/croaring-rs: Rust FFI wrapper for CRoaring), which is a set of Rust bindings over the official CRoaring C++ library (GitHub - RoaringBitmap/CRoaring: Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, and StarRocks). The C++ implementation (https://arxiv.org/pdf/1709.07821v4) has many specialized tweaks – using things like SIMD instructions & clever internal algorithms – to make set intersections/other operations even more blazingly fast.

It also looks like there’s a NodeJS implementation roaring-node: GitHub - SalvatorePreviti/roaring-node: Roaring for NodeJS . I’ve never worked with Node, and as much as I like Rust, it’d probably be easier/faster to initially experiment with this package to see how far we get without a rewrite.

A few more ideas:

It sounded like you might already be doing exactly this, but I just wanted to make sure: maybe by storing an is_disabled boolean in the DB trigger schema, those overly general triggers could have it set, and this bit is used to skip them during the inverted index builds. A UI change that prevents creating the most obviously bad ones, like a single attribute on a common band or single attribute of the 30 most common DXCCs, could be even better (e.g. I just created a test trigger with only the “80m” band condition/deleted it). If it’s easy to dynamically update the inverted indices, maybe most (all?) of the rate limiting could be done with removing the triggers + adding timers, but this is probably more complication than its worth.
If we have the dataset of recent reports plus the set of active triggers, it wouldn’t be too hard to figure out a nearly optimal order of attribute processing that shrinks the set size of intermediate triggers the fastest – by choosing the attribute with the largest “information gain” at each step. This would be a fun data exploration.
The high space-efficiency of bitmaps (even aside from any compression in roaring bitmaps) would let us cache the intersection/union sets for the first ~4-6 optimally-ordered attributes. This way, the few instructions required for a hash & lookup would cut out all the instructions for the most intensive set operations, for a tradeoff of ~1GB of RAM.

73,

David KD0BTO

HB9DQM · July 15, 2024, 5:38pm

Hello David,

Thank you very much for these excellent inputs! I went ahead and implemented Roaring Bitmaps in the matching engine to replace the existing sorted array based union and intersection operations. And boy has it made a difference! Those operations are now at least 30x faster.

I’ll let the CPU graph speak for itself:

…and this is with the FT4/FT8 DXCC filtering already turned off along with the change.

Much of the remaining base load is for the rebuilding of the hash tables from the trigger definitions, which occurs once a minute. There is more room for optimization there, once I manage to work around the limitations of Node.js IPC and worker threads. And of course, with careful tinkering, delta updates based on a change stream would also be possible (and allow for trigger changes to take effect immediately).

Yes, that is/was already the case.

Makes sense – as a more generic solution, I’ve always wanted to implement some kind of prediction in the UI (“this trigger will match about xxx spots per day”), but never got around to doing it.

Your other ideas (attribute processing order optimization, caching) sound very good as well. But it looks like for the moment, after the Roaring Bitmaps implementation, the low hanging fruit are elsewhere

73 and thanks again,

Manuel HB9DQM

HB9DQM · July 22, 2024, 7:54am

If anybody’s still following this, here’s a quick update:

The new backend version with Roaring Bitmaps is working well, and HamAlert is processing close to 30 million spots per day again (and sending out > 600k notifications). Initially there was a memory leak issue, which caused a downtime on Friday morning. I spent a few hours debugging it, and ultimately traced it down to a bug in roaring-node. PR filed, problem solved.

KD0BTO · July 27, 2024, 6:18pm

Hi Manuel,

I’m so happy the roaring bitmaps dramatically improved the performance. 5x+ the spots with less than half the CPU is amazing. I noticed a bunch of my triggers started notifying again since the change - I’m sure there’s probably quite a few people who’ve seen the same Also, very cool you found the memory leak bug in the Node bindings & got that fixed. Thanks for the work to test/deploy the new approach so quickly!

– 73, David, KD0BTO