What Is a Mining Pool
As you may know, 2Miners are a quite intensive operation: we have almost 40 pools, most of which are specialized software on dedicated physical servers located in three locations around the world.
A moment of clarification: each pool is a combination of four essential components:
- A cryptocurrency node that hosts the blockchain, submits new blocks found by the pool miners to the network and processes the payouts to the miners.
- The specialized custom-written pool software that accepts and verifies miner solutions, distributes new computing work and analyzes miner statistics.
- A high-performance database that accumulates all submitted shares, stores the miner-related statistics and every other relevant data.
- The Web frontend displays the miner data, graphs, payout history and so on.
As you may guess, any of these four components can face different problems and if any of those fails, it will affect the operation of the pool. How the Mining Pool Works: PPLNS vs. SOLO.
Most Common Mining Pool Problems
The way cryptocurrencies work is by no means error-prone and all kinds of bad situations may occur. Here are the most common ones that we face more or less on a regular basis:
- There is a network split causing the cryptocurrency node to mine on a wrong chain causing rejected blocks and the loss of profit for our miners;
- Due to some bug in the cryptocurrency node the mining process is affected or interrupted;
- There are issues with the network connection between our servers in different data centers around the world (we currently host our servers in Europe, Asia, and America);
- There is a DDoS attack running against our servers by some malicious attackers who try to disrupt the operations;
- The disk or RAM space on one of our servers is low or there is a hardware component fault so there is a risk of losing some data;
- A ghost of Vitalik Buterin is trying to steal some blocks from our ETH pool (okay, just kidding about this one).
We have designated people that are reacting to threats and incidents 24/7/365. But how do those specialists know there’s a problem that needs to be solved?
Enter the Telegram. We use it extensively for our work-related chats, including problem monitoring. Specialized custom-developed bots watch every aspect of day-to-day pools operation and report to us any problems and irregularities. Not every warning means a real problem, so the DevOps team carefully checks each message and reacts to the ones that need attention.
What stuff do we actually monitor? Let me describe in more detail which aspects of the pools are being watched.
Zabbix is an open-source powerful monitoring system that can be extended by custom plugins to watch over some custom aspects of each machine we have in operation. Zabbix watches the disk space, CPU and RAM usage, current traffic and network load, the health status of our web frontends and microservices. Also, we have a set of custom plugins that watch over the availability of mining ports, mining luck, and payment issues.
Each incident is tunneled over to a special private Telegram chat and then analyzed and acted upon, if needed.
It is worth noting that we try to keep the CPU load of our servers below 20% to allow for unexpected short spikes in usage and to provide a good mining experience for all of our users. If there is a steady increase in the CPU load that is not going away, it is a good reason to enable backup hardware infrastructure to mitigate the load.
Orphan Blocks Monitoring
Some Ethereum-based cryptocurrencies have a mechanism called “Orphans” for blocks that were mined with the correct solution, but the blockchain network has chosen to pick some other block mined by some other miner or pool because it was mined a few seconds earlier or so; the orphans are usually also rewarded but in smaller amounts. You could find more information on that in our post Orphan, Stale & Uncle Blocks in Bitcoin and Ethereum. In Bitcoin world, term “orphan” is usually used to indicate blocks that have a correct solution, but that were not accepted onto the main blockchain for various reasons (mined later than somebody else, or mined onto an orphaned chain); these are not rewarded.
Either way, an orphan appearing is a reason to do a quick check: are the nodes healthy or they are in the early stages of a split? Is network connectivity stable enough so the solutions are delivered to the rest of the blockchain network as fast as possible?
Blockchain Split Monitoring
So-called “network splits” happen when some network nodes disconnect from each other and start producing a distinct chain of blocks. Once they reconnect, the blockchain chooses one of the chains, making the entire another chain invalid (or orphaned). This is potentially bad for our miners as it means a potential loss of profit. This is why we try to detect these situations as early as possible and try to eliminate them. Chain split could happen accidentally or it could also be a result of a 51% attack.
A special monitoring bot takes various sources of information (our own nodes, some well-known explorers on the Web, some other pools’ data, etc) and estimates whether our cryptocurrency nodes are in the correct chain or not. If not, an alert is emitted so our engineers can verify it. It is worth mentioning that an official coin explorer can often be itself in a split, so we trust pools’ data more (after all, it’s the source of new blocks so they often have more accurate information).
Other Emergency Mining Pool Monitoring
The error conditions above are not nearly the exhaustive list of what can go wrong with a cryptocurrency mining pool. There are lots of times when due to programming bugs the cryptocurrency nodes report an error and stop working, requiring a restart. All of these cases are handled by a separate bot. So whenever something that’s not in the predefined list goes wrong, this bot will pick it up and report to the DevOps team. Then some issues are as easy as “restart the node”, and some require some thoughtful investigation sometimes resulting in a bug report to the cryptocurrency maintainers or even a pull request to fix the bug.
Last but not least, while all of the things mentioned above are private and used internally, we do have a Telegram bot that’s accessible for any miner: @Pool2MinersBot. It is quite a powerful bot that latches on the same backend as our internal tooling. Not only can it report to you whether one or more of the workers has gone online or offline, but it also notifies when your worker finds a block and provides a useful source of information for the statistics of your mining operations.
I believe a professional and well-thought process of monitoring existing resources is key to a successful mining pool business. While almost anyone with mediocre programming background could assemble a simple pool using the open-source software, making it handle a big load and do so in a performant manner, while watching for all the issues a pool may have, takes a team of highly skilled professionals, just like with anything else. We at 2Miners care about our users as our own income depends on how well we do so.
That’s it for now. Peace!