Centralisation in Bitcoin Mining: A Data-Driven Investigation

When constructing our pipelines to produce metrics for bitcoin miners, we made some interesting observations and we wanted to share our experiences a little more via this blog post. We'll start with some background, before moving onto an overview of our process with some metrics on labelled addresses and network hashrate. Finally we'll look at some of the anomalies and quirks we observed.

Bitcoin is a decentralised, nearly trustless system for individuals to safely store and transfer value to one another without let or hinderance by formal governance except that encoded in the bitcoin protocol itself.

However, in 2020, bitcoin has also become a highly centralised system that places an increasing amount of trust in a small number of large entities.

A few large crypto exchanges dominate the crypto landscape, whilst mining is now dominated by a small number of entities with a power base (both figuratively and literally) in China.

At TokenAnalyst we offer multiple metrics related to mining entities including; the balance they hold, how much bitcoin they send and receive, and the percentage of the total network hashrate that they control. Our metrics are calculated by identifying which addresses are controlled by mining entities (i.e. they know the private keys), and identifying the blocks they mined. To provide a best-in-class data offering we have integrated multiple sources of information, and used careful investigation to analyse and eliminate inconsistencies and further applied our own modelling techniques to create the most comprehensive and up-to-date dataset on miners as possible.

Block tagging

Each block that is added to the bitcoin blockchain contains a set of transactions. The first transaction in each block is known as the coinbase transaction. It is different to the other transactions in the block because it does not spend any previously unspent transaction outputs (UTXOs) from a previous transaction(s). It does however produce outputs or bitcoins, which are the miners' reward for successfully confirming the transactions in the block. So this transaction generates new bitcoins up to the value of the current block reward (currently 12.5BTC). It is also the transaction where the miner can claim any fees that the senders of the others transactions were willing to pay (when space in a block is limited, miners typically select the transactions that pay the highest fees to fill it up first).

A typical bitcoin transaction. Two addresses (or public-key hashes) spend 3 UTXOs in a transaction that produces a single new unspent transaction output worth $BTC 5.0. The difference between the inputs and the output is 0.1 which is available to the miner to claim as fees.
A coinbase transaction. No UTXOs are spent, but the miner can use this transaction to claim the block reward of $BTC 12.5, and to collect any fees from other transactions they managed to include in the block. The coinbase data field often contains a miner-specific tag. This transaction includes the BTC.com tag.

It is up to the miner to decide how they will distribute this reward - it could be paid in its entirety to a single address as a single output, or split into many smaller outputs to the same address, or to multiple addresses. They could claim the whole reward or even only part of it, either by design or by unfortunate accident! The only restriction is that the value claimed cannot exceed the block reward plus the fees.

In place of inputs the coinbase transaction has a data field which may contain up to 100 bytes of arbitrary data. With the sheer number of hashes required to succesfully mine a block this field is an important source of entropy, but an additional common use of this field nowadays, is to encode a hexadecimal representation of a string (or tag) which can be used as a kind of identity signature. Many miners insert a tag to show they mined that block, e.g. the hex data in the coinbase field of block 614,884 is:

03e46109049b03305e455530322f4254432e434f4d2ffabe6d6d6b30cb489fb6061752ac81e
93d4d7943d6cfd5fa63ca8afabad6c97c9c76015c08000000ceeed33d27a5939ca51d030000
000000

In UTF-8 this translates as:

# note the 'BTC.COM' embedded into the translation
�a �0^EU02/BTC.COM/��mmk0�H��R���=MyC����cʊ����|�v\\���='����

The /BTC.COM/ tag indicates this block was probably mined by BTC.com. There is no requirement to do this, and indeed since any miner is free to insert any data up to 100 bytes in length in a block they mined, they could easily tag the block with the name of another mining pool if they wished. It is therefore important not to rely solely on the tag as a source of truth. A mining pool may want to try to obfuscate the amount of network hash power they control by tagging blocks with different names and thus appearing as multiple entities when they are really one and the same entity.

Hexadecimal (left) coinbase strings and the decoded UTF-8 equivalent (right) showing miner tags in red.

Address labelling

We already have a pipeline for labelling exchange addresses, which also includes conducting transactions with exchanges to track on-chain movements of funds. When we label an address as belonging to an entity we mean:

that entity alone controls the private key for that address

When you send bitcoin to an exchange for instance, you usually have a unique deposit address, however, once you deposit that bitcoin you no longer have custody of it. The exchange is free to distribute and store your funds anywhere within their wallet infrastructure.

Mining pools may have many participants who contribute their hashpower to the pool. In return for doing so the mining pool pays them a proportion of the bitcoin they received from successfully mined blocks. The participants reward addresses are not controlled by the mining pool - the participants are free to switch their hash capacity to another pool at any time and can use the same reward address to receive their share of the mining rewards from multiple pools. A mining pool with potentially thousands of users can easily operate with relatively few addresses if necessary.

This is of course a simplification - there are multiple models that different mining pools use to pay rewards to their participants. For our pipelines one of the assumptions we do make (following Romiti et al) is that for blocks where a single address is used to claim the block reward, this address must belong to the mining pool - otherwise the mining pool is giving up custody of its revenue stream. In those blocks we can use the coinbase tag to extract the probable miner name and then the give the address that label. A downstream process is to then back-check what other addresses that labelled address has ever spent with. Using the common-input ownership heuristic, if it clusters exclusively with labelled addresses of the same entity, then we have more certainty that this address is controlled by the same entity.

In the two transactions above, assuming the common-input ownership heuristic holds, then public-key hashes (pkh) A, B, C and E must all be controlled by the same entity.

One of the problems we faced when doing this kind of checking, is that at least one major mining pool has participated in a so called CoinJoin transaction in its history, which breaks the assumption of common spend. A way to get around this is to identify probable CoinJoin transactions and then exclude these transactions when looking at the relatedness of private keys.

Metrics

At the time of writing there are almost 276k distinct addresses that have ever received coinbase rewards. After applying our labelling algorithms and meticulously QA'ing our metrics for every miner we were able to label just over 148k miner addresses across the whole history of the bitcoin block chain. Of these 148k labelled addresses only 6.3k have ever received a coinbase reward. The other addresses are used by miners for other purposes, such as internal routing and change wallets to facilitate payments to mining pool participants.

Of the 276k addresses that ever received coinbase rewards, 270k are unlabelled, and of these, 225k addresses (81.5% of the total) were only ever used once. Of the 6.3K labelled addresses that received block rewards, just 517 addresses received block rewards on more than 1 occasion (0.2% of the total). However, these 517 addresses are responsible for receiving a vast proportion of the block rewards in recent years.

The top 10 most re-used addresses for receiving block rewards all belonged to known entities (some of which are now defunct) and were used for more than 10,000 separate block rewards each, shown in the table below:

Rank Times used Address Miner
1 51,362 1KFHE7w8BhaENAswwryaoccDb6qcT6DbYY f2pool
2 26,204 14cZMQk89mRYQkDEj8Rn25AnGoBi5H6uer btcguild
3 23,083 1CjPR7Z5ZSyWk6WtXvSFgkptmpoi4UM9BC ghash-io
4 19,733 1CK6KHY6MHgYvmRQ4PAafKYDrg1ejbH1cE slushpool
5 16,858 152f1muMCNa7goXYhYAQC61hxEgGacmncB btcc
6 16,387 18cBEMRxXHqzWWCxZNtU91F5sbUNKhL5PX viabtc
7 13,749 1Hz96kJKF2HLPGY15JWLB5m9qGNxvt8tHJ btc-top
8 13,391 bc1qjl8uwezzlech723lpnyuza0h2cdkvxvh54v3dn btc-com
9 13,296 1Nh7uHdvY6fNwtQtM1G5EZAFPLC33B59rB antpool
10 10,339 18d3HV2bm94UyY4a9DrPfoZ17sXuiDQq2B eligius

This table is not a proxy for number of blocks mined except in the case of F2Pool where 1KFHE7w8BhaENAswwryaoccDb6qcT6DbYY appears to be the only address used by F2Pool to receive block rewards. Other miners use multiple addresses. We are now tracking over 100 different mining entities. We obtained the most addresses for BTC China (BTCC), HaoBTC (Bixin), Poolin and Slushpool.

Today, 95% of the blocks mined are by mining pools that TokenAnalyst have labelled.
The proportion of hashrate attributed to different mining pools. The miners who contributed the largest proportion of the hashrate over the last 12 months are labelled.

What's a mining pool anyway?

At TokenAnalyst, we perform numerous QA checks on our data. One thing that stood out for us, is when we compared the list of blocks claimed by pool.btc.com, it did not exactly match the blocks that contained the /BTC.COM/ tag in the coinbase data. The most recent block (as of Jan 27th 2020) to contain a /BTC.COM/ tag which was not claimed by the BTC.com pool was 614297. A couple of possibilities sprang to mind:

  • Someone is putting the BTC.com tag into the block to obfuscate their hash power
  • An unknown mining pool is using BTC.com's backend software which adds that tag by default and they were not aware of this

Upon further investigation however the same address (bc1qjl8uwezzlech723lpnyuza0h2cdkvxvh54v3dn) that has been used by BTC.com to claim block rewards more than 13k times was also the address to receive the block rewards for this block. This could have been an error on the pool's block listing page, but there are a total of 658 blocks to date which have the /BTC.COM/ tag which are not claimed on pool.btc.com.

Neither of the above scenarios seemed to fit in this case. Another possibility arose:

someone was mining on behalf of BTC.com using hardware that was not directly controlled by the pool.

In December of 2018 the computing power-sharing platform BitDeer launched, announcing partnerships with AntPool and BTC.com. It allows users to rent mining power on the BitDeer platform without buying or setting up crypto mining hardware. Inspection of the coinbase data for that particular block revealed the tag /BTC.COM/BitDeer. Since the pool.btc.com does not list this as a mined block, BitDeer must run the hardware externally to the BTC.com pool infrastructure, and assign the block reward to the BTC.com reward address. BTC.com must then initiate payments to miners from the BitDeer platform. The chain of trust extends from the consumer to the BitDeer compute power sharing platform to the the mining pool operators and back again. Furthermore, BitDeer announced a partnership with chip manufacturer Bitmain (who own AntPool), who by agreeing to collaborate have a powerful way to ensure their AntMiner line of products are used by a cabal of major mining pools at a time when they appeared to be struggling. The level of co-operation and trust required to operate this mining model between the mining pools, Bitmain and BitDeer blurs the line between distinct entities. At this point in time BitDeer's partnership model looks like this:

Links from chip manufacturer to mining pool entities via computer power sharing platform BitDeer
On 27th January 2020 these 5 mining entities controlled 49.9% of the hashrate of the bitcoin network.
The mining pools available via the BitDeer platform collectively control around 50% of the bitcoin network hashrate.

Collaborating and co-operating would allow involved mining pools to hedge against risk - the risk of a single mining facility going offline, the risk of customers switching between pools, the risk of energy price fluctuations in particular regions. Once a consumer has paid BitDeer for a plan, they can assign that hashrate to any partnership mining pool. Does it really make a difference any more? What is to stop the entities merging into a single large entity that maintains a degree of separation via address structure.

Another anomaly was spotted when we QA'ed our data for BTC.top and 1THash&58Coin. We found that addresses used to receive coinbase transaction payouts by these two miners either subsequently broke the common input ownership heuristic by spending together, or are in fact addresses controlled by the same underlying entity. One of these addresses, which has been used to receive coinbase transaction outputs starting at block 474,000 has been the single payout address where the miner tag has changed from /BTC.TOP/ to /canoepool/ to the current tag /1THash&58COIN/:

Blocks Payout Address(es) Tag
474000 - 497286 147SwRQdpCfj5p8PnfsXV2SsVVpVcz3aPq /BTC.TOP/
500540 - 539426 147SwRQdpCfj5p8PnfsXV2SsVVpVcz3aPq /canoepool/
591615 - Present 147SwRQdpCfj5p8PnfsXV2SsVVpVcz3aPq /1THash&58COIN/

Conclusions

Mining has changed dramatically over the decade of bitcoin. Early enthusiasts running GPU rigs were joined by a larger consumer base who moved on to FPGA and ASIC miners once they were released, eventually forming mining pools as mining a block became evermore difficult. Today's large-scale cloud mining operations have lowered the barrier to entry further by allowing individuals to purchase hashrate plans at a level and for a duration they can afford. There's no need to expend your own capital, no complicated set-up and admin, and no maintenance required. Computer power sharing platforms like BitDeer have reduced the barrier to entry for the average consumer who may just want to get mining, and this is a good thing. However, any centralisation of bitcoin network hash power should be of concern as it erodes the trustless model of the network. Consumers looking to enter the mining market (and those already participating) should always be aware what competition there really is between mining entities and to ensure no one entity controls too great a proportion of the networks hashrate. This is difficult if you don't know what an entity constitutes any more. So what steps can you take to ensure the security of the bitcoin network?

The original model of bitcoin mining was to have a large number of independent miners which would make it all but impossible to attack the network by reversing bitcoin transactions. However, due to the competition for highly valuable block rewards, the only deterministic way to earn mined BTC is to join a mining pool. So do your research before you connect your hardware to a pool, or pay for a hashrate plan on a particular pool. Joining a more independent mining pool or a pool with a smaller proportion of the hash rate will help ensure the integrity of the network. If all mining pools are forced to co-operate (which under normal conditions they are) then your returns are the same given your contributed hashrate. Another option is to run a Casa self-hosted node.

Written by Simon O'Hanlon, Bitcoin Research Lead @TokenAnalyst