--

## “Loaded” PoW: A New Direction in Proof-of-Work Algorithms

While there are many who criticize the energy consumption of crypto-currency mining operations, it remains the only time tested and *proven* method of securing a blockchain against attackers. Other proposals, such as proof-of-stake or proof-of-importance, all suffer from the same flaw: an attacker — a hostile government, for example — could strike early and quietly, buying up a sufficient quantity of the coin at a low enough price such that it could *cost-effectively *undermine the security of the system at some later date — say, by implementing some form of a Sybil or “sock-puppet” attack, where the attacker mimics the appearance of a large group of unrelated users.

But Bitcoin and other crypto-currencies such as Z-Cash and Dash use a proof-of-work (“PoW”) method that does not suffer from this flaw: an attacker must expend real-world resources in the form of electricity and capital costs, and no one can make electricity for free. There is nothing that the attacker can do beforehand to place themselves in a particularly advantageous situation to attack the network. They must rely only on the brute strength of having more computing power than every other miner in the network put together. For a network like Bitcoin, the cost of waging such an attack would be in the tens of millions of dollars per day for the power alone.

The problem with PoW is that all of the primary hash algorithms used for major crypto-currencies have either already been (or will soon be) mastered by the monopolistic Chinese company, Bitmain. Since the primary purpose of crypto-currencies is censorship resistance, this is worrisome to say the least — especially since it’s obvious that Bitmain is *not* resistant to the influences of the Chinese Goverment, which has the exact opposite goals when it comes to fostering the free and open development of uncontrollable digital money.

The speed with which Bitmain has conquered the crypto mining industry is somewhat shocking, and has led to a difficult situation for thousands of smaller scale GPU mining operations. To give you a sense of the speed/efficiency advantage of a modern ASIC design versus a bunch of computers filled with the latest AMD/Nvidia GPUs, the latest release from Bitmain for Z-Cash can do the same amount of compute for under $2,000, using 300 watts of power, as the GPUs could do using at least 7,000 watts of power and costing over $25,000. As GPU miners have gotten more desperate — especially in the current bear market for crypto -currency— many have turned to renting out their mining rigs on sites such as miningrigrentals.com and nicehash.com.

On these sites, which are relatively unknown to many in the crypto-currency world, buyers place their bids in bitcoin to gain control of large amounts of computing power for various popular hashing algorithms (e.g., Equihash, X11, SHA-256, etc.) on an hourly basis. The combination of desperate miners with a glut of now-unprofitable GPUs, together with weak crypto prices, has led to the recent surge in double-spending attacks, which have collectively resulted in millions of dollars worth of losses for exchange companies. Such attacks are very negative events for the crypto projects that are targeted; in the worst case, they can lead to exchange delistings and the loss of market confidence in the security of the currency.

There is widespread disagreement about the best way to move forward from here. The Monero community recently decided to completely change the PoW system used once it became clear that Bitmain had developed an ASIC. Now that Z-Cash’s hash algorithm also has a new Bitmain ASIC available, that community is deeply divided over whether they should fight like Monero, or accept the inevitability of ASICs and hope for a more competitive market to develop, with numerous ASIC manufacturers keeping the system dynamic and decentralized, and with more hashes per second securing the network at a lower overall cost and carbon footprint. This optimistic path was the direction taken by the SiaCoin project, as detailed in a recent blog post by their lead developer. In the case of SiaCoin, they actually developed their own ASIC. But in a surprise blow, they were beaten to market by Bitmain, leading to unhappy equipment buyers who will end up earning a fraction of what they once projected.

Others have attempted a purely technological solution that would theoretically allow for the continued use of PoW while limiting the encroachment from ASIC based miners. This was the initial design goal of the PoW algorithm used for Ethereum, which was designed to be “memory hard” and thus less profitable to execute as an ASIC. The basic idea here is that computer memory (DRAM) is expensive, so if you require a large amount of memory to perform the computations — which Ether accomplishes by utilizing a large graph data structure called the DAG, which is multiple gigabytes in size — you can make the memory costs the dominant part of the overall cost structure. This a reasonable idea, and it seemed to work for a while, which meant that the only way to mine Ether or Z-Cash was to use a bunch of GPUs. While the GPU market is nearly as concentrated as the ASIC market by AMD and Nvidia, the situation there seems infinitely preferable to the Bitmain monoculture found in crypto-mining ASICs.

Unfortunately, the “memory hard” approach suffers from an obvious flaw: if you could find a way to store all that data in memory once, using a lot of expensive DRAM, but then share this data across a large group of inexpensive processors, you would effectively share the cost across a large number of processors and thus undermine the supposed difficulty of the problem. And this is exactly what has happened recently. Indeed, it is unclear exactly when it happened, since it is alleged that Bitmain mines with their new products for months before reselling them to the public, but they are now selling these units to the public, and “memory hardness” has turned out to not be particularly hard.

As I work on the development of Animecoin, my views on this dilemma and the best strategy to deal with it have evolved. Initially, I planned on standardizing on the same hashing algorithm, Equihash, used by Z-Cash and many other coins such as Bitcoin Private, Z-Classic, ZenCash, etc. But the recent and highly publicized double-spend attacks — which previously had been dismissed as a serious risk by many market participants — made it obvious to me that choosing a popular hashing algorithm, especially one that is available for rent in large quantities by anyone on the internet with a few bitcoins to invest, is basically an open invitation for attackers to go after your project.

A couple of weeks ago, I set out to develop a new PoW system for the project. My initial plan was that it should be similar to existing systems, so that it would be easy to understand and vouch for the security of the system. But it had to be different enough so that it wouldn’t be readily available for some time on the mining rig rental websites. The idea was to protect the project during its nascent stages, when it is particularly vulnerable to such an attack, since there aren’t many computers mining a given coin in the beginning, making it easy for an attacker to overpower them using rented computing power.

But as I started to write the prototype code in the Python language, one idea led to another, and I ended up writing a piece of software that I believe has the potential to change the whole discussion about ASICs and PoW. And perhaps it could return us — at least for a while — to the early days of Bitcoin, when anyone could mine some coins with an ordinary desktop computer: Satoshi’s dream.

I began by considering what gave ASICs such a leg up when it came to price/performance versus GPUs and CPUs. The answer, of course, is that it was a trade-off — like most things in engineering. The more flexible you make a microprocessor, so that it can adapt to whatever computation you decide you want to do — like the general purpose processors found in laptop computers and cell phones — the less efficient it will be in terms of speed and energy consumption.

By greatly constraining the type and form of the computation you are doing, you can optimize the performance of algorithms in all sorts of ways. Take the algorithm used to mine Bitcoin, SHA-256. It turns out that this is an incredibly short and simple algorithm to describe, despite the fact that it produces complex looking output. In fact, the entire implementation of SHA-256 is about 100 lines of Python code. Ultimately, each step described in that code, from the bit rotations to the XOR operations, must be laid out in the circuits etched in the surface of silicon chips in order for the algorithm to make the leap from software code on the screen to a physical ASIC chip that can run in the real world.

Thus, one way to make a new PoW algorithm that is more difficult and expensive to design an ASIC for would be to go in the opposite direction: make the algorithms themselves more complicated, so that more logic gates are required to lay out the circuits. At some point, as you keep adding more and more complexity, the circuitry required begins to approximate the complexity of a general purpose computing device, which means that the cost advantage of the ASIC is reduced versus CPUs and GPUs. At the very least, you can make your PoW the least attractive house on the block from the perspective of a potential ASIC manufacturer.

It is essential here to remember that there are certain requirements for a good PoW algorithm. The primary one is that you don’t want a situation where the fastest computer always wins; rather, mining should be like a lottery in which your probability of winning is proportionate to the number of tickets that you purchase (i.e., the amount of computing power you can demonstrate by computing hashes). You also want a PoW system where you can easily dial up or down the difficulty level (i.e., how much computing power is required to mine a valid block) so that the network can adjust to changes in price and available computing power, which is needed to keep the inflation rate of the coin stable. And you also want a system where it is hard to find the special numbers (“nonces”) that all the miners are searching for, but quick and easy for anyone else to verify that a proposed nonce actually works and leads to a valid block.

The other thing you should understand at this point is the basics of how computers handle numbers in terms of decimals, or at least their limitations. The numbers that you use in most programs on a computer, such as Microsoft Excel, are really a fiction — an approximation that is good enough for most things, but which occasionally breaks down in spectacular ways.

Mostly, we use 32 and 64 bit numerical representations, which you can think of as describing the number of digits used to represent the number before we round or truncate the number. If you are doing calculations that iteratively apply the same function many times in a row, even small inaccuracies start to pile up over time. For cases such as these, which might be something like a nuclear simulation done by a government lab, even 64 bits of precision isn’t good enough.

Just like in the case of microprocessor complexity, numerical precision is a trade-off. By restricting ourselves to 32 bits or 64 bits, we allow modern computer hardware to really shine at doing math quickly. Each number has a pre-set amount of space that it takes up at every stage in the computation, which allows compiler software to heavily optimize the memory layout of programs for better utilization of processor cache, the special high-speed memory located directly on the processor chip.

If we insist on 500 digits of precision, then we could still do the computation, given the right kind of software, but it would be *much*, much slower. And that’s because the software we would need to use would have to involve a dynamic amount of memory storage. Depending on the nature of the calculation being done and the particular numbers being expressed (for example, numbers like 1/7 and 1/3 are notoriously hard for computers to approximate accurately), the amount of digits that you need to store at each step could vary wildly. It makes it hard to plan and execute a dynamic computation like that, where the “shape” of the computation is constantly changing and morphing.

Unsurprisingly, ASICs are not good at arbitrary precision math. In fact, a google search for *arbitrary precision ASIC**s* shows a distinct lack of recent research and content in this area. And this makes sense, because this is exactly the sort of application where you generally *wouldn’t *use an ASIC: it’s just not what they are good at or what they tend to be used for.

In the regular world, this situation tends not to come up that often, because *hardly anyone seems to care about arbitrary precision math*. That is, if you look around the world, there are hordes of people out there trying to optimize calculations involving the standard double- or single-precision floating point numbers (say, for better video game graphics or for faster machine learning model training). But who really needs all those digits in arbitrary precision math?

At most, people tend to care about decimals to the extent that they are dealing with money (e.g., accounting software) and they need the rounding to be correct. For example, if Citibank is calculating the interest owed on a balance of $350, they need to be accurate to more than a penny of precision. But very few applications outside of pure mathematics and some engineering professions really need to know that their answers are correct to, say, 100 decimal places — let alone 2,000 decimal places! After all, 15 digits of Pi is enough precision to allow one to measure the width of the universe to within 2 inches of accuracy— why in the world would anyone need 2,000 digits?

You probably see where this is going. We need a way to combine a huge amount of complex code and algorithms — something that would be an absolute *nightmare* for an ASIC designer to have to lay-out — with arbitrary precision numerical computations, which ASICs are generally bad at. And we need a way to make sure that these computations are done with 100% accuracy, so that we never have consensus problems; that is, all nodes on the network have to compute the same outputs given the same inputs in order to get complete agreement about the state of the system. And we want to write it all in a high-level language like Python to keep it short and get it done quickly (the final version of my code clocks in at about 530 lines of Python).

All of these constraints taken together will lead you directly to the highly regarded open source MPMath library. This library, which is written in pure Python mostly by a single author (the library is tested extensively for accuracy against the results of other software systems like Mathematica), can also take advantage of the equally respected GMP/GMPy2 projects to increase the speed of calculations dramatically.

The MPMath library is massive in terms of complexity, although it takes just under a single megabyte in terms of storage space. In fact, if you take every code file from MPMath and combine it into a single file, with all the comments removed, it comes to a whopping ~25,000 lines of Python code (I know because I spent many hours unraveling the complex dependencies in order to make it a single file). And that’s not even so much when you step back and consider how large a chunk of modern math is captured by the library, which allows you to do arbitrarily accurate calculations with real numbers, complex numbers, and matrices as inputs.

Basically, MPMath comes built-in with any sort of function that would tend to come up in engineering or physics, from the usual exponentials, logarithms, and trigonometric functions that we are taught in high school, to the more esoteric functions such as the Bessel J function or the Airy Bi function (pictured above). There are functions of all shapes and sizes, which are computed based on fundamentally different kinds of calculations. Although the specific implementation details of these different functions might be complicated, the essence of it is pretty simple: each of these functions takes an ordinary number as an input and gives you a number back as output, and you can specify how many digits of accuracy you want in the output. MPMath does the rest, ensuring that you get an accurate result relatively quickly.

So how does this all fit into proof-of-work and crypto-currencies? To see why, let’s quickly review what hash functions are. A hash function is simply a one-way function that takes input data of any length and gives you back a number as an output. If you change even a tiny detail in the input data and recompute the hash, the number will change completely. A good hash function is one that is very quick and easy to compute in one direction, but nearly impossible to do in the reverse direction (that is, it’s really hard to find data that will lead to a particular hash as output).

Hashes are usually presented as being fixed-length strings, including letters and numbers, such as the following:

00000000000000000a3ad90658c08cd4afdc24e3e36fda1a767c86bed17826

This is the hash of a recent Bitcoin block. You can see that it begins with many zeros. This is the whole goal of the proof-of-work system in Bitcoin: to find some number (the nonce) such that, when you append it to the list of all the transaction data for that block, and then calculate the result of applying the hash function to the combined data (i.e., transactions+nonce), it gives you a result that begins with a large number of zeros.

I believe that this depiction of hashes creates an unnecessary barrier to the understanding of the mining process by the public. There is nothing mysterious about the string above. It is simply a large number (an integer), but it is being expressed in hexadecimal format, where you count to 16 instead of to 10. Also, we are keeping the zeros at the beginning so that the hash string will always have the same length in terms of the number of characters. If we were using ordinary numbers, it would look very odd to see zeros at the beginning, at least without a decimal point.

If you think about it, this is just another way of saying that what we are looking for in bitcoin mining are *small* numbers coming out of the hash function. What makes the nonces that lead to such outputs rare is that the result of applying the hash function to most input data will get you a result that is generally the same size. Of course, you *could* get a number that is much smaller than that. It’s just that your chances of that happening are extremely low — so low that you’d need to compute billions of hashes to even have a small probability of finding such a nonce in time to mine the block before anyone else.

And that’s it — that’s where bitcoin mining complexity ends: Find a small output from the hash function. Even the more complex hash algorithms are basically using variants of this same basic idea: compute a single hash function again and again (and maybe also access the elements of a large data structure in memory), and try to get a really small number as an output, which is hard to do. Some projects, such as the PoW algorithm used in Dash, chain together a series of hash functions, so that the output of the first is used as the input of the second hash function, and so on. While this may seem secure to the naive observer, it turns out to be a bad idea, since the overall security of the system hinges on the *least* secure hash function used (i.e., the one where it is easiest to compute the hash function in reverse). Chaining more hash algorithms together in this way makes it so that if any of the algorithms are broken, your system is broken too!

A better idea might be to independently generate several hashes. Indeed, other projects (notably, Verge) take this approach, and allow miners to produce valid blocks using multiple different hashing algorithms, each of which has a separately tracked difficulty level. The problem with this approach is that it introduces various new vectors of attack, which have resulted in millions of dollars of “counterfeit” coins generated in recent months (and lots of frustrated miners who were wasting their time and electricity mining invalid blocks). If you think about it, this scheme just doesn’t get you anything in terms of security, and in fact it might even hurt you, since it increases the number of different pools of rentable computing power that are available to potential attackers of the network.

So what’s my better idea? I’ll explain it now. If you want to follow along with the code, which has been written in as clear a way as possible with long, descriptive variable names, you can find it here.

Basically, I start off where the previous methods end. Instead of requiring a single hash output, my proposed system (which I’m calling “loaded” proof-of-work, since it is loaded with additional computations) uses 5 distinct hash algorithms, each of which will take as input the same data (i.e., the transaction data plus a nonce). In order to ensure that all the resulting hash outputs are the same length, I took the shorter ones (the 256 bit outputs) and appended a reversed copy of the number to the end, so that they would all be 512 bit outputs.

Here is the list of hash functions I selected, along with an example of what they look like in the prototype software:

Those long strings of numbers and letters may look cryptic, but just remember this: they are simply big whole numbers, nothing more. So we now have 5 really big numbers. What do we do to them? How do we combine them? How can we require some very unlikely condition involving all 5 of these numbers — one that would have all the desirable properties for a proof-of-work function that we described above? This is where MPMath comes into play.

Here is a list of the 100 mathematical functions that I am using from MPMath:

Although I had to stretch a bit by counting things like the cube root of x as being separate from the 4th root of x, there is still an enormous breadth of computation on display in the functions listed above. So what do we do with all of these functions? How do we pick which ones to use? I’ll get to that in a moment, but first suppose that we have somehow selected five of the functions listed above — one for each of the hash results we calculated from a new nonce we are testing.

The basic idea of my system is that you use each of these hash outputs as the input to a different function. Remember, these are just mathematical functions that take a number as input (always a whole number in our case) and then return as output a number with a bunch of decimal places. The number might be huge, or it might be very small — it depends on the function and the input numbers (i.e., the 5 hash outputs) that we happen to be using.

We then take the fractional parts of these 5 numbers, ignoring any decimal points and trimming off any leading zeros, so that we are left with 5 very large whole numbers.

We then take the last several digits (the exact number is some multiple of the difficulty level, which I will explain later) of each of these 5 numbers, leaving us with 5 smaller whole numbers. Why do we take the last digits instead of the first digits? This is to ensure that miners are indeed performing all calculations to the required level of numerical accuracy.

Finally, we take these 5 smaller whole numbers and multiply them all together to get a single large whole number, which is called the *final hash.*

Unlike in Bitcoin mining, the goal here is not to find a *small* final hash result as output. Instead, we define a new goal: we want the final hash to begin with a particular pattern of numbers — the numbers 1 through 9, repeated in order. So at a difficulty level of 5, for example, we would require that the final hash start with the digits “12345”. Obviously, the vast majority of final hashes will not begin with such a pattern — it’s a freak accident arising out of the carrying and the arithmetic working out in a particular way.

In addition to this difficulty level, which controls the length of the digit pattern we must match in order to mine a valid block, we have another lever that we can pull to make the computations easier or harder: the number of decimal places of accuracy we require of everyone’s computations. In my prototype system, I have this starting at a relatively low level, say 50 digits of precision, and then gradually increasing by a certain percentage that is designed to keep the blocks generated at a consistent rate over time. Importantly, the final levels of precision employed before the system ratchets back down to the lower precision levels could be as high as 2,000 digits or more — far more than could ever be done using fixed precision computations of the sort where ASICs are the most advantaged versus general purpose processors such as GPUs and CPUs.

So how do we choose which functions to apply? There is a simple trick we can use to ensure that for each new block, we have a procedure in place so that every node agrees on the particular functions that will be selected for the next block. The trick is that we take the final hash of the preceding block that was just mined. This is a long number which might look something like this:

12342669353192025399050788645218821036796365857533579892578046344296766311405078019320129330745000

Notice that it begins with “1234”; this is from a block that was mined at a difficulty level of 4. Since the beginning of this number will always have the same pattern, we instead reverse the order of the digits and trim the leading zeros away, so that we are left with a series of numbers that no one could have selected in a special way — they are the end result of a huge computation, the mere accident of circumstance. Anyway, we take this reversed final hash and start reading off digits from it, two digits at a time, until we have selected 5 two-digit numbers; in the example above, these would be:

54, 70, 33, 92, 10

Remember that there were 100 different mathematical functions in that list above? This gives us a simple way to associate a different function with each of the 100 possibilities from 00 to 99 — each one goes to a different one of the 100 functions, using the order presented above to number each function. This means that any of those functions *could *be selected for the next block: they are all fair game, and miners need to be prepared to compute all of them quickly.

This is what it all looks like using the colorful terminal output of the prototype mining software:

Once in a while, if you are lucky, you will stumble across a nonce that has the very rare property you are looking for: that the end result of multiplying together the outputs of all these randomly selected functions applied to all these hash results, just happens to start with “1234”.

If enough computers were mining the coin, so that the difficulty level went to a very high level, perhaps the final hash would need to begin with “1234567891234567” so that the blocks aren’t mined too often.

Here is what it looks like in the software when you find such a nonce:

Again, we can dynamically flex up the required accuracy to adjust how hard it is to mine new blocks in another way besides just using the difficulty level:

To give a sense of what the miner looks like in operation, here is a short video clip of the prototype miner running on an iPhone (it’s easy to do this yourself — just copy the code linked above on Github and paste it into the Pythonista app):

And here, running on a Windows desktop:

Let’s take a step back and consider what this all means. We now have a proof-of-work system that essentially inherits the security of the existing methods, since it includes the basic idea of the existing methods as a primary component in its design.

But it then take an addition step — involving arbitrary precision for a large, complex set of algorithms. Observe that it wouldn’t be very hard to increase the number of functions, say from 100 to 1,000. There are plenty of other large open-source projects filled with mathematical functions that will take arbitrary numbers as inputs and generate numerical outputs at any desired level of precision, including the following:

Unlike bitcoin mining, where the mining hardware is so fast that billions of hashes are computed per second by a single piece of equipment, the speed of this “loaded” PoW is much, much slower. On a top-of-the-line desktop machine, you might get a few thousand iterations per second. That is, you would be able to test out a few thousand possible nonces per second to see if any of them worked.

But it all depends on the luck of the draw in terms of which 5 functions were selected for use in that block. Some of the functions are very quick and easy to compute — usually because there is some quirk in the input or a symmetry in the function that allows it to be greatly simplified. Others functions might require far more CPU cycles in order to carry out the computation at sufficient accuracy. In such a case, you might only be able to test out 10 or 12 nonces per second on a fast machine. Still, if enough machines are mining, and the difficulty level is high enough, the time between blocks should settle down to a long term average, and can easily be forced by awarding the finder of the “closest” final hash that is not quite valid after a certain period has elapsed without a new block being mined.

What is somewhat surprising, at least to me, is how quickly the software runs on an iPhone using the Pythonista app available on the iOS App Store. It is thrilling to see the computational horsepower present in most of our pockets, but which is generally left unused in terms of proof-of-work (at least by reputable projects!). Seeing it run, I can envision a world where people open up a mining app on their smartphone before they go to sleep, and wake up in the morning having earned a few coins in return for contributing to the security of the network.

After all, isn’t it a waste that these fancy phones, which are truly advanced super computers at this point, sit around doing nothing while we are sleeping? A distributed, decentralized network of small-time miners using their laptops and phones is obviously a lot more of a secure foundation for a crypto-currency than the current situation, in which all participants are caught in the gravitational pull of Bitmain and the Chinese Government.

There are those who will object, “but you can make an ASIC for any algorithm that you want!”. My response is, let people try. First of all, it would objectively be more expensive and difficult to make an ASIC for the system outlined above than for a more traditional PoW system like the ones used in Bitcoin or Z-Cash. Why would a company like Bitmain choose to go after a new algorithm like this one, when there is more money to be made by going after lower-hanging fruit, where the R&D cost of developing a new ASIC can be recouped quickly? This is analogous to the joke about being in the woods with a friend when a bear begins to chase you: you might not need to be faster than the bear in order to escape — you just have to be faster than your friend!

But I would go further than that. I suspect that the system outlined above, suitably generalized (for example, by including a dynamic and growing list of mathematical functions beyond the 100 functions currently defined), can *always* outrun the bear. Because I suspect that in order to get an ASIC chip that could do such a large and growing list of computational tasks — at arbitrary precision no less — will require so many transistors and logic gates, not to mention the time and energy of the best chip designers and engineers, that you would end up with something that would basically be analogous to a general purpose processor. And at that point you’d be better off just making a general purpose processor so that you could at least get the economies of scale from making a huge number of identical components for multiple markets (cell phones, tablets, TVs, etc.), which is how the market has historically organized itself.

But even if I’m wrong, and the ASIC manufacturers are able to figure out a clever way to implement all of this in cost effective chips — what are the implications? Well, that would mean that Bitmain or its competitors had figured out a way to make chips that perform a vast array of useful engineering computations at a much lower cost and energy footprint than traditional computers. And while I might be upset that Bitmain was back in a dominant market position, it would be good for the world to have cheap and powerful chips that could do those sorts of useful calculations. Think of it as a way of harnessing all this capital and ingenuity for good, while enhancing the security of the network.

If I’m right, users will be excited to see a return to a time when mining was not the domain of specialized operators, but an area where even new users and non-technical people could contribute their computational resources and get back a non-trivial amount of coins in return— at least if they are early enough in a project’s development!