Choosing an Optimal CRC Polynomial
Most of the time engineers just have to implement communications protocols that are given to them: industry standards, decided by committees or established by the dominant players. But surprisingly often there is the opportunity to create a new protocol, either for proprietary internal use or as part of inventing a new standard for the industry, and in those cases we have to choose a CRC polynomial.
Although the process sounds complicated, in many cases it can be quite simple. But first we must make sure we don't fall into one of the common pitfalls which leads to sub-optimal performance.
Common Pitfalls
1. Just because a particular polynomial has been used in a standard for some other use case doesn't mean it is the best of its kind or suitable for your use case.
Be wary of the many excuses for using a standard CRC
- "We already have the software routines for it"
- "It's been proven in the industry so is now trusted."
- "We haven't got time/expertise to implement a new one."
- "We do not want the risk that a new one brings."
- "We don't have the analysis done on which is the best CRC for each situation."
Before you go ahead with a standard polynomial know that it is not the only choice and may well not be the best choice. What you are doing by choosing a standard one is saying don't blame me, everyone uses this. As it says on mathpages.com if you use a standard polynomial and subsequently it turns out to be particularly unsuitable for your circumstances,
"This would be incredibly bad luck, but if it ever happened, you'd like to at least be able to say you were using an industry standard generator, so the problem couldn't be attributed to any unauthorized creativity on your part."
This argument is based on the faulty statement that
"However, the fact remains that our overall estimate for the probability of an error going undetected by an n-bit CRC is 1/(2^n), regardless of which (n+1)-bit generator polynomial we use."
As we can see from these plots (my calculation), the Pud only approaches 1/(2^n) at very high BERs which should not be seen in practice, and is much more likely to be 1e18 times better at "normal" BERs.
Are you sure you want to stick with a standard polynomial even after the work is publicly available that shows that it is sub-optimal?
As soon as you accept that you do not have to stick to standard polynomials, the possibility of choosing a better one appears. If you have already decided to use a CRC calculation rather than a simpler checksum method, then the choice of polynomial only makes the slightest of difference to the computational cost (if in software) or component cost (if in hardware) which it will more than make up for by catching more errors.
Analysis of 8 bit CRC polynomials by P Koopman from his FAA presentation.
The analysis has been done, it was done for all possible polynomials so is exhaustive and the computation was cross checked using different method and verified independently, and has been published many years ago.
In any case, be careful to double-check your polynomial because some documents have been found to have misprints or incorrect information.
2. Assuming that a standard polynomial is close enough to optimal for your use-case, and that any improvement can only be minor.
Even if you are using the given polynomial within it's data length and the HD meets your needs, there could many orders of magnitude improvement available in the probability of undetected errors. These examples show what kinds of improvement are available just by changing the polynomial, without incurring any other costs. Probabilities of undetected errors can be reduced by 10^7 times.
3. Not taking data length into account.
Either because you are unaware that the protection afforded by a CRC only goes up to a certain data length, or because you have already decided that a standard polynomial is going to be used without considering what data length it protects.
The Hamming Distance drops at larger data lengths so you do not get as much protection as you get at shorter data lengths. If your maximum data length is fixed for all time, then you can safely choose a polynomial that covers it at the Hamming Distance you require. If there is some possibility for messages to grow in future, consider going up to a bigger bit-size polynomial which will cover the increased length at the required HD. Or consider using different polynomials for different data lengths as part of your protocol design.
4. Thinking that all CRCs of the same size are equivalent because they will all catch burst errors up to the CRC size.
Burst errors are not the only types of error. Different polynomials of the same size have been shown to have different probabilities of undetected errors. Don't settle for an inferior polynomial when better ones are now available.
5. Choosing based on odd-parity considerations.
Polynomials divisible by (x+1) detect all odd-numbered bit errors but at the expense of doubling the undetected fraction on even-numbers of bit-errors. Do you have a good reason for assuming that odd-numbers of bit-errors are more common in your use case? If not, you are sacrificing overall performance for this capability and ending up with a worse result.
6. Being satisfied with a low Hamming Distance.
You can end up with HD=3 for the CRC and data length in use, but this is only a small advantage over simpler checksum algorithms which can have HD=2 or sometimes HD=3, so you should aim for at least HD=4. Note that high-integrity systems use HD=6.
7. Assuming that a bigger bit-size CRC protects a bigger data length
Whilst this is a general rule, I have found a few exceptions which can be seen from the summary table generated in my spreadsheet or more clearly from plots of CRC bit-size against maximum data length protected for each HD.
A few examples:
- A 13-bit CRC at HD=5 only protects 52 bits, where a 12-bit CRC protects 53 bits.
- A 25-bit CRC at HD=5 only protects 4072 bits, where a 24 bit CRC protects 4073 bits.
- A 29-bit CRC at HD=5 only protects 16356 bits, where a 28 bit CRC protects 16357 bits.
- A 18-bit CRC at HD=7 only protects 45 bits, where a 17 bit CRC protects 46 bits.
This phenomenon seems to only occur with odd-numbered Hamming Distances.
Method for Choosing an Optimal CRC Polynomial
This method is for designs where you do not have an undetected error rate limit to work within. If your scenario has such a limit, use section 6.16 of this document.
1. Determine the data length.
This is the payload being protected by the CRC, in bits. Decide if you need to allow for some headroom in this value in case your message has variable size and could be expanded beyond it's current maximum. If you need to be able to make guarantees about probability of undetected error rates then you may need to ensure all possible data lengths are covered by your polynomial, but in some situations there can be an argument for accepting that larger messages will suffer a lower HD when the frequency or importance of those messages is low.
2. Find the smallest polynomial with the largest Hamming Distance that covers your data length.
While doing this, bear in mind these points
- HD=3 is barely an improvement over good checksum methods, so try to achieve HD=4 minimum to make the computational expense of CRC worthwhile. HD=3 means that all 2-bit errors are detected.
- aiming higher than HD=6 is probably overkill given that this value is considered sufficient for high-integrity systems, but you might be able to justify it in specialized cases.
- You might want to only choose between 8-bit, 16-bit, 24-bit and 32-bit polynomials because these fit your processor data word sizes or your message has space for a whole number of bytes for the FCS.
- If you need to steal a bit from your FCS field for other purposes, bear in mind the effect it has on the protected data length or HD, and it reduces the burst error detection length as well as the fraction of undetected errors.
3. Allow for expansion of data word size.
If you are close to a boundary value of data length for your chosen polynomial, think about whether it is a good idea to go up one step now to avoid problems later.
4. If two polynomials of the same size offer the same length protection at your chosen HD, chose the one where the length at HD+1 is better.
If those are the same, look up the Hamming Weights of the two polynomials at that length and HD and choose the one with the lowest HW.
Example
Let's say we have a 7 byte data word. This could come from the limit of the 8 byte payload found in CAN frames for example, leaving one byte for our FCS. (I'm ignoring the fact that CAN has it's own 15 bit CRC and that this is not the optimal solution to improve Pud, it is just for the sake of an example). The "Answers" page of my spreadsheet shows that we can get HD=4 for a 7-bit CRC, but need 14 bits to do any better than that.
We can't use a 14-bit CRC since that would eat into our 7 bytes payload that we have decided on. But we can also see from the table below that this is making full use of HD=4 with a 7-bit CRC.
In this case we can see that 7-bit CRC is the smallest polynomial for our requirements, so there is no point looking for a smaller one. We could consider using an 8-bit but in this scenario there is no way that the data length could grow due to the CAN standard being fixed. If this was a serial protocol that was still being designed, we should consider going one step up to allow for growth.
Now it is time to look at the available 7-bit CRC polynomials and their performance at 56-bit data words:
Out of the 11 known 7-bit polynomials, there are three which are suitable for data length 56 bit at HD=4. None of them achieve more than 56 bits so we look at how they perform at HD+1 which is HD=5. Polynomial 0x5B has the highest value there so is the best for this application.
We can see the same thing expressed on Koopman's page for 7-bit CRCs although the notation is a bit more difficult to read - you have to know that the values in curly braces are for HD=3 onwards, so you are looking for the second value in the curly braces for HD=4.
Spreadsheet Solutions
Use my spreadsheet (compiled from Koopman's data files) to look up the solutions for your application. The OutputLengthvHDist sheet is best used with Excel's auto filter drop downs to select one Size (Bits) and to sort by HD (the columns labelled 3 to 20).
The Answers page has been set up with look up formulas to allow you to quickly answer these questions
- What is the shortest CRC for this data length?
- given Data Length in bytes; for HD=3 to 10
- Find me a CRC Poly with these requirements.
- given CRC bit size and Data Length in bits; for HD=3 to 10.
- What is the maximum Hamming Distance available for ... bytes of data length?
- given a number of bytes; for any bit-size CRC, up to HD=10, provides CRC bit-size and a polynomial that provides it.
- What HD do I get for ...?
- given a data length in bits and a polynomial in Koopman format
- Given Polynomial in hex (in Koopman format), what data length is protected?
- Given Poly K (Hex); for HD=3 to 10 gives data lengths in bits
- What is the smallest size CRC for HD ...?
- Given HD 3 to 20; gives size of smallest CRC in bits and data length protected in bits for that CRC size at that HD
- What HD do I get given Poly and Data Length in bits?
- Given Poly K (Hex) and Data Length in bits; provides HD
Compare the solutions to Koopman's selection tables.
All the source material Data files Copyrighted 2015-2018 by Philip Koopman, Carnegie Mellon University and licenced under Creative Commons Attribution 4.0 International License.