This article and the attached document are the results of a cryptographic lab during the summer semester 2011 at the Technische Universität Darmstadt. The task of this lab was to describe how exactly constant weight words are encoded in an implementation of the Hybrid McEliece Encryption Scheme (HyMES) written in C by Bhaskar Biswas and Nicolas Sendrier at Project SECRET, INRIA, Rocquencourt in 2008. The encoding of constant weight words into binary information of fixed length is implemented as an algorithm consisting of three distinct components : a dichotomic model, an enumerative encoder and an arithmetic encoder. We give a short introduction into HyMES and then describe these components in detail.

## HyMES

The first public key encryption scheme based on coding theory was proposed by Robert J. McEliece in 1978. The idea behind the McEliece cryptosystem exploits the hardness of decoding a linear code. For the encryption scheme a binary (n x k) Goppa code that is able to detect t errors efficiently is selected and disguised as a general linear code.

For encryption the cleartext is encoded and t errors are randomly added. Without knowledge of the Goppa code correcting these errors decryption of the message is NP hard. With sufficient parameters this system has resisted cryptanalysis so far and is seen as secure against quantum attacks.

McEliece has some advantages over other public key encryption schemes, for example the encryption and decryption are faster than the widely spread RSA algorithm. However some drawbacks exist : a large key ( >100kb for reasonable security parameters) and a lower information rate compared to RSA.

Nicolas Sendrier and Bhaskar Biswas addressed these problems with the Hybrid McEliece Encryption Scheme (HyMES). They increased the information rate by storing information in the error and reduced the public key size by using a generator matrix in systematic form $(ID | R)$.

This process of storing binary information in the error is best described as a function $\phi$ and it’s inverse.

$$\phi : \{0,1\}^k \mapsto W_{n,t}$$

Earlier implementations of $\phi$, for example an enumerative encoder, had either a high computation cost or were based on source encoding mechanisms that produced binary sequences of variable length. Biswas and Sendrier developed an efficient new encoder based on a recursive dichotomic model, an enumerative encoder and an arithmetic encoder.

**Dichotomic model :**

Each constant weight word of length $n=2^m$ is split in two constant weight words of equal length $n=2^{m-1}$. By splitting a word recursively and storing the Hamming weight of the first half a constant weight word can be encoded as a finite sequence of integers.

**Enumerative Coding :**

Enumerative coding solves the problem to find an information optimal bijective mapping between a set of constant weight words $W_{n,t}$ and binary words of the length $l = \log_2{n\choose t}$. Hence Enumerative coding is a concrete implementation of $\phi$. Yet for high values of n and t the computation cost of an enumerative encoder is too high for an efficient implementation of HyMES.

**Arithmetic Coding :**

Arithmetic coding is a coding algorithm that converts an input stream to a single number or a binary sequence. Under the assumption that the used model of the distribution of symbols in the input stream matches the reality arithmetic coding can be proven to almost reach a perfect compression ratio, which is limited by the entropy of the data. Also with a given model arithmetic encoding achieves the best possible compression ratio and therefore is far better than algorithms using a bit pattern to represent specific symbols.

## Implemented function :

**Encoding :** The Encoding begins with the Dichotomic Model, where the constant weight word is recursively split into smaller parts and a sequence of Hamming weights of those is constructed. This recursion stops when the size of the parts (chunks) is small enough to be efficiently encoded with an Enumerative Encoder. The produced sequence of Hamming weights is written out at the beginning of the output string by using an Arithmetic Encoder. The Arithmetic Encoder encodes the Hamming weights according a normal distribution in a constant weight word. In the final step, all chunks are sequentially encoded with an Enumerative Encoder and added to the binary string with the Arithmetic Encoder this time using a uniform distribution.

**Decoding :** The Decoding process is almost the same. Given a binary string, at first the tree/sequence of Hamming weights is read through the Arithmetic Encoder using a normal distribution of Hamming weights. After reading the tree, the Decoder knows the sizes of the chunks and sequentially reads those from the binary string using the Uniform Version of the Arithmetic Encoder. In the final step, all chunks are concatenated and the constant weight word is constructed.

For a more detailed description including the mathematical background and direct reference to the source code please have a look at the attached document.

Link to the HyMES implementation : HyMES