In this work, image compression is achieved by using transformation based on modified lifting wavelet structure followed by quantization based on modified set partition in hierarchical tree (SPIHT) algorithm.
2.1 Modified Lifting Wavelet Structure
The DWT technique based on convolution, involves filtering the input signal by two independent filters asHPF (High pass filter) &LPF (low pass filter). The output of both these filters is then decimated by two, so as to get low pass & high pass sub bands; also called as approximation &detailed sub bands respectively. The speed of convolution based DWT is the major constraint. Tian et al. [16] proposes efficient & high speed 2-dimensional lifting discrete wavelet transform. The lifting wavelet transform divides the given sequence in to its two poly-phase components, these poly-phase components are operated in parallel. So lifting wavelet deals with an operation on poly-phase components as an operation on ‘2 X 2’ sequences instead of original sequence. This is illustrated in figure 5.
From above figure 5, it is cleared that, the decomposition is achieved by arithmetic addition & subtraction between two adjacent samples of the given sequence, followed by multiplication with ½; so as to get the \({Y_{{V_0}}}\) and \({Y_{{W_0}}}\), with number of samples are exactly half in \({Y_{{V_0}}}\) and \({Y_{{W_0}}}\) as compare to Original sequence. This is illustrated by two filters followed by decimator as shown in figure 6. This structure is called as Analysis filter bank, as it decomposes or analyses the original sequence in to its components.
These filters are then described in terms of their system functions rather in time domain, for this purpose the Z-transform is used, as the Z-Transform is a linear operator i.e. the linear combination of sequences results in same linear combination in Z-domain, with region of convergence are at least intersection of region of convergence of individual sequences, which are linearly combined. The Z-transform of first filter (LPF) is given below,
\(b(n)=\frac{1}{2}\left[ {a(n)+a(n - 1)} \right]\)
Taking Z-transform on both side
\(\begin{gathered} B(Z)=\frac{1}{2}\left[ {A(Z)+{Z^{ - 1}}A(Z)} \right] \hfill \\ \hfill \\ \end{gathered}\)
System Function=\({H_{Low}}(Z)=\frac{{B(Z)}}{{A(Z)}}=\frac{1}{2}\left[ {1+{Z^{ - 1}}} \right]\)
Similarly, the system function for second filter (HPF) \({H_{High}}(Z)\)is given below,
\(\begin{gathered} {H_{High}}(Z)={Z^{ - 1}}{H_{Low}}( - {Z^{ - 1}}) \hfill \\ {H_{High}}(Z)=\frac{1}{2}{Z^{ - 1}}(1 - Z) \hfill \\ {H_{High}}(Z)=\frac{1}{2}( - 1+{Z^{ - 1}}) \hfill \\ \hfill \\ \end{gathered}\)
In general the factor ½ is not considered while analyzing, as it is compensated by selecting the multiplying factor of 2 at synthesis side. The Analysis Filter Bank in terms of its system function is as shown in figure 7.
This is represented in terms signal flow graph as given below figure 8.
Thus the complete image can be decomposed by using above homogeneous repetitive structure called a lattice. This lattice structure is analyzed using poly phase matrix so as to get the lifting structure as given in figure 9.
Thus the crisscross operation in figure 8 is broken in to the two cascade operations shown in figure 9. This lifting structure improves the speed of DWT computations, but it involves multiplication operations. Shahadi et al. [17] proposes multiplier free efficient & fast lifting wavelet transform as shown in figure 10.
As shown in figure 10 (a), the analysis filter bank involves breaking the sequence in to the odd & even sequences followed by subtraction & addition operations. This modified structure replaces the repetitive analysis filter bank in figure 7, so as obtain the different sub bands, with much more faster rate as compare to conventional convolution based DWT.
Analyzing the whole image in to four sub bands doesn’t means compressing the image. For still image compression, it must be seen that how this individual sub bands as; LL, LH, HL & HH are individually compressed. Now as discussed earlier images are rich in low frequency region i.e. LL sub band contain more information and other sub bands will be having a sparsity of information, hence the high frequency coefficients can be quantized more severely or more coarsely as compared to low frequency coefficients, just like DCT but in DCT truncation of high frequency coefficients is done by exploiting psycho-visual redundancy. For effective compression, it is very essential to exploit the inter-relationship between the different sub bands. Consider the first level partitioning of ‘8 X 8’ pixel array, so as to have each sub band of size, ‘4 X 4’ pixel. Now consider the pixel at coordinate position (1,1) in LL sub band, this pixel will have some relation with the pixels presents in high frequency sub bands at same coordinate position (1,1); as shown in below figure 11.
In two level of partitioning, each pixel in HL2, LH2& HH2 sub bands corresponds to four pixels in HL1, LH1& HH1 respectively, with each pixel in LL2 sub band corresponds to a pixel in HL2, LH2& HH2., as shown in figure 12.
Now in this form, what emerges is a kind of data structure representation, with one pixel in LL2 sub band as a starting point with corresponding pixel position in LH2, HL2 & HH2; as its descendants. Individually each of these three pixels in LH2, HL2 & HH2 sub bands will have four descendants in LH1, HL1 &HH1 sub bands respectively. If the LL2 sub band is partitioned further then the generalized data structure evolved in the form of a tree. Brahimi et al. [18] proposes improved embedded zero tree wavelet (EZW) technique for mage coding. This EZW technique achieves compression by exploiting redundancy between the coefficients, which are there at same pixel location but in the successive sub bands. In order to understand this EZW technique, consider the three level of decomposition, in this decomposition first of all the coefficient from top most LL3 sub band is picked up, then the data structure tree is formed with HL3, LH3& HH3 coefficients in same resolution; as a descendants. This sequence of picking the coefficients is repeated in finer and finer sub bands., as shown in figure 13.
As shown in figure 13, the coefficients at the same spatial position are picked-up at different resolutions. The biggest advantage of traversing in this manner is that; as most of the coefficients are highly significant in low frequency sub bands, that means in LL sub band almost all the coefficients are highly significant and going towards finer & finer resolution that too in the high frequency sub bands, the coefficients will become lesser & lesser significant. The significant coefficients are represented as a binary ‘1’ & insignificant coefficients are represented as binary ‘0’. In EZW technique, while traversing wavelet coefficients as depicted in figure 13, then if at particular position any coefficient found to be zero along with all of its descendants also to be zero, then there is no point in encoding all the coming descendants as beyond this everything is going to be zero, that is called as zero tree. The zero tree is a part of a tree structure, where all the elements starting with some intermediate level of parent node will have its own value as zero and all subsequent descendants to it, as zero. The Embedded coding deals with generating the bit stream in prioritized manner that means more significant coefficients are transmitted first followed by lesser significant coefficients. This embedded encoding performs very well when the image is transmitted over dynamic bandwidth of internet. The beauty of this scheme is that, if the available bandwidth is more, then all the higher & lesser priority coefficients are transmitted for the best quality. In case of limited bandwidth, after passing priority coefficients and if the bit budget is exhausted & don’t have bandwidth then bit stream is truncated; this results a reasonably good quality image, it would not be best quality image.
The significance of any coefficient (x) is decided as given below,
\(\begin{gathered} \left| X \right|<T=Insignficant \hfill \\ \left| X \right| \geqslant T=Significant \hfill \\ \end{gathered}\)
Where, ‘T’ is a threshold value
The EZW is a multi-pass algorithm, initially the threshold value is kept high, so the significance & insignificance is decided in more harsh way, but with every successive pass the value of threshold is halved. This ensures that, at the end of first pass only few very significant coefficients are encoded; later followed by next pass, if the bit budget permits; so as to encode lesser significant coefficients. If still the bit budget permits, then more & more passes are undertaken. The technique of successive approximation is used for encoding the significant coefficients. Although the EZW is a very standard technique for wavelet encoding, but there are certain limitations with it as; this technique deals with splitting the LL sub bands only & do not consider other sub bands splitting, another limitation is that, it is exploiting redundancy which is present at a particular spatial position but across different scale, but it is not exploiting the redundancy that exists among neighborhood coefficients of the same sub band. This issue associated with EZW is solved using Set partitioning in Hierarchical tree (SPIHT) algorithm [19].
2.2 Modified SPIHT Algorithm
SPIHT algorithm organizes the partitioned image in the form of data structure trees called as spatial orientation trees. This tree(for three level partitioned image) is organized in such a way that, top most LL3 sub band is picked up as a route node, then the data structure tree is formed with HL3, LH3& HH3 coefficients in same resolution; as a descendants. This sequence of picking the coefficients is repeated in finer and finer sub bands in such a way that one coefficient in HL3, LH3& HH3 sub-bands corresponds to four coefficients in HL2, LH2& HH2 sub bands and ultimately one coefficient in HL3, LH3& HH3 corresponds to 16 coefficients inHL1, LH1& HH1 sub bands, as shown in figure 14.
The SPIHT algorithm includes three sets as given,
-
\(P(j,k)\) = Set of all descendants of coefficient\((j,k)\), this set is also known as Type A set.
-
\(Q(j,k)\) = Set of all direct child of coefficient\((j,k)\), this set is also known as Type B set.
-
\(R(j,k)\) = Set of all descendants except direct child of coefficient\((j,k)\).
In SPIHT algorithm, each wavelet coefficient is categorized in to the significant & insignificant coefficient. The significance of any coefficient (x) is decided as given below,
\(\begin{gathered} \left| X \right|<T=Insignficant \hfill \\ \left| X \right| \geqslant T=Significant \hfill \\ \end{gathered}\)
With, \(T={2^n}\)
\(n=\left[ {lo{g_2}{C_{max}}} \right]\) ; \({C_{max}}\)= Value of coefficient with maximum value.
The SPIHT is a multi-pass algorithm and with every sub sequent pass, the value of threshold is reduced by decrementing the value of by 1. The significant coefficients are represented by logic ‘1’, whereas the insignificant coefficients are represented by logic ‘0’. In general the significant coefficients are kept in a set called as list of significant pixel (LSP) and insignificant pixels are placed in set called as list of insignificant pixels (LIP). Apart from these two sets, one more set named list of insignificant set (LIS) is used to place the set (Sub bands) which do not contain any significant coefficient. Initially the set LIP is loaded with coefficients in highest level, set LIS is loaded with set having descendants & set LSP is kept empty. SPIHT encoding begins with MSB plane, for every bit plane, it checks all the three lists as LIP, LIS & LSP. If any coefficient in LIP becomes significant, then it is transferred to LSP. Ultimately all the coefficients in LSP (excluding the coefficients becomes significant in last round) is refined in each sub sequent round by successive approximation.
As discussed in conventional algorithm for SPIHT, if a type A set (Set of all descendants) is significant, but with corresponding set of offspring is insignificant coefficients; in such a case SPIHT algorithm puts four continuous zero in bit stream; this results in larger bit stream. This problem is solved as,
Then logic ‘1’ is used to represent set significant & logic ‘0’ is used to represent offspring insignificant. So this saves 3 bits.
Then logic ‘1’ is used to represent set significant& logic ‘1’ is used to represent offspring significant. So it takes one additional bit.