快捷分类

Multi-focus image fusion based on block matching in 3D transform domain

更新时间：2016-07-05

1.Introduction

Image fusion[1]refers to the process that obtains a new single synthesized image from two or more images.The final fused image could provide more comprehensive,accurate and reliable image description, which is widely used in other image processing or computer vision fields.Pixellevel fusion methods can be broadly classified into two groups[2]spatial domain and transform domain fusion.

Currently,the most frequently used methods are based on multi-scale transforms where fusion is performed on several different scales and directions independently.The most typical transform would be the discrete wavelet transform(DWT)[3]which is widely used because of its favorable time-frequency characteristics.After DWT,a series of improved multi-scale transforms has taken to the stage,e.g.,Ridgelet[4],Curvelet[5],and Contourlet[6].Among these transforms,non-subsampled Contourlet transform(NSCT)[7,8]has been widely adopted owing to its multiresolution and multi-directional properties.With the introduction of non-subsampled Shearlet transform(NSST)[9],there is no more limit on the number of decomposition directions,thus both the effectiveness and efficiency of image fusion have been enhanced.

Usually,the transform domain fusion framework is relatively fixed.The first step is the multi-scale transform.The coefficients got from the first step can be divided into low frequency component and several high frequency components.These components are fused by different fusion rules because the low and high-frequency components represent approximate and detailed information respectively.The final fused image is constructed by the inverse transform with all composite coefficients.

With the introduction of more transform domain methods,the fusion framework has been greatly enriched.For example,the integrity,hue and saturation(IHS)transform frame was widely introduced in fusion frameworks to achieve color image fusion[10];or combining the pulse coupled neural network(PCNN)structure during the selection of coefficients[11].However,in these framework,source images are directly transformed into the transform domain,which lead to the loss of specific characteristics in the spatial domain,like edge contours,and spatial similarity.Since spatial information cannot be further used,it usually causes distortion or artificial textures in fused images.

The proposed framework’s improvement mainly shows in two aspects.First,prior of transforming it adds some spatial domain pre-processing steps,like blocking and grouping.Second,it changes the 2D multi-scale geometric transform into a new type of 3D transform.Other steps are basically the same with the existing framework,except an aggregation procedure after inverse transform.This structure constitutes a partially similar structure of an image de-noising algorithm—block-matching and 3D filtering(BM3D)[12,13].So far,the BM3D algorithm is one of the most excellent image de-noising algorithms.It has been widely used in image and video noise reduction.

3.3.1 Modeling and notation

Thus,the improvement should not only focus on modifying transforms,but also introducing more spatial features.By using the similarity in spatial domain,e.g.,block distance,we could group some image patches with salient similar features into a series of 3D arrays.Therefore,the process in transform domain can have more specific and suitable options on function parameters.These improvements enhance the effect of each block,thus achieve the promotion of the overall effect.

This paper is organized as follows.In Section 2,we introduce the proposed fusion framework in detail.The specific procedures of blocking,matching and grouping can be found in Section 3 and the method of 3D transform and other transform domain process is given in Section 4.Experimental results and analysis are presented in Section 5.Finally,Section 6 contains relevant conclusions.

2.Fusion framework

Spatial and transform domain techniques are the two-major pixel-level techniques.In terms of the structure,spatial domain techniques are usually haphazard and changeful,because spatial domain methods usually perform by combining input images in a linear or non-linear fashion using weighted average or variance based algorithms[16].However,the structure of most transform domain methods is relatively fixed,because much of the innovation happens in transform domain,where the actual fusion takes place.The main motivation of moving into the transform domain is to work within a framework,where the salient features are more clearly depicted.

2.1 Improvement of existing frameworks

A typical transform domain fusion framework can be described as Fig.1.Both the source images A and B are first transformed to a transform domain(e.g.,DWT domain).Then,each of the input images can be divided into a low frequency coefficient and a series of high-frequency coefficients.The fused low-and high-frequency components are gained from their coefficients in both images through different fusion rules.The final fused image F is received by taking the inverse transform of the composite representation.

Fig.1 Block diagram of a typical fusion framework of existing transform domain image fusion algorithms

Such a transform domain fusion framework has been widely used.In most cases,the innovations can mainly be reflected in two aspects:one is the innovation of transform methods,which means replacing the transform; the other is the creation of new fusion rules.However,transform coefficients are operated by fusion rules directly,which easily leads to the production of distortion and artificial texture.It is worth noting that these problems are not only a matter of multi-scale transform itself,but also the deficiencies of not making use of spatial information.

The transform adopted here are:3D transform with 2D direct cosine transform(DCT)(BMDCT),3D transform with 2D DWT(BMDWT),3D transform with 2D NSCT(BMNSCT),3D transform with 2D NSST(BMNSST).As for fusion rules,it mainly adopts max and mean,image region energy[14]and chose max intra-scale[15].

The proposed framework can be seen in Fig. 2. The main improvements are:blocking the input images A and B by a fixed sliding window; setting a fixed searching area based on the current block and matching them by their similarities;grouping the chosen blocks and arranging them to a 3D array;transforming the 3D arrays above into transform domain by using a 3D transform.After the fusion in transform domain,the coefficients are transformed back by the inverse 3D transform;separating the 3D array and putting the image blocks to their original positions;the overlapped pixels need to be recalculated by the aggregation algorithm to get the fused image F.

Fig.2 Block diagram of the proposed image fusion framework

2.2 Comparing with BM3D

As mentioned previously,the idea of BM3D is referenced in the proposed framework.Before gathering coefficients in the transform domain,an added procedure of BM3D is used.Analogously,the aggregation step is adopted after inverse transform.The procedures of these structures have been boxed by blue dashed lines in Fig.2.

Compared with the BM3D in image de-noising,there area lot of differences within the proposed framework. The BM3D algorithm can be divided into two sections. The first is the basic estimate. The input noisy image is processed by successively extracting reference blocks from it.And for each block,the algorithm finds their similar one and stack them together to form a 3D array[13].After 3D transform,a threshold is used to help reduce noise.The second step is the final estimate.The previous result is grouped again by the same process.Then,apply 3D transform on both groups(one from the noisy image,the other from the basic estimate)and perform Wiener filtering on the noisy one using the energy spectrum of the basic estimate as the true(pilot)energy spectrum[12].Finally,all the blocks obtained after the inverse transform need to be returned to their original positions.

From above description,the overall structure of the two estimate steps in BM3D is basically the same.Thus,there is no need to adopt a second step in image fusion.Besides,as the process is different between image fusion and denoising,the Wiener filter also is not useful for current coefficients.For further optimizing coefficients,a method of threshold shrinkage is adopted here.In the subsequent sections,the spatial domain processing,3D transform and aggregation will all be introduced in detail.

3.Spatial domain processes

The proposed framework can be divided into two parts:the spatial domain section and the transform domain one,besides,the spatial domain process can also fall into blocking and grouping.For blocking,a method by using the sliding window is adopted.For grouping,it means stacking the 2D block with high similarity together and forming them into 3D arrays.The measurement of similarity is achieved through calculating block distance.

Such steps switch the 2D images into 3D image arrays,and it is a kind of potential similarity(correlation,affinity,etc.)across the blocks which are used in the arrays.In this way,a better estimate of the distinct image can be obtained through the data with this potential relevance.The approach that groups low-dimensional data into high-dimensional data sets enables the use of high-dimensional filtering to process these data,hence it is defined as collaborative filtering[12].

3.1 Blocking and matching

According to a certain window size and fixed sliding step,a series of image blocks can be obtained.Then we filter out some of the blocks within the search area in accordance with a pre-selected searching rule and threshold set.The applied algorithm process for each image block is illustrated in Fig.3(a).The white boxed area which is enlarged on the right is the search area that uses the reference block(marked R)as the center,and the similar blocks(marked S)are pointed out by black dash arrows.The block matching process is as follows:

(i)Select the current block as reference one;

(ii)Draw a fixed search area centered on the reference block(for16×16pixels’image blocks,the set of an80×80 pixels search area is reasonable);

(iii)For each image block contained,we denominate them as candidate blocks,and calculate the distance metric between each of the candidate and reference blocks;

(iv)List the distance of all blocks in the region in a descending order,and the least one is defined as the most similar one;

存在项目（课题）牵头单位、课题承担单位、课题参与单位之间存在关联关系，或项目（课题）负责人、课题负责人与课题参与单位之间存在关联关系的，没有在申报项目（课题）时予以披露报告。导致有的科研经费流入了项目（课题）负责人的关联公司［5］。

(v)Compare the distance of blocks with a pre-set threshold,and all blocks less than the threshold are defined as similar;

(vi)Arrange these similar blocks into an array sorted by their similarity.

Fig.3 Schematic diagram of the procedures in spatial domain

3.2 Grouping by similarity

To better reflect the similarity,a typical regulation is using the inverse of some distance-measure.For image blocks,the smaller the distance between the reference blocks,the more similar the blocks are.Typical distance measures are norms,like the Euclidean distance(p=2)[17],the pnorm used in denoising,the lp-norm in different signal fragments,and the Kullback-Leibler distance used in texture detection[18].

In fact,a similar block choosing approach is diverse,so it can be considered as a clustering or classification approach.There are a series of literature systematically introducing many classic methods,e.g.,K-means clustering[19],fuzzy clustering[20],vector quantization[21].For these classification approaches,their classification result is no cross terms,which is because their idea of classification is based on segmentation or partitioning.In other words,one block can only belong to a specific group.To construct such two disjoint groups which own many elements with high mutual similarity,the conventional method requires a lot of recursive computation cycles,which needs vast computing power.Moreover,such a screening method will lead to the inequality of the distribution of fragments,this is because the fragment close to the centroid will be more similar than the farther one.Such is often the case,even under the exceptional circumstance that all fragments are equidistantly distributed.

The proposed matching method can be implemented to the intersection-contained classification of mutually similar signal fragments. This is done by the pairwise test of the similarity between the reference and candidate blocks.In such classification,the choice of similarity can be regarded as a classification function,and the chosen reference block refers to the centroid of the group.Thereby,the approach avoids the problem of disjoint groups.

The grouping and matching process can be seen in Fig.3(b),which shows,to complete the work of the whole image,we need to traverse all the blocks by using the same process as the reference one.Besides,each block needs to be used as the reference block respectively to find their similar blocks.

3.3 Similarity measurement

As mentioned above,the proposed framework adopts lpnorm as the similarity measurement.Here,two input imagesA,B are processed by the same step in spatial domain,therefore,we only useA as an example for illustration,and the final fused image is denoted by F.

The main reason for the excellent effect of BM3D is that it makes good use of the similarity of the noise signal in different similar blocks.Since it could achieve a better separation of the noise signal in the similar region by matching and grouping image blocks,this algorithm provides better performance than traditional ones.Thus,the spatial domain information can assist transform domain processes to achieve outstanding performance.With the BM3D-like algorithm structure,i.e.,before switching to the transform domain,introducing blocking and grouping steps,the similarity in spatial domain can be fully taken advantage of.

For image A,we denotex as a set of 2D spatial coordinate and its value belongs to 2D image domain X⊂Z2.Thus,for any fixed size N×N block split out from A,it can be expressed as Ax,where x represents the coordinate of the top-left cornerof the image block,i.e.,Axis the image block of the image A which is adhered at the location of x.For image block groups,the form can be represented by a set,which is denoted by a bold-face capital letter with a subscript to express the set of all coordinates in the group.For example,ASrepresents a 3D array which is composed of a plenty of Ax,here the position

where T3D(·)represents the 3D transform and denotes the set of coefficients after the 2D transform,which means

（4）绩效结果应用有待加强。目前，绩效评价结果主要是真实反映实施情况、发现项目管理存在的问题、完善项目管理相关制度层面，评价结果对于支出分配和支出项目管理应有的参考作用、导向作用和制约作用尚未得到充分体现。

In addition,we define d as the calculated distance measure between blocks.And for distinguishing different parameter selections,we use the superscripts “ideal”to denote the distance in an ideal condition likedidealand “real”for practical situation,e.g.,dreal.

3.3.2 Block distance

As introduced in section 3.1,the block-distance(dissimilarity)is a pairwise calculation of reference and candidate blocks.Thus,we define AxRas the reference block,AxCas the currently selected candidate block,here,xR∈X,xC∈X.

The dissimilarity between blocks is determined by the given reference block and a fixed threshold,which means the block is deemed to be similar when the distance with the reference block is smaller than the threshold.The distance is obtained through l2-norm calculation between the blocks.

In an ideal situation,the block-distance of the input defocus-imageA should be determined by the corresponding blocks in the true-image T,that is A=.Therefore,it can be calculated as

where means the l2-norm,TxRand TxCdenote the blocks at the corresponding location in the true-image of the reference and candidate block respectively,here

After the grouping step,we can get the transform coefficients by a 3D transform.Then high-and low-frequency components are processed in different rules,since they indicate details and approximation information respectively.Since there will be overlap,if we return fused blocks to their original position,an aggregation process is used to calculate the final pixel value of each position to avoid this.

However,such calculation does not consider the difference between the ideal and the reality,if the gap between didealand drealdoes not exceed the threshold set range,it will not affect the grouping result,but if the difference exceeds the boundary,that means grouping error.In practical situation,a too small block size or sliding step,or the chosen search area is exactly the defocus area,these all will cause difference between didealand dreal.In this case,the block will still be matched as similar because the real distance is smaller than the threshold, however, the distance in true-image has already crossed the threshold.Analogously,it is also possible that the block be excluded as dissimilar though the ideal distance is smaller than the threshold.

To address this,we employ a coarse 2D linear prefilter [12] to preprocess the two original blocks. Such a prefiltering is applying a normalized 2D linear transform on both blocks,and then the threshold is used on the obtained coefficients.This approach relatively reduces the false positives,and the final distance can be calculated as

where f2D(·)represents the function of the 2D linear filter.

As mentioned before,the results calculated by ddistance(3)is presented in the form of a set.Therefore,the set of all the coordinates x of the similar blocks of the reference block AxRcan be expressed as

where τmaxis a threshold that represents the maximum dd-distance of two blocks which are considered as similar.The selection of τmaxis based on the acceptable value of the ideal difference of the natural images.

虚拟现实VR广告是一种创新的广告型态，主要分为各式加载画面广告与应用内植入性广告，形式可以是360度全景视频广告、剧院级大屏幕视频广告、3D模型广告、应用推荐以及综合上述形式的混合型广告。相比于传统的广告，虚拟现实VR广告能够让用户拥有身临其境的沉浸式体验，VR体验内植入不干扰用户的广告形式大大提高了广告效果。可以说，虚拟现实VR广告正在逐步的改变我们传统的广告投放模式，虽然虚拟现实VR广告还没有普及，但是其前景是十分广阔的。

Since the reference block itself is also in the search area,so we have d(AxR,AxR)=0,i.e.,for each reference block AxR,it has at least one similar block(i.e.,itself),SxRwill not be empty.After obtaining coordinates set SxR,we can use similar blocks Ax∈AS,x∈SxRto form a 3D array of size N×N×NS,denoted as ASxR,whereNSdenotes the number of similar blocks.After that,we can obtain a collection of 3D arrays,as shown in Fig.3(b).The length of each 3D array in the collection is not a fixed value,but is decided by the number of the similar blocks NS.

TL-1是该锡石浮选的主要捕收剂，其性能较为稳定。适用范围广，对矿石性质波动大，原矿含泥量、含锡品位、含硫品位波动均较大等的影响较小，能够最大限度提高了锡石浮选精矿品位和作业回收率，其溶解度较低，在配制过程中，需要用70℃以上的热水进行配制，且要加入一定量的助溶剂碳酸钠。

第二，特色优势。指的是在和竞争对手的竞争中，除了要打败竞争对手，还要具备与竞争对手不一样的能力，这种能力是自身独特的、长期持有的能力，不容易被对手窃取或模仿。

3.3.3 Block-matching effect

In practical application,for 512×512 pixel resolution natural images,it would be suitable to set the sliding window between 8×8 to 16×16pixel,so the blocks can contain more local edge features.Besides,to reduce blocking effects,the step length of the sliding window is typically less than the window size.Fig.4 shows the selection of similar blocks.In each image,the red translucent square means reference block,other green translucent squares represent similar blocks found in the search area.The window size of the first line is 16×16 pixel and the second line is 8×8 pixel.

全国秋季肥市场即将全面启动，中农控股相关业务单位早谋划、早部署、早动手，积极开展营销措施，确保完成销售目标，打赢秋季售肥攻坚战。

Fig.4 Illustration of the selection of similar blocks in natural images

It can be found that similar details exist extensively among natural images,which is in the form of small edge segments.In addition,the similar blocks are scattered around the same focal plane or the junction of different focal planes.This can assist the subsequent algorithm to further integrate information,thus optimizing the fusion effect.

The selection of similar blocks between multi-focus image groups can be seen in Fig.5.For the two groups of images(Clock&Pepsi),we search similar blocks on both the focus position of one image and the corresponding position of the defocus image.The first line of the illustration selects the focus position and the second line stands for its defocused image.All the images have already gone through a rigorous registration process.The window size of the left image of each group is 8×8 pixel and the right one is 16×16 pixel.

Fig.5 Illustration of the selection of similar blocks in multi-focus image groups

As can be seen,the selection of similar blocks is approximately the same between the focus region of one image and the defocus region of its counterpart.Besides,since the similarity measures are much closer,the defocus image usually has more similar blocks.Therefore,whether the current group represents the focus region can be determined by comparing the number of similar blocks of this group and its counterpart.More details can be obtained by using this as an instruction for subsequent works,especially for the fusion rules design.

旁边，亦有来探视的人。一个长相甜美的女孩子，在玻璃窗外头，不停地用手指头在举起的另一个手掌上画着什么。里头是个清秀的男孩子，他眼睛跟着女孩的手指转动，频频点头，含着泪笑。他读懂了她爱的密码——从此，都改了吧。还有几个人，男男女女，大概是一家子，围在一起争着跟里面的一个中年人说话。里面的中年人，一张脸憔悴无比，却一直笑着，一直笑着。这时，他们中的一个突然到探视室外面叫了一个男孩进去。孩子不过十一二岁，白净的面容，文文弱弱的。孩子怯怯地打量了一下四周，然后拿过话筒，隔着玻璃窗，才说了一句什么，里面笑着的中年人立即不笑了，他愣愣地看着孩子，眼泪流了下来。

4.Transform domain processes

Obviously,the true image T is unavailable.And as the best estimate of T,the fused image F= is also unpredictable.Therefore,the distance can only be obtained by AxRand AxCthemselves,as

4.1 3D transform

The 3D transform is a combination of 2D and 1D transforms,that is,for each 2D image block in the 3D array we adopt a traditional 2D transform,followed by a 1D transform on each column of this array(i.e.the third dimension).This paper uses several 2D transforms to present a comparative analysis.For the 1D transform,we adopt the DCT,since it could reduce significant coefficients.

如利用教材九年级化学上册第53页表3-1，教师设置问题：（1）原子有什么共同点、不同点？（2）有什么独特的地方？（3）存在什么规律？

Collaborative filtering,as mentioned in Section 3,is very effective for multi-focus images,because of the use of spatial correlation in the filtering and the creation of sparsity by the shrinkage after transform.These processes reduce the uncertainty of the fused image,and create the possibility of optimizing the result.

The correlation here means the correlation within the single image block(i.e.intra-block correlation)and within the whole group(i.e.intra-group correlation).The intra block correlation refers to the connection between different pixel values in one block.The intra-group correlation shows the similarity relevance of blocks and their corresponding spatial regions.

The reason for adding a 1D transform in the third dimension is to further optimize the coefficients.For the n blocks in one group,if we use 2D transform,there will be nλ similar coefficients obtained,where λ denotes the number of coefficients of one block.Such a method is not only of minor efficiency but also does not use the intragroup similarity in transform domain. If we add a 1D transform between the transformed blocks(i.e.applying a 1D transform on each column of pixels in the same position of many blocks),there will be only λ significant coefficients approximately representing the results of the entire group.For the coefficients after the 3D transform,there should be a shrinkage before the fusion rules.To facilitate subsequent calculations,we use a hard-threshold operator to rapidly filter out significant values.

For the 3D transform of one grouped array,the process can be divided into a 2D transform T2D(·)which is followed by a 1D transform T1D(·)across all the blocks.

The process can be presented by Fig.6,wherein the redcross arrow marks the unfolding surfaces of the 2D transform and the one-way arrow indicates the direction of the 1D transform.

Fig.6 Schematic diagram of 3D transform

Since the set of grouped blocks can be denoted asASxR,its 3D transform coefficients can be expressed as

拟流体模型将宏观上离散的颗粒经过空间或时间平均，处理成与连续流体相一样，从流体力学的角度研究两相流动，给出颗粒相和流体相空间分布的详细信息，颗粒相只被处理成一相的拟流体模型又称为双流体模型(TFM)。模拟结果和实测结果容易进行比较，在工程实际中得到了广泛的应用，成为气固两相流发展的主流。两相方程形式统一，可利用单相流流体力学的成果，研究比较成熟，方程求解方便。颗粒间的相互作用通过颗粒压力和黏度来描述，不需考虑颗粒的数量，便于求解颗粒浓度较高的体系。由于采用平均方法处理，对流场内单个颗粒的行为无能为力，且不适用于气固两相流的不均匀流动结构。

从整体情况看，参评作品技术水准较高，图片制作精良，有想法有探索精神，善于使用新技术新手法。只是在追逐时尚的大环境影响下，除了对卓越技艺的追求外，还需更多关注与探索摄影的本质。

The decorated letter Axmeans the 2D transform coefficients of the intra-group block Ax,that is Ax =T2D(Ax).In transform domain,the first step is threshold shrinkage,the coefficients are processed through a set hard threshold filter fht(·),and then for two groups of coefficients,we use different fusion rules to filter the high-and low-frequency components respectively.Generally,the hard threshold filter may be defined as

由于学生的了解能力、兴趣与接受能力不同，学生的学习情况与进度必然存在着一定的差异，因此教师应该进行因材施教，充分融合教材内容制作难度不同的微课课件；以拓展学生的知识储备为目的，根据学生的日常生活环境创设完善的教学情境；以沟通交流为目的，建设完善的微课课堂，使其成为教师与学生之间进行交流和互动的平台。为了帮助学生更好的熟悉这种教学方法，同时也为了提高学生的学习兴趣和积极性，教师在进行课上教学时，要积极的应用网络与多媒体[2]。

where τ represents the current input coefficient, τhtis the fixed threshold parameter.Then transform coefficients used for fusion rules can be expressed as

where denote the coefficients after the hard threshold operation.To achieve a higher pulse signal to noise ratio(PSNR)and do not affect image clarity,according to practical applications in [12,13], for 256 gray images the value of τhtwe use 40.

4.1.2 NSST

For 2D transform,we compare several widely used transforms,including 2D-DCT,DWT and NSCT in the existing fusion,as well as the most effective and efficient transform NSST which is mainly used in this paper.The following is a brief introduction of it.

The NSST is constructed through affine systems with composite dilations,when the dimension n=2,the affine systems can be defined as follows:

whereψ ∈ L2(R2),D,S are both 2×2invertible matrices and|detS|=1.The matrix D is known as the dilations matrix,while S stands for the shear matrix.

Ifis met with any f ∈L2(R2),it implies that MDS(ψ)forms the tight frame,which means it is compactly supported,then the elements of MDS(ψ)are called composite wavelets.We call the elements of MDS(ψ)Shearlets,only when the values of D,S are defined as follows:

The discretization of NSST consists of two phases including multi-scale and multi-direction decomposition.For multi-scale decomposition,NSST adopts the nonsubsampled pyramid(NSP)as the decomposition approach.

By using NSP,one low-frequency sub-band image and k high-frequency sub-band images can be obtained from the original source image,throughk levels decomposition,in which each level can decompose out both a low-and a high-frequency sub-image,and every subsequent decomposition takes place on the low-frequency sub-image of the up one level iteratively.The NSST decomposition process is illustrated in Fig.7 where “SF”is the abbreviation for“shearing filter”.

For multi-direction decomposition,it is realized through a modified SF in NSST.Roughly speaking,the conventional SF is realized by translating the window function on the pseudo-polar grid,while the non-subsampled SF maps the pseudo-polar grid back to Cartesian grid system,so the entire process can be directly completed through the 2D convolution.The support zones of NSST is a pair of trapeziform zones with the size of 22j×2j,whichis shown in Fig.8.

Fig.7 Schematic diagram of multi-scale decomposition of NSST

Fig.8 Trapeziform frequency support zones of an SF

4.2 Fusion rules

Since the paper focuses on the study of the fusion framework,we only try some common fusion rules to integrate them with the entire framework.Besides,we do not make any in-depth discussion of the influence by using different fusion rules.

As described before,the main purpose of the 1D column transform is to utilize correlation,to reduce significant coefficients and to facilitate calculation,it does not destroy the positional distribution of the different frequency components which are obtained by the previous 2D scaling transform.The fusion takes place on the 2D surfaces of the 3D array.Hence,the high-and low-frequency components and fusion rules described subsequently still aim at 2D surfaces.

水利普查档案是指全国各级水利普查机构在水利普查工作中形成的、具有保存价值的、各种形式与载体的历史记录。水利普查档案属于水利科技档案范畴，是水利普查的重要成果，是国家档案资源的重要组成部分。水利普查档案应真实、准确、全面、系统地反映水利普查工作的过程与结果。

In addition,because the numbers of similar blocks between the corresponding reference blocks in two images are not always equal,the smaller one is used as the final number of the fused blocks.That is,if there are n blocks in ASxRand n+m blocks in BSxR,then we only take the first n blocks in BSxRto use.As mentioned in section 3.3.3,the defocus area usually has more similar blocks,therefore,choosing the smaller one can provide more focus information for the rules.

4.1.1 Theory of 3D transform

4.2.1 High frequency fusion rules

High-frequency coefficients usually contain salient fea-tures,such as contours and edges.Therefore,the higher the high-frequency coefficients’value is,the more decisive it represents the region’s change.For high frequency fusion rules,the basic one is“choose max”(CM),which selects a larger absolute value as the result.Besides,another rule improved from CM is“choose max by intra-scale grouping”(CMIS)[15],and it introduces a rule across different decomposition levels.

这款刷脸支付产品的外形如同1个台灯，只是取代“灯泡”位置的是1块书本大小的刷脸显示屏。将它接入人工收银机，并放置在收银台上，顾客只要对准摄像头就能快速完成刷脸支付。

(i)CM

The CM rule is selecting higher energy coefficients as the fused decomposed representation.Accordingly,for the fused coefficient Fxin the position(i,j),which is in the lth decomposition level and thekth sub-band,it can be represented as

where denote the magnitude coefficients of their input block respectively.

(ii)CMIS

Since each high-frequency coefficient has correlation with others in different scales and directions,the simple CM rule cannot be well combined with the multi-scale decomposition.Therefore,for all the high-frequency coefficients at different scales and directions,they all should be compared in a composite result,that is

where the judgment condition is the summation of k subbands coefficients,so it connects each decomposition level and direction to determine the fused coefficients.

4.2.2 Low frequency fusion rules

The low-frequency coefficient fusion uses two kinds of fusion rules,one is the simple averaging rule,and the other is an effective rule based on region energy.

(i)Averaging

The low-frequency coefficient reflects the background information.Therefore,it may not be able to obtain the significant salient feature,even though we use a high-pass requirement.Hence,we usually use averaging operation:

where Axand Bxdenote the approximation coefficients of the input blocks respectively.

(ii)Region energy

For the low-frequency component,if we only take some algebraic method like average,it is easy to lose some approximate information and cause a larger gray difference[22].Therefore,we adopt the fusion rule based on region energy[14].The low-frequency sub image of each image block is subdivided again into several 3×3 or 5×5 pixels regions and then its region energy is calculated.The region energy centers on coordinate(i,j)can be expressed as En(i,j),where n represents coefficient Axor Bx.The formula could be

where N is the number of ranks of the region.Therefore,the fusion rule is

where Fxrepresents the fused low-frequency coefficients,EAxandEBxare the regionenergy of coefficientsAxand Bx.

4.3 Aggregation of blocks

A series of after-fused 3D arrays like FSxRwhich share the analogous structure with ASxRcan be obtained by the inverse 3D transform.By restore the image blocks in these arrays to a 2D surface,the fused image F can be got.In general,there will be overlap of pixels between blocks.Here,an averaging operation is adopted to aggregate overlapped blocks together to form the 2D image.

Overlap is caused by block selection,i.e.,the same area of image may be selected as part of some similar blocks for many times,while after the transform domain process there might be variance among some pixels.

For example,Fxmis an image block belonging to the array FSxM,who is located on xM,while in another array FSxNthere also exists the same block Fxm,however FSxNis obtained by the reference blocks at xN.

To solve this,we calculate the mean value of the overlap pixels as the final value.For overlapping blocks,[23]has more in-depth explanation,roughly speaking,different arrays containing overlapping image blocks are statistically correlated,biased and each pixel included has different variance.In image de-noising,they use a weighted averaging method,where the weights are inversely proportional to the total sample variance to reduce the weights of noise[13].While in image fusion,such a weighted method will easily lead to edge-smoothing.Thus,an averaging method is adopted.Therefore,for the coefficient of each image block’s pixel values ωxR,it can be defined as where nxRis the number of retained non-zero coefficients in FSxR,so the final fused image F may be loosely expressed as

where xMis the coordinate of an unspecific reference block,xmdenotes the position of a similar block included in the xMlocating group.

5.Experimental results

In order to provide an effective evaluation of the proposed framework,we carry out three groups of comparative experiments in this paper.Four different sets of 256 gray levels multi-focus natural images are employed in the experiments,and we comparatively analyze the proposed algorithm through the subjective visual effects and objective evaluation criteria.

In subsequent experiments,the size of the image blocks is 16×16 pixels,and the step of the sliding window is 8 pixels.For DWT,we use a three levels db2 wavelet.The decomposition level of NSCT is 4,and there are respectively 2,8,16,16 directional sub bands in each level,for non-subsampled filter banks,we use “9-7”as a pyramid filter and “pkva”as a direction filter.For NSST,we use three levels multi-scale decomposition,the numbers of directional sub bands of each level are 10,10 and 18,and the pyramid filter is“max fl at”.For the NSCT in BMNSCT and NSST in BMNSST,the decompositions are both two levels.In addition,all experiments are implemented on an Intel Core i5 2.27 GHz with 4GB RAM.The simulation software is MATLAB 2014a.

5.1 Evaluation criteria

The experiments use image entropy(EN)[7],average gradient(AVG)[24],normalized mutual information(MI)[25],edge based similarity measure(QAB/F)[26],structural similarity(SSIM)[27]and standard deviation(STD)[28],as the evaluation criteria.

EN represents the richness of information.The larger the value of entropy is,the more information the image includes.

AVG is an indicator of contrast.The larger the AVG is,the more gradation the image reveals.

MI calculates how much information in source images is transferred to the fusion result.Higher MI means the fused image contains more information about the source images.

QAB/Fusing the Sobel operator gives the similarity between the edges transferred during the fusion process.A higher QAB/Fvalue indicates that more edge information is obtained.

SSIM measures the structural similarity between fused and source images.An SSIM value which is closer to 1 means a better fusion.

STD indicates the distribution of pixels.The larger the STD is,the more discretely the pixel values distribute,the more information the image contains.

5.2 Experiment with DCT

The first experiment is the comparison between the DCT and the BMDCT,and it is used to reflect the advantages of the proposed framework in transform domain fusion.The experiment uses DCT as the control group,and BMDCT as the experimental group.The results are shown in Fig.9.

As can be seen from Fig.9,especially from the enlarged view that BMDCT has obvious advantages when compared with DCT.The proposed framework significantly weakens the artificial texture appearing on DCT fusion,thus making the BMDCT result become smoother, flatter and more natural.

Fig.9 Source images and fusion results of DCT and BMDCT

5.3 Experiments with different transforms

This experiment evaluates the proposed framework together with various transforms.Four improved algorithms are used here:the BMDCT,the BMDWT,the BMNSCT and the BMNSST,and the fusion rules are CM and“Averaging”.The pair of source images are shown in Fig.10(a)and Fig.10(b).

As can be seen in Fig.11,Fig.11(a)is the result of BMDCT;Fig.11(b)is the result of BMDWT;Fig.11(c)is the result of BMNSCT;Fig.11(d)is the result of BMNSST;Fig.11(e),Fig.11(i)are Fig.11(a)’s difference map with Fig.10(a)and Fig.10(b);within the next three columns occurs the same.

Fig.10 Source images of experiment

Fig.11 Fusion results of different transforms on source image“Lab”

For subjective visual effects,among the four transforms with block matching and 3D transform,the BMNSST approach is the best one,because of the clearer edges,more abundant textures and better retention details.The difference map shows that BMNSST has not only the best integration of the focus area,but the least artificial textures and blocking effects,which is followed by BMNSCT.

In addition,we also examine the transforms from objective criteria.As is shown in Table 1,the fusion method of BMNSST has the best scores on EN,STD,MI,QAB/F and the second best one on AVG.Therefore,the fused image of BMNSST has the maximum amount information of the source image.

Table 1 Objective criteria comparison of different fusion algorithms with different transforms on source image “Lab”

Fusion Objective criteria method EN STD AVG MI QAB/F SSIM BMDCT 7.028 1 46.794 1 3.070 6 5.423 4 0.469 6 0.856 3 BMDWT 7.039 947.310 83.707 07.089 9 0.718 9 0.895 8 BMNSCT 7.068 748.097 73.774 17.196 4 0.720 3 0.894 0 BMNSST 7.070 148.242 53.767 87.207 6 0.729 2 0.892 8

5.4 Comparing with classic methods

This experiment uses two groups of images to compare the results of some classic fusion algorithms and the BMDWT,BMNSCT,BMNSST algorithms using the improved fu-sion rules,i.e.,CMIS and region energy(RE).Therefore,the experiment is the horizontal comparison of the best combination in this paper(e.g.,BMNSST-CMIS)and some existing transform domain fusion methods(e.g.,DWT-MAX,NSCT-MAX and NSST-MAX).

The first pair of source images “Pepsi”is shown in Fig.10(c)and Fig.10(d)and the second pair“Clock”is shown in Fig.10(e)and Fig.10(f).The experimental results of “Pepsi”can be seen in Fig.12.Fig.12(a)is the result of DWT-MAX;Fig.12(b),Fig.12(c),Fig.12(d),Fig.12(e),Fig.12(f)are the results ofNSCT-MAX,NSSTMAX,BMDWT-CMIS,BMNSCT-CMIS and BMNSSTCMIS respectively;Fig.12(g),Fig.12(m)are Fig.12(a)’s difference map with the sources images;within the next five columns occurs the same.Correspondingly,the results of“Clock”can be seen in Fig.13.

Fig.12 Fusion effects of each transform domain method on source image “Pepsi”

Fig.13 Fusion effects of each transform domain method on source image“Clock”

As can be seen from the subjective visual effects,compared with the existing algorithms,the proposed algorithm(i.e.,BMNSST-CMIS)has a better performance on edge details.Some salient features on its result are clearer than others.From the comparison of the difference maps,we see that the fused image of the proposed algorithm is more similar with the source images.That is,the proposed method better restores the focus area of the source images.Besides,the performance of BMNSCT and BMDWT is also better than the results of their original transforms.

In terms of objective criteria,the result of “Pepsi”and “Clock”can be seen from Table 2 and Table 3 respectively.Compared with the existing algorithms,the proposed algorithm has relatively good performance on four of six evaluation indexes.Especially on EN and MI,compared with NSST-MAX and NSCT-MAX,BMNSCTCMIS and BMNSST-CMIS are improved significantly,and this shows that their results are more similar with both input images on the edges structure.

Table 2 Objective criteria comparison of different transform domain fusion algorithms on source image “Pepsi”

Table 3 Objective criteria comparison of different transform domain fusion algorithms on source image “Clock”

6.Conclusions

In this paper,a multi-focus image fusion framework based on block-matching and 3D transform is proposed.Compared with existing ones,by using blocking and grouping,the proposed method makes it possible to further utilize spatial domain correlation in the transform domain fusion.The algorithm forms similarly block into 3D arrays by using block-matching steps.Then use a 3D transform which consists of a 2D and a 1D transform to transfer the blocks into transform coefficients and process them by fusion rules.The final fused image is obtained from a series of fused 3D image block groups after the inverse transform by using an aggregation process.Experimental results show that the proposed algorithm outperforms traditional algorithms in terms of qualitative and quantitative evaluations.Despite of many blocking and matching works,the efficiency of the algorithm is yet to be improved,therefore,how to reduce the time complexity will be the main research directions in the future.Besides,fusionrules are not discussed in depth in this paper,which also require further studies.

References

[1]HAGHIGHAT M B A,AGHAGOLZADEH A,SEYEDARABI H.Multi-focus image fusion for visual sensor networks in DCT domain.Computers&Electrical Engineering,2011,37(5):789–797.

[2]ZHANG Z,BLUM R S.A categorization of multiscalede composition-based image fusion schemes with a performance study for a digital camera application.Proceedings of the IEEE,1999,87(2):1315–1326.

[3]PAJARES G,CRUZ J M D L.A wavelet-based image fusion tutorial.Pattern Recognition,2004,37(9):1855–1872.

[4]CANDES E J.Ridgelets:theory and applications.Stanford,USA:Stanford University,1998.

[5]COHEN R A,SCHUMAKAR L L.Curves and surfaces.Nashville:Vanderbilt University Press,2000.

[6]DO M N,VETTERLI M.The Contourlet transform:an efficient directional multi-resolution image representation.IEEE Trans.on Image Processing,2005,14(12):2091–2106.

[7]ZHANG Q,GUO B L.Multifocus image fusion using the nonsubsampled contourlet transform.Signal Processing,2009,89(7):1334–1346.

[8]WANG J,PENG J Y,FENG X Y,et al.Image fusion with nonsubsampled contourlet transform and sparse representation.Journal of Electronic Imaging,2013,22(4):043019.

[9]GUO K,LABATE D.Optimally sparse multidimensional representation using shearlets.SIAM Journal on Mathematical Analysis,2007,39(1):298–318.

[10]NUNEZ J,OTAZU X,FORS O,et al.Multire solution-based image fusion with additive wavelet decomposition.IEEE Trans.on Geoscience and Remote Sensing,2002,37(3):1204–1211.

[11]GENG P,WANG Z Y,ZHANG Z G,et al.Image fusion by pulse couple neural network with shearlet.Optical Engineering,2010,51(6):067005-1–067005-7.

[12]DABOV K,FOI A,KATKOVNIK V,et al.Image denoising with block-matching and 3D filtering.Proc.of SPIE-IS&T Electronic Imaging:Algorithms and Systems V,2006,6064:606414-1–606414-12.

[13]DABOV K,FOI A,KATKOVNIK V,et al.Image denosing by sparse 3D transform-domain collaborative filtering.IEEE Trans.on Image Processing,2007,16(8):2080–2095.

[14]TIAN J,CHEN J,ZHANG C.Multispectral image fusion based on fractal features.Proceedings of SPIE,2004,5308:824–832.

[15]BHATNAGAR G,WU Q M J,LIU Z.Directive contrast based multimodal medical image fusion in NSCT domain.IEEE Trans.on Multimedia,2013,15(5):1014–1024.

[16]KUMAR M,DASS S.A total variation-based algorithm for pixel-level image fusion.IEEE Trans.on Image Processing,2009,18(9):2137–2143.

[17]BUADES A,COLL B,MOREL J M.A review of image denoising algorithms,with a new one.SIAM Journal on Multiscale Modeling and Simulation,2005,4(2):490–530.

[18]DOM N,VETTERLI M.Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance.IEEE Trans.on Image Processing,2002,11(2):146–158.

[19]MACQUEEN J B.Some methods for classification and analysis of multivariate observations.Proc.of the 5th Berkeley Symposium on Mathematical Statistics and Probability,1967:281–297.

[20]HPPNER F,KLAWONN F,KRUSE R,et al.Fuzzy cluster analysis.Chichester:Wiley,1999.

[21]GERSHO A.On the structure of vector quantizers.IEEE Trans.on Information Theory,1982,28(2):157–166.

[22]JIANG P,ZHANG Q,LI J,et al.Fusion algorithm for infrared and visible image based on NSST and adaptive PCNN.Laser and Infrared,2014,44(1):108–112.(in Chinese)

[23]GULERYUZ O.Weighted over complete denoising.Proc.of the 7th Asilomar Conference on Signals,Systems and Computers,2003,2:1992–1996.

[24]LIU S,ZHU Z,LI H,et al.Multi-focus image fusion using self-similarity and depth information in nonsubsampled shearlet transform domain.International Journal of Signal Processing,Image Processing and Pattern Recognition,2016,9(1):347–360.

[25]QU G H,ZHANG D L,YAN P F.Information measure for performance of image fusion.Electronic Letters,2002,38(7):313–315.

[26]XYDEAS C S,PETROVI V.Objective image fusion performance measure.Electronics Letters,2000,36(4):308–309.

[27]WANG Z,BOVIK A C,SHEIK H R,et al.Image quality assessment:from error visibility to structural similarity.IEEE Trans.on Image Processing,2004,13(4):600–612.

[28]MIAO Q G,SHI C,XU P F,et al.Multi-focus image fusion algorithm based on shearlets.Chinese Optics Letters,2011,9(4):25–29.

作者

YANGDongsheng，HUShaohai，LIUShuaiqi，MAXiaole，SUNYuchao

基金

分类号

出处

《Journal of Systems Engineering and Electronics》 2018年第2期

上一篇：Hybrid artificial bee colony algorithm with variable neighborhood search and memory mechanism

下一篇：Remaining useful life prediction for a nonlinear multi-degradation system with public noise

《Journal of Systems Engineering and Electronics》2018年第2期文献

Algorithm for source recovery in underdetermined blind source separation based on plane pursuit 作者：FUWeihong，WEIJuan，LIUNaian，andCHENJiehu

Improved pruning algorithm for Gaussian mixture probability hypothesis density filter 作者：NIEYongfang，andZHANGTao

Weak GPS signal acquisition method based on DBZP 作者：WANGJianing，LIANBaowang，andXUEZhe

Self-adapting radiation control method for RFS in tracking 作者：LIUHongqiang，YULei，YANGHaiyan，andZHOUZhongliang

Constructions for almost perfect binary sequence pairs with even length 作者：PENGXiuping，LINHongbin，RENJiadong，andCHENXiaoyu

Multi-sensor optimal weighted fusion incremental Kalman smoother 作者：SUN Xiaojun and YAN Guangming

Deceptive jamming suppression in multistatic radar based on coherent clustering 作者：ABDALLA Ahmed， AHMED Mohaned Giess Shokrallah，ZHAO Yuan，XIONG Ying，and TANG Bin

ISAR imaging based on improved phase retrieval algorithm 作者：SHI Hongyin，XIA Saixue，TIAN Ye

Design method of organizational structure for MAVs and UAVs heterogeneous team with adjustable autonomy 作者：CHENJun，QIUXunjie，RONGJia，GAOXiaoguang

Modeling and game strategy analysis of suppressing IADS for multiple fighters’cooperation 作者：LI Qiuni，YANG Rennong，LI Haoliang，ZHANG Huan，FENG Chao

Assessment and sequencing of air target threat based on intuitionistic fuzzy entropy and dynamic VIKOR 作者：ZHANGKun，KONGWeiren，LIUPeipei，SHIJiao，LEIYu，andZOUJie

Discrete decision model and multi-agent simulation of the Liang Zong two-chain hierarchical organization in a complex project 作者：MAIQiang，ZHAOYueqiang，ANShi

Diffusion mechanism simulation of cloud manufacturing complex network based on cooperative game theory 作者：GENGChao，QUShiyou，XIAOYingying，WANGMei，SHIGuoqiang，LINTingyu，XUEJunjie，andJIAZhengxuan

Algorithm for complex network diameter based on distance matrix 作者：CHENBin，ZHUWeixing，LIUYing

Adaptive sliding mode backstepping control for near space vehicles considering engine faults 作者：ZHAOJing，JIANGBin，XIEFei，GAOZhifeng，XUYufei

Differential game strategy in three-player evasion and pursuit scenarios 作者：SUNQilong，QINaiming，XIAOLongxu，LINHaiqi

Integrated modeling of spacecraft relative motion dynamics using dual quaternion 作者：PENGXuan，SHIXiaoping，GONGYupeng

An optimization method:hummingbirds optimization algorithm 作者：ZHANGZhuoran，HUANGChangqiang，HUANGHanqiao，TANGShangqin，DONGKangsheng

Hybrid artificial bee colony algorithm with variable neighborhood search and memory mechanism 作者：FAN Chengli，FU Qiang，LONG Guangzheng，XING Qinghua

Multi-focus image fusion based on block matching in 3D transform domain 作者：YANGDongsheng，HUShaohai，LIUShuaiqi，MAXiaole，SUNYuchao

Remaining useful life prediction for a nonlinear multi-degradation system with public noise 作者：ZHANGHanwen，CHENMaoyin，ZHOUDonghua

Health evaluation method for degrading systems subject to dependent competing risks 作者：ZHAOShuai，MAKISViliam，CHENShaowei，LIYong

杂志信息网