An Extended Version of the NLMF Algorithm Based on Proportionate KrylovSubspace Projections
Yasin Yilmaz
Koc University Istanbul, Turkeyemail: yayilmaz@ku.edu.tr
Suleyman S. Kozat
Koc University Istanbul, Turkeyemail: skozat@ku.edu.tr
Abstract
—The
Krylov proportionate normalized least meansquare (KPNLMS)
algorithm extended the use of proportionalupdate idea of the
PNLMS (proportionate normalized LMS)
algorithm to the nonsparse (dispersive) systems. This paperdeals with the mean fourth minimization of the error and proposes
Krylov proportionate normalized least mean fourth algorithm(KPNLMF)
. First, the
PNLMF (proportionate NLMF)
algorithmis derived, then
Krylov subspace projection technique
is appliedto the
PNLMF
algorithm to obtain the
KPNLMF
algorithm.While fully exploiting the fast convergence property of thePNLMF algorithm, the system to be identiﬁed does not needto be sparse in the
KPNLMF
algorithm due to the
Krylovsubspace projection technique
. In our simulations, the
KPNLMF
algorithm converges faster than the
KPNLMS
algorithm whenboth algorithms converge to the same system mismatch value.The
KPNLMF
algorithm achieves this without any increasein the computational complexity. Further numerical examplescomparing the
KPNLMF
with the
NLMF
and the
KPNLMS
algorithms support the fast convergence of the
KPNLMF
algorithm.
Keywords
Krylov subspaces, NLMF, proportional update
I. I
NTRODUCTION
In this paper, we introduce two novel adaptive ﬁlteringalgorithms and investigate their performance for the systemidentiﬁcation task. System identiﬁcation task is chosen forthe clearness of the problem statement. What we are doinghere is actually nothing but supervised linear regressionwhen thinking of machine learning tasks. The ﬁrst algorithmwe introduce is the proportionate normalized least meanfourth (PNLMF) algorithm. While developing the PNLMFalgorithm, we are inspired by the proportionate normalized leastmeansquares (PNLMS) algorithm presented byDuttweiler in [1]. The PNLMF algorithm is the meanfourth error version of the PNLMS algorithm. The PNLMSalgorithm has been proposed for the identiﬁcation of sparsesystems. It is known to exhibit faster convergence than thestandard NLMS in certain setups [1]. We derive the PNLMFalgorithm from the PNLMS algorithm in the same way asto obtain the LMF algorithm from the LMS algorithm. Wenote that for the derivation of the PNLMF algorithm, theimproved PNLMS (IPNLMS) algorithm [2] is going to beused instead of the direct PNLMS algorithm from [1].The second algorithm presented in this paper isthe Krylovproportionate normalized least mean fourth(KPNLMF) algorithm. Here, Krylov subspace projectiontechnique is incorporated within the framework of thePNLMF algorithm. The KPNLMS algorithm introducedin [3] extends the use of the PNLMS algorithm to theidentiﬁcation of dispersive systems by beneﬁting from theKrylov subspace projection technique. The KPNLMF algorithm inherits the advantageous features of KPNLMS fordispersive systems [3]. In addition, the KPNLMF algorithmis shown to outperform the KPNLMS algorithm and theNLMF algorithm with numerical examples in Section
IV
.Both algorithms are studied for the system identiﬁcationproblem throughout the paper. We particularly investigatesystem identiﬁcation framework since signal processingproblems like noise canceling, echo canceling and channelequalization share the same system setup with system identiﬁcation. In this framework, an unknown system is modeledadaptively by minimizing a certain statistical measure of theerror between the output of the system to be identiﬁed andthe output of the model system. We emphasize that althoughminimizing the mean square error is the most widely knownand used technique because of its tractability and simpleanalysis, there are other ways to minimize the estimationerror. Expected value of the fourth power of the error isa popular alternative to minimize in adaptive ﬁltering dueto its sensitivity to long tailed noise distributions. Modiﬁedsteepest descent algorithm for the mean fourth error caseis studied and the least mean forth (LMF) algorithm isproposed as an adaptive ﬁltering technique in [4]. ThePNLMF and KPNLMF algorithms are derived starting fromthe LMF algorithm in Section
III
. Section
IV
containsthe simulation results for the sample cases, followed by theconclusion in Section
V
.II. S
YSTEM
D
ESCRIPTION
In this paper, all vectors are column vectors represented byboldface lowercase letters,
(
·
)
T
is the transpose operation,
·
is the
l
2
norm and
 · 
is the absolute value. For avector
x
,
x
(
i
)
is the
i
th entry. Matrices are represented with
2009 International Conference on Machine Learning and Applications
9780769539263/09 $26.00 © 2009 IEEEDOI 10.1109/ICMLA.2009.47404
++
−
0
w
t
w
t
d
t
x
t
v
t
d
^
t
e
Figure 1. Block diagram of system identiﬁcation
boldface capital letters. For a random variable
x
(or vector
x
),
E
[
x
]
(or
E
[
x
]
) is the expectation.In this paper, we consider the system identiﬁcation task presented in Fig. 1. In this ﬁgure,
x
t
∈
❘
m
is the inputregressor with zero mean and covariance matrix
R
=
E
[
x
t
x
T t
]
. With this input regressor, the output of the desiredunknown system is given by
d
t
=
w
T o
x
t
+
v
t
, t
∈
N
,
(1)where
w
0
∈
❘
m
is the coefﬁcient vector of the unknownsystem to be identiﬁed. Here,
v
t
is the i.i.d. noise with zeromean and variance
σ
2
v
. We assume that the input regressorand the noise signal are uncorrelated. The input regressor
x
t
and the output signal
d
t
are available to estimate theﬁlter coefﬁcients of the unknown system. Given the inputregressor, the estimate of the desired signal is given by
ˆ
d
t
=
w
T t
x
t
, t
∈
N
,
(2)where
w
t
= [
w
(1)
t
,w
(2)
t
,...,w
(
m
)
t
]
T
is the adaptive weightvector to estimate
w
o
.In this framework, our aim is to minimize a speciﬁcstatistical measure of the error between the desired signal
d
t
and the estimate produced by an adaptive algorithm
ˆ
d
t
, i.e.,
e
t
=
d
t
−
y
t
. For this, we minimize the mean fourth powerof the estimation error via the LMF adaptive algorithm. Themean square error (MSE) and the mean fourth error (MFE)are given by
MSE =
E
[
e
2
t
]
(3)
MFE =
E
[
e
4
t
]
,
respectively. Given this setup, in the next section we introduce two different adaptive algorithms that are constructedbased on the MFE criteria using proportionate update andKrylov subspace based projections.Here,
R
=
E
[
x
t
x
T t
]
is the autocorrelation matrix of theinput regressor
x
t
and
p
=
E
[
d
t
x
t
]
is the crosscorrelationvector between the input regressor
x
t
and the output
d
t
attime
t
. Both
R
and
p
, which are statistical measures of theinput and the output, can be thought as known parametersfor the exactness of the algorithm. Otherwise, they can beapproximated at the beginning of the algorithm. We assumethey are known beforehand for the sake of simplicity.III. P
ROPOSED
A
DAPTIVE
F
ILTERING
A
LGORITHM
In this section, we ﬁrst derive the update procedure forthe ﬁlter coefﬁcients of the PNLMF algorithm inspired fromthe IPNLMS algorithm. Then, Krylov subspace projectiontechnique is incorporated within the PNLMF algorithmframework to yield the KPNLMF algorithm. We note thatthis order of constructing the corresponding algorithms is forthe clarity of the presentation. This order of derivation is alsothe chronological order of the already developed algorithmsfollowing [1], [3]. However, the logical ﬂow of the proposedalgorithms are in the reverse direction. The unknown systemis thought to be projected onto the Krylov subspace in orderto attain “sparseness”, then the PNLMF algorithm is appliedto the resulting system. A better intuition is tried to be givenin the following sections.
A. Derivation of the PNLMF algorithm
The standard LMS algorithm updates the ﬁlter coefﬁcientsas
w
t
+1
=
w
t
+ 2
µe
t
x
t
,
(4)where
2
e
t
x
t
approximates the gradient
∇
w
t
E
[
e
2
t
]
. Here,
µ
is the step size of the update, usually constrained tobe
0
< µ <
1
mE
[
x
2
t
]
, where
m
is the ﬁlter length. Thenormalized LMS (NLMS) algorithm is introduced in orderto alleviate the dependence of the standard LMS algorithmon the statistics of the input data. The NLMS algorithm usesthe update equation
w
t
+1
=
w
t
+ 2
µe
t
x
t
x
t
2
+
ǫ,
(5)where
ǫ
is a small regularization factor. In [4], instead of using the the approximate gradient of
E
[
e
2
t
]
, the approximategradient of
E
[
e
4
t
]
is used to construct the update for the LMFalgorithm as
w
t
+1
=
w
t
+ 4
µe
3
t
x
t
,
(6)where
µ
is the same step size as in the LMS algorithm.Following this, the NLMF algorithm readily comes out tobe
w
t
+1
=
w
t
+ 4
µe
3
t
x
t
x
t
2
+
ǫ
(7)as stated in [5]. We emphasize that using the fourth power of the error signal in the update, the LMF algorithm is shownto yield faster convergence in the start of the adaptation orwith noise distributions with long tails [4].
405
We next incorporate the proportional update rule [2] intothe NLMF framework. The motivation behind the IPNLMSalgorithm is to update each ﬁlter coefﬁcient individually withdifferent step sizes. Each ﬁlter coefﬁcient is updated according to the absolute value of the current ﬁlter coefﬁcient ina proportional manner, where the name
proportional
stemsfrom. In this sense, by using an update proportional to theabsolute value of the current ﬁlter coefﬁcients, the IPNLMSalgorithm distinguishes between frequently used, rarely usedor unused coefﬁcients and updates them separately as follows:
w
t
+1
=
w
t
+ 2
µe
t
G
t
x
t
x
T t
G
t
x
t
+
ǫ
(8)
G
t
= diag(
g
1
t
,g
2
t
,...,g
mt
)
g
kt
= (1
−
η
) 1
m
+
η

w
(
k
)
t

w
t
1
+
δ ,t
∈
N
,k
∈
[1
,...,m
]
,
where
η
is the proportionality factor [2]. This idea of usinga proportional update is applied to the NLMF algorithm following the IPNLMS algorithm. For the PNLMF algorithm,the matrix multiplying the individual ﬁlter coefﬁcients, i.e.,
G
t
, remains unchanged while only the ﬁrst equation in (8)is changed to:
w
t
+1
=
w
t
+ 4
µe
3
t
G
t
x
t
x
T t
G
t
x
t
+
ǫ.
(9)In the next subsection, we extend the PNMLF algorithmusing Krylov subspaces.
B. Projection over Krylov subspace
It has been shown in [2] through simulations that theIPNLMS algorithm achieves superior performance than theNLMS algorithm in the system identiﬁcation task whenthe underlying unknown channel is sparse. We project theunknown system, which is potentially dispersive, to theKrylov subspace to get beneﬁt of the nice feature of theIPNLMS algorithm upon sparse systems. In [3], the authordemonstrates that projecting the impulse response of theunknown system into the Krylov subspace yields a sparserepresentation if the input regressor is nearly white, i.e.,
E
[
x
t
x
T t
]
≈
I
. To be more precise, if the autocorrelationmatrix
R
=
E
[
x
t
x
T t
]
of the input regressor
x
t
has clusteredeigenvalues or the autocorrelation matrix has a conditionnumber close to one, then any unknown system will havea sparse representation at the new Krylov subspace coordinates. In case the input is not white, a preconditioningprocess can be applied to the input regressor before applyingthe KPNLMF algorithm.Since we work at the new Krylov subspace coordinateswhere our unknown system has a sparse structure, thePNLMF algorithm is altered to work in the new coordinates. This algorithm will be called the Krylov PNLMF(KPNLMF) and has the update
w
t
+1
=
w
t
+ 4
µe
3
t
QG
t
Q
T
x
t
x
T t
QG
t
Q
T
x
t
+
ǫ,
(10)where the orthogonal matrix
Q
represents the Krylov subspace coordinates. The columns of the matrix
Q
form a setof orthonormal basis vectors for the Krylov subspace whichis spanned by the Krylov vectors
p
,
Rp
,
R
2
p
,...,
R
m
−
1
p
.
(11)The orthogonal matrix
Q
is formed by orthonormalizing theKrylov vectors in (11). This orthonormalization process canbe performed via the wellknown
GramSchmidt method
.Note that since we now work in the space spanned by thecolumns of
Q
, the step size matrix is updated accordingly.Deﬁning the projected weight vector
˜
w
t
=
Q
T
w
t
, the newscaling matrix is deﬁned as:
G
t
= diag(
g
1
t
,g
2
t
,...,g
mt
)
(12)
g
kt
= (1
−
η
) 1
m
+
η

˜
w
(
k
)
t

˜
w
t
1
+
δ ,t
∈
N
,k
∈
[1
,...,m
]
,
Obtaining
G
t
and forming
QG
t
Q
T t
at every iteration isexpensive by using the formula for
G
t
in (12). This iteration has a computational complexity of
O
(
m
2
)
. To attainlinear computational complexity per iteration only a fewcomponents of the diagonal matrix
G
t
are actually computedrelying on the formula for
g
kt
in (12). Suppose only the ﬁrst
b
of the
m g
t
values are computed. It has been demonstratedin [3] that such a limited number of coefﬁcients, i.e.,
b
≪
m
, are usually enough to represent
w
o
in the projectedsubspace. The remaining
(
m
−
b
)
g
t
values are set to asmall constant value since they correspond to the rarely usedcoefﬁcients. Similarly in the formation of
Q
, only ﬁrst
b
Krylov vectors are orthonormalized or constructed and theremaining
(
m
−
b
)
orthonormal vectors to span the complete
m
dimensional Krylov subspace are generated randomly.Details of the above described procedure to attain linearcomputational complexity can be found in [3].IV. S
IMULATION
R
ESULTS
This section contains numerical examples. Performancesof the KPNLMF, NLMF and KPNLMS algorithms arecompared. System to be identiﬁed has a random dispersiveimpulse response with length
m
in the examples, where
m
=50
in the comparison between the NLMF and KPNLMFalgorithms. A different system with a short random impulseresponse,
m
= 5
is used in the comparison between theKPNLMS and KPNLMF algorithms to test the KPNLMFalgorithm with different ﬁlter lengths. Here,
b
= 4
and
b
= 2
are used for the NLMFKPNLMF (
m
= 50
) andthe KPNLMSKPNLMF (
m
= 5
) algorithm comparisons inthe corresponding order. Independent trials are performedwith uniformly distributed, zero mean noise process and
406
white inputs. We set
SNR = 10log
10
E
[
x
2
t
]
E
[
v
2
t
]
to be
20
dB
.The mean square error and the system mismatch (normalizedweight error power) averaged over 250 trials are plotted asperformance measures. The ﬁgures are in dB scale. As theregularization constant, we use
ǫ
= 0
.
0001
for all algorithmsand
η
= 0
.
5
for the KPNLMS and KPNLMF algorithms.Fig. 2 compares the KPNLMF and the NLMF algorithmsin terms of the MSE. The KPNLMF algorithm surpassesthe NLMF algorithm in terms of the MSE performance.Two methods are also compared in the system mismatchsetup and again the KPNLMF algorithm yields considerablysuperior performance with respect to the NLMF algorithm.The system mismatch performances for both algorithms areshown in Fig. 3.The KPNLMF and KPNLMS algorithms are compared interms of the system mismatch (average normalized weighterror power). In Fig. 4, the KPNLMF algorithm settlesdown in the steadystate earlier than KPNLMS when theirsteadystate system mismatch values are the same. We notethat the KPNLMS algorithm can converge much faster thanthe KPNLMF algorithm when the step size,
µ
is increasedas much as possible. However, KPNLMS attains this fastconvergence in the expense of a high steadystate mismatchvalue. Fig. 5 summarizes this fact. The KPNLMS algorithmconverges faster than KPNLMF to a much higher systemmismatch value. The step size,
µ
and the convergence ratefor the KPNLMF algorithm can not be increased as much aswe do for the KPNLMS algorithm because there is alwaysa higher risk to diverge for the KPNLMF algorithm thanthe KPNLMS algorithm. Note that this is due to the usageof the quadruple of the error in the KPNLMF algorithmwith respect to the usage of the square of the error in theKPNLMS algorithm.
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000−25−20−15−10−50
iterations
M S E ( d B )
Mean Square Error
NLMFKPNLMF
Figure 2. MSE comparison between KPNLMF and NLMF
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000−30−25−20−15−10−505
iterations
w e i g h t e r r o r p o w e r ( d B )
System Mismatch
NLMFKPNLMF
Figure 3. System mismatch comparison between KPNLMF and NLMF
0 1000 2000 3000 4000 5000 6000−45−40−35−30−25−20−15−10−505
iterations
w e i g h t e r r o r p o w e r ( d B )
System Mismatch
KPNLMSKPNLMF
Figure 4. System mismatch comparison between KPNLMF and KPNLMS
V. C
ONCLUSION
The
proportionate
and
Krylovproportionate normalized least mean fourth algorithms
are proposed in this paper. Itis shown through simulations that the
KPNLMF
algorithmoutperforms the
NLMF
algorithm in convergence whilehaving only linear computational complexity. The
KPNLMF
algorithm performs better than the
NLMF
algorithm sinceit beneﬁts from both the sparse structure of the Krylovprojected unknown system and the proportional update technique. Although, the
KPNLMF
algorithm is similar with the
KPNLMS
, simulation results demonstrated that the
KPNLMF
algorithm converges faster than KPNLMS when they bothconverge to the same system mismatch value.
407
0 100 200 300 400 500 600 700 800 900 1000−45−40−35−30−25−20−15−10−50
iterations
w e i g h t e r r o r p o w e r ( d B )
System Mismatch
KPNLMSKPNLMF
Figure 5. System mismatch comparison between KPNLMF and KPNLMS
R
EFERENCES
[1] D. L.Duttweiler, “Proportionate normalized leastmeansquaresadaptation in echo cancelers,”
IEEE Trans. Speech AudioProcess.
, vol. 8, no. 5, pp. 508–518, Sep. 2000.[2] J. Benesty and S. Gay, “An improved pnlms algorithm,” in
Proc. IEEEInt. Conf. on Acoustic, Speech, Signal Process(ICASSP)
, 2002, pp. 1881–1884.[3] M. Yukawa, “Krylovproportionate adaptive ﬁltering techniques not limited to sparse systems,”
IEEE Trans. SignalProcess.
, vol. 57, no. 3, Mar. 2009.[4] E. Walach and B. Widrow, “The least mean fourth (lmf)adaptive algorithm and its family,”
IEEE Trans. Inf. Theory
,vol. 30, no. 2, Mar. 1984.[5] A. Zerguine, “Convergence behavior of the normalized leastmean fourth algorithm,” Conference Record of the ThirtyFourth Asilomar Conference on Signals, Systems and Computers, 2000.
408