BOUNDS ON DIVERGENCE IN NONEXTENSIVE STATISTICAL MECHANICS

We focus on an important property upon generalization of the KullbackLeibler divergence used in nonextensive statistical mechanics, i.e., bounds. We explicitly show upper and lower bounds on it in terms of existing familiar divergences based on the finite range of the probability distribution ratio. This provides a link between the observed distribution functions based on histograms of events and parameterized distance measures in physical sciences. The charactering parameter q < 0 and q > 1 are rejected from the consideration of bounded divergence.


INTRODUCTION
The current upsurge of interest in divergence measures determined by two probability distributions is due to both usefulness and necessity for practical discriminations of different states and also for discovering how much they differ from 2000 Mathematics Subject Classification: 62B10, 94A15, 94A17.Received: 03-12-2011, revised: 28-10-2012, accepted: 10-06-2013.
T. Yamano each other.Such scenarios appear in many areas which use statistical methods.Especially in statistical physics, the H-theorem is the most relevant notion to divergence measures that probe proximity toward a stationary distribution in the course of a given dynamics.Usually it is specified as the Kulback-Leibler divergence (or relative entropy) [11].For Markovian processes, however, the validity of the proof of the H-theorem is shown for a wide class of divergences (Csiszár-Morimoto f-divergence) [2,15].For the specific forms of the generalized divergence, it is presented in [1,18,24].Historically, an attempt for building a parameterized divergence measure in a statistical mechanics context has presented in [14] without using a notion of averaged information, where instead of the term 'divergence', a word of 'a relative degradation function of nth order' was used.
Numerous properties upon generalization of the conventional relative entropy, on the one hand, is becoming an interesting research topic in its own right, because generalizations of one conventional measure provide an insight into its original ones.Among them, the ranges or bounds of divergences can be regarded as fundamental, since they contain structural property reflecting the geometric manifold governed by the parameter used upon generalization.Also, the availability of bounds for the distance is important in physics and in statistical inference test, where the bound can be used to give an estimate for specific states (usually equilibrium states).Therefore, upper and lower bounds for distance measures in general can be useful information and provide a clue to interpret the meaning of the parameter.
The purpose of the present article is to provide such bounds for the Tsallis relative entropy that are not presented in the literature [23] so far.The approach here is based on the fact that a detector which produces statistical distributions of occuered events has a finite dynamic range and consequently has finite probability distribution ratio.Therefore, these bounds obtained must be more relevant in terms of observational point of view.Our presentation proceeds as follows.First, we review the definition of the Tsallis relative entropy.Then we consider the bounds of it by the usual relative divergence.It can provide a degree of change by the parameter q of generalization.We next consider the upper bound by l 1 -norm and the lower bound in terms of it (the so-called Pinsker like inequality).Bounds by χ 2 divergence followed by Hellinger's distance are presented as simple applications of inequalities that hold for generic f-divergences.We summarize our consideration and present discussion in the last section.

THE NONEXTENSIVE RELATIVE ENTROPY
It was provided in [22] in the context of the consistent testing and some properties were investigated in [1].This generalization keeps the nonextensive thermostatistics picture [18] and it belongs to the relative information of type s, which has proposed in [17].Presently, the generalization of the relative entropy which is consistent with the nonextensive entropy of Tsallis [21] is provided by taking the linear mean (so-called the f -divergence [2]) of the corresponding distance measure f between two probability distributions [22], where t i is used to denote the ratio of two probability distributions i.e., p i /r i throughout this paper.When supp p supp r, where supp p = {ω ∈ Ω; p(ω) > 0} in σ-finite measure space Ω, divergences become infinity and this applies for later considerations [25].Alternatively, D q (p r) is produced from taking a biased average of the quantity (p This can be expressed as

BOUNDS IN TERMS OF USUAL KULLBACK-LEIBLER DIVERGENCE
We first describe our setting of consideration.In discrete cases, it is common to regard a histogram of observed values as a probability distribution associated with the system under study.This means that a measured value in a single measurement falls into one of the finite bins of a detector which has a finite dynamic range.The measuring apparatus consists of a limited number of bins, therefore the probability distribution also has a finite support reflecting the dynamic range.It is therefore highly probable that when we compare the two different probability distributions constructed in that manner, the ratio of them has finite ranges within the identical bin i.Let us call this quantity a ratio range hereafter.The ratio range becomes null if there is no detected event for the distribution r of ith bin.Furthermore, we set the minimum and the maximum values u and U , respectively on this ratio range for ith bin: 0 < u p i /r i U < ∞.Under this setting, we consider bounds for the generalized relative entropy D q (p r).
Under this setting, we can use Theorem 6 provided in [4].It was proved that when f ∈ C 2 (u, U ) and when tf (t) is bounded from below and from above with constants m ∈ R and M ∈ R, respectively, an inequality holds in terms of the Kullback-Leibler divergence D KL (p r) from p to r.More concretely, for the general f -divergences D f (p r) we know an inequality Note that D KL (p r) is the most well known measure of the f -divergence class and it is obtained if we choose f (t) = t log t (t > 0).For the divergence of Tsallis, we have tf (t) = qt q−1 , then Therefore, we obtain the following inequalities, qu q−1 D KL (p r) D q (p r) qU q−1 D KL (p r), (q > 1), ( 5)

UPPER BOUNDS IN TERMS OF VARIATIONAL DISTANCE
The variational distance is also one of the f -divergences, since we can choose f (t) = |t − 1| with t ∈ R + .Dragomir provided in [5] that the following inequality holds on where In the range 0 < u p i /r i U < ∞ for each i, the quantity for the divergence of Tsallis is found to be Therefore, we obtain the corresponding inequalities by substituting these into Eq.( 7).

LOWER BOUNDS BY PINSKER TYPE INEQUALITY
The Pinsker inequality provides a lower bound on the Kullback-Leibler divergence in terms of the variational distance V (p, r) = i |p i −r i | as D KL (p r) V 2 /2 [16].However, for other divergences, the corresponding inequality with the higher order in the variational distances was not known until recently.We present it by using the recent progress on the fourth-order extended Pinsker inequality proved in [9] and this gives a lower bound for the f -divergence measures.Under a certain condition, (for details see Theorem 7 in [9]) the following bound holds where the coefficients must be positive and are best possible in the sense that there exist no larger constants.Applying this bound for the divergence of Tsallis, we obtain When q → 1, we recover the 4th order Pinsker's inequality for the Kullback-Leibler divergence, i.e.D KL (p r) V 2 /2 + V 4 /36, whose proof was provided in [10].Note that the above bound are valid when q > 0, since the coefficient of V 2 , viz.
f (1), must be positive.The refinement of the Pinsker inequality with best possible coefficients up to eighth order with respect to V has been obtained [20,8] but the connection with physics remains unexplained so far, while the positivity of D q (p r) 0 (information inequality or Gibbs inequality) has a clear physical interpretation of the second law of thermodynamics.

BOUNDS IN TERMS OF HELLINGER'S DISTANCE
The Hellinger's distance also belongs to the f -divergence class and is obtained if we set We apply these bounds for our present consideration.For the divergence of Tsallis, we have t . Therefore, M and m are determined as We then have the inequalities,

BOUNDS IN TERMS OF χ 2 -DIVERGENCE
It is useful to give bounds of D q (p r) in terms of χ 2 -divergence because it provides a bound for the mixing time of Markov chains [3].When we set r i is also found to be a f -divergence.The Kullback-Leibler divergence is asymmetric about the exchange of any two probability distributions p and r.Their difference quantifies information to what extent the symmetry breaks.It was shown that the absolute value of the difference of that measure from p to r and from r to p is bounded from above in terms of χ 2 -divergence [7], where p i and r i satisfy the range 0 < u p i /r i U < ∞ for each i.The derivation of Eq.( 16) comes from a trapezoid inequality, which holds for any f -divergences where the function f # is defined for t ∈ (0, ∞) as f # (t) = (t − 1)f (t) and f (t) is assumed to be bounded by γ = inf t∈[u,U ) f (t) and Γ = sup t∈[u,U ) f (t).This inequality provides bounds for the difference of various f -divergences and enables us to evaluate and compare them in terms of the characteristics (the infimum and the supremum) of the second derivative of the functions f .We now apply it to the Tsallis divergence D q (p r).We have With this f # (t i ), where t i = p i /r i , we obtain for the second term of Eq.( 17) as, We note that when q → 1, we have the Jeffereys divergence D , which is also one of the f -divergences.Then, we have for the l.h.s of the inequality Eq.( 17), Since f (t) = qt q−2 and 0 < u t i U < ∞, ∀i, we have qu q−2 f (t) qU q−2 (q > 2) qU q−2 f (t) qu q−2 (0 < q < 2).
Noting the relation, and using Γ and γ obtained from Eq.( 21) and applying them to the inequality Eq.( 17), we have finally inequalities When one needs the bounds for l.h.s. of Eq.( 23) in terms of other f -divergences, we need to know the corresponding bounds on χ 2 -divergence.Taneja and Kumar [19] provided the upper bounds on the χ 2 -divergence in terms of the Kullback-Leibler divergence and of the Hellinger's distance h(p r) D KL (r p) and as D χ 2 (p r) 8 √ U 3 h(p r), respectively.With these inequalities, the inequality Eq.( 23) can be bounded by using D KL (p r) and h(p r).Anyway, the upper bounds in terms of χ 2 -divergence is tighter than those in terms of others.

AN UPPER BOUND ON AN OVERLAP BETWEEN DIVERGENCES
We here concern how much the values of divergences differ each other when we measure them with the identical reference distribution r.Namely, we shall want to know an upper bound of an overlap between quantities D q (p r) and D q (p r) for a given q.To this end we define the following quantity where the two functions are respectively defined as f (t) = (t q − t)/(q − 1) and g(u) = (u q − u)/(q − 1) with t = p/r and u = p /r.The case λ = 0 corresponds to the usual overlap of functions f and g (or the inner product of two real funcions).
Otherwise it can give a normalized overlap tempered by λ.We are concerned with the bound on it.Recalling that for α, β > 1 and 0 < λ < n with a relation 1/α + λ/n + 1/β = 2, the Hardy-Littlewood-Sobolev inequality [12] reads where a sharp constant C(n, λ, p) independent of functions f and g when α = β = 2n/(2n − λ) is given as, Our problem is in a special case by equating n = 1 and f = g by setting its form as (t q − t)/(q − 1).The desired upper bound is the multiple of the value of Eq.( 26) and that of f 2 2 2−λ given in Appendix.

SUMMARY AND CONCLUDING REMARKS
We have presented the fundamental bounds on the generalized KL divergence used in nonextensive statistical mechanics in terms of several known divergences.The bounds for the parameterized divergence is indispensable, since without it most (if not all) of the nonextensive structures in system would not fully be understood.Our starting assumption was the existence of maximum and minimum values for the ratio of probability distributions, which originates from the finite dynamic range of measuring apparatus for events associated with physical system.The bounds depend on the parameter range that bears nonextensivity.Then, a natural question arises concerning the parameter range i.e., which value of q we should use.