VOOZH about

URL: https://en.wikipedia.org/wiki/Convex_conjugate

โ‡ฑ Convex conjugate - Wikipedia


Jump to content
From Wikipedia, the free encyclopedia
Generalization of the Legendre transformation

In mathematics and mathematical optimization, the convex conjugate of a function is a generalization of the Legendre transformation which applies to non-convex functions. It is also known as Legendreโ€“Fenchel transformation, Fenchel transformation, or Fenchel conjugate (after Adrien-Marie Legendre and Werner Fenchel). The convex conjugate is widely used for constructing the dual problem in optimization theory, thus generalizing Lagrangian duality.

Definition

[edit]

Let ๐Ÿ‘ {\displaystyle X}
be a real topological vector space and let ๐Ÿ‘ {\displaystyle X^{*}}
be the dual space to ๐Ÿ‘ {\displaystyle X}
. Denote by

๐Ÿ‘ {\displaystyle \langle \cdot ,\cdot \rangle :X^{*}\times X\to \mathbb {R} }

the canonical dual pairing, which is defined by ๐Ÿ‘ {\displaystyle \left\langle x^{*},x\right\rangle =x^{*}(x).}

For a function ๐Ÿ‘ {\displaystyle f:X\to \mathbb {R} \cup \{-\infty ,+\infty \}}
taking values on the extended real number line, its convex conjugate is the function

๐Ÿ‘ {\displaystyle f^{*}:X^{*}\to \mathbb {R} \cup \{-\infty ,+\infty \}}

whose value at ๐Ÿ‘ {\displaystyle x^{*}\in X^{*}}
is defined to be the supremum:

๐Ÿ‘ {\displaystyle f^{*}\left(x^{*}\right):=\sup \left\{\left\langle x^{*},x\right\rangle -f(x)~\colon ~x\in X\right\},}

or, equivalently, in terms of the infimum:

๐Ÿ‘ {\displaystyle f^{*}\left(x^{*}\right):=-\inf \left\{f(x)-\left\langle x^{*},x\right\rangle ~\colon ~x\in X\right\}.}

This definition can be interpreted as an encoding of the convex hull of the function's epigraph in terms of its supporting hyperplanes.[1]

Examples

[edit]

For more examples, see ยง Table of selected convex conjugates.

The convex conjugate and Legendre transform of the exponential function agree except that the domain of the convex conjugate is strictly larger as the Legendre transform is only defined for positive real numbers.

Connection with expected shortfall (average value at risk)

[edit]

See this article for example.

Let F denote a cumulative distribution function of a random variable X. Then (integrating by parts), ๐Ÿ‘ {\displaystyle f(x):=\int _{-\infty }^{x}F(u)\,du=\operatorname {E} \left[\max(0,x-X)\right]=x-\operatorname {E} \left[\min(x,X)\right]}
has the convex conjugate ๐Ÿ‘ {\displaystyle f^{*}(p)=\int _{0}^{p}F^{-1}(q)\,dq=(p-1)F^{-1}(p)+\operatorname {E} \left[\min(F^{-1}(p),X)\right]=pF^{-1}(p)-\operatorname {E} \left[\max(0,F^{-1}(p)-X)\right].}

Ordering

[edit]

A particular interpretation has the transform ๐Ÿ‘ {\displaystyle f^{\text{inc}}(x):=\arg \sup _{t}t\cdot x-\int _{0}^{1}\max\{t-f(u),0\}\,du,}
as this is a nondecreasing rearrangement of the initial function f; in particular, ๐Ÿ‘ {\displaystyle f^{\text{inc}}=f}
for f nondecreasing.

Properties

[edit]

The convex conjugate of a closed convex function is again a closed convex function. The convex conjugate of a polyhedral convex function (a convex function with polyhedral epigraph) is again a polyhedral convex function.

Order reversing

[edit]

Declare that ๐Ÿ‘ {\displaystyle f\leq g}
if and only if ๐Ÿ‘ {\displaystyle f(x)\leq g(x)}
for all ๐Ÿ‘ {\displaystyle x.}
Then convex-conjugation is order-reversing, which by definition means that if ๐Ÿ‘ {\displaystyle f\leq g}
then ๐Ÿ‘ {\displaystyle f^{*}\geq g^{*}.}

For a family of functions ๐Ÿ‘ {\displaystyle \left(f_{\alpha }\right)_{\alpha }}
it follows from the fact that supremums may be interchanged that

๐Ÿ‘ {\displaystyle \left(\inf _{\alpha }f_{\alpha }\right)^{*}(x^{*})=\sup _{\alpha }f_{\alpha }^{*}(x^{*}),}

and from the maxโ€“min inequality that

๐Ÿ‘ {\displaystyle \left(\sup _{\alpha }f_{\alpha }\right)^{*}(x^{*})\leq \inf _{\alpha }f_{\alpha }^{*}(x^{*}).}

Biconjugate

[edit]

The convex conjugate of a function is always lower semi-continuous. The biconjugate ๐Ÿ‘ {\displaystyle f^{**}}
(the convex conjugate of the convex conjugate) is also the closed convex hull, i.e. the largest lower semi-continuous convex function with ๐Ÿ‘ {\displaystyle f^{**}\leq f.}
For proper functions ๐Ÿ‘ {\displaystyle f,}

๐Ÿ‘ {\displaystyle f=f^{**}}
if and only if ๐Ÿ‘ {\displaystyle f}
is convex and lower semi-continuous, by the Fenchelโ€“Moreau theorem.

The Fenchel inequality (below) implies ๐Ÿ‘ {\displaystyle f^{**}\leq f}
. More precisely, ๐Ÿ‘ {\displaystyle f^{**}}
is the greatest lower semicontinuous convex function not exceeding ๐Ÿ‘ {\displaystyle f}
, often described as the closed convex envelope of ๐Ÿ‘ {\displaystyle f}
. In particular, by the Fenchelโ€“Moreau theorem, a proper function is equal to its biconjugate if and only if it is convex and lower semicontinuous.[2][3]

Fenchel's inequality

[edit]

For any function f and its convex conjugate f *, Fenchel's inequality (also known as the Fenchelโ€“Young inequality) holds for every ๐Ÿ‘ {\displaystyle x\in X}
and ๐Ÿ‘ {\displaystyle p\in X^{*}}
:

๐Ÿ‘ {\displaystyle \left\langle p,x\right\rangle \leq f(x)+f^{*}(p).}

Furthermore, the equality holds only when ๐Ÿ‘ {\displaystyle p\in \partial f(x)}
, where ๐Ÿ‘ {\displaystyle \partial f(x)}
is the subgradient. The proof follows from the definition of convex conjugate: ๐Ÿ‘ {\displaystyle f^{*}(p)=\sup _{\tilde {x}}\left\{\langle p,{\tilde {x}}\rangle -f({\tilde {x}})\right\}\geq \langle p,x\rangle -f(x).}

Convexity

[edit]

For two functions ๐Ÿ‘ {\displaystyle f_{0}}
and ๐Ÿ‘ {\displaystyle f_{1}}
and a number ๐Ÿ‘ {\displaystyle 0\leq \lambda \leq 1}
the convexity relation

๐Ÿ‘ {\displaystyle \left((1-\lambda )f_{0}+\lambda f_{1}\right)^{*}\leq (1-\lambda )f_{0}^{*}+\lambda f_{1}^{*}}

holds. The ๐Ÿ‘ {\displaystyle {*}}
operation is a convex mapping itself.

Infimal convolution

[edit]

The infimal convolution (or epi-sum) of two functions ๐Ÿ‘ {\displaystyle f}
and ๐Ÿ‘ {\displaystyle g}
is defined as

๐Ÿ‘ {\displaystyle \left(f\operatorname {\Box } g\right)(x)=\inf \left\{f(x-y)+g(y)\mid y\in \mathbb {R} ^{n}\right\}.}

The operation ๐Ÿ‘ {\displaystyle \operatorname {\Box } }
is symmetric (commutative) and associative, i.e.

๐Ÿ‘ {\displaystyle f\Box g=g\Box f,\qquad (f\Box g)\Box h=f\Box (g\Box h).}

Let ๐Ÿ‘ {\displaystyle f_{1},\ldots ,f_{m}}
be proper, convex and lower semicontinuous functions on ๐Ÿ‘ {\displaystyle \mathbb {R} ^{n}.}
Then the infimal convolution is convex and lower semicontinuous (but not necessarily proper),[4] and satisfies

๐Ÿ‘ {\displaystyle \left(f_{1}\operatorname {\Box } \cdots \operatorname {\Box } f_{m}\right)^{*}=f_{1}^{*}+\cdots +f_{m}^{*},}

or, equivalently,

๐Ÿ‘ {\displaystyle \left(f_{1}+\cdots +f_{m}\right)^{*}=f_{1}^{*}\operatorname {\Box } \cdots \operatorname {\Box } f_{m}^{*},}

which expresses the behaviour of convex conjugation with respect to sums of functions.

The infimal convolution of two functions has a geometric interpretation: The (strict) epigraph of the infimal convolution of two functions is the Minkowski sum of the (strict) epigraphs of those functions.[5]

Maximizing argument

[edit]

If the function ๐Ÿ‘ {\displaystyle f}
is differentiable, then its derivative is the maximizing argument in the computation of the convex conjugate:

๐Ÿ‘ {\displaystyle f^{\prime }(x)=x^{*}(x):=\arg \sup _{x^{*}}{\langle x,x^{*}\rangle }-f^{*}\left(x^{*}\right)}
and
๐Ÿ‘ {\displaystyle f^{{*}\prime }\left(x^{*}\right)=x\left(x^{*}\right):=\arg \sup _{x}{\langle x,x^{*}\rangle }-f(x);}

hence

๐Ÿ‘ {\displaystyle x=\nabla f^{*}\left(\nabla f(x)\right),}
๐Ÿ‘ {\displaystyle x^{*}=\nabla f\left(\nabla f^{*}\left(x^{*}\right)\right),}

and moreover

๐Ÿ‘ {\displaystyle f^{\prime \prime }(x)\cdot f^{{*}\prime \prime }\left(x^{*}(x)\right)=1,}
๐Ÿ‘ {\displaystyle f^{{*}\prime \prime }\left(x^{*}\right)\cdot f^{\prime \prime }\left(x(x^{*})\right)=1.}

Scaling properties

[edit]

If for some ๐Ÿ‘ {\displaystyle \gamma >0,}
๐Ÿ‘ {\displaystyle g(x)=\alpha +\beta x+\gamma \cdot f\left(\lambda x+\delta \right)}
, then

๐Ÿ‘ {\displaystyle g^{*}\left(x^{*}\right)=-\alpha -\delta {\frac {x^{*}-\beta }{\lambda }}+\gamma \cdot f^{*}\left({\frac {x^{*}-\beta }{\lambda \gamma }}\right).}

Behavior under linear transformations

[edit]

Let ๐Ÿ‘ {\displaystyle A:X\to Y}
be a bounded linear operator. For any convex function ๐Ÿ‘ {\displaystyle f}
on ๐Ÿ‘ {\displaystyle X,}

๐Ÿ‘ {\displaystyle \left(Af\right)^{*}=f^{*}A^{*}}

where

๐Ÿ‘ {\displaystyle (Af)(y)=\inf\{f(x):x\in X,Ax=y\}}

is the preimage of ๐Ÿ‘ {\displaystyle f}
with respect to ๐Ÿ‘ {\displaystyle A}
and ๐Ÿ‘ {\displaystyle A^{*}}
is the adjoint operator of ๐Ÿ‘ {\displaystyle A.}
[6]

A closed convex function ๐Ÿ‘ {\displaystyle f}
is symmetric with respect to a given set ๐Ÿ‘ {\displaystyle G}
of orthogonal linear transformations,

๐Ÿ‘ {\displaystyle f(Ax)=f(x)}
for all ๐Ÿ‘ {\displaystyle x}
and all ๐Ÿ‘ {\displaystyle A\in G}

if and only if its convex conjugate ๐Ÿ‘ {\displaystyle f^{*}}
is symmetric with respect to ๐Ÿ‘ {\displaystyle G.}

Table of selected convex conjugates

[edit]

The following table provides Legendre transforms for many common functions as well as a few useful properties.[7]

๐Ÿ‘ {\displaystyle g(x)}
๐Ÿ‘ {\displaystyle \operatorname {dom} (g)}
๐Ÿ‘ {\displaystyle g^{*}(x^{*})}
๐Ÿ‘ {\displaystyle \operatorname {dom} (g^{*})}
๐Ÿ‘ {\displaystyle f(ax)}
(where ๐Ÿ‘ {\displaystyle a\neq 0}
)
๐Ÿ‘ {\displaystyle X}
๐Ÿ‘ {\displaystyle f^{*}\left({\frac {x^{*}}{a}}\right)}
๐Ÿ‘ {\displaystyle X^{*}}
๐Ÿ‘ {\displaystyle f(x+b)}
๐Ÿ‘ {\displaystyle X}
๐Ÿ‘ {\displaystyle f^{*}(x^{*})-\langle b,x^{*}\rangle }
๐Ÿ‘ {\displaystyle X^{*}}
๐Ÿ‘ {\displaystyle af(x)}
(where ๐Ÿ‘ {\displaystyle a>0}
)
๐Ÿ‘ {\displaystyle X}
๐Ÿ‘ {\displaystyle af^{*}\left({\frac {x^{*}}{a}}\right)}
๐Ÿ‘ {\displaystyle X^{*}}
๐Ÿ‘ {\displaystyle \alpha +\beta x+\gamma \cdot f(\lambda x+\delta )}
๐Ÿ‘ {\displaystyle X}
๐Ÿ‘ {\displaystyle -\alpha -\delta {\frac {x^{*}-\beta }{\lambda }}+\gamma \cdot f^{*}\left({\frac {x^{*}-\beta }{\gamma \lambda }}\right)\quad (\gamma >0)}
๐Ÿ‘ {\displaystyle X^{*}}
๐Ÿ‘ {\displaystyle {\frac {|x|^{p}}{p}}}
(where ๐Ÿ‘ {\displaystyle p>1}
)
๐Ÿ‘ {\displaystyle \mathbb {R} }
๐Ÿ‘ {\displaystyle {\frac {|x^{*}|^{q}}{q}}}
(where ๐Ÿ‘ {\displaystyle {\frac {1}{p}}+{\frac {1}{q}}=1}
)
๐Ÿ‘ {\displaystyle \mathbb {R} }
๐Ÿ‘ {\displaystyle {\frac {-x^{p}}{p}}}
(where ๐Ÿ‘ {\displaystyle 0<p<1}
)
๐Ÿ‘ {\displaystyle \mathbb {R} _{+}}
๐Ÿ‘ {\displaystyle {\frac {-(-x^{*})^{q}}{q}}}
(where ๐Ÿ‘ {\displaystyle {\frac {1}{p}}+{\frac {1}{q}}=1}
)
๐Ÿ‘ {\displaystyle \mathbb {R} _{--}}
๐Ÿ‘ {\displaystyle {\sqrt {1+x^{2}}}}
๐Ÿ‘ {\displaystyle \mathbb {R} }
๐Ÿ‘ {\displaystyle -{\sqrt {1-(x^{*})^{2}}}}
๐Ÿ‘ {\displaystyle [-1,1]}
๐Ÿ‘ {\displaystyle -\log(x)}
๐Ÿ‘ {\displaystyle \mathbb {R} _{++}}
๐Ÿ‘ {\displaystyle -(1+\log(-x^{*}))}
๐Ÿ‘ {\displaystyle \mathbb {R} _{--}}
๐Ÿ‘ {\displaystyle e^{x}}
๐Ÿ‘ {\displaystyle \mathbb {R} }
๐Ÿ‘ {\displaystyle {\begin{cases}x^{*}\log(x^{*})-x^{*}&{\text{if }}x^{*}>0\\0&{\text{if }}x^{*}=0\end{cases}}}
๐Ÿ‘ {\displaystyle \mathbb {R} _{+}}
๐Ÿ‘ {\displaystyle \log \left(1+e^{x}\right)}
๐Ÿ‘ {\displaystyle \mathbb {R} }
๐Ÿ‘ {\displaystyle {\begin{cases}x^{*}\log(x^{*})+(1-x^{*})\log(1-x^{*})&{\text{if }}0<x^{*}<1\\0&{\text{if }}x^{*}=0,1\end{cases}}}
๐Ÿ‘ {\displaystyle [0,1]}
๐Ÿ‘ {\displaystyle -\log \left(1-e^{x}\right)}
๐Ÿ‘ {\displaystyle \mathbb {R} _{--}}
๐Ÿ‘ {\displaystyle {\begin{cases}x^{*}\log(x^{*})-(1+x^{*})\log(1+x^{*})&{\text{if }}x^{*}>0\\0&{\text{if }}x^{*}=0\end{cases}}}
๐Ÿ‘ {\displaystyle \mathbb {R} _{+}}

See also

[edit]

References

[edit]
  1. ^ "Legendre Transform". Retrieved April 14, 2019.
  2. ^ Rockafellar 1970.
  3. ^ Zฤƒlinescu 2002, pp. 75โ€“79.
  4. ^ Phelps, Robert (1993). Convex Functions, Monotone Operators and Differentiability (2 ed.). Springer. p. 42. ISBN 0-387-56715-1.
  5. ^ Bauschke, Heinz H.; Goebel, Rafal; Lucet, Yves; Wang, Xianfu (2008). "The Proximal Average: Basic Theory". SIAM Journal on Optimization. 19 (2): 766. CiteSeerX 10.1.1.546.4270. doi:10.1137/070687542.
  6. ^ Ioffe, A.D. and Tichomirov, V.M. (1979), Theorie der Extremalaufgaben. Deutscher Verlag der Wissenschaften. Satz 3.4.3
  7. ^ Borwein, Jonathan; Lewis, Adrian (2006). Convex Analysis and Nonlinear Optimization: Theory and Examples (2 ed.). Springer. pp. 50โ€“51. ISBN 978-0-387-29570-1.

Further reading

[edit]