Reimagining Internal Family Systems Through the Lens of AI

IFS Concepts The Internal Family Systems is a psychotherapy proposed by Richard Schwartz in the 1980s. In this model, everybody is a combination of different parts like those characters in the film, Inside Out. But the difference is that only emotions are anthropomorphized in the animation while in IFS, a part can also be a piece of thought or mindset. For example, if you have a social appearance anxiety, there would always be a voice in your mind criticizing your appearance....

August 8, 2024 · 8 min

Notes on Sauer's Lemma

Introduction Every binary classifier is a function mapping its input, which is an element in an enumerable dataset, to 0 or 1. Equivalently, we could regard the classifier as a function $ f : \mathbb{N} \rightarrow \{ 0, 1 \} $. We have a set of hypotheses $\mathcal{H}$ from which a function is chosen to maximize the classification accuracy. It is perfect if $\mathcal{H}$ contains all possible functions $ f : \mathbb{N} \rightarrow \{ 0, 1 \} $, which indicates a universal approximator....

July 26, 2024 · 6 min

A Succinct Proof of Decoupled Parallel Backpropagation Convergence Lemma

The meaning of the notations below complies with the original paper. $$ \begin{align*} \mathbb{E} [f (w^{t + 1})] - f (w^t) & \leqslant \nabla f (w^t)^{\top} \mathbb{E} [(w^{t + 1} - w^t)] + \frac{L}{2} \mathbb{E} [| w^{t + 1} - w^t |^2]\\ & = - \gamma_t \nabla f (w^t)^{\top} \mathbb{E} \left[ \sum^K_{k = 1} \nabla f_{\mathcal{G} (k), x_i (t - K + k)} (w^{t - K + k}) \right] + \frac{L \gamma_t^2}{2} \mathbb{E} \left[ \left| \sum^K_{k = 1} \nabla f_{\mathcal{G} (k), x_i (t - K + k)} (w^{t - K + k}) \right|^2 \right]\\ & = - \gamma_t | \nabla f (w^t) |^2 - \gamma_t \nabla f (w^t)^{\top} \left( \sum_{k = 1}^K \nabla f_{\mathcal{G} (k)} (w^{t - K + k}) - \nabla f (w^t) \right) + \frac{K L \gamma_t^2}{2} \sum_{k = 1}^K \mathbb{E} [| \nabla f_{\mathcal{G} (k), x_i (t - K + k)} (w^{t - K + k}) |^2]\\ & \leqslant - \gamma_t | \nabla f (w^t) |^2 + \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{\gamma_t}{2} \left| \sum_{k = 1}^K \nabla f_{\mathcal{G} (k)} (w^{t - K + k}) - \nabla f (w^t) \right|^2 + \frac{K^2 L M \gamma_t^2}{2}\\ & \leqslant - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{K \gamma_t}{2} \sum_{k = 1}^K | \nabla f_{\mathcal{G} (k)} (w^{t - K + k}) - \nabla f_{\mathcal{G} (k)} (w^t) |^2 + \frac{K^2 L M \gamma_t^2}{2}\\ & \leqslant - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{K \gamma_t}{2} \sum_{k = 1}^K | \nabla f (w^{t - K + k}) - \nabla f (w^t) |^2 + \frac{K^2 L M \gamma_t^2}{2}\\ & \leqslant - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{K^2 L M \gamma_t^2}{2} + \frac{K L^2 \gamma_t}{2} \sum_{k = 1}^K | w^{t - K + k} - w^t |^2\\ & \leqslant - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{K^2 L M \gamma_t^2}{2} + \frac{K^4 L^2 M^2 \sigma \gamma_t^2}{2}\\ & = - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \gamma_t^2 \frac{K^2 L M}{2} (1 + K^2 L M \sigma) \end{align*} $$

April 21, 2024 · 2 min

Intuition of Universal Approximation Theorem

Universal approximation theorem states that an infinite width single layer neural network is able to approximate an arbitrary continuous function uniformly with a squashing function. It also have some stronger statement for the approximation to Borel measurable function. But continuous function is enough in our case. And we may intuitively expect that the space of all continuous functions could approximate the space of Borel measurable functions almost surely in the sense of probability....

April 4, 2024 · 8 min

Brainstorm

Here are some immature ideas that I come up with when I am taking a walk or having a meal. Feel free to get some inspiration from this post and write some papers. I am glad to see study on these topics. This post will be updated from time to time. The research field underestimates the importance of optimization. You cannot expect one to have perfect memory. What we do every day is just adjusting our cognition....

March 16, 2023 · 5 min

Existence of Optimal Policy in Markov Decision Process

In this blog, we will prove the following theorem: Optimal Policy Existence Theorem: For any Markov Decision Process, There exists an optimal policy $\pi_\star $ that is better than or equal to all other policies, $ \pi_\star \geq \pi , \forall \pi$ All optimal policies achieve the optimal value function, $V_{\pi_\star}(s)=V_\star(s)$ All optimal policies achieve the optimal action-value function $Q_{\pi_\star}(s,a)=Q_\star(s,a)$ Definition To simplify the exposition, we first define some basic concepts....

September 12, 2022 · 8 min