Neumo’s Blogs

Let’s do some deep learning!

Reimagining Internal Family Systems Through the Lens of AI

IFS Concepts The Internal Family Systems is a psychotherapy proposed by Richard Schwartz in the 1980s. In this model, everybody is a combination of different parts like those characters in the film, Inside Out. But the difference is that only emotions are anthropomorphized in the animation while in IFS, a part can also be a piece of thought or mindset. For example, if you have a social appearance anxiety, there would always be a voice in your mind criticizing your appearance....

Notes on Sauer's Lemma

Introduction Every binary classifier is a function mapping its input, which is an element in an enumerable dataset, to 0 or 1. Equivalently, we could regard the classifier as a function $ f : \mathbb{N} \rightarrow \{ 0, 1 \} $. We have a set of hypotheses $\mathcal{H}$ from which a function is chosen to maximize the classification accuracy. It is perfect if $\mathcal{H}$ contains all possible functions $ f : \mathbb{N} \rightarrow \{ 0, 1 \} $, which indicates a universal approximator....

A Succinct Proof of Decoupled Parallel Backpropagation Convergence Lemma

The meaning of the notations below complies with the original paper. $$ \begin{align*} \mathbb{E} [f (w^{t + 1})] - f (w^t) & \leqslant \nabla f (w^t)^{\top} \mathbb{E} [(w^{t + 1} - w^t)] + \frac{L}{2} \mathbb{E} [| w^{t + 1} - w^t |^2]\\ & = - \gamma_t \nabla f (w^t)^{\top} \mathbb{E} \left[ \sum^K_{k = 1} \nabla f_{\mathcal{G} (k), x_i (t - K + k)} (w^{t - K + k}) \right] + \frac{L \gamma_t^2}{2} \mathbb{E} \left[ \left| \sum^K_{k = 1} \nabla f_{\mathcal{G} (k), x_i (t - K + k)} (w^{t - K + k}) \right|^2 \right]\\ & = - \gamma_t | \nabla f (w^t) |^2 - \gamma_t \nabla f (w^t)^{\top} \left( \sum_{k = 1}^K \nabla f_{\mathcal{G} (k)} (w^{t - K + k}) - \nabla f (w^t) \right) + \frac{K L \gamma_t^2}{2} \sum_{k = 1}^K \mathbb{E} [| \nabla f_{\mathcal{G} (k), x_i (t - K + k)} (w^{t - K + k}) |^2]\\ & \leqslant - \gamma_t | \nabla f (w^t) |^2 + \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{\gamma_t}{2} \left| \sum_{k = 1}^K \nabla f_{\mathcal{G} (k)} (w^{t - K + k}) - \nabla f (w^t) \right|^2 + \frac{K^2 L M \gamma_t^2}{2}\\ & \leqslant - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{K \gamma_t}{2} \sum_{k = 1}^K | \nabla f_{\mathcal{G} (k)} (w^{t - K + k}) - \nabla f_{\mathcal{G} (k)} (w^t) |^2 + \frac{K^2 L M \gamma_t^2}{2}\\ & \leqslant - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{K \gamma_t}{2} \sum_{k = 1}^K | \nabla f (w^{t - K + k}) - \nabla f (w^t) |^2 + \frac{K^2 L M \gamma_t^2}{2}\\ & \leqslant - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{K^2 L M \gamma_t^2}{2} + \frac{K L^2 \gamma_t}{2} \sum_{k = 1}^K | w^{t - K + k} - w^t |^2\\ & \leqslant - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \frac{K^2 L M \gamma_t^2}{2} + \frac{K^4 L^2 M^2 \sigma \gamma_t^2}{2}\\ & = - \frac{\gamma_t}{2} | \nabla f (w^t) |^2 + \gamma_t^2 \frac{K^2 L M}{2} (1 + K^2 L M \sigma) \end{align*} $$

Intuition of Universal Approximation Theorem

Universal approximation theorem states that an infinite width single layer neural network is able to approximate an arbitrary continuous function uniformly with a squashing function. It also have some stronger statement for the approximation to Borel measurable function. But continuous function is enough in our case. And we may intuitively expect that the space of all continuous functions could approximate the space of Borel measurable functions almost surely in the sense of probability....

Optimal Code Existence for Countably Infinite Sources

Huffman coding demonstrates the existence and a concrete construction for sources with finite alphabet. However, the construction fails when it comes to infinity. We would prove the existence of the optimal code for sources with a countably infinite alphabet. Notations We only use 0 and 1 to construct codewords without loss of generality and the base of $\log$ is 2 by default. $X$ is the random variable from the source whose probability distribution is $p_1 , p_2 , p_3 , \cdots$....

A Problem on `va_list` in C Language

What’s the output of the following codes? And Why? #include <stdio.h>int main(int argc, char *argv[]) { printf("%#018llx\n", (char)0x80); printf("%#018llx\n", (unsigned char)0x80); return 0; } (You might encounter warnings informing you of the inconsistency between the specified format and the given arguments. Let’s neglect them.) The answer is 0x00000000ffffff80 0x0000000000000080 Questions We have two questions: Is it overloading that contributes to different behaviors when different types of arguments are passed. Why is the first output 0x00000000ffffff80 instead of 0xffffffffffffff80?...

My Paper-Reading Workflow in 2023

Main The conventional approach to storing a file involves fitting it into a hierarchical structure that necessitates a comprehensive overview of the corresponding field before the very first paper reading. You may place them flattened in an inbox folder before the tedious task of reindexing and categorizing hundreds of them hierarchically, otherwise, the overwhelming folder becomes your first obstacle to retrieve information. However, it can be deduced that both methods involve an additional burden of metal....

Strictness of Markov Properties

A stochastic process $\{X_i\}_{i=0}^\infty $ is $n$-Markov if $$P(X_{t+n}|X_{t+n-1}, X_{t+n-2}, \cdots , X_{t}) = P(X_{t+n}|X_{t+n-1})$$ for any $t \ge 0$. We would prove that An $n$-Markov stochastic process must be $m$-Markov while is not necessarily $l$-Markov where $l > n > m$ N+1 to N First, we prove an (n+1)-Markov stochastic process must be n-Markov. Proof: Suppose $\{X_i\}_{i=0}^\infty$ is an $(n+1)$-Markov stochastic process. We have $$P(X_{t+n}|X_{t+n-1}, X_{t+n-2}, \cdots, X_t) = P(X_{t+n} | X_{t+n-1})$$ for any $t \ge 0$, deriving...

Some QR Codes Generated by ControlNet

I have tried HARD to generate a photo of a brain or a neuron that can be scanned. But unfortunately, either it cannot be recognized as a QR code or it is irrelevant to my prompts.😭 It is much easier to generate that of girls or natural scenery though. Perhaps it’s better to choose another stable diffusion checkpoint. (By the way, thanks to Stable Diffusion WebUI, it is quite easy to deploy famous diffusion models....

Build Singularity/Docker Image on a Singularity Server without `sudo` Privileges

In a docker container, you have full privileges to build the image of singularity or docker in it. But if only singularity is installed on the server and the root user sets up neither --fakeroot nor proot and you have exhausted your remote build minutes, what trick can you play to work around those restrictions? Software Selection To solve the problem, we need a virtual machine under control on the server for enough privileges to execute singularity build(or docker build) which requires sudo if you meet such a tough condition as mentioned before....