This is the knowledge section

...and this section is all about what i've learned through the years.

Data viz

Cool data viz

https://www.kiln.digital/

Two interesting code survey tool

2023-04-21

What the tools

Here are the two tools i'm talking about:

Visulization mutiple repos: link
Visulizating a target repo : intro link| direct link

Why is it interesting?

Because this a unique way of thinking about code. It is also a helpful way for me to understand code.

I always wanted to make contribution to open source project, but often fail to do so due to the complexity of the code base. As beginner I often feel stuck and don't know where to begin. Most of the "tutorials" wil just tell you to "start from checking typos", which doesn't really help at all.

These two tools will help me understand the landscape of open source project of a certain language and give me an overview of the structure of the code.

So have you used it? What's you feedback?

So far i've using it to read the codebase for various open source projects. As person who want to get in the on the action, this ease my way in undertanding what the heck is going on.

As an example, here the visulization of one of the project i'm interested in(github):

Here's another one for twitter's algorithm:

It's from this visulization that i learned that twitter used Rust on one of it's services, which motivates me to learn more about the language. It seems that a lot of companies have been adopting Rust.

So to explore rust, we can also go use the muti-repo visulization tool:

So..

I hope i can use these tools more often in the future. With a easier way of reading code, I'll be way more motivated to try to write and contirbute code to open sources projects as well.

The following is all my rust notes.

String::from() in rust

2023-04-18

String::from() basically turns a static string variable to a mutabe variable.

This is what phined have to say:

String::from in Rust is a function that creates a String from a string literal or a &str. A string literal, which is of type &'static str, is an immutable string slice that is embedded into the executable. On the other hand, String is a mutable and growable string buffer allocated on the heap. The String::from function creates a heap-allocated String and copies the data from the static string into it Source 4.

Here's an example:


#![allow(unused)]
fn main() {
let static_str = "Hello, world!";
let string = String::from(static_str);
}

In this example, static_str is of type &'static str, and string is of type String. The String::from function is used to create a mutable String from the immutable &'static str.

You can also use .to_string() or .to_owned() on a string literal to achieve the same result:


#![allow(unused)]
fn main() {
let string = static_str.to_string();
let string = static_str.to_owned();
}

Both .to_string() and .to_owned() work similarly to String::from. In Rust versions before 1.9.0, String::from was faster than .to_string(), but from version 1.9.0 onwards, they have the same performance Source 3.

to_vec()in rust

2023-04-18

to_vec is a crate that provides specialized implementations of collect for common use cases when collecting into Vec, HashSet, or HashMap containers. The main functionality can be broken down into:

ToVec: Collects an iterator's items into a Vec. For example:


#![allow(unused)]
fn main() {
use to_vec::ToVec;
let v = "one two three".split_whitespace().to_vec();
assert_eq!(v, &["one", "two", "three"]);
}

Source 0

ToVecResult: Collects an iterator of Result<T, E> into a Result<Vec<T>, E>, where the error is the first error encountered.


#![allow(unused)]
fn main() {
use to_vec::ToVecResult;
let numbers = "23E 5F5 FF00".split_whitespace()
    .map(|s| u32::from_str_radix(s, 16)).to_vec_result().unwrap();
assert_eq!(numbers, &[0x23E, 0x5F5, 0xFF00]);
}

Source 0

Some other similar crates

ToSet: Collects an iterator of references into a HashSet, implicitly cloning the items.


#![allow(unused)]
fn main() {
use to_vec::ToSet;
let colours = ["green", "orange", "blue"].iter().to_set();
let fruit = ["apple", "banana", "orange"].iter().to_set();
let common = colours.intersection(&fruit).to_set();
assert_eq!(common, ["orange"].iter().to_set());
}

Source 0

ToMap: Collects an iterator of references to key-value pairs into a HashMap, implicitly cloning the keys and values.


#![allow(unused)]
fn main() {
use to_vec::ToMap;
const VALUES: &[(&str, i32)] = &[("hello", 10), ("dolly", 20)];
let map = VALUES.iter().to_map();
assert_eq!(map.get("hello"), Some(&10));
assert_eq!(map.get("dolly"), Some(&20));
}

Source 0

These specialized forms provide a more ergonomic and efficient way to collect iterators into commonly used containers.

Warning

This is done with the help of phined

Difference betwene struct and enum

2023-04-19

According to phined:

In Rust, struct and enum serve different purposes and have different behaviors:

Memory layout:
- struct: The size of a struct is the sum of the sizes of its members, plus any padding for alignment. stackoverflow.com
- enum: The size of an enum is the size of the largest variant plus the variant identifier. stackoverflow.com
Usage:
- struct: Represents an "AND" combination of its fields. It is used to encapsulate related properties into one unified data type. reddit.com
- enum: Represents an "OR" combination of its variants. It is used when a value can be one of several possible types. reddit.com
Type aliases: Type aliases can be used with enums to simplify long or generic names. doc.rust-lang.org
Optimizations: Rust's compiler can optimize enums for memory usage and safety, whereas structs do not have the same level of optimization. reddit.com
Match statement: Enums can be used directly with match statements, which is one of the primary use cases for enums. reddit.com

Here are examples of each:


#![allow(unused)]
fn main() {
// Struct example 
struct Point { 
    x: i32,
    y: i32, 
} // Enum example 
enum Shape {     
    Circle(f32),     
    Rectangle(f32, f32),     
    Triangle(f32, f32), 
}
}

source

In summary, use struct when you want to group related properties into a single data type, and use enum when you want to represent a value that can be one of several possible types.

So it basically mean the struct in rust is like java's object and enum is kind of like a super class.

What is string.to_owned()

Phined answer:

to_owned is a method provided by the ToOwned trait in Rust, which is a generalization of the Clone trait for borrowed data. While both clone() and to_owned() can be used to create a copy of a value, they differ in their behavior for certain types, such as string slices (&str) and array slices (&[T]) Source 2.

When using clone() on borrowed types like &str or &[T], it returns a copy with the same borrowed type (&str or &[T]). However, when using to_owned() on these borrowed types, it returns an owned version of the type, such as a String for &str or a Vec<T> for &[T] Source 2.

For example:


#![allow(unused)]
fn main() {
let str = "a"; // type &str 
let cloned_str = str.clone(); // type &str
}


#![allow(unused)]
fn main() {
let str = "a"; // type &str 
let owned_str = str.to_owned(); // type String
}

In general, you can use clone() when you want a copy of the same type, and to_owned() when you want to create an owned copy from borrowed data. The ToOwned trait ensures that the duplicate value has ownership, even if it requires using a different data type, such as a String or a Vec<T> for a &str or &[T] Source 2.

As mentioned in Source 4, some developers prefer using clone() when they want an object of the same type and to_owned() when the types are different. However, this distinction is mostly stylistic, and both methods can be used interchangeably in many cases.

Quake 3 fast inverse square root algorithm

This blog is inspired by this video made by youtuber Nemean explaining the fast inv-sqrt algorithm written for the Quake 3 game.

I want to put what he have said into note form, while simultaneously exploring similar solutions of this algorithm in the Rust language.

This will be in three major sections:
Context | Breakdown | Solution in rust

The original C program

Here is the original c code for the algorithm:


float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;                       // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );               // what the fuck? 
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//	y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

    return y;
}

Context

Section: who cares

purpose

This algorithm is designed to normalise light ray vectors to 1. This is needed to calculate the light physics in game.

Light ray vectors is calculated through:
$https://latex.codecogs.com/gif.image?\inline \LARGE \dpi{110}\bg{black}\sqrt{x^2 + y^2 + z^2}$

And this we want to normalize it down to the range between 0 to 1, we divided each component by the length of the vector, which leads to things like this (we are using x axis as an example):
$https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}\frac{x}{\sqrt{x^2 + y^2 + z^2}}$

which leads to
$https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}x \times \frac{1}{\sqrt{x^2 + y^2 + z^2}}$

which leads to the need to calculating the inverse square roots.

Section: Variable decleartion and IEEE 754


float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

The first two variables are bascially set out to represent the number in binary scientific notation form. The variable i is a 32 bit long number. The variable x2 and y are two float(decimal) number

As we know, the binary scientific notation is in the form of:


0 00000000 00000000000000000000000

which consistes of:

1 bit for positive or negative sign (- or +)
8 bit for the exponents (the x in $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}y^x$ )
23 bits called the mantisa is for the two decimal place for the main number (the .xx in 1.xx )

And since we are normalizing a number to the range of 0 to 1, the following is given:

the number will always be positive
the number will be in normalized form

Section: bits and numbers

This section basically pointed out that the bit representaiton of the exponent and mantisa can be roughly equal to its logged version with additional constance. Meaning: the bit representation of x can mostly equal log(x).

The bit representation of mantisa(M) and the exponents(E) can be represented as $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}2^{23}*E + M$ , which in number form is $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}(1+\frac{M}{2^{23}}) * 2^{E-127}$ .

For us to get the log version, we first wrap it with log_2:
$https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}log_{2}((1+\frac{M}{2^{23}}) * 2^{E-127})$
We then simplify to get
$https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}log_{2}(1+\frac{M}{2^{23}}) + E - 127$ .

At this point, it is a bit stuck, since we have to figure out how to get rid of log_2 in the front. However, it was found that given a small enough x, $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}log_{2}(1 + x) \approx x$ .
It was also found that if the equation has the smallest error if we add x by the constant of 0.0430.

Hence, we get the following formula when we apply the findings:
$https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}\frac{M}{2^{23}} + constant+ E - 127$

and after re-arranging we get:
$https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}\frac{1}{2^{23}}(M + 2^{23} * E) + [constant](0.0430) - 127$

where we can find what we've discovered earlier:
$https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}2^{23}*E + M$

Breakdown

Section: evil bit hack


i  = * ( long * ) &y;

We want to enable bit manipulation on float. This is necessary when we want to utilize bit manipulation to realize number divisions.

However, we don't want to convert the variable itself from float to long, because that will cause information loss. We want bit representation of the float to become a long itself.

So instead, this line of code converts the address of the float into an address of a long.

Break down

The &y is the address of the float y. The ( long * ) converts the address of float y into the address of a long. So the C language will think it's now a long number livving in that address instead of a float. The * then reads what in the address.

Section WTF


i  = 0x5f3759df - ( i >> 1 );

This line of code attempts to use bit shifting to manipulate the exponent part of a number in order to achive inverse square root.

So bascially, given $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}x^{1}$ .

If you half the exponent (which can easily be done using big shifting), you get $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}x^{\frac{1}{2}}$ .

And if you negate the exponent, you get $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}x^{-\frac{1}{2}}$ , which is what we want .

However, since directly calculating the inverse square root is what we want to avoid through this algorithm, we need to work it out some other way. This is where what we talked about in the previous section comes in. If we get y in the IEEE 754 form, the bit representation of y can be interpreted as the log of itself.

Hence instead of calculating $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}\frac{1}{\sqrt{y}}$ , we calculate $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}log(\frac{1}{\sqrt{y}})$ , which can be turned to $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}-\frac{1}{2}log(y)$ .

note..

there is still a division in the equation, which can be dealt with using bit shift - ( i >> 1 )

Since we want to find out the inverse square root of a number, we can than draw an equation like this:
$https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}log(\Gamma ) = -\frac{1}{2}log(y)$

We can then use the previously discussed equation $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}\frac{1}{2^{23}}(M + 2^{23} * E) + [constant] - 127$ and estabalish a new equation:
$https://latex.codecogs.com/png.image?\LARGE \dpi{110}\bg{black}log(\frac{1}{2^{23}}(M_{\Gamma} + 2^{23} * E_{\Gamma}) + [constant] - 127) = -\frac{1}{2}log(\frac{1}{2^{23}}(M_{y} + 2^{23} * E_{y}) + [constant] - 127)$ .

After solving for gamma, we get this:

$https://latex.codecogs.com/png.image?\LARGE \dpi{110}\bg{black}M_{\Gamma} + 2^{23} * E_{\Gamma} = \frac{3}{2}2^{23}(127 - constant) - \frac{1}{2}(M_{y} + 2^{23} * E_{y})$
where $https://latex.codecogs.com/png.image?\LARGE \dpi{110}\bg{black} \frac{3}{2}2^{23}(127 - constant)$ = 0x5f3759df


y  = * ( float * ) &i;

Finally we performs the evil bit hack again but in reverse.

Section: Newton iteration


y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//	y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

This line of code utilized the Newton iteration, we need this becuase the result from the reverse evil bit was still not accurate enough for this calculation.

Newton iteration

Newton iteration finds root of a function. This mean that is finds the x in f(x) = 0 through many iteration of approximation.

Some of the basics of newton iteration involves guessing a new value of x through the f(x) and its derivative. This leads to the following equation
$https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}x_{new} = x - \frac{f(x)}{f^{'}(x)}$

The quake 3 algorithm only did the iteration 1 time because the result already makes the error to be within 1%. The code essentially translates to $https://latex.codecogs.com/gif.image?\inline \huge \dpi{110}\bg{black}f(y) = \frac{1}{y^{2}} - x$ . And when y is the root of the function, y is also the inverse square root of x.

Some solution of this algorithm in rust

Here's a version of this code written in rust found on this stackoverflow post. (rust playground)


#![allow(unused)]
fn main() {
fn inv_sqrt(x: f32) -> f32 {
    let i = x.to_bits();
    let i = 0x5f3759df - (i >> 1);
    let y = f32::from_bits(i);

    y * (1.5 - 0.5 * x * y * y)
}
}

Note for procedural generation

I was play deep rock galactic and was wondering how procedural generation work, so i'mma read it up and take notes.

Reading list

Generating Random Fractal Terrain

Concept behind fractal: self similarity, meaning it copies itself in a miniture version, and so on and so forth.

This is like human blood vessels, it starts from main arteries, than branch out to small veins.

Midpoint displacemen in one dimension


Start with a single horizontal line segment.
Repeat for a sufficiently large number of times {
 Repeat over each line segment in the scene {
  Find the midpoint of the line segment.
  Displace the midpoint in Y by a random amount.
  Reduce the range for random numbers.
 }
}

From This

to This

to this

This is a recursive operation

It is also a simple algorithm that creates complex result.

Roughness constant H determines the roughness of the fractal.

Height maps

The same algorihtm can be used to generated the height of a 3d space, then combining with the x and y coordinates, you get a height map.

diamond-square algorithm

The diamond step: Taking a square of four points, generate a random value at the square midpoint, where the two diagonals meet. The midpoint value is calculated by averaging the four corner values, plus a random amount. This gives you diamonds when you have multiple squares arranged in a grid.

The square step: Taking each diamond of four points, generate a random value at the center of the diamond. Calculate the midpoint value by averaging the corner values, plus a random amount generated in the same range as used for the diamond step. This gives you squares again.

Implementing this algorithm recursivly will cause generation with insufficient data at some point. Like after the first pass, the square step will not recieve the four cornors of the diamond in this implementation:


Do diamond step.
Do square step.
Reduce random number range.
Call myself four times.

So the iterative implmentation is:


 While the length of the side of the squares 
 is greater than zero {
 Pass through the array and perform the diamond
 step for each square present.
 Pass through the array and perform the square 
 step for each diamond present.
 Reduce the random number range.
 }

This algorithm can also be used to generate clouds by determining the color value instead of the height value.

Linux learning

2020-07-05

This is the post where i’m going to summarize my entry level knowledge on Linux. I’m going to keep updating this post until i have a basic idea of Linux as a whole.

The basic syllabus

There is three main utilization of Linux/purpose for learning Linux:

Utilization type	Basic description	Key phrases/Stuff to learn
Operation and Maintenance, O&M.	The craft of keeping servers running. This mostly deals with server maintaining, set-up, surveillance, etc.	FTP，DNS，APACHE，IPTABLE，ORACLE，MYSQL, etc. Understanding most of the Linux commands. Shell & Python
Application developer	Same as App dev from any other OS, but on Linux.	-Understanding Linux environment, Linux API , IO handling , Python & Shell, IO handling, C/C++
Kernel Programming	Dealing with the core of Linux as an OS.	Get to know the relationship between hardware and software. , – Read manuals on computer chip sets.

Although it’s would be best to learn everything, I find the first option, O&M, more suited for my case(since I have rented servers). So this post will be focusing more on the server-side knowledge(I will try to learn the other option afterward).

Linux basics

What is Linux?
Linux distros, and the difference between them
Most common Linux command

What is Linux:

Linux is a free and open souse Operating system created by Linux Torvalds. It is Unix based, and are used among most of the servers.

Linux distros:

Since Linux is open sourced, people is allowed to modify and produce their own version of Linux, thus born Linux distros. The most common Linux distros are:

Ubuntu Linux
Red Hat Enterprise Linux
Linux Mint
Debian
Fedora

Also Linux distros usually comes with a GUI, the majority of Linux user interaction is done by CLI(command line interface)

Linux basic commands:

The absence of GUI in most cases means that operating around Linux requires navigation through working directory, e.g. /home/usr/etc/etc. This means that the most used command are the follow:

Command	Meaning	Example
cd	Change directory -this command allows you to go from one directory to another.	cd /home/exampleDir
ls	List – this command list all the file & directory in the current directory.	ls /home/exampleDir
pwd	Print working directory – this command print the path of your current directory, it let you know where you are.	pwd Output: /home/exampleDir

After going through the directories, we can also manipulate the directories by these commands | Command | Meaning | Example | | ------- | ------------------------------------------------------------------- | ----------------- | | mkdir | Make directory – create a new directory under the current directory | mkdir /exampleDir | | rmdir | Remove directory – remove a directory under the current directory. | rmdir /exampleDir |

Files can be also manipulated in a similar way:

Command	Meaning	Example
touch	create an empty file of any type	touch example.txt
rm	remove a file of any type	rm example.txt
rm -r	remove the directory, but not the files inside.	rm -f /exampleDir

In GUI, a file movement is usually done by drag and drop. In Linux, file movement is done by:

Command	Meaning	Example
cp	copy a file to a directory	cp example.txt /home/exampleDir
mv	move a file to different directory, or rename a file	mv example.txt /home/exampleDir mv example.txt example1.txt

There is also a search command:


locate

which act as the search tool for files. For example


locate example.txt
#output: /home/exampleDir/example.txt

Linux also has a build in Manual, which helped users if they forget the meaning of a certain command. To activate the the manual, use


man
#or
-help

Examples could be:


man cd
#or
cd -help

After creating a file/files, it could be examined and edited using these commands:

Command	Meanings	Examples
cat	cat displays the content of the file	cat example.txt output: Example contnet
echo	echo pushes data, usually text, into a file	echo Example content >> example.txt
vi	vi or visual editor is the default text editor for Unix based systems. When using, press i to edit, press esc to exit editing mode, type :wq or “:x” to write and quit, type :q to just quit, add ! if permission denied.	vi example.txt
nano	nano is a more complex text editor than vi. It is a “what you see is what you got” text editor with more functions such as inserting the content of another file	nano example.txt nano /home/exampleDir /example.txt

There is also some system side command that is very useful in day to day usage.

For example


sudo

sudo stands for SuperUser Do. It is a command allowing a non-root user to have administrative privilege when giving out other commands. It is usually used at the very front:


sudo cd
sudo vi example.txt
sudo mkdir
#etc, etc

Managing disks is another important part of the Linux. The following two command gives you the information of disk space: | Command | Meaning | Example | | ------- | ------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- | | df | df checks the disk space information in each mounted partition of the system. E.g. total disk space, used disk space, available disk space. | df, df -m show in megabytes | | du | du tells the disk usage of a particular file or directory | du /home/exampleDir du /home/exampleDir /example.txt |

There’s a whole other topic on disk mounting, which is something that i will touch on later on.

In terms of file compression, Linux uses the tar archive, which is represented with .tar. The command such as compress and decompress are as the following:

Command	Meaning	Example
tar -cvf	create a tar archive	tar -cvf example.txt /home/eampleDir/
tar -xvf	decompress a tar archive	tar -xvf /home/exampleDir/example.tar
tar -tvf	list the content of the archive	tar -tvf /home/exampleDir/example.tar

each letter in the “-cvf” in “tar -cvf” has it’s own meanings:

c is for create tar file
v is for verbosely show tar file progress
f is for file name

The old fashion “zip” file is also avalible in Linux, we can zip and unzip using the following command:

Command	Meaning	Example
zip	to compress a file into zip archive	zip example.txt
unzip	tp decompress a zip archive	unzip example.zip

There’s also some command that retrives information such as os version, network connection, etc.

Command	Meaning	Example
uname	retrieve information about the Linux distro system.	uname -a Output:
hostname	retrieve your host name & ip adress	hostname
ping	ping check your connection with a certain website. It is also used to check the overall internet connection.	ping example.com

Then, to install something, we use


apt-get

yum

The difference between these two is their type of installing package. Linux has two branch of distros

Distros branch	examples
Redhat	Redhat, CentOS, fedora
Debian	Debian, Ubuntu

and each have their own type of installing package.

Distros branch	Type of installing package
Redhat	rpm(redhat package manager)
Debian	deb(dpkg)

yum and apt-get are the wrappers around these installing packages. They help manage all the installing package on a online repository.

So to use them, for example download the “sudo” package, we just neet to type:


apt-get install sudo


yum install sudo

depending on your version of Linux distro.

Finally, files in Linux have privilege/permission setting. A user with a root/sudo privilege will be able to change these permissions.

To understand these permission, we first need to go back to the ls command.

There is a variety to ls, which is


ls -l

it can display the information in a “long” format, which gives more information about a file. An example output would be:


-rwxrw-r-- 1 exampleUsr exampleUsr 780 Aug 20 11:11 example.txt
#or
drwxr-xr-x 2 exampleUsr exampleUsr 4096 Aug 21 08:03 exampleDir

The string of text in front of each line displays the the file type, the permission for file owner, the permission for users in the same group as the file owner and the permission for everyone else.

–	this shows that this is a file
rwx	the first segment is for file owner
rw-	the second segment is for users in the same group
r–	the third segment is for every else.

The first symbol represent the file type. The most common symbols are – and d, which is file and directory.

Apart from the first symbol, the following letters follows a pattern, each letter represent a permission status

r = read
w = write
x = execute
– (if not the first symbol/letter) = denied permission on whatever is suppose to be there

So, rwx means readable, writable, exexutable; rw- means only readable, writable, but not executable, and r– means only readable.

Now, to change the permission of these files, we use the command


chmod

This command can only be used with a root/sudo privilage, and the user must provide the “permission statement”, which includes the information of

who: who’s permission is changed
what: what changes in permission
which: what permission is going to change

This is represented with indicators after “chmod”, which is summarized in the table below | | | | | | | ----- | ----------------------- | ----------------------------------------- | ------------------------------------- | ------ | | who | u: “user”/ “file owner” | g: group, members in the same user group. | o: others | a: all | | what | – : remove permission | + : grants permission | = : set permission and remove others. | | | which | r: see above | w: see above | x: see above | |

This permission statement can also be represented using numerical and binary values. In that case

Meanings	Numerical value(Decimal)	read digit(binary)	write digit(binary)	execute digit(binary)
No permission	0	0	0	0
Execute permissions	1	0	0	1
Write permissions	2	0	1	0
Write and Execute	3	0	1	1
Read permission	4	1	0	0
Read and exectue	5	1	0	1
Read and write	6	1	1	0
Read, write, exectue	7	1	1	1

so, to grant full permission of a file to everybody, we can either use


chmod a=rwx example.txt


chomd 777 example.txt

However, usually we just give the full permission to the file owner, and other people the read and execute permissions, so it’s like this:


chmod u=rwx,og=rx example.txt


chmod 755 example.txt

and finally, we add

-R

between chmod and the permission statement if we want tot change the permissions of the files in the sub directories.

References

This is all my reference is going to go. This page will move to the end after i finished this post.

Linux introduction https://www.zhihu.com/question/397371213
- https://www.zhihu.com/answer/1306299380
What is kernel programming https://www.quora.com/What-is-Kernel-programming-What-does-a-Kernel-programmer-actually-do#:~:text=Kernel%20programming%20is%20nothing%20but%20providing%20functionality%20in,kernel%20programmer%20would%20be%20doing%20the%20following%20things%3A
What is Linux https://maker.pro/linux/tutorial/basic-linux-commands-for-beginners
Basic vi command https://www.cs.colostate.edu/helpdocs/vi.html
The Beginner’s Guide to Nano, the Linux Command-Line Text Editor https://www.howtogeek.com/howto/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor/
18 Tar Command Examples in Linux https://www.tecmint.com/18-tar-command-examples-in-linux/
What is the difference between yum, apt-get, rpm, ./configure && make install? https://superuser.com/questions/125933/what-is-the-difference-between-yum-apt-get-rpm-configure-make-install
yum和apt-get的区别 https://www.cnblogs.com/siyuli2019/p/11252419.html
How to Use the chmod Command on Linux https://www.howtogeek.com/437958/how-to-use-the-chmod-command-on-linux/
Linux chmod command https://www.computerhope.com/unix/uchmod.htm
Linux基础入门 | 目录结构 https://mp.weixin.qq.com/s?__biz=MzU3NTgyODQ1Nw==&mid=2247485351&idx=1&sn=c1a56193a2fa9fa40eaace1c220d8279&source=41#wechat_redirect
The Linux Directory Structure, Explained https://www.howtogeek.com/117435/htg-explains-the-linux-directory-structure-explained/

Game engine architecture book note

This is a bunch of note from me reading the Game engine architecture book by Jason Gregory

2023-05-24

Main parts in game engine inlucdes:

third party SDKs, like DirectX
platform independence layer, which target mutiple plaform, like a wrapper
Core system, such as memoery allcation, object handle
Resource manager, inlcude game assets
Rendering engine, rendering graphics
- low level renderer, e.g. shaders, lighting, etc
- Graphic sdks
- other components
culling optimization
Visual effects
front end
collision and physics
skeletal animation
audio

Bayes Network

What is Bays network

A Bayesian network is a representation of a joint probability distribution of a set of random variables with a possible mutual causal relationship. The network consists of nodes representing the random variables, edges between pairs of nodes representing the causal relationship of these nodes, and a conditional probability distribution in each of the nodes. The main objective of the method is to model the posterior conditional probability distribution of outcome (often causal) variable(s) after observing new evidence. Bayesian networks may be constructed either manually with knowledge of the underlying domain, or automatically from a large dataset by appropriate software.

Bay's net consists of..

a directed acyclic graph
set of conditional probability distributions.

The goal

The goal is to calculate the posterior conditional probability distribution of each of the possible unobserved causes given the observed evidence.

Baysian equation

$P [Cause | Evidence] = P [Evidence | Cause] \times \frac{P [Cause] }{P [Evidence]}$

notice..

the flip of evidence and cause

Any node in a Bayesian network is always conditionally independent of its all non decendant given that node's parents.

Difference between markov and Bayesian network

A Markov model is an example of a graph which represents only one random variable and the nodesrepresent possible realizations of that random variable in distinct time points. In contrast, a Bayesian network represents a whole set of random variables and each node represents a particular causal relationship among them.

six rules of bayes net/six rules of d seperation

link

d in d seperation stands for directional

path in nodes means:

any consecutive sequence of edges, disregarding their directionalities.

unbocked path means:

a path that can be traced without traversing a pair of arrows that collide "head-to-head"

the head to head nodes are called "colliders"

Rule 1

x and y are _d-_connected if there is an unblocked path between them.

Rule 2

x and y are _d-_connected, conditioned on a set Z of nodes, if there is a collider-free path between x and y that traverses no member of Z. If no such path exists, we say that x and y are _d-_separated by Z, We also say then that every path between x and y is "blocked" by Z.

Rule 3

If a collider is a member of the conditioning set Z, or has a descendant in Z, then it no longer blocks any path that traces this collider.

without knowing anything:

C is dependent on A
$\left<A \right> \rightarrow \left\to \left<C \right>$
A and C is both dependent on c, which make them conditionally independent but not independent
$\left<A \right> \leftarrow \left\to \left<C \right>$
A and C is independent
$\left<A \right> \rightarrow \left\leftarrow \left<C \right>$

Given B:

A and C are independent
$\left<A \right> \rightarrow \left\to \left<C \right>$
A and C are independent
$\left<A \right> \leftarrow \left\to \left<C \right>$
A and C is dependent
$\left<A \right> \rightarrow \left\leftarrow \left<C \right>$

Stable matching and five representative problems

Stable matching problem

Solution pseudocode:


Initially all m in M and w in W are free 
While there is a free m 
    w highest on m’s list that m has not proposed to 
    if w is free, then match (m, w)
    else 
        suppose (m2 , w) is matched 
        if w prefers m to m2
            unmatch (m2 , w) 
            match (m, w)

python implementation:


def gale_shapley(*, A, B, A_pref, B_pref):
    """Create a stable matching using the
    Gale-Shapley algorithm.
    
    A -- set[str].
    B -- set[str].
    A_pref -- dict[str, list[str]].
    B_pref -- dict[str, list[str]].

    Output: list of (a, b) pairs.
    """
    B_rank = pref_to_rank(B_pref)
    ask_list = {a: deque(bs) for a, bs in A_pref.items()}
    pair = {}
    #
    remaining_A = set(A)
    while len(remaining_A) > 0:
        a = remaining_A.pop()
        b = ask_list[a].popleft()
        if b not in pair:
            pair[b] = a
        else:
            a0 = pair[b]
            b_prefer_a0 = B_rank[b][a0] < B_rank[b][a]
            if b_prefer_a0:
                remaining_A.add(a)
            else:
                remaining_A.add(a0)
                pair[b] = a
    #
    return [(a, b) for b, a in pair.items()]

Use case

matching internship applicant to companies. The matching should be self enforcing and have less chaos.

We want both parties to either:

perfer thier choice of matching
satisfied with the current selection and will not change.

In the applicant and companies example, this means: E prefers every one of its accepted applicants to A; or (ii) A prefers her current situation over working for employer E.

Note

The algorithm terminates after at most n^2 iterations of the while loop

The return set S by the algorithm is a stable matching

Five representative problems

#Algorithm

Interval scheduling

Goal

Accept a subset of requests for time, rejecting all others

visual example

Usually is solved using some kind of greedy algorithm

Weighted interval scheduling

Goal

Our goal will be to find a compatible subset of intervals of maximum total value/weight.

Usally solved using dynamic programming

Bipartite Matching

Goal

Given an arbitrary bipartite graph G, find a matching of maximum size. If |X|=|Y| = n, then there is a perfect matching if and only if the maximum matching has size n

It is like stable matching but without preferences.

there is not necessarily an edge from every x ∈ X to every y ∈ Y, so the set of possible matchings has quite a complicated structure. In other words, it is as though only certain pairs of men and women are willing to be paired off, and we want to figure out how to pair off many people in a way that is consistent with this.

Usally solved using augmentation, which is the key to a subset of problem s called network flow problems.

Independent Set

Definition

Given a graph G = (V, E), we say a set of nodes S ⊆ V is independent if no two nodes in S are joined by an edge.

For example, the maximum size of an independent set in the graph in Figure 1.6 is four, achieved by the four-node independent set {1, 4, 5, 6}.

This belongs to the class of prbolems called NP-complete problems.

Competitive facility location

Thus our game consists of two players, P1 and P2, alternately selecting nodes in G, with P1 moving first. At all times, the set of all selected nodes must form an independent set in G. Suppose that player P2 has a target bound B, and we want to know: is there a strategy for P2 so that no matter how P1 plays, P2 will be able to select a set of nodes with a total value of at least B? We will call this an instance of the Competitive Facility Location Problem.

It is considered in the class of problsm called PSPACE-complete problems. It is harder the NP-complete problems

Probabilistic-Context-Free-Grammar(PCFG)

A probabilistic context free grammar (PCFG) is a type of formal grammar that contains probabilistic functions in order to assign a probability to each production rule. PCFGs are a type of generative grammar, which means they can be used to generate sentences from a given set of rules. Unlike traditional context free grammars, PCFGs use probabilities instead of just binary values (true or false) to determine the likelihood for each production rule. This allows for the generation of more complex and natural-sounding sentences.

A CFG consist of..

Terminals
Nonterminals
Start symbol
Rules

A generic PCFG consists of

Context free grammar

Terminals
Nonterminals
Start symbol
Rules

along with..

Rule probabilities

Abbreviation meanings:

NN --> common nouns NNS --> common nouns(plural) NNP --> proper Nouns(singular)

like name of people, places and things VB --> verbs base form VBG --> verbs with -ing

probabilistic context free grammar

[[Heuristic values]]

A* search algorithm:

The A* search algorithm is a popular search algorithm used in pathfinding and graph traversal. It combines the strengths of both Dijkstra's algorithm (which can only find a shortest path in a graph with non-negative edge weights) and the greedy best-first search algorithm (which can only find a shortest path to a target in a graph without negative edge weights).

Initialize an empty list of nodes to be explored, called the "open list"
Initialize a closed list of already-explored nodes
Set the initial node as the current node and add it to the open list
While the open list is not empty: a. Select the node in the open list with the lowest f score (cost function) b. Remove it from the open list and add it to the closed list c. Generate its successors (neighboring nodes) d. For each successor: i. If it is not in either list, compute its f score and add it to open list ii. If it is already in either list, check if using this path is a better route and update accordingly
When all successors of current node have been evaluated, set current node = parent node and repeat steps 4-5 until goal state is reached


# create a set to store explored nodes
explored = set()

# create a set to store unexplored nodes
unexplored = set()

# create a dictionary to store the cost of getting to each node
cost = {}

# create a dictionary to store the best previous node for each node
previous = {}

# create a dictionary to store the estimated cost of getting to the end node from each node
estimated_cost = {}

# set the initial cost of getting to each node to infinity, since we don't know any better at the start
for node in graph:
  cost[node] = float('inf')

# set the initial estimated cost of getting to the end node from each node to the heuristic cost
for node in graph:
  estimated_cost[node] = heuristic(node, end_node)

# set the initial node to the start node and add it to the unexplored set
current_node = start_node
unexplored.add(current_node)

# loop until we either find the end node or there are no more unexplored nodes
while len(unexplored) > 0:
  # find the node in the unexplored set with the lowest estimated cost
  lowest_cost = float('inf')
  lowest_cost_node = None
  for node in unexplored:
    if estimated_cost[node] < lowest_cost:
      lowest_cost = estimated_cost[node]
      lowest_cost_node = node

  # if we've found the end node, we're done
  if lowest_cost_node == end_node:
    break

  # move the current node from the unexplored set to the explored set
  unexplored.remove(lowest_cost_node)
  explored.add(lowest_cost_node)

  # update the cost of getting to each neighbor of the current node
  for neighbor in graph[lowest_cost_node]:
    # skip any neighbors that are already in the explored set
    if neighbor in explored:
      continue

    # calculate the cost of getting to this neighbor
    new_cost = cost[lowest_cost_node] + graph[lowest_cost_node][neighbor]

    # if the new cost is lower than the previous cost, update the cost and set the previous node for this neighbor
    if new_cost < cost[neighbor]:
      cost[neighbor] = new_cost
      previous[neighbor] = lowest_cost_node
      estimated_cost[neighbor] = new_cost + heuristic(neighbor, end_node)

      # if the neighbor is not in the unexplored set, add it
      if neighbor not in unexplored:
        unexplored.add(neighbor)

# create an empty list to store the path
path = []

# set the current node to the end node
current_node = end_node

# loop until we get to the start node
while current_node != start_node:
  # insert the current node at the start of the list
  path.insert(0, current_node)

  # set the current node to the previous node
  current_node = previous[current_node]

# return the path
return path

Heuristic values

These are the values give to each node in relation to the goal. A simple example would be distance. If each city is a node, the heuristic values of a node can be the straight line between this city node and the goal city node. This value is a measurement on top of the path values in a graph.

Admissiblity

An admissble heuristics is a heuristics that never over estimates the true cost to the goal states. It estimate a number that is below the true cost.

Consistency

A consistent heuristics is a heuristics that adds up. Meaning that the forming of a heuristics value is based on previous heuristics. B node's heuristics value will be A nodes' heuristics value plus the path value between A and B.

keep in mind

A consistent heuristics function is also admissble. An admissble heuristics is not necessary consistent.

Fibonacci in time complexity of O(n)

Warning

This is a note intended for myself to re-read. It is largely written by chat gpt.


# bottom-up dynamic programming 
def fib_dp(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        fib_list = [0, 1]
        for i in range(2, n + 1):
            fib_list.append(fib_list[i-1] + fib_list[i-2])
        return fib_list[n]

Explaination:

First, the function takes in an integer n as an argument, which represents the position of the Fibonacci number we want to calculate.

The function starts by checking if n is equal to 0 or 1. If it is, it immediately returns 0 or 1, respectively. This is because the first two numbers in the Fibonacci sequence are 0 and 1, so we can just return them directly without calculating anything.

Otherwise, the function creates an array called fib_list and assigns the values [0, 1] to it. This array will be used to store the previously calculated Fibonacci numbers, so that we can use them to calculate the next number in the sequence.

Then, the function enters a for loop that starts at index 2 and goes until n+1. This is because the first two numbers in the sequence have already been added to the fib_list array.

Inside the for loop, the function appends the sum of the previous two numbers in the fib_list array to the array. It does this by using the indices i-1 and i-2 to access the two previous numbers, and then adding them together.

At the end of the for loop, the function returns the last element of fib_list, which is the Fibonacci number at the position n.

This implementation uses a bottom-up dynamic programming approach, where we store the results of previously computed Fibonacci numbers in an array, and use them to calculate the next number in the sequence. This avoids the need to re-compute the same numbers multiple times, which reduces the overall time complexity, and makes it O(n) .

Examples of time complexity of O(2^n)


def fib_rec(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib_rec(n-1) + fib_rec(n-2)

This implementation has a time complexity of O(2^n) because each call to the function generates two new calls, one for n-1 and one for n-2. Therefore, the total number of calls grows exponentially with the input size. For example, to calculate the 30th Fibonacci number, we would need to make more than 2 billion function calls.

The problem with this implementation is that it recalculates the same Fibonacci numbers over and over again, leading to a lot of redundant computation. For example, to calculate fib_rec(4) we need to calculate fib_rec(3) and fib_rec(2), but to calculate fib_rec(3) we also need to calculate fib_rec(2), which is redundant.

This is why the bottom-up dynamic programming approach has a better time complexity. It avoids the redundant computation by storing the already calculated values in an array and uses them to calculate the next values in the sequence.

Info

for fib_dp(30), the time takes to run is 0.0000169000 on my machine.
for fib_rec(30), the time takes to run is 0.2176796000s on my machine.

Triangle with out vertex

Here is the code:


void main()
{
    float x = -1.0 + float((gl_VertexID & 1) << 2);
    float y = -1.0 + float((gl_VertexID & 2) << 1);
    gl_Position = vec4(x, y, 0, 1);
}

This code allow us to draw a trangle without providing specific vertex coords.

Basics on rendering

Shaders are programs user wrote to run on some stage of the GPU.

The two most vital componenet shader are vertex and fragment.

vertex handles coordinates in the virtual space. fragment handles the color, depth and other stuff.

How does this programm work

Overview

through a method, we get x and y value, we then put them with the predetermined z and w value, which is set to 0 and 1, the standards for 2D graphics.

layer 1.

we get both calculations of x and y by minusing the output of float(things in here)

layer 2

we get the output of x in float(things in here) by bitshifting (gl_VertexID & 1) to the left by 2. This means that we times the number we get from (gl_VertexID & 1) by 4.

we get the output of y in float(things in here) by bitshifting (gl_VertexID & 2) to the left by 1. This means thaht we times the number we get from (gl_VertexID & 2) by 2.

layer 3

gl_VertexID: This is a built-in variable in GLSL that provides the index of the currently processed vertex.

(gl_VertexID & 1) extract the least significant bit(the big on the most right of the binary number). this is becuase it is comparing the binary number of gl_VertexID with the binary number of 1, which is 0001.

So if gl_VertexID is 2, which is 0010 in binary, and we do an and operation with 0001, we get 0000, which is 0, which matches the least significant bit in 0010.

The samee logic goes for (gl_VertexID & 2), but now we are getting the second least significant bit.

So..

if these logics are run for gl_VertexID equals 1, 2, and 3, we will get the coordinates {-1, -1}, {3, -1}, and {-1, 3}, which is a triangle.

learning CRDT notes

what is CRDT:

CRDT

CRDT stands for Conflict-free replicated Data Type, its a way to ensure concurrent data update between clients. They guaranteed convergence eventually as long as concurrent updates are commutative.

This is a topic in distributed systems.

CRDT address the problem of concurrency in distributed system. However it has limitations, such as the lack of consensus. It only address part of the problem since a lot of update operation is not communicative.

Types of CRDT:

CvRDTs - Convergent replicated Data Types
CmRDTs - Commutative replicated Data Types

State based replication

Replica received update from client, and then sometime later it sends its full state to other replica.

Replica receiving other's full state will merge their current state with the incoming state.

Every replication occasional does the above actions, hence every update eventually reach every replica in the system.

IF:

set of value of the state form semi-lattice(a partially ordered set with a join/least upper bound operation)
updates are increasing
merge function computes the least upper bound THEN replica guaranteed to converge at the same value.

IF:

if set of all possible states if semi-lattice THEN Merge operation has to be idempotent, associative, and commutative


Idempotent - operation on a element will produce the same result.

IF:

Replica satisfy above points

THEN:

Replica is CvRDTs.

img of CvRDTs

Operation based replication

Don't send whole state broadcast update operations to all systems, expect each replica to replay them.

This may cause replica to received update operation in different order, so they have to be communicative.

img of CmRDTs

Resource followed

https://www.farley.ai/posts/causal
https://medium.com/@istanbul_techie/a-look-at-conflict-free-replicated-data-types-crdt-221a5f629e7e
https://en.wikipedia.org/wiki/Semilattice
https://www.youtube.com/watch?v=3UkC3sXLqhQ
https://www.youtube.com/watch?v=LCFf2DBTVmo
https://www.youtube.com/watch?v=KbyVjwmzlpk
https://www.youtube.com/watch?v=XJQqDDTNvJA

Things i learned from the creator of summatra pdf

2023-10-24

Yesterday while browsing through hacker news, I encounter this article written by the creator of summatra pdf, who summerized some of his insight on mainting a open source application by vey few individuals over a long preiod of time. Here are some of my take aways.

Sometime good things come out of accidents. Summatra was an accidental app, and an accidenal sucess. It was not intentially created, but was created in a convinient time

What do you know: I had a simple PDF reader for Windows. I released it on my website. It couldn’t do much so I tagged it as version 0.1.

Getting early users, learning what features they want the most beats toiling for months or years and implementing lots of features before you know anyone even cares.

Frequent release, but also release notable changes.

On one hand I don’t want to release too often but I also do want the users to get new features as quickly as possible. My policy of new releases is: release when there’s at least one notable, user-visible improvement.

Documentation and presentation is important for everything, even an opensource software

From day one I created a website for the app. It had screenshots, it had documentation, it was easy to download and install.

Simplicity sells
Oftem times you product differs from the rest by simple tinggs.

Thinking outside of the box is hard because the box is invisible. SumatraPDF wasn’t the first PDF reader application ever written. But most PDF readers do not become multi-format readers.

With the limited amount of energy you have, do a few things excellently instead of many things mediocrely.
Overnight success takes decades

Success often takes a long time. Unfortunately, at that stage it’s undistinguishable from (eventual) failure so this wisdom doesn’t help you if you’re working on a not-yet-successful project and debating if you should continue or abandon

Maintaining an open source project for such a long time takes true passion, because open source does not make money.

Here is all the stuff I learned about webGPU

I wanted to explore the possibility of web based graphics application. So this folder is going to be about all the webGPU stuff i learned and tried out.

Instead of a recording approach, i'm going to employ a "plan and execute" approach to the knowledge on webGPU, where i plan out want I want to learn, and go explore in those particular topic area. This way, I can avoid side-tracking.

notes/
plan

there here are the notes

for webgpu study...

basic-info

What is web gpu?

This is the first step of my learning. This post will focus on what is webgpu and how to have it on your browser

What is webGPU

WebGPU is a new API for the web, which exposes modern hardware capabilities and allows rendering and computation operations on a GPU, similar to Direct3D 12, Metal, and Vulkan.

Why webGPU

WebGPU offers access to more advanced GPU features and provides first-class support for general computations on the GPU. The API is designed with the web platform in mind, featuring an idiomatic JavaScript API, integration with promises, support for importing videos, and a polished developer experience with great error messages.

webGPU abrastraction

WebGPU is a very simple system. All it does is run 3 types of functions on the GPU. Vertex Shaders, Fragment Shaders, Compute Shaders.

Running webGPU.

Here is the webGPU examples page i found.

To see webGPU working, you need to have either chrome113 or edge113. Go to the example page and check out the simple renders. It should shown on the screen.

plan

need to read

https://toji.dev/webgpu-best-practices/
- https://webgpufundamentals.org/webgpu/lessons/webgpu-fundamentals.html#a-drawing-triangles-to-textures
  - https://google.github.io/tour-of-wgsl/
https://wiki.nikiv.dev/computer-graphics/webgpu
https://codelabs.developers.google.com/your-first-webgpu-app#0
https://jack1232.github.io/webgpu00/
https://www.freecodecamp.org/news/learn-webgpu-a-next-generation-graphics-api-for-the-web/

these are task i need to do

Learn how to set up webgpu project
render a triangle
render conway's game of life

keywords

vertex shader - Vertex shaders are functions that compute vertex positions for drawing triangles/lines/points

Fragment shader - Fragment shaders are functions that compute the color (or other data) for each pixel to be drawn/rasterized when drawing triangles/lines/points

This is where i upload my notes on leetcode

Longest Repeating Character Replacement


class Solution(object):
    def characterReplacement(self, s, k):
        """
        :type s: str
        :type k: int
        :rtype: int
        """
        count = {}
        res = 0

        l = 0
        for r in range(len(s)):
            count[s[r]] = 1 + count.get(s[r] , 0)

            while (r - l + 1) - max(count.values()) > k:
                count[s[l]] -= 1
                l += 1
            res = max(res, r-l+1)
        return res

Longest Palindromic Substring

description


class Solution(object):
    def longestPalindrome(self, s):
        """
        :type s: str
        :rtype: str
        """
        res = ""
        resLen = 0
        
        for i in range(len(s)):
            l, r = i, i
            while l >= 0 and r < len(s) and s[l] == s[r]:
                if (r - l + 1) > resLen:
                    res = s[l:r+1]
                    resLen = r - l + 1
                l -= 1
                r += 1

            l, r = i, i + 1
            while l >= 0 and r < len(s) and s[l] == s[r]:
                if (r - l + 1) > resLen:
                    res = s[l:r+1]
                    resLen = r - l + 1
                l -= 1
                r += 1

        return res

Two Sum

description easy

accepted(one bang)


class Solution(object):
    def twoSum(self, nums, target):
        """
        :type nums: List[int]
        :type target: int
        :rtype: List[int]
        """
        for i in nums:
            we_want = target - i
            temp = nums[nums.index(i)+1:]
            if we_want in temp:
                index = temp.index(we_want) + nums.index(i)+1
                return [nums.index(i), index]

faster solution using hashmap:


class Solution(object):
    def twoSum(self, nums, target):
        """
        :type nums: List[int]
        :type target: int
        :rtype: List[int]
        """
        map = {}
        for i,n in enumerate(nums):
            diff = target - n
            if diff in map:
                return(map[diff], i)
            map[n] = i
        return

like what this image protrayed, we keep a record of value:index in a hashmap. we iterate through the array, and once we find the second value, we are garrenteed to find the solution to the problem.

Find Minimum in Rotated Sorted Array

description

my solutions


class Solution:
    def findMin(self, nums: List[int]) -> int:
        res = min(nums)
        return res


class Solution:
    def findMin(self, nums: List[int]) -> int:
        res = float("inf")
        for i in range(len(nums)):
            if nums[0] < res:
                res = nums[0]
            temp = nums[-1]
            if temp > nums[0]:
                return nums[0]
            nums = nums[0:len(nums)-1]
            nums.insert(0, temp)
            # print(nums)
        return int(res)

binary search solution


class Solution:
    def findMin(self, nums: List[int]) -> int:
        res = nums[0]
        l, r = 0, len(nums) - 1

        while l <= r:
            if nums[l] < nums[r]:
                res = min(res, nums[l])
            
            m = r - l // 2
            res = min(res, nums[m])
            if nums[m] >= nums[l]:
                l = m + 1
            else:
                r = m - 1
        return res

Contain Duplicate

easy desciption accept solution:


class Solution(object):
    def containsDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: bool
        """
        unique = set()
        for i in nums:
            if i in unique:
                return True
            unique.add(i)
        return False

failed solution:


class Solution(object):
    def containsDuplicate(self, nums):
        """
        :type nums: List[int]
        :rtype: bool
        """
        unique = []
        for item in nums:
            if item not in unique:
                unique.append(item)
        
        if len(unique) < len(nums):
            return True
        else:
            return False

why? set is faster than list, it can find thing pin point, without search.

the


if len(unique) < len(nums):
            return True
        else:
            return False

here did another search, slowed down the program a lot.

set and list are both data structures in Python, but they have some key differences:

Duplicates: set automatically removes duplicates, while list allows duplicates.
Order: list preserves the order of elements, while set does not guarantee the order of elements.
Indexing: list supports indexing, while set does not.
Mutability: Both list and set are mutable (can be changed), but the methods available to change each are different.
Membership testing: Both list and set allow you to check if an item is in the collection, but sets provide a faster way of checking for membership due to the hash table implementation.

In general, you would use a list when you need to preserve order and potentially have duplicates, while you would use a set when you want to eliminate duplicates and do not care about the order of elements.

K largest element in an array

description


class Solution(object):
    def findKthLargest(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: int
        """

        k = len(nums) - k
        def quickSelect(l, r):
            #choose splitter
            pivot, p = nums[r], l
            # partition numbers
            for i in range(l, r):
                if nums[i] <= pivot:
                    nums[p], nums[i] = nums[i], nums[p]
                    p += 1
            nums[p], nums[r] = nums[r], nums[p]
            # choose which side to do the recurrsion
            if p > k:
                return quickSelect(l, p-1)
            elif p < k:
                return quickSelect(p + 1, r)
            else:
                return nums[p]

        return quickSelect(0, len(nums) - 1)

Longest Substring Without Repeating Characters

description

accepted answer:


class Solution(object):
    def lengthOfLongestSubstring(self, s):
        """
        :type s: str
        :rtype: int
        """
        charSet = set()
        l = 0
        res = 0

        for r in range(len(s)):
            while s[r] in charSet:
                charSet.remove(s[l])
                l += 1
            charSet.add(s[r])
            res = max(res, r - l + 1)
        return res

Best Time to Buy and Sell Stock

description

accepted answer:


class Solution(object):
    def maxProfit(self, prices):
        """
        :type prices: List[int]
        :rtype: int
        """
        l, r = 0, 1
        res = 0
        while r < len(prices):
            if prices[l] < prices[r]:
                profit = prices[r] - prices[l]
                res = max(res, profit)
            else:
                l = r
            r += 1
        return res

Longest Consecutive Sequence

description

accept answer(fifth try)


class Solution(object):
    def longestConsecutive(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        if len(nums) != 0:
            count = 1
        else:
            count = 0
            return count
        nums = sorted(nums)
        print(nums)
        temp = 1
        for i in range(len(nums)):
            print("temp", temp)
            if i != len(nums)-1:
                first = nums[i]
                second = nums[i+1]
                print(first, second)
                
                if second - first == 1:
                    temp += 1
                    if temp > count:
                        count = temp
                elif second - first == 0:
                    continue
                else:
                    print("here")
                    temp = 1
           
        return count

A good answer:


class Solution(object):
    def longestConsecutive(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        numSet = set(nums)
        longest = 0

        for n in nums:
            if (n - 1) not in numSet:
                length = 0
                while (n + length) in numSet:
                    length += 1
                longest = max(length, longest)
        return longest

Valid Palindrome

description

accepted answer


class Solution(object):
    def isPalindrome(self, s):
        """
        :type s: str
        :rtype: bool
        """
        # process string
        s = s.lower()
        s= str(s)
        s = ''.join(filter(str.isalnum, s))
        print()
        # check
        new_s = ""
        for i in range(len(s)-1, -1, -1):
            new_s += s[i]
        if new_s == s:
            return True
        else:
            return False

another solution with better run time:


class Solution(object):
    def alnum(self, c):
        return (ord('A') <= ord(c) <= ord('Z') or ord('a') <= ord(c) <= ord('z') or ord('0') <= ord(c) <= ord('9'))

    def isPalindrome(self, s):
        """
        :type s: str
        :rtype: bool
        """
        l, r = 0, len(s) - 1
        while l < r:
            while l < r and not self.alnum(s[l]):
                l += 1
            while r > l and not self.alnum(s[r]):
                r -= 1
            if s[l].lower() != s[r].lower():
                return False
            l, r = l + 1, r - 1
        return True

Valid Anagram

easy

description accepted(one bang):


class Solution(object):
    def isAnagram(self, s, t):
        """
        :type s: str
        :type t: str
        :rtype: bool
        """
        if len(s) != len(t):
            return False
        letter = {}
        letter2 = {}
        for i in s:
            if i not in letter:
                letter[i] = 1
            else:
                letter[i] += 1
        for i in t:
            if i not in letter2:
                letter2[i] = 1
            else:
                letter2[i] += 1
        if letter != letter2:
            return False
        else:
            return True

3 sum

description
accepted answer


class Solution(object):
    def threeSum(self, nums):
        """
        :type nums: List[int]
        :rtype: List[List[int]]
        """
        res = []
        nums.sort()
        for i, a in enumerate(nums):
            # we dont want duplicates
            if i > 0 and a == nums[i - 1]:
                continue

            l, r = i+1, len(nums) - 1 
            while l < r:
                threeSum = a + nums[l] + nums[r]
                # too big, go left to go small
                if threeSum > 0:
                    r -= 1
                elif threeSum < 0:
                # too small, go right to go small
                    l += 1
                else:
                    # add
                    res.append([a, nums[l], nums[r]])
                    # go left to go to another iteration
                    l += 1
                    # go left if duplicate
                    while nums[l] == nums[l - 1] and l < r:
                        l += 1
        return res

medium

description

submitted(failed due to exceed runtime):


class Solution(object):
    def isAnagram(self, s, t):
        if len(s) != len(t):
            return False
        letter = {}
        letter2 = {}
        for i in s:
            if i not in letter:
                letter[i] = 1
            else:
                letter[i] += 1
        for i in t:
            if i not in letter2:
                letter2[i] = 1
            else:
                letter2[i] += 1

        if letter != letter2:
            return False
        else:
            return True
    def groupAnagrams(self, strs):
        """
        :type strs: List[str]
        :rtype: List[List[str]]
        """

        output = []
        searched = []
        for i in strs:
            i_index = strs.index(i)
            temp_strs = strs[i_index+1:]
            if i in searched:
                continue
            tempList = []
            tempList.append(i)
            for j in temp_strs:
                if self.isAnagram(i, j) == True:
                    tempList.append(j)
                    if j not in searched:
                        searched.append(j)
            print(tempList)
            output.append(tempList)
        return output

accepted answer:


class Solution(object):
    def groupAnagrams(self, strs):
        """
        :type strs: List[str]
        :rtype: List[List[str]]
        """
        res = defaultdict(list)
        for s in strs:
            count = [0] * 26
            for c in s:
                count[ord(c) - ord("a")] += 1
            res[tuple(count)].append(s)
        return res.values()

Explaination: This basically means that the program is counting the letter count of the a word, and turn it into a key to the dictionary. Then it will use this key compare it against the key generated based on the next word. the keys match, the program put tha word in list, and it output it in the end. The dictionary will look like:


[key]:[list of words with same key]
...

This has a run time of O(number of string * number of chatacter)


count[ord(c) - ord("a")] += 1

This line takes the ascii number of the current chatacter, and minus it with the ascii number of "a", which is the equivelent of getting the index number of this chatacter in relation to the alphabet.

Valid Parentheses

description


class Solution(object):
    def isValid(self, s):
        """
        :type s: str
        :rtype: bool
        """
        stack = []
        closeToOpen = {")": "(", "]": "[", "}": "{"}
        for c in s:
            if c in closeToOpen:
                if stack and stack[-1] == closeToOpen[c]:
                    stack.pop()
                else:
                    return False
            else:
                stack.append(c)
        return True if not stack else False

Max point on a line

description


class Solution(object):
    def maxPoints(self, points):
        """
        :type points: List[List[int]]
        :rtype: int
        """

        n = len(points)
        if n == 1:
            return 1
        result = 2
        for i in range(n):
            count = collections.defaultdict(int)
            for j in range(n):
                if j != i: 
                    count[math.atan2(points[j][1] - points[i][1], points[j][0] - points[i][0])] += 1
            result = max(result, max(count.values()) + 1)
        return result

Container with most water

description

accepted answer:


class Solution(object):
    def maxArea(self, height):
        """
        :type height: List[int]
        :rtype: int
        """
        res = 0
        l, r = 0, len(height)-1

        while l < r:
            area = (r - l) * min(height[l], height[r])
            res = max(res, area)
            if height[l] < height[r]:
                l += 1
            elif height[r] < height[l]:
                r -= 1
            else:
                l += 1
        return res

Top K Frequent Elements

description

Accepted answer(third try):


class Solution(object):
    def topKFrequent(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[int]
        """

        dic = {}
        for i in nums:
            dic[i] = 0
        for i in nums:
            dic[i] += 1
        sorted_dic = sorted(dic.items(), key=lambda x: x[1], reverse=True)
        output = []
        for i in range(k):
            output.append(sorted_dic[i][0])
        return output

A answer that suppostly have a better run time of O(n), but actually slower than my code


class Solution(object):
    def topKFrequent(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[int]
        """
        count = {}
        freq = [[] for i in range(len(nums) + 1)]

        for n in nums:
            count[n] = 1 + count.get(n, 0) #if no n in count, value set to 0, if there is n, + 1 to the old value
        for n, c in count.items():
            freq[c].append(n) # a list where the index is the number of occurence, and the sublist is the number who have that number of occurrence.
        output = []
        for i in range(len(freq) - 1, 0, -1): # going in decensing order from lenth-1 to 0
            for n in freq[i]:
                output.append(n)
                if len(output) == k:
                    return output

This is visual representation:

Task Scheduler

description


class Solution:
    def leastInterval(self, tasks: List[str], n: int) -> int:
        count = Counter(tasks)
        maxHeap = [-cnt for cnt in count.values()]
        heapq.heapify(maxHeap)

        time = 0
        q = deque()

        while maxHeap or q:
            time += 1
            if maxHeap:
                cnt = 1 + heapq.heappop(maxHeap) # processing a case, so you decrease count by 1. In this case adding is subtracting, becuase we are treating a minheap as max heap
                if cnt:
                    q.append([cnt, time + n]) 
            if q and q[0][1] == time:
                heapq.heappush(maxHeap, q.popleft()[0])
        return time

Count Vowel Substrings of a String

Description


class Solution
{
    public int countVowelSubstrings(String word)
    {
        int vow = 0;
        int n = word.length();
        Set<Character> set = new HashSet<>();
        for(int i = 0; i < n-4; i++)
        {
            set.clear();
            for(int j = i; j < n; j++)
            {
                char ch = word.charAt(j);
                if(ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u')
                {
                    set.add(ch);
                    if(set.size() == 5){
                        vow++;
                    }
                        
                }else{
                    break;
                }
                    
            }
        }
        return vow;
    }
}

Maximum subarray

description


class Solution(object):
    def maxSubArray(self, nums):
        """
        :type nums: List[int]
        :rtype: int
        """
        maxArray = nums[0]
        currArray = 0

        for n in nums:
            if currArray < 0:
                currArray = 0
            currArray += n
            maxArray = max(maxArray, currArray)
        return maxArray

Product of Array Except Self

Medium attempt 1(exceed time limit)


class Solution(object):
    def productExceptSelf(self, nums):
        """
        :type nums: List[int]
        :rtype: List[int]
        """
        dic  = {}
        for i in nums:
            temp = nums[:]
            temp.remove(i)
            dic[i] = temp
        print(dic)
        output=[]
        for i in nums:
            start=1
            for j in dic[i]:
                start = start * j
            output.append(start)
        return output

accpeted


class Solution(object):
    def productExceptSelf(self, nums):
        """
        :type nums: List[int]
        :rtype: List[int]
        """
        output = [1]  * (len(nums))
        prefix = 1
        for i in range(len(nums)):
            output[i] = prefix
            prefix *= nums[i]
        postfix = 1
        for i in range(len(nums) - 1, -1, -1):
            output[i] *= postfix
            postfix *= nums[i]
        return output

This bascially means that we times every number before the current number and every number after the current number together.


Create a list name output and inilizae them with 1
create variable prefix = 1
for every index in nums
    output[index] = prefix
    prefix times itself with nums[index]
create variable postfix =1
for every index in nums backwards:
    output[index] times itself with postfix
    postfix times itself with nums[index]
return output list

List of errors I've encountered

2023-03-25

During this spring break i've decided to learn rust, and what best way to do it other than writing small projects. One of which is a small defender game i want to write following this tutorial.

However, I encounter an error, which contains the following:

Bug

thread 'main' panicked at 'Failed to initialize any backend! Wayland status: XdgRuntimeDirNotSet X11 status: XOpenDisplayFailed'

This turns out to be a problem with the windows linux subsytem's problem. So in order to counter this problem, i found out that there is this github repo , wslg, that is dedicated to this problem.

This also brings out another problem, which relates to the wsl version that i was using. During previous usage, especially during web development, wsl2 showed a really slowed performance. So this whole time I was using wsl1. However, wslg requires wsl2 in order to function, so that means I'll have to find a counter to that.

Fortunatly, i did find a possible solution.

Trouble shooting Armoury crate Intergrated and deciated GPU switch

2022-06-29

I’ve been having trouble switching between my intergrated dedicated GPU using Armoury crate. There is a process called “textinputhost.exe” that cannot be terminated, and therefore blocking Armoury crate from executing the switch.

I’ve tried solution proposed on this post, and it worked the last time, but this time it did not solve the problem.

I then unpluged my computer, force shutdown my computer instead of using the default reboot (At this point the touch keyboard service is shutdown) and restarted it after about two minutes. Immediatly after the restart I’ve opened Amourary crate and switched to the intergated GPU mode, and it worked.

When I restarted Touch keyboard serivice and restarted the computer, the condition still holds. However, when i plug in the computer to power, the textinputhost.exe poped back on.

I noticed that in task manager, a TabletInputServices is using the TouchKeyboard services. So I went into setting and disabled it.

That did not work.

I then went and goto setting and forced textinputhost.ext to use intergrated GPU： but it did not work.

I finally went to desktop, rightclick went to Navida’s graphic setting, and turned the perfered GPU to intergratd GPU. And it worked.

This article also helped.

#error #javascript #firebase The error means that it reconise the ref object, but does not reconise the method .on after the ref.

code caused this problme


 blog.ref('blogs').on('value', (snapshot) => {
        const data = snapshot.val();
        for (let key in data) {
            blogData.push(data[key]);
        }
   });

code that fixed the problem


let blogData_firebase = get(blogRef).then((snapshot) => {
        let blogData_temp = snapshot.val();
        return blogData_temp;
    }).catch((error) => {
        console.error(error);
    });
    blogData_firebase.then((value) => {
        setBlogData(value);
    })

better code


useEffect(() => {
        const db = getDatabase();
        let blogRef = ref(db, 'blogs/blogs');
        const offFunction = onValue(blogRef, (snapshot) => {
            const valueObj = snapshot.val();
            const objKeys = Object.keys(valueObj);
            const objArray = objKeys.map((keystring) => {
                const blogObj = valueObj[keystring];
                blogObj.key = keystring;
                return blogObj;
            });
            setBlogData(objArray);
        });
        function cleanup(){
            offFunction();
        }
        return cleanup;
    }, []);

Cool tech blogs

This is the collection of some interesting tech blogs i found:

uwe-arzt.de
tw93
table.dog
evan.gg
web.dev
youngforest
nvbn
wattenberger
taniarascia
strml
acko
lynnandtonic
jvns
matthew.science
mrmr
fourmilab
hacker news outliers
danluu
jeffhuang -> his's a fellow UW student! (PHD in information science)
mary.codes
ciechanow

Applying ML concept on my own data

2024-02-08

So I just got off class and want to fiddle around with the concept i just learned: bag of words, term document matrix, context vector and word embedding.

So I trained a model using the text from here. Here is the code.


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


import nltk


d_Making_it_some_what_dynamic = """
Making it some what dynamic..."""
d_Making_it_properly_dynamic = """
Making it properly dynamic..."""
d_detection_script_debug  = """
What's new detection script debug and fix..."""
d_Website_server_migration= """
Website server migration..."""
d_Creating_mdbook_summary_maker = """
Creating mdbook-summary-maker - a summary.md automation tool..."""
d_creating_this_website = r"""
Creating this website..."
""


documents = [d_Creating_mdbook_summary_maker, d_creating_this_website, d_detection_script_debug, d_Making_it_properly_dynamic, d_Making_it_some_what_dynamic, d_Website_server_migration]


vrizer = CountVectorizer(stop_words="english")
vrizer.fit(documents)
X = vrizer.transform(documents)
print(X.shape)
dtm = pd.DataFrame(X.toarray(),
        	columns=vrizer.get_feature_names_out())
dtm


(6, 909)

	00	02	05	07	10	12	16	17	200	2020	...	wraped	write	writing	written	www	wwwroot	xxxx	year	yes	your_website_url
0	0	0	1	0	0	1	0	0	0	0	...	0	3	2	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	...	0	2	0	0	0	0	0	0	0	0
2	1	3	0	3	2	0	1	1	0	0	...	0	1	0	0	0	0	0	0	0	0
3	0	0	1	0	0	1	0	0	1	0	...	1	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	...	0	0	0	1	0	0	0	0	0	0
5	0	0	0	1	0	1	0	0	0	1	...	0	0	0	0	1	1	1	1	1	1

6 rows × 909 columns


cosine_sim_matrix_for_antzed = cosine_similarity(dtm)
cosine_sim_matrix_for_antzed.shape 
document_names = [
    "d_Creating_mdbook_summary_maker",
    "d_creating_this_website",
    "d_detection_script_debug",
    "d_Making_it_properly_dynamic",
    "d_Making_it_some_what_dynamic",
    "d_Website_server_migration"
]


# Initialize variables to store the maximum similarity and the document indices
max_similarity = 0
doc_index_1 = 0
doc_index_2 = 0

# Iterate over the matrix to find the pair with the highest similarity
for i in range(6):
    for j in range(i+1, 6):  # Ensure no self-comparison
        if cosine_sim_matrix_for_antzed[i, j] > max_similarity:
            max_similarity = cosine_sim_matrix_for_antzed[i, j]
            doc_index_1 = i
            doc_index_2 = j

# Print the document names with the highest similarity and their similarity score
print(f"The documents with the highest similarity are \"{document_names[doc_index_1]}\" and \"{document_names[doc_index_2]}\" with a similarity score of {max_similarity:.4f}.")


The documents with the highest similarity are "d_detection_script_debug" and "d_Making_it_some_what_dynamic" with a similarity score of 0.4253.


import gensim


import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from gensim.utils import simple_preprocess
from nltk.corpus import stopwords

nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('stopwords')

stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()


[nltk_data] Downloading package wordnet to /home/jovyan/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /home/jovyan/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


from gensim.utils import simple_preprocess
from gensim.parsing.preprocessing import preprocess_string

def get_wordnet_pos(word):
    """Map NLTK's part of speech tags to wordnet's"""
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}
    return tag_dict.get(tag, wordnet.NOUN)


def preprocess_text(text):
    tokens = simple_preprocess(text, deacc=True)  # Tokenize into words
    tokens_no_stops = [token for token in tokens if token not in stop_words]
    lemmatized_tokens = [lemmatizer.lemmatize(token, get_wordnet_pos(token)) for token in tokens_no_stops]
    return lemmatized_tokens

# Tokenize and preprocess all documents
processed_sentences = [preprocess_text(doc) for doc in documents]

# Flatten the list if your model expects a list of sentences as input
flat_list_of_words = [word for sublist in processed_sentences for word in sublist]


print(flat_list_of_words[:5])


['create', 'mdbook', 'summary', 'maker', 'summary']


from gensim.models import Word2Vec

# Now that we have our sentences preprocessed and lemmatized, we train the model
model = Word2Vec([flat_list_of_words], vector_size=100, window=5, min_count=1, workers=4)

# Save the model for later use
model.save("lemmatized_word2vec_model.model")


model.wv.most_similar('rust', topn=10)


[('ssh', 0.30321794748306274),
 ('target', 0.28839021921157837),
 ('enviroment', 0.28243470191955566),
 ('examine', 0.2718653678894043),
 ('succeful', 0.26447442173957825),
 ('touppercase', 0.2616710662841797),
 ('assist', 0.2575630247592926),
 ('true', 0.25681009888648987),
 ('familiar', 0.25594884157180786),
 ('doen', 0.2559070289134979)]


model.wv.most_similar('dynamic', topn=10)


[('top', 0.25962546467781067),
 ('variable', 0.2449072003364563),
 ('semi', 0.23804989457130432),
 ('later', 0.23783016204833984),
 ('could', 0.23667342960834503),
 ('lastindexof', 0.22947242856025696),
 ('indeed', 0.22467540204524994),
 ('easy', 0.22087810933589935),
 ('detection', 0.21729730069637299),
 ('directory', 0.21623274683952332)]


model.wv.most_similar('website', topn=10)


[('avaiable', 0.3144473731517792),
 ('fail', 0.3111109435558319),
 ('probably', 0.3086855113506317),
 ('wm', 0.306361585855484),
 ('host', 0.28776368498802185),
 ('save', 0.2712177634239197),
 ('backup', 0.26919832825660706),
 ('code', 0.2683839201927185),
 ('folder', 0.2653118073940277),
 ('etc', 0.2603006362915039)]


model.wv.most_similar('antzed', topn=10)


[('in_file_path', 0.3186494708061218),
 ('template', 0.30744123458862305),
 ('relationship', 0.29583120346069336),
 ('port', 0.295200377702713),
 ('filename', 0.2635174095630646),
 ('allows', 0.24324464797973633),
 ('preprocesor', 0.2392539530992508),
 ('item', 0.2337876856327057),
 ('treat', 0.22798201441764832),
 ('malfunction', 0.22339922189712524)]


model.wv.most_similar('problem', topn=10)


[('empty', 0.25634294748306274),
 ('us', 0.24955253303050995),
 ('template', 0.2491423785686493),
 ('junk', 0.24703580141067505),
 ('filter', 0.2332962602376938),
 ('bash', 0.23113256692886353),
 ('enter', 0.22585271298885345),
 ('sign', 0.2233218401670456),
 ('node', 0.21544311940670013),
 ('website', 0.21240629255771637)]

We can see that the detection script debugging post has the highest similarity to making it somewhat dynamic post, which was surprising.

We can also see that some of the words, such as 'rust' is used along side 'ssh'. Although it seems like there aren't a lot of close context between words in my blogs.

Financial blog

quantitative-trading-book-note

Quantitative trading

This is the note on all the things i learned while reading the book Quantitative Trading by Ernest Chan

What is Quantitative trading

Trading of stock through algorithmic strategy automated by programming.

Basic

how to find the right strategy for me

work hours

automation can turn a seemingly full time endeavour to part time.

programming skill

need some programming expirence to trade in high frequency. Otherwise stick to once a day trade.

Trading capital

Preferabily 50,000+, 100,000+ is consider high capital Capital also determines how you are going to trade.

def

Information Ratio =

Average of excess returns
/
Standard Deviation of Excess Returns

def

Excess Returns =

Portfolio returns

Benchmark Returns

Benchmark returns are market index of security you are trading belong

This is the Opinion section

...and this is where I share my opinions on things.

2023-04-10

I am but two things: a dedicated student who wants to read everything he can get his hands on, and a social media junkie.

This post will specify how I used my newly gained interests in self-hosting to help me read more as a social media addict who can't stop scrolling on his phone.

The idea

Scrolling. Scrolling is the key to get me off from social media to reading books. Since social media have made me develop the habit of scrolling non-stop for hours, a corresponding system where books that can be read though non-stop scolling can be used to displace social media. It's like nicotine addict using vaping as a subsititue for cigarettes.

The system

The main component of this system uses Kavita, an open source library service.

On top of that you also need a place to get books. My favorite is libgen, and a tool to input the books.

Now depending on where you hosts you Kavita, the method of putting in book into your virtual library will be different. Since i wanted to avoid the hassel of renting a server of my own, I used pikapods, a great platform for hosting a variety of open source services.

If you hosting it on your own machine, you will need to download the installer from their website, and you should be able to access a version of Kavita on your local machine on http://localhost:5000.

If you decided to use pikapods, you can first try it out by register a free account. They'll gift you a 5 dollars coupon upon new registration. Just search Kavita in this page and click the run your own button, then follow the instructions to create a new pod. The defualt server configuration should be enough Kavita's need.

Pikapods also allows sftp file tranfer. So get to a terminal on you machine or download a ftp app on you phone to login to the server. You identification should be specified in the settings of the server under Files

Epub, never pdf

This is a key point in this system. Always prioritize Epub over pdf. Becuase Epub allows the text to be formated to one chapter per page. So you can scroll as much as you want when reading the book. Pdf will have too many pages. Too many pages means too many page breaks, which means too many distractions.

PDFs are also large, which can be expensive to store. Epub often comes with smaller sizes

The flow

Get hold of a name of the book
Search on sites like Libgen, and download the book (perferably epub, if not pdf)
upload to server through various means
Scan library through Kavita and get the new books

Some things to note

Here are the file structure guides and naming convension

Today I learned..

2023/05/25

Today i learned that if you put a space in front of a command it will not be recorded in shell history(for most shell that is)

Today I learned: 2023-1-8

..that you can tell if a egg has gone bad or not by placing it in a bowl of cold tap water. If the egg sinks to the bottom and lay on its sides, it's good. If it float in any shape form, it's bad(becuase there is air forming in it)

The strange tale behind my phone number

2023-05-04

The strange tale behind my phone number

The start

I got my phone number back in boston along with my dad at an AT&T store. It was during a summer travel to boston for vacation and visiting colleges. We went to this small AT&T store with lots of round table and got myself with a phone number and a data plan.

What's going on with my phone number?

Since last year, I've been consantly receiving text from various people asking to sell my "house". Of course, it is not me who have the house, but a fellow named John. I always ignored the texts, until they become somewhat unbearable. So I decided to look into it for some fun.

Findings so far

Warning

For the sake of privacy, things like last names or spcific address will not be mentioned.

The first thing to look into is the house:

After a few quick searches, I discovered that the house is a 3 bedrooms, 3 bathrooms, 2300 sqft condo that was built in 1910. The house was sold 3 times, one in 1987, one in 1989 and one in 2001.

Also according to some searches, the house is undergoing the process of foreclosure.

Interestingly, it was planned to appear on an online auction recently, but was cancelled. I wonder if that has anything to do with me not answering their text.

The second is the person, John:

Apprently, he, along with Madeline(I'm assuming that's his wife) owns that condo. They are both in thier 50s, and is currenty live in some other place in Boston. They have three company under their name with a focus on servicing limos.

Some other lore

My phone number also have been used by some chinese docter names Docter Zhou, so far, not a lot of text have been sent to my way, so i'm going to come back later if there's more coming in.

A curated list of my writings

Updated on 2024-02-18:

MHW武器配装与各种龙打法经验

2020-07-16
这里记录一下我玩MHW的各种经验。目前我我有大概100小时的游玩时间，主线全部通关，MR跟HR都在50级左右。现在主玩锤子，不过以后可能会更深入的探索轻弩。

锤子

锤子配装

我目前还没有一套完整的锤子配装，经过与大佬的多番讨论，我收到了以下建议：
锤子配装1:

锤子配装2:

这两个配装都有从猛爆碎龙打出来的铁腕头盔贝塔以及铁腕铠甲贝塔。而两者分歧的地方在于腕甲，腰甲跟护腿。

配装1走的路线是煌黑龙三件套，但是由于我的锤子是冥赤龙的爆破锤，没有什么属性，打煌黑龙极其费劲，所以目前这套装备无法实现。

而配装2走的路线是大师灭尽龙三件套。这个配装我目前也无法实现。但是这主要是因为大师灭尽龙需要在MR100的时候才能打。相比较于配装1，实现配装2更为容易。

而两个配装在珠子上都是差不多的。主要需要堆满的就是攻击，看破跟挑战者属性，而耳塞属性则用耳塞护石等道具来提高。

锤子打法

我是用手柄来打的，所以PC玩家需要找到相对应的按键。

锤子作为一个相对简单的武器不需要操作并不是很难，其主要目标应该是龙的脑袋。一个常用连招为：


RT + B 蓄力三下站桩打击三下（跑到龙面前然后不动 + 松开左摇杆 + 松开RT）+ LT钩爪抓住龙的脑袋 + Y

这一套连招不仅能够打出伤害，同时还能软化龙皮以及增加龙的眩晕值。

而同时带上挑打击的连招


Y + Y + Y + Y + Y


RT + B 蓄力三下旋转打击五下（跑到龙面前不松左摇杆 + 松开RT） + Y


RT + B 蓄力两下

都可以和很好的从龙的下面打到龙的下巴，从而增加眩晕值。

Yearly summaries

2023-2024 yearly summary

2024-06-16

Hi, today I turn 22! Many things happened this year, and I think this is one of the most important year of my life so far.

My graduation ceremony, by my mom

School and work

First of all, I finally graduate from college! It was an interesting experience. Covid has cut my in-person college short by one year, so it feel rather quick that all of a sudden I'm exiting out of college.

It is also the first time that I've experience a LARGE graduation ceremony. There were 7000 graduates students this year, and they spend almost 30 minutes on giving certificates to the phd students alone.

However, graduating does not mean that my academic journey is over, because for the next 1-2 years I will studying for a master of computer science degree at Northwestern University. I will be going from one purple school to another.

Looking back at the past year, I'd say that my senior year is when i experienced the most growth. I wanted to be able to eliminated the things I don't know I don't know. so I started attending colloquiums and seminars and learn about academic research on topic such as robotics, AI, international policies and marine biology.

I also completed my graduating capstone project. It was on the topics of AR/AI in healthcare and it has taught me many valuable lessons in regards to teamwork. The capstone ended okayish. I guess the biggest lesson that i've learned from this project is to know when to have meetings and when not to. As a developer, meetings should be alloted of co-programming and solving bugs. As a manager, meetings should be made to coordinating, future planning and set goals. An inefficient use of meeting would be to repeatedly do the sale pitch to the people who's already on board on the idea. Wasting the time of your workers ia good way of making them leave.

Finally, some interesting courses i've taken is

INFO 371 advance data science methods with ott
INFO 443 Software architecture with Alexander Bell-Towne
INFO 351 Information ethics and policy Katherine Cross
CLAS 101 Current use of greek and latin in English
CLAS 205 bioscientific vocabulary building from latin and greek
ACCTG 205 Essentials in accounting

Life and relationship

My relationship with Alieen is still going strong. Obviously my graduation add a lot of complications to the equation, but no biggie. We will strive through.

I also got my first car. A used Toyota Venza. Since I just got it, I haven't named it. However, I'm sure it will be my loyal war-rig for the next phase of my life.

Personal achievement

This year wasn't the most technical year for me due to completing graduation requirement and grad school application. However, I managed to

building and testing simple muti-layer perceptron and CNN using tensorflow.
Learn about software design patterns. Analyze and refractor my previous website codebase.
Build uni, my personal automation cli using rust.

Beyond the technical, I've also managed learn to use a swath personal management tool and techniques. This proofs to be a wonderful improvement to life. So much so that I've managed to be on the dean's list all through out this school year.

Goal for next year

Successfully find internship in either seattle or California.
Take courses in distributed systems and deep learning at a gradate level.
Decided which track I will need to pursue to finish my graduate degree at Northwestern. Be it the project track, or the research paper track.
Continue my relationship with Aileen
Keep swimming and exercising
Each health, drink more water and sleep no later than 12am.

So thats about it. See you next year!

2022-2023 yearly summary

2023-06-16

Hi, we meet again. Today I turned 21, so it is time for another yearly summary. What this also means is that i can finally procure alcohol leagally. So i'm currently writing this summary after downing two-third of a bottle of ice wine.

Mt Rainier Mount Rainier, by me

What have i acheived

School & work

I've finally got into the informatics major
I've taken several coding classes and CS class that covers topic like full stack development, AI, algorithms, and data science.
I've gotten myself a summer intership here at seattle.

Relationship

My relationship with Aileen is still going and its going quite well
I've made several new friend from groups projects in class and other activites such as mario kart competition.

Technical achivements

I've learned how to do full stack development.
I've got in touch with all of the basic CS algorithm that will pop up in SWE interviews
I've largely automized antzed.com, this includes writing two preprocessors for the website in Rust.
I've learned the basics concept of AI
I've learned the basics of data science using python
I've learned and incorperated tools like ChatGPT into daily coding.
I've established personal twitter using memos and personal library using kavita
I've lightly contributed to serveral repository on github.

Goals for next year

apply and get into a good grad school
reading more on the various of other programming jobs besides SWE, such ML, Quant and game engine dev.
continue my relationship with Aileen
Drive around more
Write more on various things now that the process is heavely automized.
exercise more and drink more water
Find my passion

Last thoughts

This year is an interesting year, because I finally got used to the live in the US. However, my current lifestyle is only kept alive by my parents' "generous donation, and will probably crumble after i graduate, so we will see what will happens.

I've also gotten used to the idea of working, and finally started to enjoying the process of bettering myself. My cravings for laziness have gradually diminished, and instead is replaced by my curiosity, which is good.

I hope I will become a man that will do great things.

See you next year!

2021 – 2022 yearly summary

2022-06-15

Tomorrow is my birthday so I want to do a summary of the past year. I moved to Seattle last year during September to study at the UW, and this was the first time that I’ve left my home and my parents for more than 3 months. Unfortunately, Covid took a year of college experience away from me, but after I finally got here, the experience was better than I’d expected.

The Drumheller fountain, by me.

What have i achieved

Important things I’ve Achived

I’ve taken and finished many cources during the three quater of school
I’ve got myself into a relationship
I’ve made new friends in college
I’ve managed to independently survive longer than 3 month

Technical achivement

I’ve learned to program in SQL and create two database as class pro ject
I’ve learned to program in Go, and successfully created a CLI.
I’ve dapped into bash scripts, ruby and lua in attempt to create the CLI before using Go
Planned and run Husky expo
Brought back this blog website

Places I’ve visited

Seattle (Space needle and its surroundings)
Leavenworth
Snoqualamie

Notable Film and shows watched

Arcane season 1
Station eleven season 1
Love Death and Robot season 3
Demon Slayer season 2
Made in abyss season 1

Challanges

The most difficult part of this year was my adaptation to college life. I thought at the beginning of the year that I was fully capable of handling myself on my own, yet the actual experience was not as smooth. Burnouts were common throughout the year, but I believe through practice I can better manage my time and energy.

It was also quite difficult to find energy to explore beyond my academic life, which I believe is caused by both the loss of comfort(of being at home) and my less than optimal sleeping and exercising routine.

What to look forward to next year

More technical work done(project, interns.)
More blog posted(target: 10 articles)
exercise regularly(at least two to three times a week)
Have regular sleeping sechedule.

So, see you in 2023

Personal management

A fight with todo list

2023-02-15

As a student, I'm always in a sea of tasks, and through out the years i've tried many method of managing them. Unfortunatly, none of them is perfect. Some of them worked for a quater or two, but most failed. I often find myself unmotivated to continue to create, view and check off tasks after a certain "hyped" period.

In my opinion, there are two main reasons why a task managing system will fail:

It is too hard to interactive with
It has no priority management

Lets talk about each point:

point 1

It it too hard to interactive with

During a previous iteration of my task mangaement system, i used miro, and created something like this:

I attempted to create a todo list like a mindmap, where i link tasks to each other and subtasks. The idea being that task are eaiser to manage when it works more closely to the human mind.

After using this method for half of autumn quater 2022, I concluded the following:

The mind map method does help with understanding the nature of my tasks. It helps me to decided if the task i'm looking at is worth my time.
The mind map is really hard to maintain. I cannot quickly enter a task when i need to, I have to find "space" and organise the stick notes on the boards.

point 2

It has no priority management

Currently i've been testing out another methods of mine, which involves Obsidian notebook, its tasks and the DayPlanner plugin, and a todo script.

I sepereated my tasks into three different catagories:

Homework
School
Personal and i keep track of them in a Task.md file. In which i have the following query code:
A due to day section:


(happens today) OR (happens tomorrow)
(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal )
NOT (done)
sort by due

Explanation of this Tasks code block query:

(happens today) OR (happens tomorrow) => OR (At least one of): due, start or scheduled date is on 2023-02-15 (Wednesday 15th February 2023) due, start or scheduled date is on 2023-02-16 (Thursday 16th February 2023)

(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal ) => OR (At least one of): tag includes #Homework tag includes #School tag includes #Personal

NOT (done) => NOT: done

A happend before, happening today and due tomorrow section:


(happens today) OR (due tomorrow) OR (happens before today)
(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal)
NOT (done)
group by tags reverse
sort by due

Explanation of this Tasks code block query:

(happens today) OR (due tomorrow) OR (happens before today) => OR (At least one of): due, start or scheduled date is on 2023-02-15 (Wednesday 15th February 2023) due date is on 2023-02-16 (Thursday 16th February 2023) due, start or scheduled date is before 2023-02-15 (Wednesday 15th February 2023)

(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal) => OR (At least one of): tag includes #Homework tag includes #School tag includes #Personal

NOT (done) => NOT: done

and a all task section


(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal)
sort by due date
group by tags
NOT (done)

Explanation of this Tasks code block query:

(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal) => OR (At least one of): tag includes #Homework tag includes #School tag includes #Personal

NOT (done) => NOT: done

However, after 6 weeks of use, i noticed that this method does not handle priority well. Due to the fact that i've made it so easy to create a task, I often put down tasks that are not well thought off.

For example:

As you can see here, I want to do several reading that are related to programming. But as of the time of writing, none of these task have been done, and they are well pass due date.

The first thing that is wrong with these task are their wording. They are uses vague phrase like "skim through", "read". These makes me feel unmotivated as the intructions are unclear.

Another things is that these readings are important to me, so when i created them I specified deadlines. However, because they do not have the immediate urgency similar to a homework, I often have to place them under tasks that have a Homework or School tag, and then subsequently forget about them.

Plan for improvement

So what is my plan for improvements?

section proritize urgency -> sections prioritize importance(following the Eisenhower Matrix)

I will focus on only managing important tasks and avoid unimportant tasks(especially when they are not urgent)

use general action words -> use specific, achivable objective nouns.

Using general action words makes me not wanting to do the task, so I have to tangible descriptions thatt tells me exactly what is it that i need to do.

Using github project to manage my life - what i've learned.

Originally written in 2023-11-6 , rewritten in 2024-01-31

This post is a continuation of my endeavor to find a good way to manage my life. In this previous blog, I tried to build a text-based markdown style todo system, which somewhat worked for an academic quarter. However, the previous attempt quickly became very hard to use. Obsidian isn't the best tool for management. The best it can do without increasing operational complexity is creating simple lists, which will soon became hard to manage if you have many of such lists.

Note

If you want to know what my current method of management is, go to Iteration 2: The Current Form. If you want to know the process of developing my methods, start with Initiation

Table of content:

Initiation

During my internship this summer,I managed to learn a thing or two from the daily standup. Hiya chose to use Kanban boards, and they seemed quite effective at first glance. So, I decided to try them out for to-do management to see if things would work out.

While consulting with ChatGPT on this topic, I was excited to discover that GitHub itself has a project feature that includes Kanban boards and a whole suite of other project management tools. As an avid user of GitHub, I had to try it out. I spent my autumn quarter of 2023 on iteration 1 and refined my process during iteration 2, in the winter quarter of 2024."

Iteration 1: A rudimentary attempt

The basics

My first attempt involved carving out the basics. Following GPT's suggestion, issues were the obvious choice for the basic unit of tasks. Since GitHub projects are closely knit with repositories, I borrowed the idea I had during Hiya's hackathon to host things (issues) in a dedicated repository. Since I'm not good at naming things, I will just call it:

my management repo

Same with the project:

my managment project
I've also quoted one of my favorite sentence from Krzysztof Kowalczyk's "Lessons learned from 15 years of SumatraPDF, an open source Windows app"

Here is an example of my issue/todo:

example of an issue

Due to some automation reasons later on, I decided to limit the issue description to text only. Markdown to-dos were to be put in the first comment of the issue. Any logging text would then be added in comments, starting with the second comment. This acts like a commit; each time I make some change to this to-do, I log it. The logging ends after I close my issue. I hoped to use the logging as a motivational tool.

On top of these mechanisms, I also repurposed the milestone section as a way of categorizing these issues in the project view. A milestone can be a goal, as originally intended, or it can simply be a category, like a class I'm taking this quarter or a book list I'm going through.

The project interface

In the project interface, issues can have four possible statuses: Backlog, In Progress, On Hold, and Done. An issue starts in the Backlog and then moves back and forth between In Progress and On Hold until it ends up in Done, operating just like a regular Kanban board.

Issues are organized by iteration using the iteration template. Each week is one iteration. The iteration divides up the multitude of tasks I have and provides me with a sense of 'this is what I'm doing this week,' which, in turn, helps me to stay focused on my tasks.

Here's an example of what i see most of the time:

example of an iteration

There are a total of six views in my project: Current Iteration, Current Iteration: Done, Next Iteration, Planning, Road Map, and Task Pile.

When going through my workflow, I first select a category of tasks I'm interested in reviewing, as shown in the previous image. The tasks then flow from left to right. Once I finish a task, I go into the issue and close it. This triggers the default automation of moving the issue to the 'Done' status upon closure.

example of issue in project view

close issue to done workflow logic
the logic of closing issue automation

Similar automation is also applied to issue labeling. For instance, when an issue is edited, a label of last_worked_on will automatically be added, and the previously labeled last_worked_on issue will have its label removed.

last_worked_on label

Another example is the has_todo lable, if a issue have uncheck todos like so

an example todo

a label of "have_todos" will be added.

has_todo label .

These labeling automation is all done in github actions using yml files and various scripts.

Iteration 1 was going well until it encountered a problem similar to the one with the Obsidian method: priority ordering. My tasks were still prioritized based on urgency, and I was left feeling unfulfilled. I constantly felt like I was doing busy work. Moreover, GitHub decided to add timeline items into the GitHub Project's issue view, which practically ruined my workflow. My automation, along with the rule '1st comment for to-do and 2nd+ comments for logging', no longer worked. So, these changes jump-started iteration 2.

Iteration 2: The Current Form

For iteration 2, I decided to go back to the Eisenhower matrix that i mentioned here. I consulted this article to learn about the difference between urgency and importance, and came up with the following ideas:

The priority matrix-list

Here is the priority list I made stemming from the Eisenhower matrix:

priority list

1-Need-to-finish

2-Make-one-task-progress

3-Delegate-or-speed-through

4-Leave-it

As you can see, I did not use the classic 'Do, Schedule, Delegate, and Delete.' This is because the original wording does not make sense in a 'white-collar job' scenario. The Eisenhower Matrix was created against a background of military leadership, which assumes that 1) you don't have time to handle all the tasks at hand, and 2) you have people working under you. Neither assumption applies to me.

Instead what i did was:

Priority 1 means I have to finished this task today or tomorrow
Priority 2 means I don't have to finish it within two days and I have to:
1. plan it by dividing this issue into sub-issue tasks, or create a small todo-list of sub-task within the issue
2. complete one sub-issue/sub-task
Priority 3 means to either spend as little time on it as possible to complete it, or let chatGPT do most of the work.
Priority 4 means this task is not worth my time (not in the present nor in the future)

And here is an example application of that in a real task:

the importance matrix i've had in a real task

Additional views

Additionally, I've also added a few extra views to my GitHub project:

Task Importance Analysis: This is a straightforward view where I sort all my tasks based on long-term importance and due dates, assigning the correct priorities to them.
Today: This view shows the tasks I manually assign to myself based on the priorities. It provides an additional division of labor, allowing me to focus solely on the tasks I need to finish today.
Due Today: This view allows me to see if I've missed any tasks that are due today. I check this periodically.
Task Shelf: This holds all the tasks that I've assigned priority 4-leave-it to. I check them periodically to decide if I should restart any.
Current Iteration: Done: This view shows all the completed tasks. At the end of each week, I perform a review, asking myself if the completed tasks were genuinely important, urgent, and how many hours I've spent on them.

Future improvement

This method so far is going pretty well. The only caveat is that sometimes my actual workload is the same between a priority 1 task and a priority 3 task. So, for the future, I will work on automation for common tasks that fall into priority 3.

How to get in the zone

2025/02/26

Follow these steps to get in the zone:

1. Have a clear desk

Have a clear desk, a clean desk. Only the absolutely necessary items go on the desk, like the computer and screens. If there cables, hide them. If there are things you want to get your hands on from time to time, but is not necessary to the work, like you mug, put then on a secondary desk, something like the top of a drawer.

2. Have the screen close to your face.

It doesn't have to be dangerously close. Don't hurt your eyes. The screen should dominate the majority of your field of view. Have the screen on your eye level. Do not leave gaps between your monitor and the desk, as that may cause distraction and cause you to keep peeking through and look at things behind the screens.

3. Listen to music that you can tone out.

Listen to music. Piano, lofi, string quartet. Game OSTs can be a good option. Listen to stuff you can tone out, and also sort of vibe to. The point is to have something nice to dominate your sense of hearing, so you can forget about giving attention to your surroundings.

4. Drink sweet juice

Drink anything sweet. You are human, you like sugar, so indulge in it. This will help you be energized, and put you in a good mood. If you don't want sugar, you can dilute the sweet juice with water.

5. get your legs tired

Sometimes you really want stop your work so you can standup. If that happens listen to your body. Stand up, go take a walk, and while you're at it, do some squats. Make those legs tired so you won't have to stand back up so soon again.

Now work away.

随笔

有时候不知道些什么的时候就会在这里添两笔，正好也回顾一下中文写作

2023-05-15-凌晨

五月十五，凌晨，23年

热

全球变暖说了这么多年，今天是总算感觉到了。西雅图作为一个沿海城市，属于被抢打出头鸟，第一个受罪。

去年刚到这里的时候就听周边人说: "西雅图今年超高温", 结果打开一看，气温连三十度都不到。我作为从北京出来的人，看到这个数字就不以为然，怕都不怕。结果夏天真的来到时，闷热的脑子发晕。

西雅图历史上都是一个凉快的城市，所以大部分建筑都不自带空调。当时听到这个知识时立即联想到中国南方冬天不开暖气。而就如同中国这种奇葩制度一样，拒绝空调的西雅图在气温变化下带给人们的热糊涂的脑袋和被抢购一空的风扇。

歌

今天听的歌是TrackTribe的Last train north，没有歌词，只有电吉他。听的时候感受这二手市场风扇的小风，有够惬意。

棋

最近又重新拾起国际象棋。当初看完网飞的The queens gambit后，深受启发（虽然每看一部剧都会这样），注册了chess.com的账号，卖力的把自己从初始的1200分打到了400分。这几天偶尔抽空来了两把，发现比之前水平要好了不少，能做到赢多输少。经过练习一些puzzle，偶尔能成功把对方逼到checkmate。希望这种记录可以被保持住吧。

周天

周天的时候跟着女友去看了第三部银河护卫队，虽然依旧是一部漫威电影，但是可以看出导演是用力拍了的。运镜和节奏都把握的很到位。即使是千遍一律的好人打败坏人的剧情也能让人感受到新意。这次电影好像也不关心到底该不该pg13，偶尔的蹦脏话，加上绿色血腥打斗，大人看了也觉得的挺爽。

Creation section

...this is where I post all my creations, big or small.

Some website i made

This folder contains all the interesting sound I've recorded

michigan-lake-evanston

Michigan Lake, Evanston

2025/03/06

Here's the sound:

User 733432706 · Michigan lake, Evanston

The following is all the visulizations create by me.

temp-in-2014

Temperature in 2014

Here is the link.

TODO: Write about it

Automations

This is where i record my automation code

Building a reddit bot

I build this following this tutorial.

Learning | Making my own

Learning

Fetch post from reddit

Creating reddit app

Reddit bot rules

You can’t make more than 1 request every 2 seconds (or 30 a minute)
You must not lie about your user agent

Go to this page and click on the create your own app button:

and create a reddit app. From there you will get a client_id(red square) and a secret(blue square).

And then you should to the praw.ini in the package folder of the praw and added a new section for your bot, and put the client_id and secret, along with a user_agent field specifiying the version.

Note

the user_agent is there becuase reddit only allows to prevent someone from abusing your bot. Update the version number if somebody is abusing your bot.

featch code


import praw  
  
reddit = praw.Reddit('bot1')  
subreddit = reddit.subreddit("learnpython")  
for submission in subreddit.hot(limit=5):  
    print("Title: ", submission.title)  
    print("Text: ", submission.selftext)  
    print("Score: ", submission.score)  
    print("---------------------------------\n")

Reply


import praw  
import pdb  
import re  
import os  
  
# Create the Reddit instance  
  
reddit = praw.Reddit(  
    client_id="id",  
    client_secret="sercret",  
    password="password",  
    username="username",  
    user_agent="Py_gloomy_repli_bot 0.2"  
)  
  
  
if not os.path.isfile("posts_replied_to.txt"):  
    posts_replied_to = []  
else:  
    with open("posts_replied_to.txt", "r") as f:  
        # Read the file into a list and remove any empty values  
        posts_replied_to = f.read()  
        posts_replied_to = posts_replied_to.split("\n")  
        posts_replied_to = list(filter(None, posts_replied_to))  
  
        # Get the top 5 values from the subreddit pythonforengineers  
        subreddit = reddit.subreddit('pythonforengineers')  
        for submission in subreddit.hot(limit=20):  
            if submission.id not in posts_replied_to:  
                if re.search("i love python", submission.title, re.IGNORECASE):  
                    submission.reply("me bot says: I love Python too! I especially love praw")  
                    print("Bot replying to : ", submission.title)  
                    posts_replied_to.append(submission.id)  
                elif re.search('python', submission.selftext, re.IGNORECASE):  
                    submission.reply("you mentioned python, python very good!")  
                    print("Bot replying to : ", submission.title)  
                    posts_replied_to.append(submission.id)  
                else:  
                    print("Nothing found")  
  
        # Write our updated list back to the file  
        with open("posts_replied_to.txt", "w") as f:  
            for post_id in posts_replied_to:  
                f.write(post_id + "\n")

Automate

To automate this script, we have to use linux chrontab

we first start a new cron job by typing:


crontab -e

and then we select which editor to use, I chose vim.

Then we put this line in the bottom of the cron job script:


* * * * * cd /path/to/project/; python3 bot_script.py

The 5 * represent the time the job gets executed:


* * * * * command to be executed 
– – – – – 
| | | | |
| | | | +—– day of week (0 – 6) (Sunday=0) 
| | | +——- month (1 – 12) 
| | +——— day of month (1 – 31) 
| +———– hour (0 – 23) +————- min (0 – 59)

Then save the script, and enter the command:


sudo service cron start

Note

if you want to stop the cron job: sudo service cron stop

if you want to restart the cron job: sudo service cron restart

Comment reply


import praw  
import pdb  
import re  
import os  
import random  
  
# Create the Reddit instance  
  
reddit = praw.Reddit(  
    client_id="id",  
    client_secret="secret",  
    password="password",  
    username="username",  
    user_agent="useragent"  
)  
  
lucas_quotes = \  
[  
" I've calculated your chance of survival, but I don't think you'll like it. ",  
" Do you want me to sit in a corner and rust or just fall apart where I'm standing?",  
"Here I am, brain the size of a planet, and they tell me to take you up to the bridge. Call that job satisfaction? Cause I don't. ",  
"Here I am, brain the size of a planet, and they ask me to pick up a piece of paper. ",  
" It gives me a headache just trying to think down to your level. ",  
" You think you've got problems. What are you supposed to do if you are a manically depressed robot? No, don't even bother answering. I'm 50,000 times more intelligent than you and even I don't know the answer.",  
"Zaphod Beeblebrox: There's a whole new life stretching out in front of you. Marvin: Oh, not another one.",  
"The first ten million years were the worst. And the second ten million... they were the worst too. The third ten million I didn't enjoy at all. After that, I went into a bit of a decline. ",  
"Sorry, did I say something wrong? Pardon me for breathing which I never do anyway so I don't know why I bother to say it oh God I'm so depressed. ",  
" I have a million ideas, but, they all point to certain death. ",  
]  
  
subreddit = reddit.subreddit('pythonforengineers')  
  
with open("comment_replied.txt", "r") as f:  
        # Read the file into a list and remove any empty values  
        comment_replied = f.read()  
        comment_replied = comment_replied.split("\n")  
        comment_replied = list(filter(None, comment_replied))  
  
for comment in subreddit.comments(limit=100):  
    if comment.id not in comment_replied:  
        if re.search("Lucas Help", comment.body, re.IGNORECASE):  
            lucas_reply = "Lucas the robot says: " + random.choice(lucas_quotes)  
            comment.reply(lucas_reply)  
            comment_replied.append(comment.id)  
        else:  
            print("Nothing found")  
  
# Write our updated list back to the file  
with open("comment_replied.txt", "w") as f:  
    for post_id in  comment_replied:  
        f.write(post_id + "\n")

Making my own

This is my attempt to make a helpful info bot for r/udub


import praw  
import pdb  
import re  
import os  
import random  
  
# Create the Reddit instance  
  
reddit = praw.Reddit(  
    client_id="client_id",  
    client_secret="secret",  
    password="password",  
    username="username",  
    user_agent="Py_gloomy_repli_bot 0.2"  
)  
  
helplink = '''  
General student service:\n  
- https://www.washington.edu/students/servicesforstudents/  
  
Courses/Professor info:\n  
- https://uwgrades.com/  
  
- https://www.washington.edu/cec/toc.html  
  
- https://www.google.com/search?q=rate+my+professor+uw+%5Binsert+professor+name%5D&sxsrf=AJOqlzVwUt9A0OY5_KgZb7bdeYc_lw_RKg%3A1675667778386&ei=QqngY5GbF_aE0PEP8t2piAo&ved=0ahUKEwiRxMT1rID9AhV2AjQIHfJuCqEQ4dUDCBA&uact=5&oq=rate+my+professor+uw+%5Binsert+professor+name%5D&gs_lcp=Cgxnd3Mtd2l6LXNlcnAQAzIFCCEQoAEyBQghEKABMgUIIRCrAjoICAAQogQQsAM6BAgjECc6BQgAEIAESgQIQRgBSgQIRhgAUNgGWJYaYLgkaAFwAHgAgAFQiAHVCpIBAjIymAEAoAEByAEFwAEB&sclient=gws-wiz-serp  
  
UW student housing info:\n  
- https://www.housingforhuskies.com/  
  
UW on-campus food info:\n  
- https://www.campusreel.org/colleges/university-of-washington-seattle-campus/dining_food/  
  
UW on-campus job info:\n  
- https://www.washington.edu/workstudy/find-a-job/  
'''  
  
if not os.path.isfile("posts_replied_to.txt"):  
    posts_replied_to = []  
else:  
    with open("posts_replied_to.txt", "r") as f:  
        # Read the file into a list and remove any empty values  
        posts_replied_to = f.read()  
        posts_replied_to = posts_replied_to.split("\n")  
        posts_replied_to = list(filter(None, posts_replied_to))  
  
        # Get the top 5 values from the subreddit pythonforengineers  
        subreddit = reddit.subreddit('udub')  
        for submission in subreddit.new(limit=50):  
            if submission.id not in posts_replied_to:  
                if re.search("\?", submission.title, re.IGNORECASE):  
                    submission.reply("Hi! I am a bot! Your post is asking a question, so here are some usefuly links if you haven't found them yet!\n" + helplink)  
                    print("Bot replying to : ", submission.title)  
                    posts_replied_to.append(submission.id)  
                else:  
                    print("Nothing found")  
  
        # Write our updated list back to the file  
        with open("posts_replied_to.txt", "w") as f:  
            for post_id in posts_replied_to:  
                f.write(post_id + "\n")

Simple todo bash script

This file is written for the Day planner and the Tasks plugin in obsidian. This script allows me to quickly add a todo task in the appropriate place.

This is written with the assistance of ChatGPT. Here is my prompt:

Prompt

Please write a bash script that does the following:

takes three arguments: The first argument will be put in the variable [tag], the second argument will be the variable [task_name], the third variable will be the variable [due_date].
get today's year, month and date, and make them number only. So if today is 2023, January 22nd, change that into 20230122 and but that in a variable "date"
Search in path [predefined path]/"Day Planners" for a file that contains "date". put the path of the file in variable "file_path" Error handle this step
put the string "- [ ] [tag] [task_name] 📅 [due_date]" at the end of the file

Here's the actual bash script:


#!/bin/bash
tag=$1
tag="#$tag"
task_name=$2
due_date=$3
root_path="<pre-defined path>"
date=$(date +%Y%m%d) #get today's date in the format of YYYYMMDD
path="$root_path/Day Planners/" #predefined path
file_path=$(find "$path" -type f -name *$date* -print0 | xargs -0 echo) #search for file containing the date
if [ -z "$file_path" ]; then #if file path is empty, display error message
  echo "Error: No file found with the date $date"
else
  echo "- [ ] $tag $task_name 📅 $due_date" >> "$file_path" #append the task string to the end of the file
  echo "Task added"
fi

Todo bundle

2023-01-24

I've been using the Day planner plugin on obsidian for a while now and notised the piling problem that occurs with its auto-generated files:

So i created a python script to bundle them together.

Here is the prompt i came up with for chatGPT:

prompt

I want to create a python script that let the user to type "todo bundle" which takes in two arguments:

a [begin date]
a [end date]

Note: the dates will be in the format of a 8 digit number. For example, if it's 2023, january 24th, the number will be 20230124 and then performs the following actions:

search through a [predefined path] where all the files end in 8 digit numbers, and put the file that has all the number with in the range of [begin date] and [end date] in a list.
sort the list from small to big based on the ending number in file name
read through these files line by line in [predefined path] in the sorted order.
if the line starts with x number of space plus "- [ ", than put it in temp.md along with the file name. Note temp.md's content will be formated like this:

[filename 1]

[a qualified line from filename 1] [another qualified line from filename 1]

[filename 2]

[a qualified line from filename 2] [another qualified line from filename 2] 5. Finally, temp.md wil be saved as "Day Planner-[begin date]->[end date]" in [predefined path]

and here is the actual code:

Note

the code is the product of entering the prompt with addtional tuning and debugging


import os
import sys
  
def todo_bundle(begin_date, end_date):
    # predefined path
    path = "[predefined path]"
  
    # search through files and put in a list
    file_list = [f for f in os.listdir(path) if f.endswith('.md') and (("Day Planner-"+begin_date <= f <= "Day Planner-"+end_date) or (f.endswith(end_date+".md")))]
  
    # sort list by file name
    file_list.sort()
  
    # read through files and write to temp.md
    temp_md = ""
    for file_name in file_list:
        with open(os.path.join(path, file_name)) as f:
            in_file_name = file_name.rsplit(".",1)[0]
            temp_md += "# " + in_file_name   + "\n"
            for line in f:
                print(line)
                if line.lstrip().startswith("- ["):
                    temp_md += line
            temp_md += "\n"
    # save temp.md as "Day Planner-[begin date]->[end date]"
    with open(os.path.join(path, "Day Planner-{}-{}.md".format(begin_date, end_date)), "w") as f:
        f.write(temp_md)
  
if __name__ == "__main__":
    begin_date = sys.argv[1]
    end_date = sys.argv[2]
    todo_bundle(begin_date, end_date)

glue

2023-02-27

here is the repo

This is a bunch of random stuff i've made that doesn't belong in a catagory

a-steamy-love-story-about-billy-bone-and-eye

#AI #AI_generated_story

a steamy love story about "billy bone" and "eye"

Billy Bone and Eye had been friends for a long time, but recently something had changed between them. It was as if a spark had been ignited the moment their eyes met and they both knew that something special was brewing between them.

One night, Billy Bone asked Eye out on a date. She could feel the butterflies in her stomach as she said yes to his invitation and they arranged to meet up outside of her favorite cafe.

When the night of the date came, Billy Bone arrived in a suit with a bouquet of roses in his hands. Eye was taken aback by how handsome he looked and she felt her heart skip a beat when Billy Bone smiled at her. They talked for hours over dinner, never running out of things to say to each other.

After dinner, they took a walk along the beachfront hand in hand and watched the sun set together. As they strolled along, their conversation turned more intimate, sharing secrets and stories from their pasts that left them feeling even more connected than ever before.

Eventually, as the night grew darker, they found themselves standing underneath a tree filled with twinkling stars. They both knew what was going to happen next - no words were needed - and their lips met in an electrifying kiss that sealed the deal between them.

Billy Bone and Eye were now officially in love and it felt like nothing else mattered when they were together - all that mattered was love.

First full stack diagnosis

What's going on?

During winter quater of 2023, I made this full stack web app with three other group mates.

It is now autumn quater 2023, and for project 1 of the course INFO 443, software architecture, I was asked to look back at this project, give it a diagnosis and form a report of its architecture design. This is the place where I will do the recording.

First lets do a quick viz

Using the tool mentioned here, I got this:

visualization of my project

Node: one of the first things i realize is that I should've put the backend in a seperate folder, like what the frontend did.

Here comes the UML

image of the top level UML of my codebase

This is a top level UML diagram of my codebase. The connections between the elements have been marked using arrows.

and the sequence UML

image of the sequence UML of my codebase

This is a sequence diagram showing how the web app will behave when the user attempts to add a game to the dashboard.

Tests

Since this project is done using react, Jest would be the natural choice of framework for testing.

The assignment ask us to

implement a complete set of automated tests for your chosen architectural element (module or class) you're analyzing. We expect a complete set to comprise around a dozen automated tests, achieving 100% code coverage.

So for my project the architectural element will be the components that composes the main dashboard. Namely Navbar.js and Gambox.js.

image of test coverage Here is the test result. As you can see my components are not as isolated as I once imagined. There is a lot of spill over.

I've also notised that there is a lot of unnessary error handling. For example this code here


function handleAdd() {
    // add a game
    // pass in current Catagory, then show search window
    fetch("api/users/")
      .then(res => res.json())
      .then(data => {
        if (data.status == "loggedin") {
          setCatagoryName(catagoryName);
          setShowSearch(true);
        } else {
          alert("Please login first");
        }
      })
  }

will not be used unless the user logged in. I had to add an impossible scenario in order to cover that line of code in the test.

Update in 2023-01-30:

I've finished the full version since then.

Antzed.com

This section talks about the process of making this website

What's new detection script debug and fix

2024-02-07

This document provides a brief overview of the debugging process I undertook for the JavaScript code responsible for detecting new additions and changes to my website.

The issue arose when I added a function to detect modifications to existing files in addition to detecting new files added. Notably, several files that had no actual changes were mistakenly marked as "changed".

Below is the log from when the error initially occurred:

The "What's New" section malfunctioned; instead of updating the one article that I had updated, the website incorrectly marked old articles as "updated" as well. I wanted to figure out why.

Checked GitHub commit history -> The changes were all related to indexing, for example:
<ol class="chapter"><li class="chapter-item affix "><li class="part-title">Content</li><li class="chapter-item expanded "><a href="../Creation/index.html".
...
Checked if git checkout was related. -> It was not.

The problem was caused by adding a new file, which caused mdbook to re-index during building, and git picked that up.

~~Solution idea 1: In the "What's New" detection code, detect changes beyond 10 lines~~

This approach was unsuccessful because sometimes the re-indexing exceeded 10 lines.

Solution idea 2: In the "What's New" detection code, ignore changes that start with <ol class="chapter">, <a rel="next", and <a rel="prev".

I discovered that GitHub displays the content of the file change in a file.patch section in the JSON file.

I needed to know if file.patch separated changes into different sections or if it just dumped everything together -> file.patch does indeed separate changes between each section with an @@..@@ pattern.

Now, I just need to write a filter that filters out all the reindexing junk and leaves out the actual changes.

Here's the log from the second occurrence:

Once again, after editing one of the posts, the latest updater still failed to recognize the edit.

The problem occurs because the change to one of the posts did not have a file.patch.

I ended up using both the significantChangeIdentifier function and counting the number of file.changes in an OR relationship. This way, both types of changes in the commit metadata are covered.

Edit 2024-02-07, 16:00

Shortly after uploading this document, another bug appeared. This document was once again not recognized by the detector.

It was discovered that in the REST API, there are additional status names like renamed, besides the statuses added and modified, for files in a commit.

By examining the response schema here, we can see that there are the following types of statuses:


"status": {
    "type": "string",
    "enum": [
        "added",
        "removed",
        "modified",
        "renamed",
        "copied",
        "changed",
        "unchanged"
    ],...}

A quick chatGPT conversation result this:

"added": The file was added in the commit. It did not exist in the repository before this commit.

"removed": The file was deleted in the commit. It will not be present in the repository after this commit.

"modified": The file existed before the commit and has been altered in some way (content changed, file permissions, and so on) as part of this commit.

"renamed": The file was renamed in the commit. This status indicates that the file's path or name has changed.

"copied": This status indicates that the file was copied from another file in the commit. This is similar to "added" but specifies that the new file is a copy of an existing file.

"changed": This is a more generic status that can indicate any change not specifically covered by the other statuses, such as changes in permissions or other metadata that doesn't necessarily modify the file's content.

"unchanged": The file was part of the commit but was not altered in any way. This status might be used in contexts where files are explicitly mentioned in a commit without being changed, possibly for tracking or auditing purposes.

Which means that I do not need to cover any additional status besides added, modified, and renamed.

Edit 2024-02-07, 17:23

The new changes lead to another problem: failure to handle the edge case where a file is renamed without any content changes. Ideally, this change should be reflected back on the website.

To fix this, I moved the processing from an array of files to an array of filenames slightly downstream. I added tracker for both file.previous_filenames associated with renamed files and overall duplicates using hashsets. So i can compare and delete later.

网站搭建总结

2020-06-30

先把总结放在前头：最后成功方案是使用宝塔面板一套弄下来的，安装的LNMP环境。

由于最近才开始学linux，对于很多东西都极其的不熟悉。基本上每一个步骤都要上网去查，导致一开始进展很慢。

以下是目录：

服务器购买与初始化设置

首先服务器买的是什鸟科技的，相当的便宜，而且网站ui相对简洁明了，对于新手还是很友好的。

https://www.sncidc.com/

然后是服务器初步设置，这里我是依照这这一个网站进行初步设置的，主要也就是添加sudo用户以及开启http，https服务等

https://www.cnblogs.com/jojo-feed/p/10169769.html

这里贴它开头的一段内容：

新的 Centos 服务器初始化配置

当你初次创建新的 Centos 服务器的时候， Centos 默认的配置安全性和可用性上会存在一点缺陷（运维人员往往会有初始化的脚本）。为了增强服务器的安全性和可用性，有些配置你应该尽快地完成。这篇文章大致从这方面去讲 – 账号安全 – ssh 安全 – 防火墙 – 交换区文件(swap file)

用户密码安全

关于 root 用户

root 用户是在linux环境下拥有非凡权限的的超级管理员。因为root用户的权限很高，所以在日常使用中不建议使用。这是因为 root 用户能做出一些非常有破坏性的行为，甚至是事故。（比如是臭名昭彰的rm -rf /，或者你会对这篇文章《Linux 中如何避免 rm -rf /*》感兴趣）下一步，我们为了日常工作会设置一个权限较小的替代用户。我们会教你当你需要更大的权限的时候怎样获取。

创建一个新的用户

这个例子会创建一个用户叫 demo，你可以替换成你喜欢的用户名:


adduser demo

下一步，为新的用户分配密码（再次说明，用你刚创建的用户替换demo）


passwd demo

输入一个强密码，然后再重复输入以完成验证。

用户权限

现在，我们已经有了一个普通权限的用户。然而我们有时需要执行一些管理员任务。为了避免要注销普通用户，然后用 root 用户重新登录，Linux 中有个优雅的解决方式，系统授权特定用户或用户组作为 root 或他用户执行某些（或所有）命令系统。在这个用户组的用户在每条命令前加个单词 sudo，就可以用管理员权限执行命令。

安装 `sudo`

有些版本会没有 sudo 命令的，所以首先要安装 sudo


yum install -y sudo

属于wheel用户组的用户可以使用sudo命令。在 RedHat 系只需我们需要添加用户到那个wheel组。属于wheel组的用户可以使用sudo命令。以root的身份执行以下命令，会将新用户添加到wheel组中（将 demo 替换成你的新用户）


gpasswd -a demo wheel

之后还有一个开启防火墙的命令需要在新主机上操作，具体如下：

防火墙能控制端口、应用程序的流量，让服务器更加安全。在 Centos 7 中引进 firewalld 作为 iptables 的前端。firewalld 对比 iptables，有区域(zones)划分，更简单，能动态配置等特点。下面主要介绍杂在新服务器中 Firewalld 的基本配置启动


sudo systemctl start firewalld

设置默认区域


sudo firewall-cmd --set-default-zone=public

查看激活区域


sudo firewall-cmd --get-active-zones

如果激活区域中没有public的话，可能是没有为public区域设置接口，你需要用id addr 查看网卡接口(其中 lo 是本地回环接口),再使用设置


sudo firewall-cmd --permanent --zone=public --change-interface=eth0

给防火墙添加允许通过的服务


sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=ssh

查看更多可以添加的服务


sudo firewall-cmd --get-services

查看你的 firewalld 信息


sudo firewall-cmd --list-all

重载配置


sudo firewall-cmd --reload

开机启动


sudo systemctl enable firewalld

服务器环境安装

LNMP环境安装

首先LNMP环境的全称是Linux，Nginx，Mysql/Mariadb, php。除此之外还有LAMP，与LNMP的区别在于将Nginx替换为了Apache。

这四个东西的功能分别对应着服务器系统，网络服务器软件，数据库和hypertext预处理器。是一个网站服务器必要的运作环境。

我一开始想使用命令进行安装，但是出现了问题。最主要的一个是：


no package xxx available
nothing to do

这个问题困扰了我很久。我一开始的判断是yum源有问题，就进行了yum源更新：


yum update

以及更换yum源至阿里源

wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo

但是问题都没有得到解决，所以我决定进行换位思考，看看有没有其他方法

最终我选择了宝塔面板。在一开始的时候我实际上尝试了宝塔面板的自动安装，但是安装过后没有在正常的文件目录里找到，所以之后对这个方案有所搁置。而现在我换回宝塔面板这个方向后，就决定尝试直接从宝塔里面寻找答案，完全依靠宝塔来操作，而非半命令半宝来进行安装。

我找到的宝塔安装教程：

https://www.jianshu.com/p/293c94adc11d

利用宝塔一键安装LNMP整个过程就变得异常简单。虽然命令安装时才坑能够更能帮助我学习，但是有些事情还是挑简单的做为好，省的浪费不必要的时间。

之后的话就可以开始数据库与wordoress安装与设置了

数据库设置

在安装wordpress之前，需要先启动mysql，并创建一个数据库，并将数据库权限给予一个用户（旧/新）。

在宝塔里面，这一步操作如下（这里借用教程里的内容https://www.jianshu.com/p/293c94adc11d）

在宝塔界面找到网站，点击添加站点，会出现以下界面：

域名根据实际情况填写，如果是国内空间，则域名必须备案，国外空间则不需要，一般写一个顶级域名和一个www二级域名即可，比如www.pandacui.com,pandacui.com，搭建阶段可以也用云服务器的公网IP代替。

FTP可以选择不创建，后面可以用宝塔面板的文件管理。

数据库选择MySQL，会提示输入用户名名称和密码，自动建立与用户名同名的数据库

PHP版本选择你安装的PHP版本，我安装的是PHP7.2，建议安装版本不低于7。

点击提交，会提示站点创建成功，并显示数据库用户名和密码，这里不用刻意去记录，可以在宝塔的数据库管理界面再次查看。

站点创建完成之后，可以通过公网IP或你绑定的域名测试一下，在浏览器输入公网IP或你绑定的域名即可，如果成功，会显示以下页面：

以上是利用宝塔的方式，同时这一步还可以用命令来完成(这里从教程里进行摘要https://www.liquidweb.com/kb/how-to-install-wordpress-on-centos-7/)：


mysql -u root -p


CREATE DATABASE wordpress;


CREATE USER adminuser@localhost IDENTIFIED BY 'password';
#注意这里的adminuser和password需要自定义


GRANT ALL PRIVILEGES ON wordpress.* TO adminuser@localhost IDENTIFIED BY 'password';
#这里的adminuser与password与上一步的保持一致。


FLUSH PRIVILEGES;


exit

wordpress安装

wordpress的安装

宝塔的安装较为简单。从wordpress.org官网上下载它提供的.tar安装包到自己的电脑里，然后从宝塔上传到刚在建立的目录里


/www/wwwroot/

上传后将里面的自带的欢迎index.html和4040.html删掉，然后将tar包解压，放进带有.htacceess的目录里，将解压包和解压出来的空wordpress目录删掉。

之后就上到在浏览器里输入自己的已经注册好的域名，查看wordpress是否安装成功。如果出现一个选语言的wordpress见面，便说明安装成功。

wordpress设置

进入wordpress设置界面后，只需要将自己的之前设置好的数据库名称，数据库用户和其密码输入进去即可，然后会设定wordpress admin的账户与密码，最后成功进入wordpress dashboard。

成功进入网站

如果看见了wordpress的管理员后台的话，就代表网站完成配置了

Making it some what dynamic

2023-4-27

Reason for this

Although this website is meant to be written the form of a book, I realized it is nessary to add a bit more blog-like features for the returning readers. This way, both the "exploration" of the book and "what's been added since last time I visited" aspect of the expirence can be covered.

The solution for now

I embeded a javascript file into the welcome page markedown file.

The code look like this



fetch("https://api.github.com/<my-repo-owner>/<my-repo-name>/commits?per_page=3&sha=master", {headers: header})
  .then(response => response.json())
  .then(data => {
  let promises = [];
  for (let commit of data) {
      // sort through the fetch
      promises.push(
          fetch("https://api.github.com/repos/<my-repo-owner>/<my-repo-name>/commits/"+commit.sha, {headers: header})
          .then(response => response.json())
          .then(commitData => {
              // Check if the commit includes file data
              if (commitData.files) {
                  const newFilesInThisCommit = commitData.files.filter(file => file.status === 'added' && file.filename.startsWith('book/')  && file.filename.endsWith('.html')  && !file.filename.endsWith('index.html'));
                  return newFilesInThisCommit.map(file => file.filename);
              } else {
                  return [];
              }
          })
      );
  }
  return Promise.all(promises);
  })
  .then(filesInCommits => {
      let html = "<ul>";
      for (let filesInCommit of filesInCommits) {
        for(let file of filesInCommit) {
            //String manimulation
            file = file.substring(5);
            file = file.substring(0, file.length - 5);
            let temp = file.substring(file.lastIndexOf('/') + 1);
            temp = temp.replace(/-/g, ' ');
            temp = temp.replace(/\w\S*/g, function(txt){return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();});
            file = file.substring(0, file.lastIndexOf('/')+1);
            file = temp + ' in ' + "<code class='hljs'>" +file + '</code>';
            html += `<li>${file}</li>`;
        }
      }
      html += "</ul>";
      //put the html in the document
      const element = document.getElementById('latestBlog');
      element.innerHTML = html;
  })
  .catch(error => console.error(error));

And then I put it in the book.toml(the default "setting" file for mdbook) in my book folder like this:


[output.html]
additional-js = ["my-javascript-file.js"]

What was the challange

Mdbook is static website generator, so its missing a backend. There are "custom backend" that you can do, but they are mainly for rendering( generating the book in a different format other than html) and is not an actual service.

The loops I have to go through

script during compile

I explored seveal script like this


import subprocess

def get_git_diff(num):
    git_diff_cmd = f'git diff HEAD~{num} ./src/SUMMARY.md'
    grep_cmd = "grep -E '^\+'"
    cmd = f'{git_diff_cmd} | {grep_cmd}'
    result = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
    return result.stdout.splitlines()

i = 1
latest_content = None

# Keep incrementing the number until we find a diff with more than 5 lines
while True:
    content = get_git_diff(i)
    if len(content) > 5:
        latest_content = content
        break
    i += 1

# Print the latest content
for line in latest_content:
    if (line.startswith('@@') and line.endswith('@@')):
        continue
    if ("README.md" in line):
        continue
    
    stripped_line = line.strip('+').strip()
    print(line)

which aim to monitor the changes in SUMMARY.md(the table of content file) and do string manipulation afterwards. This seemed like a good idea at the time until i realized i could just monitor the file addition instead.

The true custom backend

Then i thought, hmm, why not just make a backend for my satic "frontend", and create a backend i did.

I used this template generator and creates a simple backend. I specified an api call that gave me the necessary strings, and I than uploaded it to cyclic.sh.

I then went on my welcome page and embedded this code:


    fetch("<my-custom-backend-app>/api/v1/latest")
    .then(response => response.json())
    .then(data => {
        // Get a reference to the HTML element where we want to display the data
        const element = document.getElementById('latestBlog');
        // Update the HTML content with the fetched data
        let html = "</ul>";
        for (let i = 0; i < data.length; i++) {
            html += "<li>" + data[i] + "</li>";
        }
        html += "</ul>";

        element.innerHTML = html;
    })
    .catch(error => console.error(error));

This sounds good on paper, so lets test it out.

sound effect, this was a major fail.

The reason turns out to be quite simple, not only did i have to do the do the api call from the backend app, i then have to fetch again from my custom backend, so the process is like this:

antzed.com <-- my custom backend app <-- github api.

I even tried to simplied fetch through fetching the minimum commit from the github api, meaning changing https.../commits?per_page=<big number> to https.../commits?per_page=2, and do direct index calles in my embeds like this,


let html = `<ul>
    <li>${data[0]}</li>
    <li>${data[1]}</li>
    <li>${data[2]}</li>
    <li>${data[3]}</li>
    <li>${data[4]}</li>
</ul>`;

but that doesn't really help.

So in the end, I cut off the middleman that is my custom backend.

What's next

Even though the current solution works, I still believe I can achieve a faster loading time. So I'm going to keep working on this.

Am I going to keep the custom backend?

Probably. Using it to do this simple fetches is definitly not going to happen. But it definitly can be used for some other complex operation.

Creating mdbook-summary-maker - a summary.md automation tool

2023-05-12

Problem

The problem i want to solve has to do with my SUMMARY.md file. The method I use currenlty is to added the name of the file into SUMMARY.md whenever i'm writing a new post. I automated the process of adding new post a little bit when I was mirating everything into mdbook and netlify(detialed in this post), but it was still a semi-automatic expirence, which i'm aiming to improve. I want to be able to create a file, write in it, build the book and publishing it online without worrying about anything else.

The solution

Here is my repo. I ended up writing a excutable in rust that cycles through the folder

Process

maybe a preprocessor?

My first idea is to find already developed preprocessor. mdbook support these script that can be excuted before the content is sent in for rendering. After some searching, i found two possible solutions:

I immediatly ditch the mdbook-fs-summary option because it requires me to rename all my files with number prefix, which is definitly a hassle i don't want to do.

After trying out the mdbook-auto-gen-summary, it only offers a mediocre solution. It doesn't provided with the desirable order I want to have with my SUMMARY.md, and it is not customizable.

So afterwards, I've decided to make a custom preprocessor myself.

hammering out my own preprocesor

In order to make my own preprocessor, I'm using the easy of of seeing how other people does it. So natrually, i went and look at the code of mdbook-auto-gen-summary, the preprocessor i was just looking at.

Upon first inspection, and some reading in mdbook's documentation, a few things stand out:

There is the framework called mdbook:preprocess that is for bulding custom preprocessor
The template

I then copied the file structure from mdbook-auto-gen-summary:


.
└── Project/
    ├── src/
    │   ├── your_mod_name/
    │   │   └── mod.rs
    │   └── main.rs
    └── cargo.toml

and implemented my own code in.

here's when the bugs comes

After testing out my preprocessor, I immediatly spotted some bugs:

The SUMMARY.md generated seems to be stuck in a recursive loop. After the code generated the strucutre of the target folder and write it into SUMMARY.md, it would starting write in more of the same structure over and over again.
IF the content of the SUMMARY.md was not cleared before running, some target would literatlly "explode", meaning that the content within the folder gets deleted, the files get dragged out of the target folder into its parent directory.

Then I realize the problem: I was aimming to update SUMMARY.md according the src/ folders current state each time i build the book. But the preprocessor can only manipulate the markdowns files the content files after SUMMARY.md gets loaded in, but before these files gets rendered. It cannot manipulate the SUMMARY.md file, then initiate the process of loading everything in and rendering.

That was not stated in any of the documentation! Thanks Rust!

Guess we are back to make the excutable.

So after finding this devastating fact, i went back and removed all the preprocessor framework, and instead code it like a script, exporting it to an executable, place the executable in the book folder, and added a line to my automation script here.

Making it properly dynamic

2023-05-12

The Problem

Not long after my last post, the solution quickly became invalid. This is due to the fact the i've exposed my github api key on the frontend in order to add in the dynamic function. I thought this was ok since the api key is "fine grained"(according to github's naming) to only have read access to my website repo, but github revoked it immediatly as soon as i made that repo public.

Solution

The final solution i went with is netlify's serverless functions. They are avaiable for free tier user and can provided dynamic functions.

Here is the Intro article that i followed as a guide.

Honorary mention: The fetch not defined bug after deployment and its solution. I will talk about it more in later.

Process

Attempt 1

Previously i didn't opt in for the whole "building a backend" things because hosting a backend a seperate provider makes the dynamic features really slow. But then i realize the files mdbook generates can be treated as the frontend, and a backend can be wraped around it.

So that's what I did, I took that simple backend I've build previously, swap out the public folder with the generated static web content, and called it day.

My file structure:


my-app/
├─ node_modules/
├─ book(generated)/
│  ├─ favicon.ico
│  ├─ index.html
│  ├─ ...
├─ src/
│  ├─ index.js
│  ├─ ...
├─ .gitignore
├─ package.json
├─ README.md

However, after reaching this step, a new problem arise: where to host all of this code?

Attempt 2

One of the main reasons i'm hosting my website on netlify is that its for free and the expierence is nice, but this only applies to static website. The only free solution(that i know of) for full stack website hosting is cyclic.sh, which has a mediocre experience at best. Any other solutions that is comparable to netlify cost a solid chunk of money.

Here comes netlify's serverless functions. After fiddling with it for bit, i was succeful in implementing the same feature as i did in the simple server.

Here is my file structure:


.
└── Project/
    ├── .netlify(this is generating using netlify-cli)/
    │   └── ...
    ├── node_modules/
    │   └── ...
    ├── book(frontend)/
    │   ├── index.html
    │   └── ...
    ├── netlify/
    │   └── functions/
    │       └── api-endpoint-1.js
    ├── netlify.toml
    ├── package-lock.json
    └── package.json

I copied the commit fetching code into the api-end-point-1.js and made it a function, and added the endpoint function at the top.

The endpoint function:


export const handler = async () => {
    const html = await getLatestCommit();
    try {
      return {
        statusCode: 200,
        body: JSON.stringify({
          content: html,
        }),
      }
    } catch (err) {
      return { statusCode: 500, body: err.toString() }
    }
}

and that's about it, the rest is just following the intro article mentioned at the top.

The bug

One of bugs that was worth mentioning on is the fetch not defined bug.

Bug

fetch not defined

This bug will only appear after deployment to netlify. This probably have to do with node.js version mismatch between the local machine and the cloud service netlify is using, so use the solution given at the top should solve it.

Creating this website

This website is created using mdbook and is hosted on netlify

Here's the tutorial I followed during the creation process.

Publishing Workflow

I wanted to make the process of publishing as easy as possible so it doen't obstruct my flow of idea. Since mdbook uses the srcs files to build the book files, I used github submodules to seperate the builds and the src files. This allow me to save drafts that are not published, but also easily publish my builds when i need to.

I also create a script on my local machine to automate these processes. The script looked something like this:


#!/bin/bash

save_path="path1"
publish_path="$save_path/path2"
build_path="path3"

if [ $1 == "save" ]; then
  read -p "Enter commit message: " commit_message
  cd "$save_path"
  git add .
  git commit -m "$commit_message"
  git push
elif [ $1 == "build" ]; then
  cd "$save_path"
  ./mdbook-summary-maker
  if [ $# -eq 2 ] && ([ $2 == "open" ] || [ $2 == "-o" ]); then
    mdbook build --open --dest-dir $build_path
  else
    mdbook build --dest-dir $build_path
  fi
elif [ $1 == "create" ]; then
  read -p "Enter file name: " file_name
  file_path=$(find "$save_path" -name "$file_name.md")
  if [ -z "$file_path" ]; then
    echo "File not found."
  else
    # process the file path that contains README.md
    replaced_file_path="${file_path/\/$file_name.md/\/README.md}"
    replaced_file_path="./${replaced_file_path#*src/}"
    summary_path="$save_path/src/SUMMARY.md"
    in_file_path="./${file_path#*src/}"
    processed_file_name=$(echo $file_name | sed 's/-/ /g' | sed 's/\b\(.\)/\u\1/g')
    new_entry="- [$processed_file_name]($in_file_path)"
    found=false
    while IFS= read -r line; do
        temp_line=$line
        if [[ $line == *"$replaced_file_path"* ]]; then          
            tab_count=0
            while [[ $line == $'\t'* ]]; do
                line=${line#?}
                tab_count=$((tab_count+1))
            done
            for i in $(seq 1 $tab_count); do echo -n -e "\t" >> temp.md; done
            echo "$line" >> temp.md
            tabs=""
            # if $2 is not empty and is equal to "notab", then don't add tabs
            if [ $# -eq 2 ] && [ $2 == "root" ]; then
                for i in $(seq 1 $((tab_count))); do tabs+="\t"; done
            else
                for i in $(seq 1 $((tab_count+1))); do tabs+="\t"; done
            fi
            # for i in $(seq 1 $((tab_count))); do tabs+="\t"; done
            echo -e $tabs"$new_entry" >> temp.md
            found=true
        else
            echo "$line" >> temp.md
        fi
    done < "$summary_path"
    if [ "$found" = false ]; then
      echo "$new_entry" >> temp.md
    fi
    mv temp.md "$summary_path"
    echo "Successfully added new entry to SUMMARY.md"
  fi
elif [ $1 == "summarize" ]; then
  target_file_name=$2
  # search for the target_file_name in $save_path and store it in a variable
  target_file_path=$(find "$save_path" -name "$target_file_name.md")
  # read the file line by line
  subtitle_list=()
  while IFS= read -r line; do
    # if the line starts with "## "
    if [[ $line == "## "* ]]; then
      # get the content line without the "## "
      subtitle=${line#"## "}
      # put it in a list subtitle_list
      subtitle_list+=("$subtitle")
    fi
  done < "$target_file_path"

  # create a string variable table_of_content
  table_of_content="## Table of Contents\n"
  # for each element in the list subtitle_list
  for subtitle in "${subtitle_list[@]}"; do
    # create a copy of element and add "#" in front of the element. Then replace the spaces in the element to "-" and the name the copy of element "element_link"
    element_link="${subtitle// /-}"
    # make the element_link lowercase
    element_link="${element_link,,}"
    element_link="#${element_link}"
    # put the original element in a line "- [element](element_link)/n"
    line="- [$subtitle]($element_link)\n"
    # add that line into table_of_content
    table_of_content="${table_of_content}${line}"
  done

  # Insert table_of_content at the top of the target file
  temp_file=$(mktemp)  # create a temporary file
  # put the content of target file into the temporary file
  cat "$target_file_path" > "$temp_file"
  # save target file into a target file backup
  cp "$target_file_path" "$target_file_path.bak"
  # put the content of table_of_content into the target file
  echo -e "${table_of_content}$(cat "$target_file_path")" > $temp_file  # write to the temporary file
  mv $temp_file "$target_file_path"  # move the temporary file to the target file path


elif [ $1 == "publish" ]; then
  read -p "Enter commit message: " commit_message
  cd "$publish_path"
  git add .
  git commit -m "$commit_message"
  git push
else
  echo "Invalid argument. Please use 'save', 'build', 'create', 'summarize' or 'publish'"
fi

Afterwards, you must run


sudo cp script.sh /usr/local/bin/<command name>
sudo chmod +x /usr/local/bin/<command name>

Note that when using cd in bash scripts, you have to excute the script via . script.sh or source script.sh. This is becuase when running a bash script, the file run its own shell, according to this post

So I added this in my .bashrc file:


alias <command name>=". <command name>"

Also, when dealing with muti-word directory name, you must change from cd $path to cd "$path". (It must be double quotes)

ChatGPT assistence.

This script is made with the assist of chatGPT. Here's my main prompt:

prompt 1

I want to automate my git operations using a script. This script to be able to run anywhere in a ubuntu linux enviroment using the keywork "(command name)". This script should take three different argument:

"save", which ask user for a (commit message), and excute:

cd (pre-defined path)
git add .
git commit -m "(commit message)"
git push

"build" which executes:
1. cd (pre-defined path)
2. mdbook build --dest-dir (pre-defined path)
"publish" which is the same as "save", but i will change the (pre-defined path) later.

prompt 2

I also want make the script edit a SUMMARY.md file in a (pre-defined path).

The user should be able to input "antzed create" and be asked a (file name).

The script will then:

goto a (pre-defined path)
search for the (file name).'s (file path) in reletive to (pre-defined path)
replace the the (file name) part of the (file path) with README.md, for example, if the (file path) is "./knowledge/(file name)", than change it to "./knowledge/README.md". Lets calls this processed file path (replaced file path)
Go to SUMMARY.md, which should be in (pre-defined path) and read line by line to search for (replaced file path). Note that the SUMMARY.md should be in a list structure of markdown list, for example: - [some name](some file path). So should be in (some file path)
create the string "- [ (file name) ]((file path))" and place it a line under the (replaced file path) and add a tab at the start of the line.

prompt 4

fill the todo in this code "elif [ $1 == "summarize"]; then target_file_name = $2

TODO:　implement code using given prompt" using this prompt: "Prompt:

write the code that do the following:

search for the target_file_name in $save_path and store it in a variable
read the file line by line
- if the line starts with "## "
  - get the content line without the "## "
  - put it in a list subtitle_list create table of content by doing the following:
create a string variable table_of_content
for each element in the list subtitle_list
- create a copy of element and add "#" infront of the element. Then replace the spaces in the element to "-" and the name the copy of element "element_link"
- put the original element in a line "- element/n"
- add that line into table_of_content Insert table_of_content at the top of the target file"

Website server migration

2020-07-28

Recently I wanted to migrate my website to a more secure server. My old server's provider was not well known, thus making their servers a bit too shady.

After extensive research, I found a reliable, and most importantly, free solution. AWS EC2 VPS servers.

They offer free 12 month trail, with 750 hours(31.25 days) of rental time each month. This basically means that I can rent a VPS for free for a year.

Resources

小白如何利用wordpress和aws从零搭建自己的个人网站https://www.jianshu.com/p/82db25396e3c

在aws ec2上使用root用户登录https://www.cnblogs.com/520wife/p/7744015.html

AWS server rental process

To start a rental period, you first have to log in to the “root account” (you will be asked to set up when you first sign up for an AWS account) management panel in AWS, which looked a little bit like this

then, click


Service -> EC2 - launch instance

an instance is a VPS, by clicking launch instance, you’ve started the initialization process of a VPS set-up.

There a minimum of three steps in this initialization process. The first one is choosing an AMI, or Amazon machine image. AMI is basically the operating system of your server.

There are plenty of options, such as Ubuntu, Amazon Linux, and red hat enterprise Linux, etc. I have chosen cent-OS 7 since that’s my most familiar OS.

Noted here, for Cent-OS in particular, the option cannot be found in the default recommendation tab. It can only be found under AWS marketplace.

It is also important to look for the “Free tier eligible” sign, because that's the only free option.

You then need to choose the hardware for the server. Unfortunatly, there’s only one option that is free, which is shown below:

However, there’s many advanced options to choose from if your’re willing to pay.

After that, you are basically done. If you want to do further detailed configuration, you have four more steps to go.

Out of the four “sub-steps”, the most important one is step 6, which allows you to open up different protocols and ports. If you want to customize your ssh ports, for example, you can add a new inbond rules on port xxxx.

Connecting to AWS server

I used Putty for this connection process. Prior to this rental, the server provider always provide a random password for the root user. This time however, I need to use a key generated by AWS.

This key is downloaded as a .pem file, with a name your come up with. Then, I have to open puttyGen, and load that .pem file in:

It will show up the content of the .pem file. You then need to press save private key to save a .ppk file.

This new file is the key that will log you in to your server. So next you will open Putty, and find the Auth option under the SSH option.

in the “private key file for authentication” space, click browse and find the .ppk file that you just save, then click open to log in.

Enable password authentication

Password authentication is turned off by default. This creates a lot of limitations if I want to ssh into the server directly/immediately. Hence it is necessary to turn the password authentication back on.

To do that, we first need to login as the root user. We can do this by creating a password for the root user


passwd root

then login as root using the password


su root

we then need to turn on password authentication. That is located in the file sshd_config, hence


vi /etc/ssh/sshd_config

find the line


PasswordAuthentication no

and turn the “no” to “yes”. After that, reload sshd


sudo /sbin/service sshd restart

Next time when you log back in, test using root and an inquiry for user password should be seen.

Web content migration

Now it’s for the part of content migration. I was able to achieve the migration through a WordPress plugin called “All in one wp migration”(ai1wm for short)

This plugins allow the user to export, import and backup thier website.

However, after the version 6.8 update, the plugin started a limit on the maximum size of the website upload, which creates problems if you have a big website.

On top of that, WordPress itself also has an uploading size restriction. So to mitigate all these problems, we have to download:

the v6.7 of ai1w
and a plugin called “tuxedo big file upload”

which unlocks both upload restrictions.

To start migration, you would need ai1wm on both servers(old and new).

You first export your website into a .wpress file via export in ai1wm:

Then, you will install the ai1wm v6.7 on your new server. Through either server panel or ssh, you will need to upload the .wpress file you’ve just created into the ai1wm backup folder:


Default path: /www/wwwroot/your_website_url/wp-content/ai1wm-backups

After that, go to the backup tab in ai1wm and restore the backup that just showed up.

You’ll need reload your website again, and enter the username and password of the old website, and your’re good to go.

Hi, I'm Anthony

..and this is my book.

I want to compile my writings/blogs into a book. The idea is that if I die, I will leave the world with something to read on.

This website will be divided into three sections:

The Knowledge section
The Opinion section
...and the Creation section

If you don't know where to start, check out this

Here are my other links:

Github | Linkedin

Anthony's blog

Abbreviation meanings: