This is the knowledge section
...and this section is all about what i've learned through the years.
Data viz
Cool data viz
Two interesting code survey tool
2023-04-21
What the tools
Here are the two tools i'm talking about:
- Visulization mutiple repos: link
- Visulizating a target repo : intro link| direct link
Why is it interesting?
Because this a unique way of thinking about code. It is also a helpful way for me to understand code.
I always wanted to make contribution to open source project, but often fail to do so due to the complexity of the code base. As beginner I often feel stuck and don't know where to begin. Most of the "tutorials" wil just tell you to "start from checking typos", which doesn't really help at all.
These two tools will help me understand the landscape of open source project of a certain language and give me an overview of the structure of the code.
So have you used it? What's you feedback?
So far i've using it to read the codebase for various open source projects. As person who want to get in the on the action, this ease my way in undertanding what the heck is going on.
As an example, here the visulization of one of the project i'm interested in(github):
Here's another one for twitter's algorithm:
It's from this visulization that i learned that twitter used Rust on one of it's services, which motivates me to learn more about the language. It seems that a lot of companies have been adopting Rust.
So to explore rust, we can also go use the muti-repo visulization tool:
So..
I hope i can use these tools more often in the future. With a easier way of reading code, I'll be way more motivated to try to write and contirbute code to open sources projects as well.
CS-related
- rust-notes/
- Quake-3-fast-inverse-sqrt
- procedural-generation-note
- Linux-learning
- game-engine-architecture-note
- AI/
- fibonacci-in-O(n)
- OpenGL-triangle-without-vertex
- leanring-CRDT-note
- things-i-learned-from-the-creator-of-summatra-pdf
- webGPU/
- leetcode-grind/
- errors/
- Cool-tech-blogs
- apply-ML-concept
The following is all my rust notes.
String::from() in rust
2023-04-18
String::from() basically turns a static string variable to a mutabe variable.
This is what phined have to say:
String::from
in Rust is a function that creates a String
from a string literal or a &str
. A string literal, which is of type &'static str
, is an immutable string slice that is embedded into the executable. On the other hand, String
is a mutable and growable string buffer allocated on the heap. The String::from
function creates a heap-allocated String
and copies the data from the static string into it Source 4.
Here's an example:
#![allow(unused)] fn main() { let static_str = "Hello, world!"; let string = String::from(static_str); }
In this example, static_str
is of type &'static str
, and string
is of type String
. The String::from
function is used to create a mutable String
from the immutable &'static str
.
You can also use .to_string()
or .to_owned()
on a string literal to achieve the same result:
#![allow(unused)] fn main() { let string = static_str.to_string(); let string = static_str.to_owned(); }
Both .to_string()
and .to_owned()
work similarly to String::from
. In Rust versions before 1.9.0, String::from
was faster than .to_string()
, but from version 1.9.0 onwards, they have the same performance Source 3.
to_vec()in rust
2023-04-18
to_vec
is a crate that provides specialized implementations of collect
for common use cases when collecting into Vec
, HashSet
, or HashMap
containers. The main functionality can be broken down into:
-
ToVec
: Collects an iterator's items into aVec
. For example:#![allow(unused)] fn main() { use to_vec::ToVec; let v = "one two three".split_whitespace().to_vec(); assert_eq!(v, &["one", "two", "three"]); }
-
ToVecResult
: Collects an iterator ofResult<T, E>
into aResult<Vec<T>, E>
, where the error is the first error encountered.#![allow(unused)] fn main() { use to_vec::ToVecResult; let numbers = "23E 5F5 FF00".split_whitespace() .map(|s| u32::from_str_radix(s, 16)).to_vec_result().unwrap(); assert_eq!(numbers, &[0x23E, 0x5F5, 0xFF00]); }
Some other similar crates
-
ToSet
: Collects an iterator of references into aHashSet
, implicitly cloning the items.#![allow(unused)] fn main() { use to_vec::ToSet; let colours = ["green", "orange", "blue"].iter().to_set(); let fruit = ["apple", "banana", "orange"].iter().to_set(); let common = colours.intersection(&fruit).to_set(); assert_eq!(common, ["orange"].iter().to_set()); }
-
ToMap
: Collects an iterator of references to key-value pairs into aHashMap
, implicitly cloning the keys and values.#![allow(unused)] fn main() { use to_vec::ToMap; const VALUES: &[(&str, i32)] = &[("hello", 10), ("dolly", 20)]; let map = VALUES.iter().to_map(); assert_eq!(map.get("hello"), Some(&10)); assert_eq!(map.get("dolly"), Some(&20)); }
These specialized forms provide a more ergonomic and efficient way to collect iterators into commonly used containers.
This is done with the help of phined
Difference betwene struct and enum
2023-04-19
According to phined:
In Rust, struct
and enum
serve different purposes and have different behaviors:
-
Memory layout:
struct
: The size of a struct is the sum of the sizes of its members, plus any padding for alignment. stackoverflow.comenum
: The size of an enum is the size of the largest variant plus the variant identifier. stackoverflow.com
-
Usage:
struct
: Represents an "AND" combination of its fields. It is used to encapsulate related properties into one unified data type. reddit.comenum
: Represents an "OR" combination of its variants. It is used when a value can be one of several possible types. reddit.com
-
Type aliases: Type aliases can be used with enums to simplify long or generic names. doc.rust-lang.org
-
Optimizations: Rust's compiler can optimize enums for memory usage and safety, whereas structs do not have the same level of optimization. reddit.com
-
Match statement: Enums can be used directly with match statements, which is one of the primary use cases for enums. reddit.com
Here are examples of each:
#![allow(unused)] fn main() { // Struct example struct Point { x: i32, y: i32, } // Enum example enum Shape { Circle(f32), Rectangle(f32, f32), Triangle(f32, f32), } }
In summary, use struct
when you want to group related properties into a single data type, and use enum
when you want to represent a value that can be one of several possible types.
So it basically mean the struct in rust is like java's object and enum is kind of like a super class.
What is string.to_owned()
Phined answer:
to_owned
is a method provided by the ToOwned
trait in Rust, which is a generalization of the Clone
trait for borrowed data. While both clone()
and to_owned()
can be used to create a copy of a value, they differ in their behavior for certain types, such as string slices (&str
) and array slices (&[T]
) Source 2.
When using clone()
on borrowed types like &str
or &[T]
, it returns a copy with the same borrowed type (&str
or &[T]
). However, when using to_owned()
on these borrowed types, it returns an owned version of the type, such as a String
for &str
or a Vec<T>
for &[T]
Source 2.
For example:
#![allow(unused)] fn main() { let str = "a"; // type &str let cloned_str = str.clone(); // type &str }
#![allow(unused)] fn main() { let str = "a"; // type &str let owned_str = str.to_owned(); // type String }
In general, you can use clone()
when you want a copy of the same type, and to_owned()
when you want to create an owned copy from borrowed data. The ToOwned
trait ensures that the duplicate value has ownership, even if it requires using a different data type, such as a String
or a Vec<T>
for a &str
or &[T]
Source 2.
As mentioned in Source 4, some developers prefer using clone()
when they want an object of the same type and to_owned()
when the types are different. However, this distinction is mostly stylistic, and both methods can be used interchangeably in many cases.
Quake 3 fast inverse square root algorithm
This blog is inspired by this video made by youtuber Nemean explaining the fast inv-sqrt algorithm written for the Quake 3 game.
I want to put what he have said into note form, while simultaneously exploring similar solutions of this algorithm in the Rust language.
This will be in three major sections:
Context | Breakdown | Solution in rust
The original C program
Here is the original c code for the algorithm:
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
Context
Section: who cares
This algorithm is designed to normalise light ray vectors to 1. This is needed to calculate the light physics in game.
Light ray vectors is calculated through:
And this we want to normalize it down to the range between 0 to 1, we divided each component by the length of the vector, which leads to things like this (we are using x axis as an example):
which leads to
which leads to the need to calculating the inverse square roots.
Section: Variable decleartion and IEEE 754
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
The first two variables are bascially set out to represent the number in binary scientific notation form.
The variable i
is a 32 bit long number.
The variable x2 and y are two float(decimal) number
As we know, the binary scientific notation is in the form of:
0 00000000 00000000000000000000000
which consistes of:
- 1 bit for positive or negative sign (- or +)
- 8 bit for the exponents (the x in )
- 23 bits called the mantisa is for the two decimal place for the main number (the .xx in 1.xx )
And since we are normalizing a number to the range of 0 to 1, the following is given:
- the number will always be positive
- the number will be in normalized form
Section: bits and numbers
This section basically pointed out that the bit representaiton of the exponent and mantisa can be roughly equal to its logged version with additional constance. Meaning: the bit representation of x can mostly equal log(x).
The bit representation of mantisa(M) and the exponents(E) can be represented as , which in number form is .
For us to get the log version, we first wrap it with log_2:
We then simplify to get
.
At this point, it is a bit stuck, since we have to figure out how to get rid of log_2 in the front.
However, it was found that given a small enough x, .
It was also found that if the equation has the smallest error if we add x by the constant of 0.0430.
Hence, we get the following formula when we apply the findings:
and after re-arranging we get:
where we can find what we've discovered earlier:
Breakdown
Section: evil bit hack
i = * ( long * ) &y;
We want to enable bit manipulation on float. This is necessary when we want to utilize bit manipulation to realize number divisions.
However, we don't want to convert the variable itself from float to long, because that will cause information loss. We want bit representation of the float to become a long itself.
So instead, this line of code converts the address of the float into an address of a long.
Break down
The &y
is the address of the float y.
The ( long * )
converts the address of float y into the address of a long. So the C language will think it's now a long number livving in that address instead of a float.
The *
then reads what in the address.
Section WTF
i = 0x5f3759df - ( i >> 1 );
This line of code attempts to use bit shifting to manipulate the exponent part of a number in order to achive inverse square root.
So bascially, given .
If you half the exponent (which can easily be done using big shifting), you get .
And if you negate the exponent, you get , which is what we want .
However, since directly calculating the inverse square root is what we want to avoid through this algorithm, we need to work it out some other way. This is where what we talked about in the previous section comes in. If we get y in the IEEE 754 form, the bit representation of y can be interpreted as the log of itself.
Hence instead of calculating , we calculate , which can be turned to .
there is still a division in the equation, which can be dealt with using bit shift - ( i >> 1 )
Since we want to find out the inverse square root of a number, we can than draw an equation like this:
We can then use the previously discussed equation and estabalish a new equation:
.
After solving for gamma, we get this:
where = 0x5f3759df
y = * ( float * ) &i;
Finally we performs the evil bit hack again but in reverse.
Section: Newton iteration
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
This line of code utilized the Newton iteration, we need this becuase the result from the reverse evil bit was still not accurate enough for this calculation.
Newton iteration finds root of a function. This mean that is finds the x in f(x) = 0 through many iteration of approximation.
Some of the basics of newton iteration involves guessing a new value of x through the f(x) and its derivative. This leads to the following equation
The quake 3 algorithm only did the iteration 1 time because the result already makes the error to be within 1%. The code essentially translates to . And when y is the root of the function, y is also the inverse square root of x.
Some solution of this algorithm in rust
Here's a version of this code written in rust found on this stackoverflow post. (rust playground)
#![allow(unused)] fn main() { fn inv_sqrt(x: f32) -> f32 { let i = x.to_bits(); let i = 0x5f3759df - (i >> 1); let y = f32::from_bits(i); y * (1.5 - 0.5 * x * y * y) } }
Note for procedural generation
I was play deep rock galactic and was wondering how procedural generation work, so i'mma read it up and take notes.
Reading list
Generating Random Fractal Terrain
Concept behind fractal: self similarity, meaning it copies itself in a miniture version, and so on and so forth.
This is like human blood vessels, it starts from main arteries, than branch out to small veins.
Midpoint displacemen in one dimension
Start with a single horizontal line segment.
Repeat for a sufficiently large number of times {
Repeat over each line segment in the scene {
Find the midpoint of the line segment.
Displace the midpoint in Y by a random amount.
Reduce the range for random numbers.
}
}
From This
to This
to this
This is a recursive operation
It is also a simple algorithm that creates complex result.
Roughness constant H determines the roughness of the fractal.
Height maps
The same algorihtm can be used to generated the height of a 3d space, then combining with the x and y coordinates, you get a height map.
diamond-square algorithm
The diamond step: Taking a square of four points, generate a random value at the square midpoint, where the two diagonals meet. The midpoint value is calculated by averaging the four corner values, plus a random amount. This gives you diamonds when you have multiple squares arranged in a grid.
The square step: Taking each diamond of four points, generate a random value at the center of the diamond. Calculate the midpoint value by averaging the corner values, plus a random amount generated in the same range as used for the diamond step. This gives you squares again.
Implementing this algorithm recursivly will cause generation with insufficient data at some point. Like after the first pass, the square step will not recieve the four cornors of the diamond in this implementation:
Do diamond step.
Do square step.
Reduce random number range.
Call myself four times.
So the iterative implmentation is:
While the length of the side of the squares
is greater than zero {
Pass through the array and perform the diamond
step for each square present.
Pass through the array and perform the square
step for each diamond present.
Reduce the random number range.
}
This algorithm can also be used to generate clouds by determining the color value instead of the height value.
Linux learning
2020-07-05
This is the post where i’m going to summarize my entry level knowledge on Linux. I’m going to keep updating this post until i have a basic idea of Linux as a whole.
The basic syllabus
There is three main utilization of Linux/purpose for learning Linux:
Utilization type | Basic description | Key phrases/Stuff to learn |
---|---|---|
Operation and Maintenance, O&M. | The craft of keeping servers running. This mostly deals with server maintaining, set-up, surveillance, etc. | FTP,DNS,APACHE,IPTABLE,ORACLE,MYSQL, etc. Understanding most of the Linux commands. Shell & Python |
Application developer | Same as App dev from any other OS, but on Linux. | -Understanding Linux environment, Linux API , IO handling , Python & Shell, IO handling, C/C++ |
Kernel Programming | Dealing with the core of Linux as an OS. | Get to know the relationship between hardware and software. , – Read manuals on computer chip sets. |
Although it’s would be best to learn everything, I find the first option, O&M, more suited for my case(since I have rented servers). So this post will be focusing more on the server-side knowledge(I will try to learn the other option afterward).
Linux basics
- What is Linux?
- Linux distros, and the difference between them
- Most common Linux command
What is Linux:
Linux is a free and open souse Operating system created by Linux Torvalds. It is Unix based, and are used among most of the servers.
Linux distros:
Since Linux is open sourced, people is allowed to modify and produce their own version of Linux, thus born Linux distros. The most common Linux distros are:
- Ubuntu Linux
- Red Hat Enterprise Linux
- Linux Mint
- Debian
- Fedora
Also Linux distros usually comes with a GUI, the majority of Linux user interaction is done by CLI(command line interface)
Linux basic commands:
The absence of GUI in most cases means that operating around Linux requires navigation through working directory, e.g. /home/usr/etc/etc. This means that the most used command are the follow:
Command | Meaning | Example |
---|---|---|
cd | Change directory -this command allows you to go from one directory to another. | cd /home/exampleDir |
ls | List – this command list all the file & directory in the current directory. | ls /home/exampleDir |
pwd | Print working directory – this command print the path of your current directory, it let you know where you are. | pwd Output: /home/exampleDir |
After going through the directories, we can also manipulate the directories by these commands | Command | Meaning | Example | | ------- | ------------------------------------------------------------------- | ----------------- | | mkdir | Make directory – create a new directory under the current directory | mkdir /exampleDir | | rmdir | Remove directory – remove a directory under the current directory. | rmdir /exampleDir |
Files can be also manipulated in a similar way:
Command | Meaning | Example |
---|---|---|
touch | create an empty file of any type | touch example.txt |
rm | remove a file of any type | rm example.txt |
rm -r | remove the directory, but not the files inside. | rm -f /exampleDir |
In GUI, a file movement is usually done by drag and drop. In Linux, file movement is done by:
Command | Meaning | Example |
---|---|---|
cp | copy a file to a directory | cp example.txt /home/exampleDir |
mv | move a file to different directory, or rename a file | mv example.txt /home/exampleDir mv example.txt example1.txt |
There is also a search command:
locate
which act as the search tool for files. For example
locate example.txt
#output: /home/exampleDir/example.txt
Linux also has a build in Manual, which helped users if they forget the meaning of a certain command. To activate the the manual, use
man
#or
-help
Examples could be:
man cd
#or
cd -help
After creating a file/files, it could be examined and edited using these commands:
Command | Meanings | Examples |
---|---|---|
cat | cat displays the content of the file | cat example.txt output: Example contnet |
echo | echo pushes data, usually text, into a file | echo Example content >> example.txt |
vi | vi or visual editor is the default text editor for Unix based systems. When using, press i to edit, press esc to exit editing mode, type :wq or “:x” to write and quit, type :q to just quit, add ! if permission denied. | vi example.txt |
nano | nano is a more complex text editor than vi. It is a “what you see is what you got” text editor with more functions such as inserting the content of another file | nano example.txt nano /home/exampleDir /example.txt |
There is also some system side command that is very useful in day to day usage.
For example
sudo
sudo stands for SuperUser Do. It is a command allowing a non-root user to have administrative privilege when giving out other commands. It is usually used at the very front:
sudo cd
sudo vi example.txt
sudo mkdir
#etc, etc
Managing disks is another important part of the Linux. The following two command gives you the information of disk space: | Command | Meaning | Example | | ------- | ------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------- | | df | df checks the disk space information in each mounted partition of the system. E.g. total disk space, used disk space, available disk space. | df, df -m show in megabytes | | du | du tells the disk usage of a particular file or directory | du /home/exampleDir du /home/exampleDir /example.txt |
There’s a whole other topic on disk mounting, which is something that i will touch on later on.
In terms of file compression, Linux uses the tar archive, which is represented with .tar. The command such as compress and decompress are as the following:
Command | Meaning | Example |
---|---|---|
tar -cvf | create a tar archive | tar -cvf example.txt /home/eampleDir/ |
tar -xvf | decompress a tar archive | tar -xvf /home/exampleDir/example.tar |
tar -tvf | list the content of the archive | tar -tvf /home/exampleDir/example.tar |
each letter in the “-cvf” in “tar -cvf” has it’s own meanings:
- c is for create tar file
- v is for verbosely show tar file progress
- f is for file name
The old fashion “zip” file is also avalible in Linux, we can zip and unzip using the following command:
Command | Meaning | Example |
---|---|---|
zip | to compress a file into zip archive | zip example.txt |
unzip | tp decompress a zip archive | unzip example.zip |
There’s also some command that retrives information such as os version, network connection, etc.
Command | Meaning | Example |
---|---|---|
uname | retrieve information about the Linux distro system. | uname -a Output: |
hostname | retrieve your host name & ip adress | hostname |
ping | ping check your connection with a certain website. It is also used to check the overall internet connection. | ping example.com |
Then, to install something, we use
apt-get
or
yum
The difference between these two is their type of installing package. Linux has two branch of distros
Distros branch | examples |
---|---|
Redhat | Redhat, CentOS, fedora |
Debian | Debian, Ubuntu |
and each have their own type of installing package.
Distros branch | Type of installing package |
---|---|
Redhat | rpm(redhat package manager) |
Debian | deb(dpkg) |
yum and apt-get are the wrappers around these installing packages. They help manage all the installing package on a online repository.
So to use them, for example download the “sudo” package, we just neet to type:
apt-get install sudo
or
yum install sudo
depending on your version of Linux distro.
Finally, files in Linux have privilege/permission setting. A user with a root/sudo privilege will be able to change these permissions.
To understand these permission, we first need to go back to the ls command.
There is a variety to ls, which is
ls -l
it can display the information in a “long” format, which gives more information about a file. An example output would be:
-rwxrw-r-- 1 exampleUsr exampleUsr 780 Aug 20 11:11 example.txt
#or
drwxr-xr-x 2 exampleUsr exampleUsr 4096 Aug 21 08:03 exampleDir
The string of text in front of each line displays the the file type, the permission for file owner, the permission for users in the same group as the file owner and the permission for everyone else.
– | this shows that this is a file |
---|---|
rwx | the first segment is for file owner |
rw- | the second segment is for users in the same group |
r– | the third segment is for every else. |
The first symbol represent the file type. The most common symbols are – and d, which is file and directory.
Apart from the first symbol, the following letters follows a pattern, each letter represent a permission status
- r = read
- w = write
- x = execute
- – (if not the first symbol/letter) = denied permission on whatever is suppose to be there
So, rwx means readable, writable, exexutable; rw- means only readable, writable, but not executable, and r– means only readable.
Now, to change the permission of these files, we use the command
chmod
This command can only be used with a root/sudo privilage, and the user must provide the “permission statement”, which includes the information of
- who: who’s permission is changed
- what: what changes in permission
- which: what permission is going to change
This is represented with indicators after “chmod”, which is summarized in the table below | | | | | | | ----- | ----------------------- | ----------------------------------------- | ------------------------------------- | ------ | | who | u: “user”/ “file owner” | g: group, members in the same user group. | o: others | a: all | | what | – : remove permission | + : grants permission | = : set permission and remove others. | | | which | r: see above | w: see above | x: see above | |
This permission statement can also be represented using numerical and binary values. In that case
Meanings | Numerical value(Decimal) | read digit(binary) | write digit(binary) | execute digit(binary) |
---|---|---|---|---|
No permission | 0 | 0 | 0 | 0 |
Execute permissions | 1 | 0 | 0 | 1 |
Write permissions | 2 | 0 | 1 | 0 |
Write and Execute | 3 | 0 | 1 | 1 |
Read permission | 4 | 1 | 0 | 0 |
Read and exectue | 5 | 1 | 0 | 1 |
Read and write | 6 | 1 | 1 | 0 |
Read, write, exectue | 7 | 1 | 1 | 1 |
so, to grant full permission of a file to everybody, we can either use
chmod a=rwx example.txt
or
chomd 777 example.txt
However, usually we just give the full permission to the file owner, and other people the read and execute permissions, so it’s like this:
chmod u=rwx,og=rx example.txt
or
chmod 755 example.txt
and finally, we add
-R
between chmod and the permission statement if we want tot change the permissions of the files in the sub directories.
References
This is all my reference is going to go. This page will move to the end after i finished this post.
- Linux introduction https://www.zhihu.com/question/397371213
- What is kernel programming https://www.quora.com/What-is-Kernel-programming-What-does-a-Kernel-programmer-actually-do#:~:text=Kernel%20programming%20is%20nothing%20but%20providing%20functionality%20in,kernel%20programmer%20would%20be%20doing%20the%20following%20things%3A
- What is Linux https://maker.pro/linux/tutorial/basic-linux-commands-for-beginners
- Basic vi command https://www.cs.colostate.edu/helpdocs/vi.html
- The Beginner’s Guide to Nano, the Linux Command-Line Text Editor https://www.howtogeek.com/howto/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor/
- 18 Tar Command Examples in Linux https://www.tecmint.com/18-tar-command-examples-in-linux/
- What is the difference between yum, apt-get, rpm, ./configure && make install? https://superuser.com/questions/125933/what-is-the-difference-between-yum-apt-get-rpm-configure-make-install
- yum和apt-get的区别 https://www.cnblogs.com/siyuli2019/p/11252419.html
- How to Use the chmod Command on Linux https://www.howtogeek.com/437958/how-to-use-the-chmod-command-on-linux/
- Linux chmod command https://www.computerhope.com/unix/uchmod.htm
- Linux基础入门 | 目录结构 https://mp.weixin.qq.com/s?__biz=MzU3NTgyODQ1Nw==&mid=2247485351&idx=1&sn=c1a56193a2fa9fa40eaace1c220d8279&source=41#wechat_redirect
- The Linux Directory Structure, Explained https://www.howtogeek.com/117435/htg-explains-the-linux-directory-structure-explained/
Game engine architecture book note
This is a bunch of note from me reading the Game engine architecture book by Jason Gregory
2023-05-24
Main parts in game engine inlucdes:
- third party SDKs, like DirectX
- platform independence layer, which target mutiple plaform, like a wrapper
- Core system, such as memoery allcation, object handle
- Resource manager, inlcude game assets
- Rendering engine, rendering graphics
- low level renderer, e.g. shaders, lighting, etc
- Graphic sdks
- other components
- culling optimization
- Visual effects
- front end
- collision and physics
- skeletal animation
- audio
AI-related
- Bayes-Networks
- stable-matching-and-five-representative-problems
- Probabilistic-Context-Free-Grammar(PCFG)
- A-star-search
- heuristic-values
Bayes Network
What is Bays network
A Bayesian network is a representation of a joint probability distribution of a set of random variables with a possible mutual causal relationship. The network consists of nodes representing the random variables, edges between pairs of nodes representing the causal relationship of these nodes, and a conditional probability distribution in each of the nodes. The main objective of the method is to model the posterior conditional probability distribution of outcome (often causal) variable(s) after observing new evidence. Bayesian networks may be constructed either manually with knowledge of the underlying domain, or automatically from a large dataset by appropriate software.
The goal
The goal is to calculate the posterior conditional probability distribution of each of the possible unobserved causes given the observed evidence.
Any node in a Bayesian network is always conditionally independent of its all non decendant given that node's parents.
Difference between markov and Bayesian network
A Markov model is an example of a graph which represents only one random variable and the nodesrepresent possible realizations of that random variable in distinct time points. In contrast, a Bayesian network represents a whole set of random variables and each node represents a particular causal relationship among them.
six rules of bayes net/six rules of d seperation
d in d seperation stands for directional
path in nodes means:
any consecutive sequence of edges, disregarding their directionalities.
unbocked path means:
a path that can be traced without traversing a pair of arrows that collide "head-to-head"
the head to head nodes are called "colliders"
Rule 1
x and y are _d-_connected if there is an unblocked path between them.
Rule 2
x and y are _d-_connected, conditioned on a set Z of nodes, if there is a collider-free path between x and y that traverses no member of Z. If no such path exists, we say that x and y are _d-_separated by Z, We also say then that every path between x and y is "blocked" by Z.
Rule 3
If a collider is a member of the conditioning set Z, or has a descendant in Z, then it no longer blocks any path that traces this collider.
without knowing anything:
-
C is dependent on A
-
A and C is both dependent on c, which make them conditionally independent but not independent
-
A and C is independent
Given B:
-
A and C are independent
-
A and C are independent
-
A and C is dependent
Stable matching and five representative problems
Stable matching problem
Solution pseudocode:
Initially all m in M and w in W are free
While there is a free m
w highest on m’s list that m has not proposed to
if w is free, then match (m, w)
else
suppose (m2 , w) is matched
if w prefers m to m2
unmatch (m2 , w)
match (m, w)
python implementation:
def gale_shapley(*, A, B, A_pref, B_pref):
"""Create a stable matching using the
Gale-Shapley algorithm.
A -- set[str].
B -- set[str].
A_pref -- dict[str, list[str]].
B_pref -- dict[str, list[str]].
Output: list of (a, b) pairs.
"""
B_rank = pref_to_rank(B_pref)
ask_list = {a: deque(bs) for a, bs in A_pref.items()}
pair = {}
#
remaining_A = set(A)
while len(remaining_A) > 0:
a = remaining_A.pop()
b = ask_list[a].popleft()
if b not in pair:
pair[b] = a
else:
a0 = pair[b]
b_prefer_a0 = B_rank[b][a0] < B_rank[b][a]
if b_prefer_a0:
remaining_A.add(a)
else:
remaining_A.add(a0)
pair[b] = a
#
return [(a, b) for b, a in pair.items()]
Use case
matching internship applicant to companies. The matching should be self enforcing and have less chaos.
We want both parties to either:
- perfer thier choice of matching
- satisfied with the current selection and will not change.
In the applicant and companies example, this means: E prefers every one of its accepted applicants to A; or (ii) A prefers her current situation over working for employer E.
The algorithm terminates after at most n^2 iterations of the while loop
The return set S by the algorithm is a stable matching
Five representative problems
#Algorithm
- Interval scheduling
- Weighted interval scheduling
- Bipartite Matching
- Independent Set
- Competitive facility location
Interval scheduling
Usually is solved using some kind of greedy algorithm
Weighted interval scheduling
Usally solved using dynamic programming
Bipartite Matching
Given an arbitrary bipartite graph G, find a matching of maximum size. If |X|=|Y| = n, then there is a perfect matching if and only if the maximum matching has size n
It is like stable matching but without preferences.
there is not necessarily an edge from every x ∈ X to every y ∈ Y, so the set of possible matchings has quite a complicated structure. In other words, it is as though only certain pairs of men and women are willing to be paired off, and we want to figure out how to pair off many people in a way that is consistent with this.
Usally solved using augmentation, which is the key to a subset of problem s called network flow problems.
Independent Set
Given a graph G = (V, E), we say a set of nodes S ⊆ V is independent if no two nodes in S are joined by an edge.
For example, the maximum size of an independent set in the graph in Figure 1.6 is four, achieved by the four-node independent set {1, 4, 5, 6}.
This belongs to the class of prbolems called NP-complete problems.
Competitive facility location
Thus our game consists of two players, P1 and P2, alternately selecting nodes in G, with P1 moving first. At all times, the set of all selected nodes must form an independent set in G. Suppose that player P2 has a target bound B, and we want to know: is there a strategy for P2 so that no matter how P1 plays, P2 will be able to select a set of nodes with a total value of at least B? We will call this an instance of the Competitive Facility Location Problem.
It is considered in the class of problsm called PSPACE-complete problems. It is harder the NP-complete problems
Probabilistic-Context-Free-Grammar(PCFG)
A probabilistic context free grammar (PCFG) is a type of formal grammar that contains probabilistic functions in order to assign a probability to each production rule. PCFGs are a type of generative grammar, which means they can be used to generate sentences from a given set of rules. Unlike traditional context free grammars, PCFGs use probabilities instead of just binary values (true or false) to determine the likelihood for each production rule. This allows for the generation of more complex and natural-sounding sentences.
A generic PCFG consists of
Abbreviation meanings:
NN --> common nouns NNS --> common nouns(plural) NNP --> proper Nouns(singular)
- like name of people, places and things VB --> verbs base form VBG --> verbs with -ing
probabilistic context free grammar
[[Heuristic values]]
A* search algorithm:
The A* search algorithm is a popular search algorithm used in pathfinding and graph traversal. It combines the strengths of both Dijkstra's algorithm (which can only find a shortest path in a graph with non-negative edge weights) and the greedy best-first search algorithm (which can only find a shortest path to a target in a graph without negative edge weights).
- Initialize an empty list of nodes to be explored, called the "open list"
- Initialize a closed list of already-explored nodes
- Set the initial node as the current node and add it to the open list
- While the open list is not empty: a. Select the node in the open list with the lowest f score (cost function) b. Remove it from the open list and add it to the closed list c. Generate its successors (neighboring nodes) d. For each successor: i. If it is not in either list, compute its f score and add it to open list ii. If it is already in either list, check if using this path is a better route and update accordingly
- When all successors of current node have been evaluated, set current node = parent node and repeat steps 4-5 until goal state is reached
# create a set to store explored nodes
explored = set()
# create a set to store unexplored nodes
unexplored = set()
# create a dictionary to store the cost of getting to each node
cost = {}
# create a dictionary to store the best previous node for each node
previous = {}
# create a dictionary to store the estimated cost of getting to the end node from each node
estimated_cost = {}
# set the initial cost of getting to each node to infinity, since we don't know any better at the start
for node in graph:
cost[node] = float('inf')
# set the initial estimated cost of getting to the end node from each node to the heuristic cost
for node in graph:
estimated_cost[node] = heuristic(node, end_node)
# set the initial node to the start node and add it to the unexplored set
current_node = start_node
unexplored.add(current_node)
# loop until we either find the end node or there are no more unexplored nodes
while len(unexplored) > 0:
# find the node in the unexplored set with the lowest estimated cost
lowest_cost = float('inf')
lowest_cost_node = None
for node in unexplored:
if estimated_cost[node] < lowest_cost:
lowest_cost = estimated_cost[node]
lowest_cost_node = node
# if we've found the end node, we're done
if lowest_cost_node == end_node:
break
# move the current node from the unexplored set to the explored set
unexplored.remove(lowest_cost_node)
explored.add(lowest_cost_node)
# update the cost of getting to each neighbor of the current node
for neighbor in graph[lowest_cost_node]:
# skip any neighbors that are already in the explored set
if neighbor in explored:
continue
# calculate the cost of getting to this neighbor
new_cost = cost[lowest_cost_node] + graph[lowest_cost_node][neighbor]
# if the new cost is lower than the previous cost, update the cost and set the previous node for this neighbor
if new_cost < cost[neighbor]:
cost[neighbor] = new_cost
previous[neighbor] = lowest_cost_node
estimated_cost[neighbor] = new_cost + heuristic(neighbor, end_node)
# if the neighbor is not in the unexplored set, add it
if neighbor not in unexplored:
unexplored.add(neighbor)
# create an empty list to store the path
path = []
# set the current node to the end node
current_node = end_node
# loop until we get to the start node
while current_node != start_node:
# insert the current node at the start of the list
path.insert(0, current_node)
# set the current node to the previous node
current_node = previous[current_node]
# return the path
return path
Heuristic values
These are the values give to each node in relation to the goal. A simple example would be distance. If each city is a node, the heuristic values of a node can be the straight line between this city node and the goal city node. This value is a measurement on top of the path values in a graph.
Admissiblity
An admissble heuristics is a heuristics that never over estimates the true cost to the goal states. It estimate a number that is below the true cost.
Consistency
A consistent heuristics is a heuristics that adds up. Meaning that the forming of a heuristics value is based on previous heuristics. B node's heuristics value will be A nodes' heuristics value plus the path value between A and B.
A consistent heuristics function is also admissble. An admissble heuristics is not necessary consistent.
Fibonacci in time complexity of O(n)
# bottom-up dynamic programming
def fib_dp(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
fib_list = [0, 1]
for i in range(2, n + 1):
fib_list.append(fib_list[i-1] + fib_list[i-2])
return fib_list[n]
Explaination:
First, the function takes in an integer n
as an argument, which represents the position of the Fibonacci number we want to calculate.
The function starts by checking if n
is equal to 0 or 1. If it is, it immediately returns 0 or 1, respectively. This is because the first two numbers in the Fibonacci sequence are 0 and 1, so we can just return them directly without calculating anything.
Otherwise, the function creates an array called fib_list
and assigns the values [0, 1] to it. This array will be used to store the previously calculated Fibonacci numbers, so that we can use them to calculate the next number in the sequence.
Then, the function enters a for loop that starts at index 2 and goes until n+1
. This is because the first two numbers in the sequence have already been added to the fib_list
array.
Inside the for loop, the function appends the sum of the previous two numbers in the fib_list
array to the array. It does this by using the indices i-1
and i-2
to access the two previous numbers, and then adding them together.
At the end of the for loop, the function returns the last element of fib_list, which is the Fibonacci number at the position n
.
This implementation uses a bottom-up dynamic programming approach, where we store the results of previously computed Fibonacci numbers in an array, and use them to calculate the next number in the sequence. This avoids the need to re-compute the same numbers multiple times, which reduces the overall time complexity, and makes it O(n) .
Examples of time complexity of O(2^n)
def fib_rec(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib_rec(n-1) + fib_rec(n-2)
This implementation has a time complexity of O(2^n) because each call to the function generates two new calls, one for n-1 and one for n-2. Therefore, the total number of calls grows exponentially with the input size. For example, to calculate the 30th Fibonacci number, we would need to make more than 2 billion function calls.
The problem with this implementation is that it recalculates the same Fibonacci numbers over and over again, leading to a lot of redundant computation. For example, to calculate fib_rec(4) we need to calculate fib_rec(3) and fib_rec(2), but to calculate fib_rec(3) we also need to calculate fib_rec(2), which is redundant.
This is why the bottom-up dynamic programming approach has a better time complexity. It avoids the redundant computation by storing the already calculated values in an array and uses them to calculate the next values in the sequence.
for fib_dp(30), the time takes to run is 0.0000169000 on my machine.
for fib_rec(30), the time takes to run is 0.2176796000s on my machine.
Triangle with out vertex
Here is the code:
void main()
{
float x = -1.0 + float((gl_VertexID & 1) << 2);
float y = -1.0 + float((gl_VertexID & 2) << 1);
gl_Position = vec4(x, y, 0, 1);
}
This code allow us to draw a trangle without providing specific vertex coords.
Basics on rendering
Shaders are programs user wrote to run on some stage of the GPU.
The two most vital componenet shader are vertex and fragment.
vertex handles coordinates in the virtual space. fragment handles the color, depth and other stuff.
How does this programm work
Overview
through a method, we get x
and y
value, we then put them with the predetermined z
and w
value, which is set to 0 and 1, the standards for 2D graphics.
layer 1.
we get both calculations of x and y by minusing the output of float(things in here)
layer 2
we get the output of x
in float(things in here)
by bitshifting (gl_VertexID & 1)
to the left by 2. This means that we times the number we get from (gl_VertexID & 1)
by 4.
we get the output of y
in float(things in here)
by bitshifting (gl_VertexID & 2)
to the left by 1. This means thaht we times the number we get from (gl_VertexID & 2)
by 2.
layer 3
gl_VertexID
: This is a built-in variable in GLSL that provides the index of the currently processed vertex.
(gl_VertexID & 1)
extract the least significant bit(the big on the most right of the binary number). this is becuase it is comparing the binary number of gl_VertexID
with the binary number of 1, which is 0001
.
So if gl_VertexID
is 2, which is 0010
in binary, and we do an and operation with 0001
, we get 0000
, which is 0, which matches the least significant bit in 0010
.
The samee logic goes for (gl_VertexID & 2)
, but now we are getting the second least significant bit.
So..
if these logics are run for gl_VertexID
equals 1, 2, and 3, we will get the coordinates {-1, -1}, {3, -1}, and {-1, 3}, which is a triangle.
learning CRDT notes
what is CRDT:
CRDT stands for Conflict-free replicated Data Type, its a way to ensure concurrent data update between clients. They guaranteed convergence eventually as long as concurrent updates are commutative.
This is a topic in distributed systems.
CRDT address the problem of concurrency in distributed system. However it has limitations, such as the lack of consensus. It only address part of the problem since a lot of update operation is not communicative.
Types of CRDT:
- CvRDTs - Convergent replicated Data Types
- CmRDTs - Commutative replicated Data Types
State based replication
Replica received update from client, and then sometime later it sends its full state to other replica.
Replica receiving other's full state will merge their current state with the incoming state.
Every replication occasional does the above actions, hence every update eventually reach every replica in the system.
IF:
- set of value of the state form semi-lattice(a partially ordered set with a join/least upper bound operation)
- updates are increasing
- merge function computes the least upper bound THEN replica guaranteed to converge at the same value.
IF:
- if set of all possible states if semi-lattice THEN Merge operation has to be idempotent, associative, and commutative
Idempotent - operation on a element will produce the same result.
IF:
- Replica satisfy above points
THEN:
- Replica is CvRDTs.
Operation based replication
Don't send whole state broadcast update operations to all systems, expect each replica to replay them.
This may cause replica to received update operation in different order, so they have to be communicative.
Resource followed
- https://www.farley.ai/posts/causal
- https://medium.com/@istanbul_techie/a-look-at-conflict-free-replicated-data-types-crdt-221a5f629e7e
- https://en.wikipedia.org/wiki/Semilattice
- https://www.youtube.com/watch?v=3UkC3sXLqhQ
- https://www.youtube.com/watch?v=LCFf2DBTVmo
- https://www.youtube.com/watch?v=KbyVjwmzlpk
- https://www.youtube.com/watch?v=XJQqDDTNvJA
Things i learned from the creator of summatra pdf
2023-10-24
Yesterday while browsing through hacker news, I encounter this article written by the creator of summatra pdf, who summerized some of his insight on mainting a open source application by vey few individuals over a long preiod of time. Here are some of my take aways.
- Sometime good things come out of accidents. Summatra was an accidental app, and an accidenal sucess. It was not intentially created, but was created in a convinient time
What do you know: I had a simple PDF reader for Windows. I released it on my website. It couldn’t do much so I tagged it as version 0.1.
Getting early users, learning what features they want the most beats toiling for months or years and implementing lots of features before you know anyone even cares.
- Frequent release, but also release notable changes.
On one hand I don’t want to release too often but I also do want the users to get new features as quickly as possible. My policy of new releases is: release when there’s at least one notable, user-visible improvement.
- Documentation and presentation is important for everything, even an opensource software
From day one I created a website for the app. It had screenshots, it had documentation, it was easy to download and install.
-
Simplicity sells
-
Oftem times you product differs from the rest by simple tinggs.
Thinking outside of the box is hard because the box is invisible. SumatraPDF wasn’t the first PDF reader application ever written. But most PDF readers do not become multi-format readers.
-
With the limited amount of energy you have, do a few things excellently instead of many things mediocrely.
-
Overnight success takes decades
Success often takes a long time. Unfortunately, at that stage it’s undistinguishable from (eventual) failure so this wisdom doesn’t help you if you’re working on a not-yet-successful project and debating if you should continue or abandon
- Maintaining an open source project for such a long time takes true passion, because open source does not make money.
Here is all the stuff I learned about webGPU
I wanted to explore the possibility of web based graphics application. So this folder is going to be about all the webGPU stuff i learned and tried out.
Instead of a recording approach, i'm going to employ a "plan and execute" approach to the knowledge on webGPU, where i plan out want I want to learn, and go explore in those particular topic area. This way, I can avoid side-tracking.
there here are the notes
for webgpu study...
What is web gpu?
This is the first step of my learning. This post will focus on what is webgpu and how to have it on your browser
What is webGPU
WebGPU is a new API for the web, which exposes modern hardware capabilities and allows rendering and computation operations on a GPU, similar to Direct3D 12, Metal, and Vulkan.
Why webGPU
WebGPU offers access to more advanced GPU features and provides first-class support for general computations on the GPU. The API is designed with the web platform in mind, featuring an idiomatic JavaScript API, integration with promises, support for importing videos, and a polished developer experience with great error messages.
webGPU abrastraction
WebGPU is a very simple system. All it does is run 3 types of functions on the GPU. Vertex Shaders, Fragment Shaders, Compute Shaders.
Running webGPU.
Here is the webGPU examples page i found.
To see webGPU working, you need to have either chrome113 or edge113. Go to the example page and check out the simple renders. It should shown on the screen.
plan
need to read
-
https://toji.dev/webgpu-best-practices/
-
https://webgpufundamentals.org/webgpu/lessons/webgpu-fundamentals.html#a-drawing-triangles-to-textures
- https://google.github.io/tour-of-wgsl/
-
https://webgpufundamentals.org/webgpu/lessons/webgpu-fundamentals.html#a-drawing-triangles-to-textures
- https://wiki.nikiv.dev/computer-graphics/webgpu
- https://codelabs.developers.google.com/your-first-webgpu-app#0
- https://jack1232.github.io/webgpu00/
- https://www.freecodecamp.org/news/learn-webgpu-a-next-generation-graphics-api-for-the-web/
these are task i need to do
- Learn how to set up webgpu project
- render a triangle
- render conway's game of life
keywords
vertex shader - Vertex shaders are functions that compute vertex positions for drawing triangles/lines/points
Fragment shader - Fragment shaders are functions that compute the color (or other data) for each pixel to be drawn/rasterized when drawing triangles/lines/points
This is where i upload my notes on leetcode
- longest-repeating-character-replacement
- longest-palindromic-substring
- two-sum
- find-minimum-in-rotated-sorted-array
- contains-duplicate
- k-largest-element-in-array
- longest-substring-without-repeating-characters
- best-time-to-buy-a-stock
- longest-consecutive-sequence
- valid-palindrome
- valid-anagram
- 3sum
- group-anagrams
- valid-parentheses
- max-point-on-a-line
- container-with-most-water
- top-k-frequent-elements
- task-scheduler
- count-vowel-substrings-of-a-string
- maximum-subarray
- product-of-array-except-self
Longest Repeating Character Replacement
class Solution(object):
def characterReplacement(self, s, k):
"""
:type s: str
:type k: int
:rtype: int
"""
count = {}
res = 0
l = 0
for r in range(len(s)):
count[s[r]] = 1 + count.get(s[r] , 0)
while (r - l + 1) - max(count.values()) > k:
count[s[l]] -= 1
l += 1
res = max(res, r-l+1)
return res
Longest Palindromic Substring
class Solution(object):
def longestPalindrome(self, s):
"""
:type s: str
:rtype: str
"""
res = ""
resLen = 0
for i in range(len(s)):
l, r = i, i
while l >= 0 and r < len(s) and s[l] == s[r]:
if (r - l + 1) > resLen:
res = s[l:r+1]
resLen = r - l + 1
l -= 1
r += 1
l, r = i, i + 1
while l >= 0 and r < len(s) and s[l] == s[r]:
if (r - l + 1) > resLen:
res = s[l:r+1]
resLen = r - l + 1
l -= 1
r += 1
return res
Two Sum
description easy
accepted(one bang)
class Solution(object):
def twoSum(self, nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: List[int]
"""
for i in nums:
we_want = target - i
temp = nums[nums.index(i)+1:]
if we_want in temp:
index = temp.index(we_want) + nums.index(i)+1
return [nums.index(i), index]
faster solution using hashmap:
class Solution(object):
def twoSum(self, nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: List[int]
"""
map = {}
for i,n in enumerate(nums):
diff = target - n
if diff in map:
return(map[diff], i)
map[n] = i
return
like what this image protrayed, we keep a record of value:index in a hashmap. we iterate through the array, and once we find the second value, we are garrenteed to find the solution to the problem.
Find Minimum in Rotated Sorted Array
Table of Contents
my solutions
class Solution:
def findMin(self, nums: List[int]) -> int:
res = min(nums)
return res
class Solution:
def findMin(self, nums: List[int]) -> int:
res = float("inf")
for i in range(len(nums)):
if nums[0] < res:
res = nums[0]
temp = nums[-1]
if temp > nums[0]:
return nums[0]
nums = nums[0:len(nums)-1]
nums.insert(0, temp)
# print(nums)
return int(res)
binary search solution
class Solution:
def findMin(self, nums: List[int]) -> int:
res = nums[0]
l, r = 0, len(nums) - 1
while l <= r:
if nums[l] < nums[r]:
res = min(res, nums[l])
m = r - l // 2
res = min(res, nums[m])
if nums[m] >= nums[l]:
l = m + 1
else:
r = m - 1
return res
Contain Duplicate
easy desciption accept solution:
class Solution(object):
def containsDuplicate(self, nums):
"""
:type nums: List[int]
:rtype: bool
"""
unique = set()
for i in nums:
if i in unique:
return True
unique.add(i)
return False
failed solution:
class Solution(object):
def containsDuplicate(self, nums):
"""
:type nums: List[int]
:rtype: bool
"""
unique = []
for item in nums:
if item not in unique:
unique.append(item)
if len(unique) < len(nums):
return True
else:
return False
why? set is faster than list, it can find thing pin point, without search.
the
if len(unique) < len(nums):
return True
else:
return False
here did another search, slowed down the program a lot.
set
and list
are both data structures in Python, but they have some key differences:
- Duplicates:
set
automatically removes duplicates, whilelist
allows duplicates. - Order:
list
preserves the order of elements, whileset
does not guarantee the order of elements. - Indexing:
list
supports indexing, whileset
does not. - Mutability: Both
list
andset
are mutable (can be changed), but the methods available to change each are different. - Membership testing: Both
list
andset
allow you to check if an item is in the collection, but sets provide a faster way of checking for membership due to the hash table implementation.
In general, you would use a list
when you need to preserve order and potentially have duplicates, while you would use a set
when you want to eliminate duplicates and do not care about the order of elements.
K largest element in an array
class Solution(object):
def findKthLargest(self, nums, k):
"""
:type nums: List[int]
:type k: int
:rtype: int
"""
k = len(nums) - k
def quickSelect(l, r):
#choose splitter
pivot, p = nums[r], l
# partition numbers
for i in range(l, r):
if nums[i] <= pivot:
nums[p], nums[i] = nums[i], nums[p]
p += 1
nums[p], nums[r] = nums[r], nums[p]
# choose which side to do the recurrsion
if p > k:
return quickSelect(l, p-1)
elif p < k:
return quickSelect(p + 1, r)
else:
return nums[p]
return quickSelect(0, len(nums) - 1)
Longest Substring Without Repeating Characters
accepted answer:
class Solution(object):
def lengthOfLongestSubstring(self, s):
"""
:type s: str
:rtype: int
"""
charSet = set()
l = 0
res = 0
for r in range(len(s)):
while s[r] in charSet:
charSet.remove(s[l])
l += 1
charSet.add(s[r])
res = max(res, r - l + 1)
return res
Best Time to Buy and Sell Stock
accepted answer:
class Solution(object):
def maxProfit(self, prices):
"""
:type prices: List[int]
:rtype: int
"""
l, r = 0, 1
res = 0
while r < len(prices):
if prices[l] < prices[r]:
profit = prices[r] - prices[l]
res = max(res, profit)
else:
l = r
r += 1
return res
Longest Consecutive Sequence
accept answer(fifth try)
class Solution(object):
def longestConsecutive(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
if len(nums) != 0:
count = 1
else:
count = 0
return count
nums = sorted(nums)
print(nums)
temp = 1
for i in range(len(nums)):
print("temp", temp)
if i != len(nums)-1:
first = nums[i]
second = nums[i+1]
print(first, second)
if second - first == 1:
temp += 1
if temp > count:
count = temp
elif second - first == 0:
continue
else:
print("here")
temp = 1
return count
A good answer:
class Solution(object):
def longestConsecutive(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
numSet = set(nums)
longest = 0
for n in nums:
if (n - 1) not in numSet:
length = 0
while (n + length) in numSet:
length += 1
longest = max(length, longest)
return longest
Valid Palindrome
accepted answer
class Solution(object):
def isPalindrome(self, s):
"""
:type s: str
:rtype: bool
"""
# process string
s = s.lower()
s= str(s)
s = ''.join(filter(str.isalnum, s))
print()
# check
new_s = ""
for i in range(len(s)-1, -1, -1):
new_s += s[i]
if new_s == s:
return True
else:
return False
another solution with better run time:
class Solution(object):
def alnum(self, c):
return (ord('A') <= ord(c) <= ord('Z') or ord('a') <= ord(c) <= ord('z') or ord('0') <= ord(c) <= ord('9'))
def isPalindrome(self, s):
"""
:type s: str
:rtype: bool
"""
l, r = 0, len(s) - 1
while l < r:
while l < r and not self.alnum(s[l]):
l += 1
while r > l and not self.alnum(s[r]):
r -= 1
if s[l].lower() != s[r].lower():
return False
l, r = l + 1, r - 1
return True
Valid Anagram
easy
description accepted(one bang):
class Solution(object):
def isAnagram(self, s, t):
"""
:type s: str
:type t: str
:rtype: bool
"""
if len(s) != len(t):
return False
letter = {}
letter2 = {}
for i in s:
if i not in letter:
letter[i] = 1
else:
letter[i] += 1
for i in t:
if i not in letter2:
letter2[i] = 1
else:
letter2[i] += 1
if letter != letter2:
return False
else:
return True
3 sum
description
accepted answer
class Solution(object):
def threeSum(self, nums):
"""
:type nums: List[int]
:rtype: List[List[int]]
"""
res = []
nums.sort()
for i, a in enumerate(nums):
# we dont want duplicates
if i > 0 and a == nums[i - 1]:
continue
l, r = i+1, len(nums) - 1
while l < r:
threeSum = a + nums[l] + nums[r]
# too big, go left to go small
if threeSum > 0:
r -= 1
elif threeSum < 0:
# too small, go right to go small
l += 1
else:
# add
res.append([a, nums[l], nums[r]])
# go left to go to another iteration
l += 1
# go left if duplicate
while nums[l] == nums[l - 1] and l < r:
l += 1
return res
medium
submitted(failed due to exceed runtime):
class Solution(object):
def isAnagram(self, s, t):
if len(s) != len(t):
return False
letter = {}
letter2 = {}
for i in s:
if i not in letter:
letter[i] = 1
else:
letter[i] += 1
for i in t:
if i not in letter2:
letter2[i] = 1
else:
letter2[i] += 1
if letter != letter2:
return False
else:
return True
def groupAnagrams(self, strs):
"""
:type strs: List[str]
:rtype: List[List[str]]
"""
output = []
searched = []
for i in strs:
i_index = strs.index(i)
temp_strs = strs[i_index+1:]
if i in searched:
continue
tempList = []
tempList.append(i)
for j in temp_strs:
if self.isAnagram(i, j) == True:
tempList.append(j)
if j not in searched:
searched.append(j)
print(tempList)
output.append(tempList)
return output
accepted answer:
class Solution(object):
def groupAnagrams(self, strs):
"""
:type strs: List[str]
:rtype: List[List[str]]
"""
res = defaultdict(list)
for s in strs:
count = [0] * 26
for c in s:
count[ord(c) - ord("a")] += 1
res[tuple(count)].append(s)
return res.values()
Explaination: This basically means that the program is counting the letter count of the a word, and turn it into a key to the dictionary. Then it will use this key compare it against the key generated based on the next word. the keys match, the program put tha word in list, and it output it in the end. The dictionary will look like:
[key]:[list of words with same key]
...
This has a run time of O(number of string * number of chatacter)
count[ord(c) - ord("a")] += 1
This line takes the ascii number of the current chatacter, and minus it with the ascii number of "a", which is the equivelent of getting the index number of this chatacter in relation to the alphabet.
Valid Parentheses
class Solution(object):
def isValid(self, s):
"""
:type s: str
:rtype: bool
"""
stack = []
closeToOpen = {")": "(", "]": "[", "}": "{"}
for c in s:
if c in closeToOpen:
if stack and stack[-1] == closeToOpen[c]:
stack.pop()
else:
return False
else:
stack.append(c)
return True if not stack else False
Max point on a line
class Solution(object):
def maxPoints(self, points):
"""
:type points: List[List[int]]
:rtype: int
"""
n = len(points)
if n == 1:
return 1
result = 2
for i in range(n):
count = collections.defaultdict(int)
for j in range(n):
if j != i:
count[math.atan2(points[j][1] - points[i][1], points[j][0] - points[i][0])] += 1
result = max(result, max(count.values()) + 1)
return result
Container with most water
accepted answer:
class Solution(object):
def maxArea(self, height):
"""
:type height: List[int]
:rtype: int
"""
res = 0
l, r = 0, len(height)-1
while l < r:
area = (r - l) * min(height[l], height[r])
res = max(res, area)
if height[l] < height[r]:
l += 1
elif height[r] < height[l]:
r -= 1
else:
l += 1
return res
Top K Frequent Elements
Accepted answer(third try):
class Solution(object):
def topKFrequent(self, nums, k):
"""
:type nums: List[int]
:type k: int
:rtype: List[int]
"""
dic = {}
for i in nums:
dic[i] = 0
for i in nums:
dic[i] += 1
sorted_dic = sorted(dic.items(), key=lambda x: x[1], reverse=True)
output = []
for i in range(k):
output.append(sorted_dic[i][0])
return output
A answer that suppostly have a better run time of O(n), but actually slower than my code
class Solution(object):
def topKFrequent(self, nums, k):
"""
:type nums: List[int]
:type k: int
:rtype: List[int]
"""
count = {}
freq = [[] for i in range(len(nums) + 1)]
for n in nums:
count[n] = 1 + count.get(n, 0) #if no n in count, value set to 0, if there is n, + 1 to the old value
for n, c in count.items():
freq[c].append(n) # a list where the index is the number of occurence, and the sublist is the number who have that number of occurrence.
output = []
for i in range(len(freq) - 1, 0, -1): # going in decensing order from lenth-1 to 0
for n in freq[i]:
output.append(n)
if len(output) == k:
return output
This is visual representation:
Task Scheduler
class Solution:
def leastInterval(self, tasks: List[str], n: int) -> int:
count = Counter(tasks)
maxHeap = [-cnt for cnt in count.values()]
heapq.heapify(maxHeap)
time = 0
q = deque()
while maxHeap or q:
time += 1
if maxHeap:
cnt = 1 + heapq.heappop(maxHeap) # processing a case, so you decrease count by 1. In this case adding is subtracting, becuase we are treating a minheap as max heap
if cnt:
q.append([cnt, time + n])
if q and q[0][1] == time:
heapq.heappush(maxHeap, q.popleft()[0])
return time
Count Vowel Substrings of a String
class Solution
{
public int countVowelSubstrings(String word)
{
int vow = 0;
int n = word.length();
Set<Character> set = new HashSet<>();
for(int i = 0; i < n-4; i++)
{
set.clear();
for(int j = i; j < n; j++)
{
char ch = word.charAt(j);
if(ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u')
{
set.add(ch);
if(set.size() == 5){
vow++;
}
}else{
break;
}
}
}
return vow;
}
}
Maximum subarray
class Solution(object):
def maxSubArray(self, nums):
"""
:type nums: List[int]
:rtype: int
"""
maxArray = nums[0]
currArray = 0
for n in nums:
if currArray < 0:
currArray = 0
currArray += n
maxArray = max(maxArray, currArray)
return maxArray
Product of Array Except Self
Medium attempt 1(exceed time limit)
class Solution(object):
def productExceptSelf(self, nums):
"""
:type nums: List[int]
:rtype: List[int]
"""
dic = {}
for i in nums:
temp = nums[:]
temp.remove(i)
dic[i] = temp
print(dic)
output=[]
for i in nums:
start=1
for j in dic[i]:
start = start * j
output.append(start)
return output
accpeted
class Solution(object):
def productExceptSelf(self, nums):
"""
:type nums: List[int]
:rtype: List[int]
"""
output = [1] * (len(nums))
prefix = 1
for i in range(len(nums)):
output[i] = prefix
prefix *= nums[i]
postfix = 1
for i in range(len(nums) - 1, -1, -1):
output[i] *= postfix
postfix *= nums[i]
return output
This bascially means that we times every number before the current number and every number after the current number together.
Create a list name output and inilizae them with 1
create variable prefix = 1
for every index in nums
output[index] = prefix
prefix times itself with nums[index]
create variable postfix =1
for every index in nums backwards:
output[index] times itself with postfix
postfix times itself with nums[index]
return output list
List of errors I've encountered
- Having-X11-Wayland-on-wsl
- Trouble-shooting-Armoury-crate-Intergrated-and-deciated-GPU-switch
- ref.on-is-not-a-function
2023-03-25
During this spring break i've decided to learn rust, and what best way to do it other than writing small projects. One of which is a small defender game i want to write following this tutorial.
However, I encounter an error, which contains the following:
thread 'main' panicked at 'Failed to initialize any backend! Wayland status: XdgRuntimeDirNotSet X11 status: XOpenDisplayFailed'
This turns out to be a problem with the windows linux subsytem's problem. So in order to counter this problem, i found out that there is this github repo , wslg, that is dedicated to this problem.
This also brings out another problem, which relates to the wsl version that i was using. During previous usage, especially during web development, wsl2 showed a really slowed performance. So this whole time I was using wsl1. However, wslg requires wsl2 in order to function, so that means I'll have to find a counter to that.
Fortunatly, i did find a possible solution.
Trouble shooting Armoury crate Intergrated and deciated GPU switch
2022-06-29
I’ve been having trouble switching between my intergrated dedicated GPU using Armoury crate. There is a process called “textinputhost.exe” that cannot be terminated, and therefore blocking Armoury crate from executing the switch.
I’ve tried solution proposed on this post, and it worked the last time, but this time it did not solve the problem.
I then unpluged my computer, force shutdown my computer instead of using the default reboot (At this point the touch keyboard service is shutdown) and restarted it after about two minutes. Immediatly after the restart I’ve opened Amourary crate and switched to the intergated GPU mode, and it worked.
When I restarted Touch keyboard serivice and restarted the computer, the condition still holds. However, when i plug in the computer to power, the textinputhost.exe poped back on.
I noticed that in task manager, a TabletInputServices is using the TouchKeyboard services. So I went into setting and disabled it.
That did not work.
I then went and goto setting and forced textinputhost.ext to use intergrated GPU: but it did not work.
I finally went to desktop, rightclick went to Navida’s graphic setting, and turned the perfered GPU to intergratd GPU. And it worked.
This article also helped.
#error #javascript #firebase The error means that it reconise the ref object, but does not reconise the method .on after the ref.
code caused this problme
blog.ref('blogs').on('value', (snapshot) => {
const data = snapshot.val();
for (let key in data) {
blogData.push(data[key]);
}
});
code that fixed the problem
let blogData_firebase = get(blogRef).then((snapshot) => {
let blogData_temp = snapshot.val();
return blogData_temp;
}).catch((error) => {
console.error(error);
});
blogData_firebase.then((value) => {
setBlogData(value);
})
better code
useEffect(() => {
const db = getDatabase();
let blogRef = ref(db, 'blogs/blogs');
const offFunction = onValue(blogRef, (snapshot) => {
const valueObj = snapshot.val();
const objKeys = Object.keys(valueObj);
const objArray = objKeys.map((keystring) => {
const blogObj = valueObj[keystring];
blogObj.key = keystring;
return blogObj;
});
setBlogData(objArray);
});
function cleanup(){
offFunction();
}
return cleanup;
}, []);
Cool tech blogs
This is the collection of some interesting tech blogs i found:
- uwe-arzt.de
- tw93
- table.dog
- evan.gg
- web.dev
- youngforest
- nvbn
- wattenberger
- taniarascia
- strml
- acko
- lynnandtonic
- jvns
- matthew.science
- mrmr
- fourmilab
- hacker news outliers
- danluu
- jeffhuang -> his's a fellow UW student! (PHD in information science)
- mary.codes
- ciechanow
Applying ML concept on my own data
2024-02-08
So I just got off class and want to fiddle around with the concept i just learned: bag of words, term document matrix, context vector and word embedding.
So I trained a model using the text from here. Here is the code.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import nltk
= """
d_Making_it_some_what_dynamic Making it some what dynamic..."""
= """
d_Making_it_properly_dynamic Making it properly dynamic..."""
= """
d_detection_script_debug What's new detection script debug and fix..."""
= """
d_Website_server_migrationWebsite server migration..."""
= """
d_Creating_mdbook_summary_maker Creating mdbook-summary-maker - a summary.md automation tool..."""
= r"""
d_creating_this_website Creating this website..."
""
= [d_Creating_mdbook_summary_maker, d_creating_this_website, d_detection_script_debug, d_Making_it_properly_dynamic, d_Making_it_some_what_dynamic, d_Website_server_migration] documents
= CountVectorizer(stop_words="english")
vrizer
vrizer.fit(documents)= vrizer.transform(documents)
X print(X.shape)
= pd.DataFrame(X.toarray(),
dtm =vrizer.get_feature_names_out())
columns dtm
(6, 909)
00 | 02 | 05 | 07 | 10 | 12 | 16 | 17 | 200 | 2020 | ... | wraped | write | writing | written | www | wwwroot | xxxx | year | yes | your_website_url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 3 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 3 | 0 | 3 | 2 | 0 | 1 | 1 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
6 rows × 909 columns
= cosine_similarity(dtm)
cosine_sim_matrix_for_antzed
cosine_sim_matrix_for_antzed.shape = [
document_names "d_Creating_mdbook_summary_maker",
"d_creating_this_website",
"d_detection_script_debug",
"d_Making_it_properly_dynamic",
"d_Making_it_some_what_dynamic",
"d_Website_server_migration"
]
# Initialize variables to store the maximum similarity and the document indices
= 0
max_similarity = 0
doc_index_1 = 0
doc_index_2
# Iterate over the matrix to find the pair with the highest similarity
for i in range(6):
for j in range(i+1, 6): # Ensure no self-comparison
if cosine_sim_matrix_for_antzed[i, j] > max_similarity:
= cosine_sim_matrix_for_antzed[i, j]
max_similarity = i
doc_index_1 = j
doc_index_2
# Print the document names with the highest similarity and their similarity score
print(f"The documents with the highest similarity are \"{document_names[doc_index_1]}\" and \"{document_names[doc_index_2]}\" with a similarity score of {max_similarity:.4f}.")
The documents with the highest similarity are "d_detection_script_debug" and "d_Making_it_some_what_dynamic" with a similarity score of 0.4253.
import gensim
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
from gensim.utils import simple_preprocess
from nltk.corpus import stopwords
'wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('stopwords')
nltk.download(
= set(stopwords.words('english'))
stop_words = WordNetLemmatizer() lemmatizer
[nltk_data] Downloading package wordnet to /home/jovyan/nltk_data...
[nltk_data] Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /home/jovyan/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /home/jovyan/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
from gensim.utils import simple_preprocess
from gensim.parsing.preprocessing import preprocess_string
def get_wordnet_pos(word):
"""Map NLTK's part of speech tags to wordnet's"""
= nltk.pos_tag([word])[0][1][0].upper()
tag = {"J": wordnet.ADJ,
tag_dict "N": wordnet.NOUN,
"V": wordnet.VERB,
"R": wordnet.ADV}
return tag_dict.get(tag, wordnet.NOUN)
def preprocess_text(text):
= simple_preprocess(text, deacc=True) # Tokenize into words
tokens = [token for token in tokens if token not in stop_words]
tokens_no_stops = [lemmatizer.lemmatize(token, get_wordnet_pos(token)) for token in tokens_no_stops]
lemmatized_tokens return lemmatized_tokens
# Tokenize and preprocess all documents
= [preprocess_text(doc) for doc in documents]
processed_sentences
# Flatten the list if your model expects a list of sentences as input
= [word for sublist in processed_sentences for word in sublist] flat_list_of_words
print(flat_list_of_words[:5])
['create', 'mdbook', 'summary', 'maker', 'summary']
from gensim.models import Word2Vec
# Now that we have our sentences preprocessed and lemmatized, we train the model
= Word2Vec([flat_list_of_words], vector_size=100, window=5, min_count=1, workers=4)
model
# Save the model for later use
"lemmatized_word2vec_model.model") model.save(
'rust', topn=10) model.wv.most_similar(
[('ssh', 0.30321794748306274),
('target', 0.28839021921157837),
('enviroment', 0.28243470191955566),
('examine', 0.2718653678894043),
('succeful', 0.26447442173957825),
('touppercase', 0.2616710662841797),
('assist', 0.2575630247592926),
('true', 0.25681009888648987),
('familiar', 0.25594884157180786),
('doen', 0.2559070289134979)]
'dynamic', topn=10) model.wv.most_similar(
[('top', 0.25962546467781067),
('variable', 0.2449072003364563),
('semi', 0.23804989457130432),
('later', 0.23783016204833984),
('could', 0.23667342960834503),
('lastindexof', 0.22947242856025696),
('indeed', 0.22467540204524994),
('easy', 0.22087810933589935),
('detection', 0.21729730069637299),
('directory', 0.21623274683952332)]
'website', topn=10) model.wv.most_similar(
[('avaiable', 0.3144473731517792),
('fail', 0.3111109435558319),
('probably', 0.3086855113506317),
('wm', 0.306361585855484),
('host', 0.28776368498802185),
('save', 0.2712177634239197),
('backup', 0.26919832825660706),
('code', 0.2683839201927185),
('folder', 0.2653118073940277),
('etc', 0.2603006362915039)]
'antzed', topn=10) model.wv.most_similar(
[('in_file_path', 0.3186494708061218),
('template', 0.30744123458862305),
('relationship', 0.29583120346069336),
('port', 0.295200377702713),
('filename', 0.2635174095630646),
('allows', 0.24324464797973633),
('preprocesor', 0.2392539530992508),
('item', 0.2337876856327057),
('treat', 0.22798201441764832),
('malfunction', 0.22339922189712524)]
'problem', topn=10) model.wv.most_similar(
[('empty', 0.25634294748306274),
('us', 0.24955253303050995),
('template', 0.2491423785686493),
('junk', 0.24703580141067505),
('filter', 0.2332962602376938),
('bash', 0.23113256692886353),
('enter', 0.22585271298885345),
('sign', 0.2233218401670456),
('node', 0.21544311940670013),
('website', 0.21240629255771637)]
We can see that the detection script debugging post has the highest similarity to making it somewhat dynamic post, which was surprising.
We can also see that some of the words, such as 'rust' is used along side 'ssh'. Although it seems like there aren't a lot of close context between words in my blogs.
Financial blog
Quantitative trading
This is the note on all the things i learned while reading the book Quantitative Trading by Ernest Chan
What is Quantitative trading
Trading of stock through algorithmic strategy automated by programming.
Basic
how to find the right strategy for me
work hours
automation can turn a seemingly full time endeavour to part time.
programming skill
need some programming expirence to trade in high frequency. Otherwise stick to once a day trade.
Trading capital
Preferabily 50,000+, 100,000+ is consider high capital Capital also determines how you are going to trade.
Benchmark returns are market index of security you are trading belong
This is the Opinion section
...and this is where I share my opinions on things.
- a-reading-system-for-social-media-addicts
- today-i-learned/
- the-strange-tale-behind-my-phone-number
- a-curated-list
- MHW-log
- yearly-summary/
- personal-management/
- 随笔/
A reading system for a social media addicts
2023-04-10
I am but two things: a dedicated student who wants to read everything he can get his hands on, and a social media junkie.
This post will specify how I used my newly gained interests in self-hosting to help me read more as a social media addict who can't stop scrolling on his phone.
Table of Contents
The idea
Scrolling. Scrolling is the key to get me off from social media to reading books. Since social media have made me develop the habit of scrolling non-stop for hours, a corresponding system where books that can be read though non-stop scolling can be used to displace social media. It's like nicotine addict using vaping as a subsititue for cigarettes.
The system
The main component of this system uses Kavita, an open source library service.
On top of that you also need a place to get books. My favorite is libgen, and a tool to input the books.
Now depending on where you hosts you Kavita, the method of putting in book into your virtual library will be different. Since i wanted to avoid the hassel of renting a server of my own, I used pikapods, a great platform for hosting a variety of open source services.
If you hosting it on your own machine, you will need to download the installer from their website, and you should be able to access a version of Kavita on your local machine on http://localhost:5000
.
If you decided to use pikapods, you can first try it out by register a free account. They'll gift you a 5 dollars coupon upon new registration. Just search Kavita in this page and click the run your own button, then follow the instructions to create a new pod. The defualt server configuration should be enough Kavita's need.
Pikapods also allows sftp file tranfer. So get to a terminal on you machine or download a ftp app on you phone to login to the server. You identification should be specified in the settings of the server under Files
Epub, never pdf
This is a key point in this system. Always prioritize Epub over pdf. Becuase Epub allows the text to be formated to one chapter per page. So you can scroll as much as you want when reading the book. Pdf will have too many pages. Too many pages means too many page breaks, which means too many distractions.
PDFs are also large, which can be expensive to store. Epub often comes with smaller sizes
The flow
- Get hold of a name of the book
- Search on sites like Libgen, and download the book (perferably epub, if not pdf)
- upload to server through various means
- Scan library through Kavita and get the new books
Some things to note
Here are the file structure guides and naming convension
Today I learned..
2023/05/25
Today i learned that if you put a space in front of a command it will not be recorded in shell history(for most shell that is)
Today I learned: 2023-1-8
..that you can tell if a egg has gone bad or not by placing it in a bowl of cold tap water. If the egg sinks to the bottom and lay on its sides, it's good. If it float in any shape form, it's bad(becuase there is air forming in it)
The strange tale behind my phone number
2023-05-04
Table of Contents
The start
I got my phone number back in boston along with my dad at an AT&T store. It was during a summer travel to boston for vacation and visiting colleges. We went to this small AT&T store with lots of round table and got myself with a phone number and a data plan.
What's going on with my phone number?
Since last year, I've been consantly receiving text from various people asking to sell my "house". Of course, it is not me who have the house, but a fellow named John. I always ignored the texts, until they become somewhat unbearable. So I decided to look into it for some fun.
Findings so far
The first thing to look into is the house:
After a few quick searches, I discovered that the house is a 3 bedrooms, 3 bathrooms, 2300 sqft condo that was built in 1910. The house was sold 3 times, one in 1987, one in 1989 and one in 2001.
Also according to some searches, the house is undergoing the process of foreclosure.
Interestingly, it was planned to appear on an online auction recently, but was cancelled. I wonder if that has anything to do with me not answering their text.
The second is the person, John:
Apprently, he, along with Madeline(I'm assuming that's his wife) owns that condo. They are both in thier 50s, and is currenty live in some other place in Boston. They have three company under their name with a focus on servicing limos.
Some other lore
My phone number also have been used by some chinese docter names Docter Zhou, so far, not a lot of text have been sent to my way, so i'm going to come back later if there's more coming in.
A curated list of my writings
Updated on 2024-02-18:
- About how I manage my life
- Creating and maintaining this website
- Some interesting blog that i follow
- A closer look at Quake 3's fast inverse square root algorithm
- some interesting findings of my writings through the lens of machine learning
MHW武器配装与各种龙打法经验
2020-07-16
这里记录一下我玩MHW的各种经验。目前我我有大概100小时的游玩时间,主线全部通关,MR跟HR都在50级左右。现在主玩锤子,不过以后可能会更深入的探索轻弩。
锤子
锤子配装
我目前还没有一套完整的锤子配装,经过与大佬的多番讨论,我收到了以下建议:
锤子配装1:
锤子配装2:
这两个配装都有从猛爆碎龙打出来的铁腕头盔贝塔以及铁腕铠甲贝塔。而两者分歧的地方在于腕甲,腰甲跟护腿。
配装1走的路线是煌黑龙三件套,但是由于我的锤子是冥赤龙的爆破锤,没有什么属性,打煌黑龙极其费劲,所以目前这套装备无法实现。
而配装2走的路线是大师灭尽龙三件套。这个配装我目前也无法实现。但是这主要是因为大师灭尽龙需要在MR100的时候才能打。相比较于配装1,实现配装2更为容易。
而两个配装在珠子上都是差不多的。主要需要堆满的就是攻击,看破跟挑战者属性,而耳塞属性则用耳塞护石等道具来提高。
锤子打法
我是用手柄来打的,所以PC玩家需要找到相对应的按键。
锤子作为一个相对简单的武器不需要操作并不是很难,其主要目标应该是龙的脑袋。一个常用连招为:
RT + B 蓄力三下站桩打击三下(跑到龙面前然后不动 + 松开左摇杆 + 松开RT)+ LT钩爪抓住龙的脑袋 + Y
这一套连招不仅能够打出伤害,同时还能软化龙皮以及增加龙的眩晕值。
而同时带上挑打击的连招
Y + Y + Y + Y + Y
RT + B 蓄力三下旋转打击五下(跑到龙面前不松左摇杆 + 松开RT) + Y
RT + B 蓄力两下
都可以和很好的从龙的下面打到龙的下巴,从而增加眩晕值。
Yearly summaries
2023-2024 yearly summary
2024-06-16
Hi, today I turn 22! Many things happened this year, and I think this is one of the most important year of my life so far.
My graduation ceremony, by my mom
School and work
First of all, I finally graduate from college! It was an interesting experience. Covid has cut my in-person college short by one year, so it feel rather quick that all of a sudden I'm exiting out of college.
It is also the first time that I've experience a LARGE graduation ceremony. There were 7000 graduates students this year, and they spend almost 30 minutes on giving certificates to the phd students alone.
However, graduating does not mean that my academic journey is over, because for the next 1-2 years I will studying for a master of computer science degree at Northwestern University. I will be going from one purple school to another.
Looking back at the past year, I'd say that my senior year is when i experienced the most growth. I wanted to be able to eliminated the things I don't know I don't know. so I started attending colloquiums and seminars and learn about academic research on topic such as robotics, AI, international policies and marine biology.
I also completed my graduating capstone project. It was on the topics of AR/AI in healthcare and it has taught me many valuable lessons in regards to teamwork. The capstone ended okayish. I guess the biggest lesson that i've learned from this project is to know when to have meetings and when not to. As a developer, meetings should be alloted of co-programming and solving bugs. As a manager, meetings should be made to coordinating, future planning and set goals. An inefficient use of meeting would be to repeatedly do the sale pitch to the people who's already on board on the idea. Wasting the time of your workers ia good way of making them leave.
Finally, some interesting courses i've taken is
- INFO 371 advance data science methods with ott
- INFO 443 Software architecture with Alexander Bell-Towne
- INFO 351 Information ethics and policy Katherine Cross
- CLAS 101 Current use of greek and latin in English
- CLAS 205 bioscientific vocabulary building from latin and greek
- ACCTG 205 Essentials in accounting
Life and relationship
My relationship with Alieen is still going strong. Obviously my graduation add a lot of complications to the equation, but no biggie. We will strive through.
I also got my first car. A used Toyota Venza. Since I just got it, I haven't named it. However, I'm sure it will be my loyal war-rig for the next phase of my life.
Personal achievement
This year wasn't the most technical year for me due to completing graduation requirement and grad school application. However, I managed to
- building and testing simple muti-layer perceptron and CNN using tensorflow.
- Learn about software design patterns. Analyze and refractor my previous website codebase.
- Build uni, my personal automation cli using rust.
Beyond the technical, I've also managed learn to use a swath personal management tool and techniques. This proofs to be a wonderful improvement to life. So much so that I've managed to be on the dean's list all through out this school year.
Goal for next year
- Successfully find internship in either seattle or California.
- Take courses in distributed systems and deep learning at a gradate level.
- Decided which track I will need to pursue to finish my graduate degree at Northwestern. Be it the project track, or the research paper track.
- Continue my relationship with Aileen
- Keep swimming and exercising
- Each health, drink more water and sleep no later than 12am.
So thats about it. See you next year!
2022-2023 yearly summary
2023-06-16
Hi, we meet again. Today I turned 21, so it is time for another yearly summary. What this also means is that i can finally procure alcohol leagally. So i'm currently writing this summary after downing two-third of a bottle of ice wine.
Mount Rainier, by me
What have i acheived
School & work
- I've finally got into the informatics major
- I've taken several coding classes and CS class that covers topic like full stack development, AI, algorithms, and data science.
- I've gotten myself a summer intership here at seattle.
Relationship
- My relationship with Aileen is still going and its going quite well
- I've made several new friend from groups projects in class and other activites such as mario kart competition.
Technical achivements
- I've learned how to do full stack development.
- I've got in touch with all of the basic CS algorithm that will pop up in SWE interviews
- I've largely automized antzed.com, this includes writing two preprocessors for the website in Rust.
- I've learned the basics concept of AI
- I've learned the basics of data science using python
- I've learned and incorperated tools like ChatGPT into daily coding.
- I've established personal twitter using memos and personal library using kavita
- I've lightly contributed to serveral repository on github.
Goals for next year
- apply and get into a good grad school
- reading more on the various of other programming jobs besides SWE, such ML, Quant and game engine dev.
- continue my relationship with Aileen
- Drive around more
- Write more on various things now that the process is heavely automized.
- exercise more and drink more water
- Find my passion
Last thoughts
This year is an interesting year, because I finally got used to the live in the US. However, my current lifestyle is only kept alive by my parents' "generous donation, and will probably crumble after i graduate, so we will see what will happens.
I've also gotten used to the idea of working, and finally started to enjoying the process of bettering myself. My cravings for laziness have gradually diminished, and instead is replaced by my curiosity, which is good.
I hope I will become a man that will do great things.
See you next year!
2021 – 2022 yearly summary
2022-06-15
Tomorrow is my birthday so I want to do a summary of the past year. I moved to Seattle last year during September to study at the UW, and this was the first time that I’ve left my home and my parents for more than 3 months. Unfortunately, Covid took a year of college experience away from me, but after I finally got here, the experience was better than I’d expected.
The Drumheller fountain, by me.
What have i achieved
Important things I’ve Achived
- I’ve taken and finished many cources during the three quater of school
- I’ve got myself into a relationship
- I’ve made new friends in college
- I’ve managed to independently survive longer than 3 month
Technical achivement
- I’ve learned to program in SQL and create two database as class project
- I’ve learned to program in Go, and successfully created a CLI.
- I’ve dapped into bash scripts, ruby and lua in attempt to create the CLI before using Go
- Planned and run Husky expo
- Brought back this blog website
Places I’ve visited
- Seattle (Space needle and its surroundings)
- Leavenworth
- Snoqualamie
Notable Film and shows watched
- Arcane season 1
- Station eleven season 1
- Love Death and Robot season 3
- Demon Slayer season 2
- Made in abyss season 1
Challanges
The most difficult part of this year was my adaptation to college life. I thought at the beginning of the year that I was fully capable of handling myself on my own, yet the actual experience was not as smooth. Burnouts were common throughout the year, but I believe through practice I can better manage my time and energy.
It was also quite difficult to find energy to explore beyond my academic life, which I believe is caused by both the loss of comfort(of being at home) and my less than optimal sleeping and exercising routine.
What to look forward to next year
- More technical work done(project, interns.)
- More blog posted(target: 10 articles)
- exercise regularly(at least two to three times a week)
- Have regular sleeping sechedule.
So, see you in 2023
Personal management
A fight with todo list
2023-02-15
As a student, I'm always in a sea of tasks, and through out the years i've tried many method of managing them. Unfortunatly, none of them is perfect. Some of them worked for a quater or two, but most failed. I often find myself unmotivated to continue to create, view and check off tasks after a certain "hyped" period.
In my opinion, there are two main reasons why a task managing system will fail:
- It is too hard to interactive with
- It has no priority management
Lets talk about each point:
During a previous iteration of my task mangaement system, i used miro, and created something like this:
I attempted to create a todo list like a mindmap, where i link tasks to each other and subtasks. The idea being that task are eaiser to manage when it works more closely to the human mind.
After using this method for half of autumn quater 2022, I concluded the following:
- The mind map method does help with understanding the nature of my tasks. It helps me to decided if the task i'm looking at is worth my time.
- The mind map is really hard to maintain. I cannot quickly enter a task when i need to, I have to find "space" and organise the stick notes on the boards.
Currently i've been testing out another methods of mine, which involves Obsidian notebook, its tasks and the DayPlanner plugin, and a todo script.
I sepereated my tasks into three different catagories:
- Homework
- School
- Personal and i keep track of them in a Task.md file. In which i have the following query code:
- A due to day section:
(happens today) OR (happens tomorrow)
(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal )
NOT (done)
sort by due
Explanation of this Tasks code block query:
(happens today) OR (happens tomorrow) => OR (At least one of): due, start or scheduled date is on 2023-02-15 (Wednesday 15th February 2023) due, start or scheduled date is on 2023-02-16 (Thursday 16th February 2023)
(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal ) => OR (At least one of): tag includes #Homework tag includes #School tag includes #Personal
NOT (done) => NOT: done
- A happend before, happening today and due tomorrow section:
(happens today) OR (due tomorrow) OR (happens before today)
(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal)
NOT (done)
group by tags reverse
sort by due
Explanation of this Tasks code block query:
(happens today) OR (due tomorrow) OR (happens before today) => OR (At least one of): due, start or scheduled date is on 2023-02-15 (Wednesday 15th February 2023) due date is on 2023-02-16 (Thursday 16th February 2023) due, start or scheduled date is before 2023-02-15 (Wednesday 15th February 2023)
(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal) => OR (At least one of): tag includes #Homework tag includes #School tag includes #Personal
NOT (done) => NOT: done
- and a all task section
(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal)
sort by due date
group by tags
NOT (done)
Explanation of this Tasks code block query:
(tag includes #Homework) OR (tag includes #School ) OR (tag includes #Personal) => OR (At least one of): tag includes #Homework tag includes #School tag includes #Personal
NOT (done) => NOT: done
However, after 6 weeks of use, i noticed that this method does not handle priority well. Due to the fact that i've made it so easy to create a task, I often put down tasks that are not well thought off.
For example:
As you can see here, I want to do several reading that are related to programming. But as of the time of writing, none of these task have been done, and they are well pass due date.
The first thing that is wrong with these task are their wording. They are uses vague phrase like "skim through", "read". These makes me feel unmotivated as the intructions are unclear.
Another things is that these readings are important to me, so when i created them I specified deadlines. However, because they do not have the immediate urgency similar to a homework, I often have to place them under tasks that have a Homework or School tag, and then subsequently forget about them.
Plan for improvement
So what is my plan for improvements?
- section proritize urgency -> sections prioritize importance(following the Eisenhower Matrix)
I will focus on only managing important tasks and avoid unimportant tasks(especially when they are not urgent)
- use general action words -> use specific, achivable objective nouns.
Using general action words makes me not wanting to do the task, so I have to tangible descriptions thatt tells me exactly what is it that i need to do.
Using github project to manage my life - what i've learned.
Originally written in 2023-11-6 , rewritten in 2024-01-31
This post is a continuation of my endeavor to find a good way to manage my life. In this previous blog, I tried to build a text-based markdown style todo system, which somewhat worked for an academic quarter. However, the previous attempt quickly became very hard to use. Obsidian isn't the best tool for management. The best it can do without increasing operational complexity is creating simple lists, which will soon became hard to manage if you have many of such lists.
If you want to know what my current method of management is, go to Iteration 2: The Current Form. If you want to know the process of developing my methods, start with Initiation
Table of content:
Initiation
During my internship this summer,I managed to learn a thing or two from the daily standup. Hiya chose to use Kanban boards, and they seemed quite effective at first glance. So, I decided to try them out for to-do management to see if things would work out.
While consulting with ChatGPT on this topic, I was excited to discover that GitHub itself has a project feature that includes Kanban boards and a whole suite of other project management tools. As an avid user of GitHub, I had to try it out. I spent my autumn quarter of 2023 on iteration 1 and refined my process during iteration 2, in the winter quarter of 2024."
Iteration 1: A rudimentary attempt
The basics
My first attempt involved carving out the basics. Following GPT's suggestion, issues were the obvious choice for the basic unit of tasks. Since GitHub projects are closely knit with repositories, I borrowed the idea I had during Hiya's hackathon to host things (issues) in a dedicated repository. Since I'm not good at naming things, I will just call it:
Same with the project:
I've also quoted one of my favorite sentence from Krzysztof Kowalczyk's "Lessons learned from 15 years of SumatraPDF, an open source Windows app"
Here is an example of my issue/todo:
Due to some automation reasons later on, I decided to limit the issue description to text only. Markdown to-dos were to be put in the first comment of the issue. Any logging text would then be added in comments, starting with the second comment. This acts like a commit; each time I make some change to this to-do, I log it. The logging ends after I close my issue. I hoped to use the logging as a motivational tool.
On top of these mechanisms, I also repurposed the milestone section as a way of categorizing these issues in the project view. A milestone can be a goal, as originally intended, or it can simply be a category, like a class I'm taking this quarter or a book list I'm going through.
The project interface
In the project interface, issues can have four possible statuses: Backlog, In Progress, On Hold, and Done. An issue starts in the Backlog and then moves back and forth between In Progress and On Hold until it ends up in Done, operating just like a regular Kanban board.
Issues are organized by iteration using the iteration template. Each week is one iteration. The iteration divides up the multitude of tasks I have and provides me with a sense of 'this is what I'm doing this week,' which, in turn, helps me to stay focused on my tasks.
Here's an example of what i see most of the time:
There are a total of six views in my project: Current Iteration, Current Iteration: Done, Next Iteration, Planning, Road Map, and Task Pile.
When going through my workflow, I first select a category of tasks I'm interested in reviewing, as shown in the previous image. The tasks then flow from left to right. Once I finish a task, I go into the issue and close it. This triggers the default automation of moving the issue to the 'Done' status upon closure.
example of issue in project view
the logic of closing issue automation
Similar automation is also applied to issue labeling. For instance, when an issue is edited, a label of last_worked_on
will automatically be added, and the previously labeled last_worked_on
issue will have its label removed.
Another example is the has_todo lable, if a issue have uncheck todos like so
- an example todo
a label of "have_todos" will be added.
.
These labeling automation is all done in github actions using yml files and various scripts.
Iteration 1 was going well until it encountered a problem similar to the one with the Obsidian method: priority ordering. My tasks were still prioritized based on urgency, and I was left feeling unfulfilled. I constantly felt like I was doing busy work. Moreover, GitHub decided to add timeline items into the GitHub Project's issue view, which practically ruined my workflow. My automation, along with the rule '1st comment for to-do and 2nd+ comments for logging', no longer worked. So, these changes jump-started iteration 2.
Iteration 2: The Current Form
For iteration 2, I decided to go back to the Eisenhower matrix that i mentioned here. I consulted this article to learn about the difference between urgency and importance, and came up with the following ideas:
The priority matrix-list
Here is the priority list I made stemming from the Eisenhower matrix:
As you can see, I did not use the classic 'Do, Schedule, Delegate, and Delete.' This is because the original wording does not make sense in a 'white-collar job' scenario. The Eisenhower Matrix was created against a background of military leadership, which assumes that 1) you don't have time to handle all the tasks at hand, and 2) you have people working under you. Neither assumption applies to me.
Instead what i did was:
- Priority 1 means I have to finished this task today or tomorrow
- Priority 2 means I don't have to finish it within two days and I have to:
- plan it by dividing this issue into sub-issue tasks, or create a small todo-list of sub-task within the issue
- complete one sub-issue/sub-task
- Priority 3 means to either spend as little time on it as possible to complete it, or let chatGPT do most of the work.
- Priority 4 means this task is not worth my time (not in the present nor in the future)
And here is an example application of that in a real task:
Additional views
Additionally, I've also added a few extra views to my GitHub project:
- Task Importance Analysis: This is a straightforward view where I sort all my tasks based on long-term importance and due dates, assigning the correct priorities to them.
- Today: This view shows the tasks I manually assign to myself based on the priorities. It provides an additional division of labor, allowing me to focus solely on the tasks I need to finish today.
- Due Today: This view allows me to see if I've missed any tasks that are due today. I check this periodically.
- Task Shelf: This holds all the tasks that I've assigned priority
4-leave-it
to. I check them periodically to decide if I should restart any. - Current Iteration: Done: This view shows all the completed tasks. At the end of each week, I perform a review, asking myself if the completed tasks were genuinely important, urgent, and how many hours I've spent on them.
Future improvement
This method so far is going pretty well. The only caveat is that sometimes my actual workload is the same between a priority 1 task and a priority 3 task. So, for the future, I will work on automation for common tasks that fall into priority 3.
随笔
有时候不知道些什么的时候就会在这里添两笔,正好也回顾一下中文写作
五月十五,凌晨,23年
热
全球变暖说了这么多年,今天是总算感觉到了。西雅图作为一个沿海城市,属于被抢打出头鸟,第一个受罪。
去年刚到这里的时候就听周边人说: "西雅图今年超高温", 结果打开一看,气温连三十度都不到。我作为从北京出来的人,看到这个数字就不以为然,怕都不怕。结果夏天真的来到时,闷热的脑子发晕。
西雅图历史上都是一个凉快的城市,所以大部分建筑都不自带空调。当时听到这个知识时立即联想到中国南方冬天不开暖气。而就如同中国这种奇葩制度一样,拒绝空调的西雅图在气温变化下带给人们的热糊涂的脑袋和被抢购一空的风扇。
歌
今天听的歌是TrackTribe的Last train north,没有歌词,只有电吉他。听的时候感受这二手市场风扇的小风,有够惬意。
棋
最近又重新拾起国际象棋。当初看完网飞的The queens gambit后,深受启发(虽然每看一部剧都会这样),注册了chess.com的账号,卖力的把自己从初始的1200分打到了400分。这几天偶尔抽空来了两把,发现比之前水平要好了不少,能做到赢多输少。经过练习一些puzzle,偶尔能成功把对方逼到checkmate。希望这种记录可以被保持住吧。
周天
周天的时候跟着女友去看了第三部银河护卫队,虽然依旧是一部漫威电影,但是可以看出导演是用力拍了的。运镜和节奏都把握的很到位。即使是千遍一律的好人打败坏人的剧情也能让人感受到新意。这次电影好像也不关心到底该不该pg13,偶尔的蹦脏话,加上绿色血腥打斗,大人看了也觉得的挺爽。
Creation section
...this is where I post all my creations, big or small.
Some website i made
The following is all the visulizations create by me.
Temperature in 2014
Here is the link.
TODO: Write about it
Automations
This is where i record my automation code
Building a reddit bot
I build this following this tutorial.
Learning | Making my own
Learning
Fetch post from reddit
Creating reddit app
- You can’t make more than 1 request every 2 seconds (or 30 a minute)
- You must not lie about your user agent
Go to this page and click on the create your own app button:
and create a reddit app. From there you will get a client_id(red square) and a secret(blue square).
And then you should to the praw.ini in the package folder of the praw and added a new section for your bot, and put the client_id and secret, along with a user_agent field specifiying the version.
the user_agent is there becuase reddit only allows to prevent someone from abusing your bot. Update the version number if somebody is abusing your bot.
featch code
import praw
reddit = praw.Reddit('bot1')
subreddit = reddit.subreddit("learnpython")
for submission in subreddit.hot(limit=5):
print("Title: ", submission.title)
print("Text: ", submission.selftext)
print("Score: ", submission.score)
print("---------------------------------\n")
Reply
import praw
import pdb
import re
import os
# Create the Reddit instance
reddit = praw.Reddit(
client_id="id",
client_secret="sercret",
password="password",
username="username",
user_agent="Py_gloomy_repli_bot 0.2"
)
if not os.path.isfile("posts_replied_to.txt"):
posts_replied_to = []
else:
with open("posts_replied_to.txt", "r") as f:
# Read the file into a list and remove any empty values
posts_replied_to = f.read()
posts_replied_to = posts_replied_to.split("\n")
posts_replied_to = list(filter(None, posts_replied_to))
# Get the top 5 values from the subreddit pythonforengineers
subreddit = reddit.subreddit('pythonforengineers')
for submission in subreddit.hot(limit=20):
if submission.id not in posts_replied_to:
if re.search("i love python", submission.title, re.IGNORECASE):
submission.reply("me bot says: I love Python too! I especially love praw")
print("Bot replying to : ", submission.title)
posts_replied_to.append(submission.id)
elif re.search('python', submission.selftext, re.IGNORECASE):
submission.reply("you mentioned python, python very good!")
print("Bot replying to : ", submission.title)
posts_replied_to.append(submission.id)
else:
print("Nothing found")
# Write our updated list back to the file
with open("posts_replied_to.txt", "w") as f:
for post_id in posts_replied_to:
f.write(post_id + "\n")
Automate
To automate this script, we have to use linux chrontab
we first start a new cron job by typing:
crontab -e
and then we select which editor to use, I chose vim.
Then we put this line in the bottom of the cron job script:
* * * * * cd /path/to/project/; python3 bot_script.py
The 5 *
represent the time the job gets executed:
* * * * * command to be executed
– – – – –
| | | | |
| | | | +—– day of week (0 – 6) (Sunday=0)
| | | +——- month (1 – 12)
| | +——— day of month (1 – 31)
| +———– hour (0 – 23) +————- min (0 – 59)
Then save the script, and enter the command:
sudo service cron start
if you want to stop the cron job: sudo service cron stop
if you want to restart the cron job: sudo service cron restart
Comment reply
import praw
import pdb
import re
import os
import random
# Create the Reddit instance
reddit = praw.Reddit(
client_id="id",
client_secret="secret",
password="password",
username="username",
user_agent="useragent"
)
lucas_quotes = \
[
" I've calculated your chance of survival, but I don't think you'll like it. ",
" Do you want me to sit in a corner and rust or just fall apart where I'm standing?",
"Here I am, brain the size of a planet, and they tell me to take you up to the bridge. Call that job satisfaction? Cause I don't. ",
"Here I am, brain the size of a planet, and they ask me to pick up a piece of paper. ",
" It gives me a headache just trying to think down to your level. ",
" You think you've got problems. What are you supposed to do if you are a manically depressed robot? No, don't even bother answering. I'm 50,000 times more intelligent than you and even I don't know the answer.",
"Zaphod Beeblebrox: There's a whole new life stretching out in front of you. Marvin: Oh, not another one.",
"The first ten million years were the worst. And the second ten million... they were the worst too. The third ten million I didn't enjoy at all. After that, I went into a bit of a decline. ",
"Sorry, did I say something wrong? Pardon me for breathing which I never do anyway so I don't know why I bother to say it oh God I'm so depressed. ",
" I have a million ideas, but, they all point to certain death. ",
]
subreddit = reddit.subreddit('pythonforengineers')
with open("comment_replied.txt", "r") as f:
# Read the file into a list and remove any empty values
comment_replied = f.read()
comment_replied = comment_replied.split("\n")
comment_replied = list(filter(None, comment_replied))
for comment in subreddit.comments(limit=100):
if comment.id not in comment_replied:
if re.search("Lucas Help", comment.body, re.IGNORECASE):
lucas_reply = "Lucas the robot says: " + random.choice(lucas_quotes)
comment.reply(lucas_reply)
comment_replied.append(comment.id)
else:
print("Nothing found")
# Write our updated list back to the file
with open("comment_replied.txt", "w") as f:
for post_id in comment_replied:
f.write(post_id + "\n")
Making my own
This is my attempt to make a helpful info bot for r/udub
import praw
import pdb
import re
import os
import random
# Create the Reddit instance
reddit = praw.Reddit(
client_id="client_id",
client_secret="secret",
password="password",
username="username",
user_agent="Py_gloomy_repli_bot 0.2"
)
helplink = '''
General student service:\n
- https://www.washington.edu/students/servicesforstudents/
Courses/Professor info:\n
- https://uwgrades.com/
- https://www.washington.edu/cec/toc.html
- https://www.google.com/search?q=rate+my+professor+uw+%5Binsert+professor+name%5D&sxsrf=AJOqlzVwUt9A0OY5_KgZb7bdeYc_lw_RKg%3A1675667778386&ei=QqngY5GbF_aE0PEP8t2piAo&ved=0ahUKEwiRxMT1rID9AhV2AjQIHfJuCqEQ4dUDCBA&uact=5&oq=rate+my+professor+uw+%5Binsert+professor+name%5D&gs_lcp=Cgxnd3Mtd2l6LXNlcnAQAzIFCCEQoAEyBQghEKABMgUIIRCrAjoICAAQogQQsAM6BAgjECc6BQgAEIAESgQIQRgBSgQIRhgAUNgGWJYaYLgkaAFwAHgAgAFQiAHVCpIBAjIymAEAoAEByAEFwAEB&sclient=gws-wiz-serp
UW student housing info:\n
- https://www.housingforhuskies.com/
UW on-campus food info:\n
- https://www.campusreel.org/colleges/university-of-washington-seattle-campus/dining_food/
UW on-campus job info:\n
- https://www.washington.edu/workstudy/find-a-job/
'''
if not os.path.isfile("posts_replied_to.txt"):
posts_replied_to = []
else:
with open("posts_replied_to.txt", "r") as f:
# Read the file into a list and remove any empty values
posts_replied_to = f.read()
posts_replied_to = posts_replied_to.split("\n")
posts_replied_to = list(filter(None, posts_replied_to))
# Get the top 5 values from the subreddit pythonforengineers
subreddit = reddit.subreddit('udub')
for submission in subreddit.new(limit=50):
if submission.id not in posts_replied_to:
if re.search("\?", submission.title, re.IGNORECASE):
submission.reply("Hi! I am a bot! Your post is asking a question, so here are some usefuly links if you haven't found them yet!\n" + helplink)
print("Bot replying to : ", submission.title)
posts_replied_to.append(submission.id)
else:
print("Nothing found")
# Write our updated list back to the file
with open("posts_replied_to.txt", "w") as f:
for post_id in posts_replied_to:
f.write(post_id + "\n")
Simple todo bash script
This file is written for the Day planner and the Tasks plugin in obsidian. This script allows me to quickly add a todo task in the appropriate place.
This is written with the assistance of ChatGPT. Here is my prompt:
Please write a bash script that does the following:
- takes three arguments: The first argument will be put in the variable [tag], the second argument will be the variable [task_name], the third variable will be the variable [due_date].
- get today's year, month and date, and make them number only. So if today is 2023, January 22nd, change that into 20230122 and but that in a variable "date"
- Search in path [predefined path]/"Day Planners" for a file that contains "date". put the path of the file in variable "file_path" Error handle this step
- put the string "- [ ] [tag] [task_name] 📅 [due_date]" at the end of the file
Here's the actual bash script:
#!/bin/bash
tag=$1
tag="#$tag"
task_name=$2
due_date=$3
root_path="<pre-defined path>"
date=$(date +%Y%m%d) #get today's date in the format of YYYYMMDD
path="$root_path/Day Planners/" #predefined path
file_path=$(find "$path" -type f -name *$date* -print0 | xargs -0 echo) #search for file containing the date
if [ -z "$file_path" ]; then #if file path is empty, display error message
echo "Error: No file found with the date $date"
else
echo "- [ ] $tag $task_name 📅 $due_date" >> "$file_path" #append the task string to the end of the file
echo "Task added"
fi
Todo bundle
2023-01-24
I've been using the Day planner plugin on obsidian for a while now and notised the piling problem that occurs with its auto-generated files:
So i created a python script to bundle them together.
Here is the prompt i came up with for chatGPT:
I want to create a python script that let the user to type "todo bundle" which takes in two arguments:
- a [begin date]
- a [end date]
Note: the dates will be in the format of a 8 digit number. For example, if it's 2023, january 24th, the number will be 20230124
and then performs the following actions:
- search through a [predefined path] where all the files end in 8 digit numbers, and put the file that has all the number with in the range of [begin date] and [end date] in a list.
- sort the list from small to big based on the ending number in file name
- read through these files line by line in [predefined path] in the sorted order.
- if the line starts with x number of space plus "- [ ", than put it in temp.md along with the file name. Note temp.md's content will be formated like this:
[filename 1]
[a qualified line from filename 1] [another qualified line from filename 1]
[filename 2]
[a qualified line from filename 2] [another qualified line from filename 2] 5. Finally, temp.md wil be saved as "Day Planner-[begin date]->[end date]" in [predefined path]
and here is the actual code:
import os
import sys
def todo_bundle(begin_date, end_date):
# predefined path
path = "[predefined path]"
# search through files and put in a list
file_list = [f for f in os.listdir(path) if f.endswith('.md') and (("Day Planner-"+begin_date <= f <= "Day Planner-"+end_date) or (f.endswith(end_date+".md")))]
# sort list by file name
file_list.sort()
# read through files and write to temp.md
temp_md = ""
for file_name in file_list:
with open(os.path.join(path, file_name)) as f:
in_file_name = file_name.rsplit(".",1)[0]
temp_md += "# " + in_file_name + "\n"
for line in f:
print(line)
if line.lstrip().startswith("- ["):
temp_md += line
temp_md += "\n"
# save temp.md as "Day Planner-[begin date]->[end date]"
with open(os.path.join(path, "Day Planner-{}-{}.md".format(begin_date, end_date)), "w") as f:
f.write(temp_md)
if __name__ == "__main__":
begin_date = sys.argv[1]
end_date = sys.argv[2]
todo_bundle(begin_date, end_date)
glue
2023-02-27
here is the repo
This is a bunch of random stuff i've made that doesn't belong in a catagory
#AI #AI_generated_story
a steamy love story about "billy bone" and "eye"
Billy Bone and Eye had been friends for a long time, but recently something had changed between them. It was as if a spark had been ignited the moment their eyes met and they both knew that something special was brewing between them.
One night, Billy Bone asked Eye out on a date. She could feel the butterflies in her stomach as she said yes to his invitation and they arranged to meet up outside of her favorite cafe.
When the night of the date came, Billy Bone arrived in a suit with a bouquet of roses in his hands. Eye was taken aback by how handsome he looked and she felt her heart skip a beat when Billy Bone smiled at her. They talked for hours over dinner, never running out of things to say to each other.
After dinner, they took a walk along the beachfront hand in hand and watched the sun set together. As they strolled along, their conversation turned more intimate, sharing secrets and stories from their pasts that left them feeling even more connected than ever before.
Eventually, as the night grew darker, they found themselves standing underneath a tree filled with twinkling stars. They both knew what was going to happen next - no words were needed - and their lips met in an electrifying kiss that sealed the deal between them.
Billy Bone and Eye were now officially in love and it felt like nothing else mattered when they were together - all that mattered was love.
First full stack diagnosis
What's going on?
During winter quater of 2023, I made this full stack web app with three other group mates.
It is now autumn quater 2023, and for project 1 of the course INFO 443, software architecture, I was asked to look back at this project, give it a diagnosis and form a report of its architecture design. This is the place where I will do the recording.
First lets do a quick viz
Using the tool mentioned here, I got this:
Node: one of the first things i realize is that I should've put the backend in a seperate folder, like what the frontend did.
Here comes the UML
This is a top level UML diagram of my codebase. The connections between the elements have been marked using arrows.
and the sequence UML
This is a sequence diagram showing how the web app will behave when the user attempts to add a game to the dashboard.
Tests
Since this project is done using react, Jest would be the natural choice of framework for testing.
The assignment ask us to
implement a complete set of automated tests for your chosen architectural element (module or class) you're analyzing. We expect a complete set to comprise around a dozen automated tests, achieving 100% code coverage.
So for my project the architectural element will be the components that composes the main dashboard. Namely Navbar.js
and Gambox.js
.
Here is the test result. As you can see my components are not as isolated as I once imagined. There is a lot of spill over.
I've also notised that there is a lot of unnessary error handling. For example this code here
function handleAdd() {
// add a game
// pass in current Catagory, then show search window
fetch("api/users/")
.then(res => res.json())
.then(data => {
if (data.status == "loggedin") {
setCatagoryName(catagoryName);
setShowSearch(true);
} else {
alert("Please login first");
}
})
}
will not be used unless the user logged in. I had to add an impossible scenario in order to cover that line of code in the test.
Update in 2023-01-30:
I've finished the full version since then.
Antzed.com
This section talks about the process of making this website
- whats-new-detection-script-debug-and-fix
- website-creation-2020-06-30
- making-it-some-what-dynamic-2023-4-27
- mdbook-summary-maker
- making-it-properly-dynamic
- website-create-documentation
- website-server-migration-2020-07-28
What's new detection script debug and fix
2024-02-07
This document provides a brief overview of the debugging process I undertook for the JavaScript code responsible for detecting new additions and changes to my website.
The issue arose when I added a function to detect modifications to existing files in addition to detecting new files added. Notably, several files that had no actual changes were mistakenly marked as "changed".
Below is the log from when the error initially occurred:
The "What's New" section malfunctioned; instead of updating the one article that I had updated, the website incorrectly marked old articles as "updated" as well. I wanted to figure out why.
- Checked GitHub commit history -> The changes were all related to indexing, for example:
<ol class="chapter"><li class="chapter-item affix "><li class="part-title">Content</li><li class="chapter-item expanded "><a href="../Creation/index.html". ...
- Checked if git checkout was related. -> It was not.
The problem was caused by adding a new file, which caused mdbook to re-index during building, and git picked that up.
Solution idea 1: In the "What's New" detection code, detect changes beyond 10 linesThis approach was unsuccessful because sometimes the re-indexing exceeded 10 lines.
Solution idea 2: In the "What's New" detection code, ignore changes that start with
<ol class="chapter">
,<a rel="next"
, and<a rel="prev"
.I discovered that GitHub displays the content of the file change in a
file.patch
section in the JSON file.I needed to know if
file.patch
separated changes into different sections or if it just dumped everything together ->file.patch
does indeed separate changes between each section with an @@..@@ pattern.Now, I just need to write a filter that filters out all the reindexing junk and leaves out the actual changes.
Here's the log from the second occurrence:
Once again, after editing one of the posts, the latest updater still failed to recognize the edit.
The problem occurs because the change to one of the posts did not have a
file.patch
.I ended up using both the
significantChangeIdentifier
function and counting the number offile.changes
in an OR relationship. This way, both types of changes in the commit metadata are covered.
Edit 2024-02-07, 16:00
Shortly after uploading this document, another bug appeared. This document was once again not recognized by the detector.
It was discovered that in the REST API, there are additional status names like renamed
, besides the statuses added
and modified
, for files in a commit.
By examining the response schema here, we can see that there are the following types of statuses:
"status": {
"type": "string",
"enum": [
"added",
"removed",
"modified",
"renamed",
"copied",
"changed",
"unchanged"
],...}
A quick chatGPT conversation result this:
"added": The file was added in the commit. It did not exist in the repository before this commit.
"removed": The file was deleted in the commit. It will not be present in the repository after this commit.
"modified": The file existed before the commit and has been altered in some way (content changed, file permissions, and so on) as part of this commit.
"renamed": The file was renamed in the commit. This status indicates that the file's path or name has changed.
"copied": This status indicates that the file was copied from another file in the commit. This is similar to "added" but specifies that the new file is a copy of an existing file.
"changed": This is a more generic status that can indicate any change not specifically covered by the other statuses, such as changes in permissions or other metadata that doesn't necessarily modify the file's content.
"unchanged": The file was part of the commit but was not altered in any way. This status might be used in contexts where files are explicitly mentioned in a commit without being changed, possibly for tracking or auditing purposes.
Which means that I do not need to cover any additional status besides added
, modified
, and renamed
.
Edit 2024-02-07, 17:23
The new changes lead to another problem: failure to handle the edge case where a file is renamed without any content changes. Ideally, this change should be reflected back on the website.
To fix this, I moved the processing from an array of files to an array of filenames slightly downstream. I added tracker for both file.previous_filenames
associated with renamed
files and overall duplicates using hashsets. So i can compare and delete later.
网站搭建总结
2020-06-30
先把总结放在前头:最后成功方案是使用宝塔面板一套弄下来的,安装的LNMP环境。
由于最近才开始学linux,对于很多东西都极其的不熟悉。基本上每一个步骤都要上网去查,导致一开始进展很慢。
以下是目录:
服务器购买与初始化设置
首先服务器买的是什鸟科技的,相当的便宜,而且网站ui相对简洁明了,对于新手还是很友好的。
然后是服务器初步设置,这里我是依照这这一个网站进行初步设置的,主要也就是添加sudo用户以及开启http,https服务等
https://www.cnblogs.com/jojo-feed/p/10169769.html
这里贴它开头的一段内容:
新的 Centos 服务器初始化配置
当你初次创建新的 Centos 服务器的时候, Centos 默认的配置安全性和可用性上会存在一点缺陷(运维人员往往会有初始化的脚本)。为了增强服务器的安全性和可用性,有些配置你应该尽快地完成。 这篇文章大致从这方面去讲 – 账号安全 – ssh 安全 – 防火墙 – 交换区文件(swap file)
用户密码安全
关于 root 用户
root 用户是在linux环境下拥有非凡权限的的超级管理员。因为root用户的权限很高,所以在日常使用中不建议使用。这是因为 root 用户能做出一些非常有破坏性的行为,甚至是事故。(比如是臭名昭彰的rm -rf /
,或者你会对这篇文章《Linux 中如何避免 rm -rf /*》感兴趣) 下一步,我们为了日常工作会设置一个权限较小的替代用户。我们会教你当你需要更大的权限的时候怎样获取。
创建一个新的用户
这个例子会创建一个用户叫 demo,你可以替换成你喜欢的用户名:
adduser demo
下一步,为新的用户分配密码(再次说明,用你刚创建的用户替换demo)
passwd demo
输入一个强密码,然后再重复输入以完成验证。
用户权限
现在,我们已经有了一个普通权限的用户。然而我们有时需要执行一些管理员任务。 为了避免要注销普通用户,然后用 root 用户重新登录,Linux 中有个优雅的解决方式,系统授权特定用户或用户组作为 root 或他用户执行某些(或所有)命令系统。在这个用户组的用户在每条命令前加个单词 sudo,就可以用管理员权限执行命令。
安装 sudo
有些版本会没有 sudo
命令的,所以首先要安装 sudo
yum install -y sudo
属于wheel
用户组的用户可以使用sudo
命令。 在 RedHat 系只需我们需要添加用户到那个wheel
组。属于wheel
组的用户可以使用sudo
命令。 以root
的身份执行以下命令,会将新用户添加到wheel
组中(将 demo 替换成你的新用户)
gpasswd -a demo wheel
之后还有一个开启防火墙的命令需要在新主机上操作,具体如下:
防火墙能控制端口、应用程序的流量,让服务器更加安全。在 Centos 7 中引进 firewalld 作为 iptables 的前端。firewalld 对比 iptables,有区域(zones)划分, 更简单,能动态配置等特点。 下面主要介绍杂在新服务器中 Firewalld 的基本配置 启动
sudo systemctl start firewalld
设置默认区域
sudo firewall-cmd --set-default-zone=public
查看激活区域
sudo firewall-cmd --get-active-zones
如果激活区域中没有public
的话,可能是没有为public
区域设置接口,你需要用id addr
查看网卡接口(其中 lo 是本地回环接口),再使用设置
sudo firewall-cmd --permanent --zone=public --change-interface=eth0
给防火墙添加允许通过的服务
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=ssh
查看更多可以添加的服务
sudo firewall-cmd --get-services
查看你的 firewalld 信息
sudo firewall-cmd --list-all
重载配置
sudo firewall-cmd --reload
开机启动
sudo systemctl enable firewalld
服务器环境安装
LNMP环境安装
首先LNMP环境的全称是Linux,Nginx,Mysql/Mariadb, php。除此之外还有LAMP,与LNMP的区别在于将Nginx替换为了Apache。
这四个东西的功能分别对应着服务器系统,网络服务器软件,数据库和hypertext预处理器。是一个网站服务器必要的运作环境。
我一开始想使用命令进行安装,但是出现了问题。最主要的一个是:
no package xxx available
nothing to do
这个问题困扰了我很久。我一开始的判断是yum源有问题,就进行了yum源更新:
yum update
以及更换yum源至阿里源
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
但是问题都没有得到解决,所以我决定进行换位思考,看看有没有其他方法
最终我选择了宝塔面板。在一开始的时候我实际上尝试了宝塔面板的自动安装,但是安装过后没有在正常的文件目录里找到,所以之后对这个方案有所搁置。而现在我换回宝塔面板这个方向后,就决定尝试直接从宝塔里面寻找答案,完全依靠宝塔来操作,而非半命令半宝来进行安装。
我找到的宝塔安装教程:
https://www.jianshu.com/p/293c94adc11d
利用宝塔一键安装LNMP整个过程就变得异常简单。虽然命令安装时才坑能够更能帮助我学习,但是有些事情还是挑简单的做为好,省的浪费不必要的时间。
之后的话就可以开始数据库与wordoress安装与设置了
数据库设置
在安装wordpress之前,需要先启动mysql,并创建一个数据库,并将数据库权限给予一个用户(旧/新)。
在宝塔里面,这一步操作如下(这里借用教程里的内容https://www.jianshu.com/p/293c94adc11d)
在宝塔界面找到网站,点击添加站点,会出现以下界面:
域名根据实际情况填写,如果是国内空间,则域名必须备案,国外空间则不需要,一般写一个顶级域名和一个www二级域名即可,比如www.pandacui.com,pandacui.com,搭建阶段可以也用云服务器的公网IP代替。
FTP可以选择不创建,后面可以用宝塔面板的文件管理。
数据库选择MySQL,会提示输入用户名名称和密码,自动建立与用户名同名的数据库
PHP版本选择你安装的PHP版本,我安装的是PHP7.2,建议安装版本不低于7。
点击提交,会提示站点创建成功,并显示数据库用户名和密码,这里不用刻意去记录,可以在宝塔的数据库管理界面再次查看。
站点创建完成之后,可以通过公网IP或你绑定的域名测试一下,在浏览器输入公网IP或你绑定的域名即可,如果成功,会显示以下页面:
以上是利用宝塔的方式,同时这一步还可以用命令来完成(这里从教程里进行摘要https://www.liquidweb.com/kb/how-to-install-wordpress-on-centos-7/):
mysql -u root -p
CREATE DATABASE wordpress;
CREATE USER adminuser@localhost IDENTIFIED BY 'password';
#注意这里的adminuser和password需要自定义
GRANT ALL PRIVILEGES ON wordpress.* TO adminuser@localhost IDENTIFIED BY 'password';
#这里的adminuser与password与上一步的保持一致。
FLUSH PRIVILEGES;
exit
wordpress安装
wordpress的安装
宝塔的安装较为简单。从wordpress.org官网上下载它提供的.tar安装包到自己的电脑里,然后从宝塔上传到刚在建立的目录里
/www/wwwroot/
上传后将里面的自带的欢迎index.html和4040.html删掉,然后将tar包解压,放进带有.htacceess的目录里,将解压包和解压出来的空wordpress目录删掉。
之后就上到在浏览器里输入自己的已经注册好的域名,查看wordpress是否安装成功。如果出现一个选语言的wordpress见面,便说明安装成功。
wordpress设置
进入wordpress设置界面后,只需要将自己的之前设置好的数据库名称,数据库用户和其密码输入进去即可,然后会设定wordpress admin的账户与密码,最后成功进入wordpress dashboard。
成功进入网站
如果看见了wordpress的管理员后台的话,就代表网站完成配置了
Making it some what dynamic
2023-4-27
Table of Contents
- Table of Contents
- Reason for this
- The solution for now
- What was the challange
- The loops I have to go through
- What's next
- Am I going to keep the custom backend?
Reason for this
Although this website is meant to be written the form of a book, I realized it is nessary to add a bit more blog-like features for the returning readers. This way, both the "exploration" of the book and "what's been added since last time I visited" aspect of the expirence can be covered.
The solution for now
I embeded a javascript file into the welcome page markedown file.
The code look like this
fetch("https://api.github.com/<my-repo-owner>/<my-repo-name>/commits?per_page=3&sha=master", {headers: header})
.then(response => response.json())
.then(data => {
let promises = [];
for (let commit of data) {
// sort through the fetch
promises.push(
fetch("https://api.github.com/repos/<my-repo-owner>/<my-repo-name>/commits/"+commit.sha, {headers: header})
.then(response => response.json())
.then(commitData => {
// Check if the commit includes file data
if (commitData.files) {
const newFilesInThisCommit = commitData.files.filter(file => file.status === 'added' && file.filename.startsWith('book/') && file.filename.endsWith('.html') && !file.filename.endsWith('index.html'));
return newFilesInThisCommit.map(file => file.filename);
} else {
return [];
}
})
);
}
return Promise.all(promises);
})
.then(filesInCommits => {
let html = "<ul>";
for (let filesInCommit of filesInCommits) {
for(let file of filesInCommit) {
//String manimulation
file = file.substring(5);
file = file.substring(0, file.length - 5);
let temp = file.substring(file.lastIndexOf('/') + 1);
temp = temp.replace(/-/g, ' ');
temp = temp.replace(/\w\S*/g, function(txt){return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();});
file = file.substring(0, file.lastIndexOf('/')+1);
file = temp + ' in ' + "<code class='hljs'>" +file + '</code>';
html += `<li>${file}</li>`;
}
}
html += "</ul>";
//put the html in the document
const element = document.getElementById('latestBlog');
element.innerHTML = html;
})
.catch(error => console.error(error));
And then I put it in the book.toml(the default "setting" file for mdbook) in my book folder like this:
[output.html]
additional-js = ["my-javascript-file.js"]
What was the challange
Mdbook is static website generator, so its missing a backend. There are "custom backend" that you can do, but they are mainly for rendering( generating the book in a different format other than html) and is not an actual service.
The loops I have to go through
script during compile
I explored seveal script like this
import subprocess
def get_git_diff(num):
git_diff_cmd = f'git diff HEAD~{num} ./src/SUMMARY.md'
grep_cmd = "grep -E '^\+'"
cmd = f'{git_diff_cmd} | {grep_cmd}'
result = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
return result.stdout.splitlines()
i = 1
latest_content = None
# Keep incrementing the number until we find a diff with more than 5 lines
while True:
content = get_git_diff(i)
if len(content) > 5:
latest_content = content
break
i += 1
# Print the latest content
for line in latest_content:
if (line.startswith('@@') and line.endswith('@@')):
continue
if ("README.md" in line):
continue
stripped_line = line.strip('+').strip()
print(line)
which aim to monitor the changes in SUMMARY.md(the table of content file) and do string manipulation afterwards. This seemed like a good idea at the time until i realized i could just monitor the file addition instead.
The true custom backend
Then i thought, hmm, why not just make a backend for my satic "frontend", and create a backend i did.
I used this template generator and creates a simple backend. I specified an api call that gave me the necessary strings, and I than uploaded it to cyclic.sh.
I then went on my welcome page and embedded this code:
fetch("<my-custom-backend-app>/api/v1/latest")
.then(response => response.json())
.then(data => {
// Get a reference to the HTML element where we want to display the data
const element = document.getElementById('latestBlog');
// Update the HTML content with the fetched data
let html = "</ul>";
for (let i = 0; i < data.length; i++) {
html += "<li>" + data[i] + "</li>";
}
html += "</ul>";
element.innerHTML = html;
})
.catch(error => console.error(error));
This sounds good on paper, so lets test it out.
sound effect, this was a major fail.
The reason turns out to be quite simple, not only did i have to do the do the api call from the backend app, i then have to fetch again from my custom backend, so the process is like this:
antzed.com <-- my custom backend app <-- github api.
I even tried to simplied fetch through fetching the minimum commit from the github api, meaning changing https.../commits?per_page=<big number>
to https.../commits?per_page=2
, and do direct index calles in my embeds like this,
let html = `<ul>
<li>${data[0]}</li>
<li>${data[1]}</li>
<li>${data[2]}</li>
<li>${data[3]}</li>
<li>${data[4]}</li>
</ul>`;
but that doesn't really help.
So in the end, I cut off the middleman that is my custom backend.
What's next
Even though the current solution works, I still believe I can achieve a faster loading time. So I'm going to keep working on this.
Am I going to keep the custom backend?
Probably. Using it to do this simple fetches is definitly not going to happen. But it definitly can be used for some other complex operation.
Creating mdbook-summary-maker - a summary.md automation tool
2023-05-12
Table of Contents
Problem
The problem i want to solve has to do with my SUMMARY.md file. The method I use currenlty is to added the name of the file into SUMMARY.md whenever i'm writing a new post. I automated the process of adding new post a little bit when I was mirating everything into mdbook and netlify(detialed in this post), but it was still a semi-automatic expirence, which i'm aiming to improve. I want to be able to create a file, write in it, build the book and publishing it online without worrying about anything else.
The solution
Here is my repo. I ended up writing a excutable in rust that cycles through the folder
Process
maybe a preprocessor?
My first idea is to find already developed preprocessor. mdbook support these script that can be excuted before the content is sent in for rendering. After some searching, i found two possible solutions:
I immediatly ditch the mdbook-fs-summary option because it requires me to rename all my files with number prefix, which is definitly a hassle i don't want to do.
After trying out the mdbook-auto-gen-summary, it only offers a mediocre solution. It doesn't provided with the desirable order I want to have with my SUMMARY.md, and it is not customizable.
So afterwards, I've decided to make a custom preprocessor myself.
hammering out my own preprocesor
In order to make my own preprocessor, I'm using the easy of of seeing how other people does it. So natrually, i went and look at the code of mdbook-auto-gen-summary, the preprocessor i was just looking at.
Upon first inspection, and some reading in mdbook's documentation, a few things stand out:
- There is the framework called mdbook:preprocess that is for bulding custom preprocessor
- The template
I then copied the file structure from mdbook-auto-gen-summary:
.
└── Project/
├── src/
│ ├── your_mod_name/
│ │ └── mod.rs
│ └── main.rs
└── cargo.toml
and implemented my own code in.
here's when the bugs comes
After testing out my preprocessor, I immediatly spotted some bugs:
- The SUMMARY.md generated seems to be stuck in a recursive loop. After the code generated the strucutre of the target folder and write it into SUMMARY.md, it would starting write in more of the same structure over and over again.
- IF the content of the SUMMARY.md was not cleared before running, some target would literatlly "explode", meaning that the content within the folder gets deleted, the files get dragged out of the target folder into its parent directory.
Then I realize the problem: I was aimming to update SUMMARY.md according the src/ folders current state each time i build the book. But the preprocessor can only manipulate the markdowns files the content files after SUMMARY.md gets loaded in, but before these files gets rendered. It cannot manipulate the SUMMARY.md file, then initiate the process of loading everything in and rendering.
That was not stated in any of the documentation! Thanks Rust!
Guess we are back to make the excutable.
So after finding this devastating fact, i went back and removed all the preprocessor framework, and instead code it like a script, exporting it to an executable, place the executable in the book folder, and added a line to my automation script here.
Making it properly dynamic
2023-05-12
Table of Contents
The Problem
Not long after my last post, the solution quickly became invalid. This is due to the fact the i've exposed my github api key on the frontend in order to add in the dynamic function. I thought this was ok since the api key is "fine grained"(according to github's naming) to only have read access to my website repo, but github revoked it immediatly as soon as i made that repo public.
Solution
The final solution i went with is netlify's serverless functions. They are avaiable for free tier user and can provided dynamic functions.
Here is the Intro article that i followed as a guide.
Honorary mention: The fetch not defined
bug after deployment and its solution. I will talk about it more in later.
Process
Attempt 1
Previously i didn't opt in for the whole "building a backend" things because hosting a backend a seperate provider makes the dynamic features really slow. But then i realize the files mdbook generates can be treated as the frontend, and a backend can be wraped around it.
So that's what I did, I took that simple backend I've build previously, swap out the public folder with the generated static web content, and called it day.
My file structure:
my-app/
├─ node_modules/
├─ book(generated)/
│ ├─ favicon.ico
│ ├─ index.html
│ ├─ ...
├─ src/
│ ├─ index.js
│ ├─ ...
├─ .gitignore
├─ package.json
├─ README.md
However, after reaching this step, a new problem arise: where to host all of this code?
Attempt 2
One of the main reasons i'm hosting my website on netlify is that its for free and the expierence is nice, but this only applies to static website. The only free solution(that i know of) for full stack website hosting is cyclic.sh, which has a mediocre experience at best. Any other solutions that is comparable to netlify cost a solid chunk of money.
Here comes netlify's serverless functions. After fiddling with it for bit, i was succeful in implementing the same feature as i did in the simple server.
Here is my file structure:
.
└── Project/
├── .netlify(this is generating using netlify-cli)/
│ └── ...
├── node_modules/
│ └── ...
├── book(frontend)/
│ ├── index.html
│ └── ...
├── netlify/
│ └── functions/
│ └── api-endpoint-1.js
├── netlify.toml
├── package-lock.json
└── package.json
I copied the commit fetching code into the api-end-point-1.js and made it a function, and added the endpoint function at the top.
The endpoint function:
export const handler = async () => {
const html = await getLatestCommit();
try {
return {
statusCode: 200,
body: JSON.stringify({
content: html,
}),
}
} catch (err) {
return { statusCode: 500, body: err.toString() }
}
}
and that's about it, the rest is just following the intro article mentioned at the top.
The bug
One of bugs that was worth mentioning on is the fetch not defined
bug.
This bug will only appear after deployment to netlify. This probably have to do with node.js version mismatch between the local machine and the cloud service netlify is using, so use the solution given at the top should solve it.
Creating this website
This website is created using mdbook and is hosted on netlify
Here's the tutorial I followed during the creation process.
Publishing Workflow
I wanted to make the process of publishing as easy as possible so it doen't obstruct my flow of idea. Since mdbook uses the srcs files to build the book files, I used github submodules to seperate the builds and the src files. This allow me to save drafts that are not published, but also easily publish my builds when i need to.
I also create a script on my local machine to automate these processes. The script looked something like this:
#!/bin/bash
save_path="path1"
publish_path="$save_path/path2"
build_path="path3"
if [ $1 == "save" ]; then
read -p "Enter commit message: " commit_message
cd "$save_path"
git add .
git commit -m "$commit_message"
git push
elif [ $1 == "build" ]; then
cd "$save_path"
./mdbook-summary-maker
if [ $# -eq 2 ] && ([ $2 == "open" ] || [ $2 == "-o" ]); then
mdbook build --open --dest-dir $build_path
else
mdbook build --dest-dir $build_path
fi
elif [ $1 == "create" ]; then
read -p "Enter file name: " file_name
file_path=$(find "$save_path" -name "$file_name.md")
if [ -z "$file_path" ]; then
echo "File not found."
else
# process the file path that contains README.md
replaced_file_path="${file_path/\/$file_name.md/\/README.md}"
replaced_file_path="./${replaced_file_path#*src/}"
summary_path="$save_path/src/SUMMARY.md"
in_file_path="./${file_path#*src/}"
processed_file_name=$(echo $file_name | sed 's/-/ /g' | sed 's/\b\(.\)/\u\1/g')
new_entry="- [$processed_file_name]($in_file_path)"
found=false
while IFS= read -r line; do
temp_line=$line
if [[ $line == *"$replaced_file_path"* ]]; then
tab_count=0
while [[ $line == $'\t'* ]]; do
line=${line#?}
tab_count=$((tab_count+1))
done
for i in $(seq 1 $tab_count); do echo -n -e "\t" >> temp.md; done
echo "$line" >> temp.md
tabs=""
# if $2 is not empty and is equal to "notab", then don't add tabs
if [ $# -eq 2 ] && [ $2 == "root" ]; then
for i in $(seq 1 $((tab_count))); do tabs+="\t"; done
else
for i in $(seq 1 $((tab_count+1))); do tabs+="\t"; done
fi
# for i in $(seq 1 $((tab_count))); do tabs+="\t"; done
echo -e $tabs"$new_entry" >> temp.md
found=true
else
echo "$line" >> temp.md
fi
done < "$summary_path"
if [ "$found" = false ]; then
echo "$new_entry" >> temp.md
fi
mv temp.md "$summary_path"
echo "Successfully added new entry to SUMMARY.md"
fi
elif [ $1 == "summarize" ]; then
target_file_name=$2
# search for the target_file_name in $save_path and store it in a variable
target_file_path=$(find "$save_path" -name "$target_file_name.md")
# read the file line by line
subtitle_list=()
while IFS= read -r line; do
# if the line starts with "## "
if [[ $line == "## "* ]]; then
# get the content line without the "## "
subtitle=${line#"## "}
# put it in a list subtitle_list
subtitle_list+=("$subtitle")
fi
done < "$target_file_path"
# create a string variable table_of_content
table_of_content="## Table of Contents\n"
# for each element in the list subtitle_list
for subtitle in "${subtitle_list[@]}"; do
# create a copy of element and add "#" in front of the element. Then replace the spaces in the element to "-" and the name the copy of element "element_link"
element_link="${subtitle// /-}"
# make the element_link lowercase
element_link="${element_link,,}"
element_link="#${element_link}"
# put the original element in a line "- [element](element_link)/n"
line="- [$subtitle]($element_link)\n"
# add that line into table_of_content
table_of_content="${table_of_content}${line}"
done
# Insert table_of_content at the top of the target file
temp_file=$(mktemp) # create a temporary file
# put the content of target file into the temporary file
cat "$target_file_path" > "$temp_file"
# save target file into a target file backup
cp "$target_file_path" "$target_file_path.bak"
# put the content of table_of_content into the target file
echo -e "${table_of_content}$(cat "$target_file_path")" > $temp_file # write to the temporary file
mv $temp_file "$target_file_path" # move the temporary file to the target file path
elif [ $1 == "publish" ]; then
read -p "Enter commit message: " commit_message
cd "$publish_path"
git add .
git commit -m "$commit_message"
git push
else
echo "Invalid argument. Please use 'save', 'build', 'create', 'summarize' or 'publish'"
fi
Afterwards, you must run
sudo cp script.sh /usr/local/bin/<command name>
sudo chmod +x /usr/local/bin/<command name>
Note that when using cd
in bash scripts, you have to excute the script via . script.sh
or source script.sh
. This is becuase when running a bash script, the file run its own shell, according to this post
So I added this in my .bashrc
file:
alias <command name>=". <command name>"
Also, when dealing with muti-word directory name, you must change from cd $path
to cd "$path"
. (It must be double quotes)
ChatGPT assistence.
This script is made with the assist of chatGPT. Here's my main prompt:
I want to automate my git operations using a script. This script to be able to run anywhere in a ubuntu linux enviroment using the keywork "(command name)". This script should take three different argument:
- "save", which ask user for a (commit message), and excute:
- cd (pre-defined path)
- git add .
- git commit -m "(commit message)"
- git push
- "build" which executes:
- cd (pre-defined path)
- mdbook build --dest-dir (pre-defined path)
- "publish" which is the same as "save", but i will change the (pre-defined path) later.
I also want make the script edit a SUMMARY.md file in a (pre-defined path).
The user should be able to input "antzed create" and be asked a (file name).
The script will then:
-
goto a (pre-defined path)
-
search for the (file name).'s (file path) in reletive to (pre-defined path)
-
replace the the (file name) part of the (file path) with README.md, for example, if the (file path) is "./knowledge/(file name)", than change it to "./knowledge/README.md". Lets calls this processed file path (replaced file path)
-
Go to SUMMARY.md, which should be in (pre-defined path) and read line by line to search for (replaced file path). Note that the SUMMARY.md should be in a list structure of markdown list, for example: - [some name](some file path). So
should be in (some file path) -
create the string "- [ (file name) ]((file path))" and place it a line under the (replaced file path) and add a tab at the start of the line.
fill the todo in this code "elif [ $1 == "summarize"]; then target_file_name = $2
TODO: implement code using given prompt" using this prompt: "Prompt:
write the code that do the following:
- search for the target_file_name in $save_path and store it in a variable
- read the file line by line
- if the line starts with "## "
- get the content line without the "## "
- put it in a list subtitle_list create table of content by doing the following:
- if the line starts with "## "
- create a string variable table_of_content
- for each element in the list subtitle_list
- create a copy of element and add "#" infront of the element. Then replace the spaces in the element to "-" and the name the copy of element "element_link"
- put the original element in a line "- element/n"
- add that line into table_of_content Insert table_of_content at the top of the target file"
Website server migration
2020-07-28
Recently I wanted to migrate my website to a more secure server. My old server's provider was not well known, thus making their servers a bit too shady.
After extensive research, I found a reliable, and most importantly, free solution. AWS EC2 VPS servers.
They offer free 12 month trail, with 750 hours(31.25 days) of rental time each month. This basically means that I can rent a VPS for free for a year.
Resources
小白如何利用wordpress和aws从零搭建自己的个人网站https://www.jianshu.com/p/82db25396e3c
在aws ec2上使用root用户登录https://www.cnblogs.com/520wife/p/7744015.html
AWS server rental process
To start a rental period, you first have to log in to the “root account” (you will be asked to set up when you first sign up for an AWS account) management panel in AWS, which looked a little bit like this
then, click
Service -> EC2 - launch instance
an instance is a VPS, by clicking launch instance, you’ve started the initialization process of a VPS set-up.
There a minimum of three steps in this initialization process. The first one is choosing an AMI, or Amazon machine image. AMI is basically the operating system of your server.
There are plenty of options, such as Ubuntu, Amazon Linux, and red hat enterprise Linux, etc. I have chosen cent-OS 7 since that’s my most familiar OS.
Noted here, for Cent-OS in particular, the option cannot be found in the default recommendation tab. It can only be found under AWS marketplace.
It is also important to look for the “Free tier eligible” sign, because that's the only free option.
You then need to choose the hardware for the server. Unfortunatly, there’s only one option that is free, which is shown below:
However, there’s many advanced options to choose from if your’re willing to pay.
After that, you are basically done. If you want to do further detailed configuration, you have four more steps to go.
Out of the four “sub-steps”, the most important one is step 6, which allows you to open up different protocols and ports. If you want to customize your ssh ports, for example, you can add a new inbond rules on port xxxx.
Connecting to AWS server
I used Putty for this connection process. Prior to this rental, the server provider always provide a random password for the root user. This time however, I need to use a key generated by AWS.
This key is downloaded as a .pem file, with a name your come up with. Then, I have to open puttyGen, and load that .pem file in:
It will show up the content of the .pem file. You then need to press save private key to save a .ppk file.
This new file is the key that will log you in to your server. So next you will open Putty, and find the Auth option under the SSH option.
in the “private key file for authentication” space, click browse and find the .ppk file that you just save, then click open to log in.
Enable password authentication
Password authentication is turned off by default. This creates a lot of limitations if I want to ssh into the server directly/immediately. Hence it is necessary to turn the password authentication back on.
To do that, we first need to login as the root user. We can do this by creating a password for the root user
passwd root
then login as root using the password
su root
we then need to turn on password authentication. That is located in the file sshd_config, hence
vi /etc/ssh/sshd_config
find the line
PasswordAuthentication no
and turn the “no” to “yes”. After that, reload sshd
sudo /sbin/service sshd restart
Next time when you log back in, test using root and an inquiry for user password should be seen.
Web content migration
Now it’s for the part of content migration. I was able to achieve the migration through a WordPress plugin called “All in one wp migration”(ai1wm for short)
This plugins allow the user to export, import and backup thier website.
However, after the version 6.8 update, the plugin started a limit on the maximum size of the website upload, which creates problems if you have a big website.
On top of that, WordPress itself also has an uploading size restriction. So to mitigate all these problems, we have to download:
- the v6.7 of ai1w
- and a plugin called “tuxedo big file upload”
which unlocks both upload restrictions.
To start migration, you would need ai1wm on both servers(old and new).
You first export your website into a .wpress file via export in ai1wm:
Then, you will install the ai1wm v6.7 on your new server. Through either server panel or ssh, you will need to upload the .wpress file you’ve just created into the ai1wm backup folder:
Default path: /www/wwwroot/your_website_url/wp-content/ai1wm-backups
After that, go to the backup tab in ai1wm and restore the backup that just showed up.
You’ll need reload your website again, and enter the username and password of the old website, and your’re good to go.
Hi, I'm Anthony
..and this is my book.
I want to compile my writings/blogs into book form. The idea being that if I die, I have left the world with something to read on.
This website will be divided into three sections:
If you don't know where to start, check out this
Here is my other links:
Github | Linkedin
What's new:
Legal stuff
© 2023 Anthony Zhang. All rights reserved.