Redis

2 minute read

  • is an in-memory key-value store, that can achieve microsecond-level data access latency. It is RAM-based, so RAM access is at least 1000 times faster than random disk access
  • uses hashtable to hold key-value pairs, lookup O(1)
    • key = pointer to string, value = pointer to strings / lists / hashes / sets / sorted lists
  • use when read-heavy
  • no locking

Why is redis so fast

  • single threaded execution loop, IO multiplexing
  • everything is in memory
    • RAM access is at least 1000 times faster than random disk access
  • efficient lower-level data structures

Cons:

  • If have a bad redis query, can block everything else

Functions:

  • data structures for the values
  • operations allowed on the data structures (supports CRUD)
  • data persistence
  • high availability
    • uses leader-follower replication to achieve this

Usages:

  • recording number of clicks and comments for each post (hash)
  • sorting commenting user list and deduping the users (zset)
    • zset contains unique keys and values(float only), sorted and accessible by values and order by which the items in the zset are sorted
    • different from hash as hash can only be accessed by values
  • caching user behaviour history and filtering malicious behaviour
  • storing boolean information of extremely large data into small space, eg. login status, membership status (bitmap)

Data Persistence

2 ways to save Redis data to disk:

  • AOF (Append-Only File)
  • RDB (Redis Database)

AOF

Unlike a write-ahead log, the Redis AOF log writes after commands run. Redis runs commands to change the data in memory first. Then it writes the commands to the log file. AOF logs the commands instead of the data. This simpler design makes recovering data easier. Additionally, AOF records commands after they have executed in memory. So it does not block current write operations.

RDB

The limitation of AOF is that it saves commands instead of data. When we use the AOF log to recover data, the whole log must be scanned. When the log size is large, Redis takes longer to recover. So Redis offers another way to save data - RDB.

RDB records snapshots of data at certain times. When the server needs recovery, the data snapshot can load directly into memory for fast recovery.

Step 1: The main thread forks the “bgsave” sub-process, which shares all the in-memory data. “bgsave” reads the data from the main thread and writes it to the RDB file.

Steps 2 and 3: If the main thread changes data, a copy is created.

Steps 4 and 5: The main thread then operates on the copy. Meanwhile, the “bgsave” sub-process continues writing data to the RDB file.

Recovery from disk takes too long. For most uses where Redis is used as a cache, using Redis replication and promoting the replica if the main Redis server fails catastrophically is better.

I attempt to implement redis in Python here

References:

  • https://blog.bytebytego.com/p/a-crash-course-in-redis
  • https://blog.bytebytego.com/p/why-is-redis-so-fast