My last post was on simming an L1 cache. Being pleased with how it came out, I thought I’d say a little about its construction.
The cache implementation is:
interface CacheLine {
tag: number | null; // main-memory address
last_used: number;
}
type Program = () => Generator<number, undefined, undefined>;
export class Cache {
sets: number;
ways: number;
data: CacheLine[][];
hits: number;
misses: number;
evictions: number;
id: number;
generation: number;
last_address: number | null;
program: Program;
constructor(sets: number, ways: number, program: Program) {
this.sets = sets;
this.ways = ways;
this.data = [];
this.hits = 0;
this.misses = 0;
this.evictions = 0;
this.generation = 0;
this.last_address = null;
this.program = program;
this.id = 0;
for (let i = 0; i < sets; i++) {
if (this.data[i] === undefined) this.data[i] = [];
for (let j = 0; j < ways; j++) {
this.data[i][j] = {
tag: null,
last_used: 0,
};
}
}
}
reset(id: number) {
this.id = id;
this.hits = 0;
this.misses = 0;
this.evictions = 0;
this.generation = 0;
this.last_address = null;
for (let i = 0; i < this.sets; i++) {
if (this.data[i] === undefined) this.data[i] = [];
for (let j = 0; j < this.ways; j++) {
this.data[i][j] = {
tag: null,
last_used: 0,
};
}
}
}
access(byte_address: number) {
this.last_address = byte_address;
this.generation += 1;
const setIndex = this.getSetIndex(byte_address);
const set = this.data[setIndex];
const address = (byte_address >> 6) << 6;
for (const way of set) {
if (way.tag === address) {
this.hits += 1;
return;
}
}
this.misses += 1;
let least_recently_used: number = Number.MAX_SAFE_INTEGER;
let least_recently_used_index: number = 0;
for (const [i, way] of set.entries()) {
if (way.tag === null) {
this.data[setIndex][i] = {
tag: address,
last_used: this.generation,
};
return;
} else {
if (way.last_used < least_recently_used) {
least_recently_used = way.last_used;
least_recently_used_index = i;
}
}
}
this.evictions += 1;
this.data[setIndex][least_recently_used_index] = {
tag: address,
last_used: this.generation,
};
}
getSetIndex(address: number) {
const set = (address >> 6) & 0b111111;
return set;
}
}
Let’s break it down. The definition of a cache line is:
interface CacheLine {
tag: number | null; // main-memory address
last_used: number;
}
The tag field tells us whether a cache line is unused (tag === null)
or whether it is used in which case the tag is the main-memory address of the
cache line.
In a given set the simulation implements a least-recently used (LRU) cache, so we record the generation the cache line / address was last accessed.
type Program = () => Generator<number, undefined, undefined>;
This is our “program” definition. Those familiar with javascript will recognise this as being the type of a javascript generator. I will come back to this later.
The cache is then defined with a javascript class:
export class Cache {
sets: number;
ways: number;
data: CacheLine[][];
hits: number;
misses: number;
evictions: number;
id: number;
generation: number;
last_address: number | null;
program: Program;
constructor(sets: number, ways: number, program: Program) {
this.sets = sets;
this.ways = ways;
this.data = [];
this.hits = 0;
this.misses = 0;
this.evictions = 0;
this.generation = 0;
this.last_address = null;
this.program = program;
this.id = 0;
for (let i = 0; i < sets; i++) {
if (this.data[i] === undefined) this.data[i] = [];
for (let j = 0; j < ways; j++) {
this.data[i][j] = {
tag: null,
last_used: 0,
};
}
}
}
reset(id: number) {
this.id = id;
this.hits = 0;
this.misses = 0;
this.evictions = 0;
this.generation = 0;
this.last_address = null;
for (let i = 0; i < this.sets; i++) {
if (this.data[i] === undefined) this.data[i] = [];
for (let j = 0; j < this.ways; j++) {
this.data[i][j] = {
tag: null,
last_used: 0,
};
}
}
}
access(byte_address: number) {
this.last_address = byte_address;
this.generation += 1;
const setIndex = this.getSetIndex(byte_address);
const set = this.data[setIndex];
const address = (byte_address >> 6) << 6;
for (const way of set) {
if (way.tag === address) {
this.hits += 1;
return;
}
}
this.misses += 1;
let least_recently_used: number = Number.MAX_SAFE_INTEGER;
let least_recently_used_index: number = 0;
for (const [i, way] of set.entries()) {
if (way.tag === null) {
this.data[setIndex][i] = {
tag: address,
last_used: this.generation,
};
return;
} else {
if (way.last_used < least_recently_used) {
least_recently_used = way.last_used;
least_recently_used_index = i;
}
}
}
this.evictions += 1;
this.data[setIndex][least_recently_used_index] = {
tag: address,
last_used: this.generation,
};
}
getSetIndex(address: number) {
const set = (address >> 6) & 0b111111;
return set;
}
}
This most important part of this is probably the access
method. Given a main-memory address this method looks
up the set, measures a hit / miss and evicts where necessary.
So what does a program look like? As I said before, the program is represented with a javascript generator. An example program might look like this:
export function linearAccess(
offset: number,
n: number,
stride: number,
loops: number
) {
return function* () {
let j = 0;
while (j < loops) {
let i = 0;
while (i < n) {
yield offset + i * stride;
i += 1;
}
j += 1;
}
};
}
And indeed that is the program used in the previous post: consecutive memory access modulo some stride. The “program” simply yields addresses.
This is basically equivalent to an actual program you might write with the
arr living at some offset in memory:
let arr = [];
let j = 0;
while (j < loops) {
let i = 0;
while (i < n) {
arr[i * stride];
i += 1;
}
j += 1;
}
Aside: I wonder if there is some clever trick I could play with javascript proxies to write effectively that but get addresses?
Okay, cool. What I’ve not shown yet is how the access function of the
Cache is invoked…how is our simulation actually run? I have wrapped
the simulation in a CacheView.svelte component. I will only show
the <script> part of the component…the rest is simply the HTML
to draw the table and styling:
<script lang="ts">
import { frame } from '$lib/frame';
import type { Cache } from './cache';
export let cache: Cache;
let scroll: boolean = true;
let run_id: number = 0;
async function run() {
const id = (run_id += 1);
console.log(`Thread ${id}: starting`);
cache.reset(id);
for (const address of cache.program()) {
await frame();
if (run_id !== id || cache.id !== id) {
console.log(`Thread ${id}: exiting`);
return;
}
cache.access(address);
cache.data = [...cache.data];
}
}
function reset() {
run_id = 0;
cache.reset(0);
cache.data = [...cache.data];
}
</script>
A couple of immediate things to note:
- We have a
runfunction. Calling this starts the simulation. - The
runfunction isasync
For the dual purposes of not blocking the main thread
and “animating” the simulation, the run function is
async. Within the for loop the first thing we do
is await frame(). What is frame? It’s simply a
wrapper around requestAnimationFrame in a a way that
is lovely to use from an async function:
export function frame(): Promise<number> {
return new Promise((resolve) => requestAnimationFrame(resolve));
}
So the operation goes like this:
- The user clicks the
Runbutton which kicks off therunfunction. - The
runfunction then iterates over thecache.program(again a generator) whichyields main-memory addresses. - After waiting for an animation frame the address is passed to
cache.accesswhich invokes the logic of the cache. - The
CacheViewthen redraws using the usualsveltereactivity:
...
<div class="buttons">
<button on:click={(ev) => run()}>Run</button>
<button on:click={(ev) => reset()}>Reset</button>
</div>
<div>
<div>Hits: {cache.hits}</div>
<div>Misses: {cache.misses}</div>
<div>
Miss rate: {cache.misses + cache.hits === 0
? '-'
: ((cache.misses / (cache.misses + cache.hits)) * 100).toFixed(2)}%
</div>
<div>Evictions: {cache.evictions}</div>
</div>
...
Pretty straightforward and very neat I think, hence I am writing about it.
There are a some changes I would like to make:
- Given the
await frame()after every memory address is yielded the simulation is stuck at 60-addresses per second (assuming your browser is drawing at 60 FPS). For a larger example this is probably a bit slow, so I want to have some multiplier (i.e. N number of accesses per frame). - It would be nice to step through a simulation rather than the continuous run until completion.
- I think it could generally use some colour / animation.
With those changes in place I’ll be ready for part 2 of the series. Stay tuned.