After three grad classes that covered virtually-indexed, physically-tagged (VIPT) caches, I think I might understand them. A good test of that is seeing if I can explain them.
Starting from the beginning, then.
Accessing main memory is slow relative to the speed of a processor cycle, so finding ways to increase the speed of load/store instructions can significantly increase performance. Caches are a great way to do this; we put a small cache (or series of caches) in the processor, and by some policy, keep some data there. Access times in caches are orders of magnitude faster than accessing main memory.
Virtual memory, something essentially all operating systems use, is tricky. Data is ultimately stored on a physical system at a particular physical address. Processes and the OS largely work with virtual addresses, so we need a way not only to translate virtual addresses to physical addresses, but also to do it quickly.
So, we re-use a favorite trick of computing systems performance: we introduce a cache, namely the translation lookaside buffer, also known as the TLB. This is just a cache that contains virtual to physical address mappings so that we don’t constantly need to load the mappings from the page tables, which reside in memory.
So how does a cache work with the TLB? Let’s consider what needs to happen if we make a load request.
Our problem is that we now have to check a series of caches sequentially. This leads to the idea of a VIPT: what if we figure out how we can overlap the cache and TLB lookups?
So, if we’re going to do the lookups in parallel, that means we need to be indexing into both the cache and the TLB with the virtual address. Since different OS processes could use the same virtual addresses, that means we would need an additional check to make sure that the entry we retrieve in a cache is the right one for our request.
So, we add some more metadata to the cache. Beyond an index and some other metadata (e.g. valid bit), we’ll also store a tag. In particular, we store the physical tag, which is essentially the result of the TLB lookup.
What does this gain us? We can now do the lookups in parallel. Once the lookups are done, we do a quick check – does the tag stored in the cache line up with the physical address returned by the TLB lookup? If so, great – it was a cache hit. If not, it was a miss, and we’ll have to handle that.
What does this cost us? Some additional hardware (the cache needs to store a tag) and some complexity (we need to check the tag).