2018-06-30

LRU与LFU缓存算法

一、背景

缓存算法也是也是我们日常使用的操作系统、应用程序内部用得比较多的一种调度算法，之前也是了解个过程没具体实现过，刚好LintCode上面刷题看到这两个算法，所以写这篇博客来整理一下LRUCache算法和LFUCache算法的过程和实现。

二、LRUCache算法

1.简单介绍

LRU(Least Recently Used)即最少最近使用算法，这个算法就是把每次都把最近访问或者添加的数据放到一端，当这个存放数据的容器满时，就从另一端开始移除元素，被移除的元素一定是近期没有使用或者较少使用的，其实核心也就是这些概念，可以看出，每次操作容器中的元素时都会有元素的位置改变这样一个额外的操作。

2.具体实现

因为整个容器是有一定的顺序的，要满足这个算法的规则，并且我们也需要快速定位来访问到元素，所以这里的实现的话使用哈希表加双链表的方式来实现，其实Java的标准库就刚好有这样一种数据结构，也就是LinkedHashMap，不过为了更清晰的了解解这个过程，还是自己去做具体的流程控制比较好。所以这里使用HashMap加上自己写的一个辅助类Node来实现。如下

// 双向链表
private class Node {
    Node prev;
    Node next;
    int key;
    int value;
    public Node(int key, int value) {
        this.key = key;
        this.value = value;
        prev = next = null;
    }
}

这个Node类很直观了，用来表示容器中的每一个元素，构造方法传入的就是键值对，然后里面还存储了两个分别指向前一个节点和后一个节点的指针，然后我们还使用一个虚拟的头节点和尾节点方便我们进行访问。实现的话就是实现get()和set()两个方法了，有几个注意点，当get()一个元素，除了获取这个元素对应的value，还需要将这个元素从当前位置移动到整个容器的尾部；而进行set()操作时如果元素存在就更新这个元素的value，然后需要判断当前容器有没有满，满的话就从头节点移除一个元素，然后把新添加的元素移动到底部。这里的实现就是容器底部保留的是最近使用的元素，而顶部就存储的使用较少或没有使用过的元素，可以看到每次操作元素都需要进行一个移动到底部的操作。对应下面的代码

private void moveToTail(Node cur) {
        tail.prev.next = cur;
        cur.prev = tail.prev;
        cur.next = tail;
        tail.prev = cur;
    }

了解链表的操作最好的方式就是去画一下图，这里也就是把传进来的节点放到尾节点的前面，前面说过头节点和尾节点都是虚拟的，所以容器里面真实存储的元素实际上是在第二个节点到倒数第二个节点的这一段，因为是双链表，所以next和prev指针都要更新一下。整个过程也没有太多的操作，思路还是比较简单的，就是链表的维护看起来没那么直观，理解不清楚画一下就出来了，然后看看全部的代码,对应LintCode上面第134号题，重要的地方都有注释

public class LRUCache {
    // 双向链表
    private class Node {
        Node prev;
        Node next;
        int key;
        int value;
        public Node(int key, int value) {
            this.key = key;
            this.value = value;
            prev = next = null;
        }
    }
    private int capacity;
    private Map<Integer, Node> map;
    // 虚拟头节点
    private Node head;
    // 虚拟尾节点
    private Node tail;
    public LRUCache(int capacity) {
        // do intialization if necessary
        this.capacity = capacity;
        this.map = new HashMap<>();
        head = new Node(-1, -1);
        tail = new Node(-1, -1);
        head.next = tail;
        tail.prev = head;
    }
    public int get(int key) {
        // write your code here
        if (map.containsKey(key)) {
            Node cur = map.get(key);
            // 查询到该节点后后取出该节点 移到底部
            cur.prev.next = cur.next;
            cur.next.prev = cur.prev;
            cur.prev = null;
            cur.next = null;
            moveToTail(cur);
            return map.get(key).value;
        } else {
            return -1;
        }
    }
    public void set(int key, int value) {
        // write your code here
        if (get(key) != -1) {
            map.get(key).value = value;
            return;
        }
        // 如果缓存容器满了就从顶部移除 (最近使用的都移动到了底部)
        // 因为头部和尾部都是虚拟节点 所以真实的元素是从head.next开始
        if (map.size() == capacity) {
            map.remove(head.next.key);
            head.next = head.next.next;
            head.next.prev = head;
        }
        Node newNode = new Node(key, value);
        map.put(key, newNode);
        moveToTail(newNode);
    }
    // 将元素移动到底部
    private void moveToTail(Node cur) {
        tail.prev.next = cur;
        cur.prev = tail.prev;
        cur.next = tail;
        tail.prev = cur;
    }
}

三、LFUCache缓存算法

1.简单介绍

LFU（Least Frequently Used)即最近不经常使用算法。这个算法就是核心是记录每一个元素的访问频率，当容器满了的时候，每次移出的都是访问频率较小的元素，如果两个元素的频率相同的话，就先移除添加容器较早的元素。

2.具体实现

对应LintCode上面24号题，这里给出两种实现的方式，第一种是直接用一个Node类保留要查找的键对应的value值、元素访问的次数
和上次访问的时间，感觉没有太多说的，重点就是记录访问频率，直接上代码

public class LFUCache1 {
    private class Node {
        int value;
        // 使用的次数
        int useCount;
        // 上次访问的时间
        long lastGetTime;
        public Node(int value, int useCount) {
            this.value = value;
            this.useCount = useCount;
        }
    }
    private Map<Integer, Node> cache;
    private int capacity;
    public LFUCache1(int capacity) {
        // do intialization if necessary
        this.capacity = capacity;
        this.cache = new HashMap<>();
    }
    public void set(int key, int value) {
        // write your code here
        if (get(key) != -1) {
            cache.get(key).value = value;
            return;
        }
        if (capacity == 0) return;
        // 容器满时移除访问频率最小的元素
        if (cache.size()>=capacity) {
            removeMin();
        }
        Node node = new Node(value, 0);
        node.lastGetTime = System.nanoTime();
        cache.put(key, node);
    }
    public int get(int key) {
        // write your code here
        if (!cache.containsKey(key)) {
            return -1;
        }
        cache.get(key).useCount++;
        cache.get(key).lastGetTime = System.nanoTime();
        return cache.get(key).value;
    }
    // 移除使用频率最小的元素
    private void removeMin() {
        int minCount = Integer.MAX_VALUE;
        long currTime = System.nanoTime();
        int minKey = 0;
        Iterator<Integer> iterator = cache.keySet().iterator();
        while (iterator.hasNext()) {
            int key = iterator.next();
            Node node = cache.get(key);
            if (node.useCount<minCount || (node.useCount==minCount && node.lastGetTime<currTime)) {
                minKey = key;
                minCount = node.useCount;
                currTime = node.lastGetTime;
            }
        }
        cache.remove(minKey);
    }
    public static void main(String[] args) {
        LFUCache1 lfuCache = new LFUCache1(3);
        lfuCache.set(1, 10);
        lfuCache.set(2, 20);
        lfuCache.set(3, 30);
        System.out.print("["+lfuCache.get(1)+", ");
        lfuCache.set(4, 40);
        System.out.print(lfuCache.get(4)+", ");
        System.out.print(lfuCache.get(3)+", ");
        System.out.print(lfuCache.get(2)+", ");
        System.out.print(lfuCache.get(1)+", ");
        lfuCache.set(5, 50);
        System.out.print(lfuCache.get(1)+", ");
        System.out.print(lfuCache.get(2)+", ");
        System.out.print(lfuCache.get(3)+", ");
        System.out.print(lfuCache.get(4)+", ");
        System.out.print(lfuCache.get(5)+"]");
        System.out.println();
        System.out.println(lfuCache.cache.size());
    }
}

这里可以跑一下，出来的结果是符合预期的，这种方式是将节点相关的信息保存在节点内部，并且就使用了HashMap来储存，缺点就是每次移除元素时需要扫一下整个容器，这个操作需要消耗的时间是不太能接受的，当数据量大，操作频繁时，就凸显出来了，不过放到LintCode上面跑也能通过。下面来看到一种比较高效的方式吧。

第二种方法也是把元素的访问次数封装到Node内部，但是使用一个额外的Map来记录节点的访问次数，访问次数作为键，LinkedHashSet作为值，LinkedHashSet里面记录的是元素对应的key，每个元素肯定是唯一的，并且LinkedHashSet是按照插入顺序来进行排列的，所以每次访问到某个元素时，先更新元素Node对应的频率后，还需要更新频率对应的LinkedHashSet，然后使用一个全局变量记录最小的访问频率min，当容器满的时候，就根据这个最小访问频率去获取LinkedHashSet里面对应的第一个键key，因为访问频率相同时，优先移除较早添加的元素，获得这个key后就可以直接去保存Node的容器里面去进行删除基本就是这么一个过程，这样在容器里面删除元素时不需要扫描整个容器，只需要O(1)的复杂度就可以完成这个操作，具体的实现还有一点需要注意的，就是最小访问频率min为更新之前的次数，并且当对应频率的LinkedHashSet为空时，才需更新值为当前的次数，此外每次添加节点时也需要把这个值置为0，因为新节点访问次数默认是0。
重要的地方都说清楚了，然后看看全部的代码

public class LFUCache {
    
    private class Node {
        int value;
        int count;
        public Node(int value, int count) {
            this.value = value;
            this.count = count;
        }
    }
    private int capacity;
    private Map<Integer, Node> cache;
    private Map<Integer, LinkedHashSet<Integer>> freqList;
    private int min;
    public LFUCache(int capacity) {
        // do intialization if necessary
        this.capacity = capacity;
        this.cache = new HashMap<>();
        this.freqList = new HashMap<>();
        freqList.put(0, new LinkedHashSet<>());
        min = -1;
    }
    public void set(int key, int value) {
        // write your code here
        if (get(key) != -1) {
            Node node = cache.get(key);
            node.value = value;
            return;
        }
        // 当容器缓存满时 这段时间使用频率最小的key
        if (cache.size() == capacity) {
            Integer evict = freqList.get(min).iterator().next();
            cache.remove(evict);
            freqList.get(min).remove(evict);
        } 
        // 添加新节点时更新min
        min = 0;
        Node newNode = new Node(value, 0);
        cache.put(key, newNode);
        freqList.get(0).add(key);
    }
    public int get(int key) {
        // write your code here
        if (capacity == 0) return -1;
        if (cache.containsKey(key)) {
            Node node = cache.get(key);
            node.count++;
            // 更新节点对应的频率表
            freqList.get(node.count-1).remove(key);
            if (!freqList.containsKey(node.count)) {
                freqList.put(node.count, new LinkedHashSet<>());
            }
            freqList.get(node.count).add(key);
            // 如果之前对应的频率表为空 就更新最小频率为当前的key
            if (min==node.count-1 && freqList.get(min).isEmpty()) {
                min = node.count;
            }
            return node.value;
        } else {
            return -1;
        }
    }
    public static void main(String[] args) {
          LFUCache lfuCache = new LFUCache(3);
          lfuCache.set(1, 10);
          lfuCache.set(2, 20);
          lfuCache.set(3, 30);
          System.out.print("["+lfuCache.get(1)+", ");
          lfuCache.set(4, 40);
          System.out.print(lfuCache.get(4)+", ");
          System.out.print(lfuCache.get(3)+", ");
          System.out.print(lfuCache.get(2)+", ");
          System.out.print(lfuCache.get(1)+", ");
          lfuCache.set(5, 50);
          System.out.print(lfuCache.get(1)+", ");
          System.out.print(lfuCache.get(2)+", ");
          System.out.print(lfuCache.get(3)+", ");
          System.out.print(lfuCache.get(4)+", ");
          System.out.print(lfuCache.get(5)+"]");
          System.out.println();
          System.out.println(lfuCache.cache.size());
    }
}

可以跑一下，也是一样的结果，比起第一种方法有更高的速度，虽然维护了多的结构，但是影响不是很大，我们对时间的需求一般都是要大的多，所以大多数情况下我们总是愿意以空间换时间。

四、总结

LRUCacahe和LFUCache都是比较经典的缓存算法了。LRUCache完全通过最后一次的访问来进行整体排序，所以同一时间访问较多的元素可能会被后面新来的的元素淘汰；LFUCache则是通过访问次数来进行，所以可能比较大一段时间内访问次数较多的元素会被短时间内访问更多的元素所淘汰。因为有着不同的特性，所以很多时候都是把两种缓存算法混合来使用。这篇博客就先到这里了。

赏

纯属好玩