Description, Design a logger system that receive stream of messages along with its timestamps, each message should be printed if and only if it is not printed in the last 10 seconds.

Given a message and a timestamp (in seconds granularity), return true if the message should be printed in the given timestamp, otherwise returns false.

It is possible that several messages arrive roughly at the same time.

Example:

Logger logger = new Logger();

// logging string "foo" at timestamp 1
logger.shouldPrintMessage(1, "foo"); returns true; 

// logging string "bar" at timestamp 2
logger.shouldPrintMessage(2,"bar"); returns true;

// logging string "foo" at timestamp 3
logger.shouldPrintMessage(3,"foo"); returns false;

// logging string "bar" at timestamp 8
logger.shouldPrintMessage(8,"bar"); returns false;

// logging string "foo" at timestamp 10
logger.shouldPrintMessage(10,"foo"); returns false;

// logging string "foo" at timestamp 11
logger.shouldPrintMessage(11,"foo"); returns true;

Native HashMap solution, O(1) timeComplexity

the algorithm is described as below,

Initiate a hashmap, where the key is the log content and the value is the log’s timestamp.

for every new incoming log, we check if it is already in the hashmap.

if no, we know it is a newly appeared log. we print it and store it into the hashmmap.

if yes, we check the timestamp of the log in hashmap. if the time difference is greater than 10-min, we print the log, and do an update to set the value to current timestamp. if the time difference is smaller than 10-min, we do not print the log, and do nothing.

This data structure works well if you are lucky that most of the logs you recieved are having the same content, so that the hashtable won’t grow big.

When the hashmap grow big the collision will happen, and it starts to degereate to linkedlists (or BSTs in Java 8).

Doubly LinkedList + HashMap solutions O(n) timeComplexity

So ideally, we want to limit the size of the hashmap. In other word, the problem is demanding a data structure that,

  1. we maintain a timeline linkedlist that all logs are wrapped as object and get stored in chronical order.
  2. we also maintain a hashmap where the key is the log content, the value is a ref to the log object stored in the linkedlist.
  3. for every new incoming log, we need to check whether it is in the timeline list by looking up the hashmap in O(1) time complexity. If no, print it and add it to the end of the timeline list. If yes, we check the timestamp difference, if it is smaller than 10-min, we do not print the newly received log, and do nothing. if it is greater than 10-min, we do two things – 1) all logs before the existinglog (including it) are older than 10-min, we remove them from the timeline list by setting the linkedlist HEAD to the existing log’s next log object in O(1) time complexity. and we shrink the hashmap by removing all the older logs’s reference out from the hashmap in O(n) time complexity, where n is the size of the current hashmap. 2) we print the log and add it to the end of the timeline list.

This solution is basically avoid store the full logs, which most of them could be useless.

The performance bottleneck is the hashmap shrink, which is of O(n) and blocks the incoming log handling.

class DoubledLL(object):
    def __init__(self, val, message):
        """
        Doubled LinkedList
        """
        self.prev = None
        self.next = None
        self.val = val
        self.message = message

class Logger(object):

    def __init__(self):
        """
        Initialize your data structure here.
        """
        self.hm = dict()
        self.head = DoubledLL(0, 'HEAD')
        self.tail = DoubledLL(0, 'TAIL')
        self.head.next = self.tail
        self.tail.prev = self.head
        
    def shouldPrintMessage(self, timestamp, message):
        """
        Returns true if the message should be printed in the given timestamp, otherwise returns false.
        If this method returns false, the message will not be printed.
        The timestamp is in seconds granularity.
        :type timestamp: int
        :type message: str
        :rtype: bool
        """

        if message not in self.hm:
            self._updateHMappendNodeToTail(message, timestamp)
            return True # logging it
        else:
            existingNode = self.hm.get(message)
            timestampToCut = existingNode.val
            if timestamp - timestampToCut >= 10:
                self._setNewHeadandCleanHM(existingNode)
                self._updateHMappendNodeToTail(message, timestamp)
                return True
            else:
                return False

                
    def _updateHMappendNodeToTail(self, message, timestamp): 
        newNode = DoubledLL(timestamp, message)
        self.hm[message] = newNode
        
        #adding node to the tail
        self.tail.prev.next = newNode
        newNode.prev = self.tail.prev
        self.tail.prev = newNode
        newNode.next = self.tail

    
    def _setNewHeadandCleanHM(self, existingNode):
        #remove all node before it from the hashmap
        cur = self.head.next
        while cur != existingNode:
            self.hm.pop(cur.message)
            cur = cur.next
        
        # GC will clean the nodes before it
        self.head.next = existingNode.next
        existingNode.next.prev = self.head
        
        


# Your Logger object will be instantiated and called as such:
# obj = Logger()
# param_1 = obj.shouldPrintMessage(timestamp,message)

imgRuntime: 232ms

Doubly LL + Weakref Data Caching O(1) time complexity per query

Using Weakref to do the data caching.

import weakref

class DoublyLLNode(object):
    def __init__(self, timestamp, message):
        self._timeStamp = timestamp
        self._message = message
        self.nxt = None
        self.prev = None

class Logger(object):

    def __init__(self):
        """
        Initialize your data structure here.
        """
        self._cache = weakref.WeakValueDictionary()
        self.HEAD = DoublyLLNode(0, "HEAD")
        self.TAIL = DoublyLLNode(0, "TAIL")
        self.HEAD.nxt = self.TAIL
        self.TAIL.prev = self.HEAD

    def shouldPrintMessage(self, timestamp, message):
        """
        Returns true if the message should be printed in the given timestamp, otherwise returns false.
        If this method returns false, the message will not be printed.
        The timestamp is in seconds granularity.
        :type timestamp: int
        :type message: str
        :rtype: bool
        """
        
        if message in self._cache:
                tempNode = self._cache[message]
                if timestamp - tempNode._timeStamp >= 10:
                    self._resetHead(tempNode)
                    node = DoublyLLNode(timestamp, message)
                    self._addNodeToTail(node)
                    return True
                else:
                    return False                                    
        else:
            node = DoublyLLNode(timestamp, message)
            self._addNodeToTail(node)
            return True
        
    def _resetHead(self, node):
        """
        : method that sets new head of the doubly ll to node.nxt of input node.
        : type node: DoublyLLNode
        : rtype    : void
        """
        self.HEAD.nxt = node.nxt
        node.nxt.prev = self.HEAD
    
    def _addNodeToTail(self, node):
        """
        : method that add input node to the end of the doubly ll.
        : type node: DoublyLLNode
        : rtype    : void
        """
        self.TAIL.prev.nxt = node
        node.prev = self.TAIL.prev
        self.TAIL.prev = node
        node.nxt = self.TAIL
        self._cache[node._message] = node
        

# Your Logger object will be instantiated and called as such:
# obj = Logger()
# param_1 = obj.shouldPrintMessage(timestamp,message)

imgRuntime: 282ms