Detecting Changes in the File System in Real Time with Watchdog

/images/detecting-changes-in-the-file-system-in-real-time-with-watchdog/watchdog-tk-python.gif

(The source code for this program is at the bottom.)

Watchdog is a cross-platform Python library that allows you to monitor file system events in real time. It is very useful for automating tasks if we want our program to execute an operation when a file is modified, deleted, moved, etc. Let's see how it works by creating a simple program that logs events for files in a folder.

Before we start, let's install the package via pip by running in the terminal:

python -m pip install watchdog

(Windows users should use py instead of python.)

Let's verify that it was installed correctly by typing in the interactive shell:

>>> import watchdog

If it doesn't throw any errors, the installation was successful.

There are two main concepts in Watchdog: the observer and the event handler. The observer is a class in charge of monitoring the events that have occurred in one or several directories. When the observer detects an event in the monitored folder, it dispatches the event to another class called the event handler. Usually our sole task will be to implement the event handler, to be able to respond with our own code to a creation, deletion, modification or movement of a file, while the observer is provided by Watchdog via the watchdog.observers.Observer class.

Considering these two concepts, let's start with some basic code that prints a message to the console when a file is modified in the directory where the program is running:

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class MyEventHandler(FileSystemEventHandler):
    def on_modified(self, event):
        print(event.src_path, "modified.")
observer = Observer()
observer.schedule(MyEventHandler(), ".", recursive=False)
observer.start()
try:
    while observer.is_alive():
        observer.join(1)
except KeyboardInterrupt:
    observer.stop()
observer.join()

In lines 11 and 12 we create an instance of the observer and via the schedule() method we assign a handler (MyEventHandler) to respond to the events that occur in the current folder ("."), without considering subfolders (recursive=False). If we wanted to monitor, for example, all events for every file and folder on disk, we could say:

# Monitor all events on local drive C: (on Windows).
observer.schedule(MyEventHandler(), "C:\\", recursive=True)

In line 13 we call the start() method, which initializes the observer. It is important to note that the observer runs in a secondary thread (hence the start() method of the threading.Thread standard class) since consequently our event handler will also be called from the same secondary thread.

Lines 14-18 allow the observer to be stopped by pressing CTRL + D (Linux and macOS) or CTRL + C (Windows). The last line (19) waits for the observer thread to finish before terminating the program.

In lines 5-8 we create the class MyEventHandler inheriting from watchdog.events.FileSystemEventHandler, which is a helper base class provided by Watchdog to implement our own event handlers. The on_modified() method will be invoked by Watchdog every time a file is modified within the directory where we have installed the observer (".", which represents the directory where our program is located). The event argument will be an instance of watchdog.events.FileSystemEvent, which primarily has the following attributes:

  • is_directory, a boolean indicating whether the object raising the event is a folder;

  • src_path, the path of the event file or folder.

Thus, in the code we use event.src_path to print the path of the file or directory that has been modified. To know when a file system object has been created, moved or deleted, we have the on_created(), on_moved() and on_deleted() methods.

class MyEventHandler(FileSystemEventHandler):
    def on_modified(self, event):
        print(event.src_path, "modified.")
    def on_created(self, event):
        print(event.src_path, "created.")
    def on_moved(self, event):
        print(event.src_path, "moved to", event.dest_path)
    def on_deleted(self, event):
        print(event.src_path, "deleted.")

Note that in the on_moved() method the event has an additional attribute, dest_path, which indicates the new path of the moved file or folder. However, on_moved() is actually called when a file or folder is renamed. Instead, when a file is moved to another location, the event received is on_deleted(), just like when it is deleted. Conversely, when a file is moved from another location to the directory we are monitoring with Watchdog, the event is on_created(), like when a new file is created.

It is common to want to get only the file name of the event, without the full path. For this we can use the functions of the standard class pathlib.Path:

from pathlib import Path
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class MyEventHandler(FileSystemEventHandler):
    def on_modified(self, event):
        filename = Path(event.src_path).name
        print(filename, "modified.")

Or to get just the path, without the file name:

    def on_modified(self, event):
        path = str(Path(event.src_path).parent)
        print(path, "modified.")

There is also the on_any_event() method, which is executed, as the name indicates, when any of the previous events occurs:

class MyEventHandler(FileSystemEventHandler):
    def on_any_event(self, event):
        print("An event has ocurred.")

This method is useful when we want to execute some action common to all events. The event.event_type attribute indicates the registered event type, whose values can be:

  • watchdog.events.EVENT_TYPE_CREATED

  • watchdog.events.EVENT_TYPE_DELETED

  • watchdog.events.EVENT_TYPE_MODIFIED

  • watchdog.events.EVENT_TYPE_MOVED

It is not possible to cancel or modify any of the registered events. Watchdog only allows you to monitor file system operations, not to alter them.

Sample Code: Log File System Events with Tk

The following code (preview at the beginning of the article) logs any operation executed in the folder where the program is executed, and displays them within a tree view in a Tk desktop application.

from pathlib import Path
from tkinter import ttk
import datetime
import queue
import tkinter as tk
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from watchdog.events import (
    EVENT_TYPE_CREATED,
    EVENT_TYPE_DELETED,
    EVENT_TYPE_MODIFIED,
    EVENT_TYPE_MOVED
)
class MyEventHandler(FileSystemEventHandler):
    def __init__(self, q):
        # Save a reference to the queue so it can be accessed
        # by on_any_event().
        self._q = q
        super().__init__()
    def on_any_event(self, event):
        # Figure out the name of the event.
        action = {
            EVENT_TYPE_CREATED: "Created",
            EVENT_TYPE_DELETED: "Deleted",
            EVENT_TYPE_MODIFIED: "Modified",
            EVENT_TYPE_MOVED: "Moved",
        }[event.event_type]
        # If it is a movement, append the destination path.
        if event.event_type == EVENT_TYPE_MOVED:
            action += f" ({event.dest_path})"
        # Put the event information in the queue to be processed
        # by loop_observer() in the main thread.
        # (It is not convenient to modify a Tk widget from a
        # secondary thread.)
        self._q.put((
            # Name of the modified file.
            Path(event.src_path).name,
            # Action executed on that file.
            action,
            # The current time.
            datetime.datetime.now().strftime("%H:%M:%S")
        ))
def process_events(observer, q, modtree):
    # Make sure the observer is still running.
    if not observer.is_alive():
        return
    try:
        # Try to get an event from the queue.
        new_item = q.get_nowait()
    except queue.Empty:
        # If there is no event, just continue.
        pass
    else:
        # If an event was retrieved from the queue, append insert it
        # into the treeview.
        modtree.insert("", 0, text=new_item[0], values=new_item[1:])
    # Check again in half a second (500 ms).
    root.after(500, process_events, observer, q, modtree)
root = tk.Tk()
root.config(width=600, height=500)
root.columnconfigure(0, weight=1)
root.rowconfigure(0, weight=1)
root.title("Real Time Event Logging")
modtree = ttk.Treeview(columns=("action", "time",))
modtree.heading("#0", text="File")
modtree.heading("action", text="Action")
modtree.heading("time", text="Time")
modtree.grid(column=0, row=0, sticky="nsew")
# Watchdog event observer.
observer = Observer()
# This queue acts as a communication channel between the observer
# and the Tk application. For a more detailed explanation about
# queues and Tk, see https://pythonassets.com/posts/background-tasks-with-tk-tkinter/
q = queue.Queue()
observer.schedule(MyEventHandler(q), ".", recursive=False)
observer.start()
# Schedule the function that processes the observer events.
root.after(1, process_events, observer, q, modtree)
root.mainloop()
observer.stop()
observer.join()