Building and using Python Decorators

Published: 2023/3/12 Views: 2279

Categories:

Development

Be concise and elegant with Python decorators.

The Philosophy of Decorators

When writing programs, we often encounter situations where we have written a lot of functions with similar functionality, but we suddenly realize that we need to perform additional operations on these functions. Of course, we can write these additional operations in each function, but doing so would make the code redundant and difficult to maintain. In this case, we can use decorators to solve this problem.

The essence of a decorator is actually a function that can "decorate" other functions, that is, it can add additional functionality to them or monitor the execution of functions without changing the original function code.

Using Decorators

Decorators in Python are used with the @ symbol and come in two basic forms, with or without arguments.

Python has many built-in decorators, such as @dataclass to automatically implement the __init__ method when creating a class, or @property to implement getter and setter methods for properties. There are also decorators like @staticmethod and @classmethod to implement static methods and class methods.

Decorators with arguments include @app.route('/path') in the popular Flask web framework, which binds a function to a URL so that the function is executed when the user visits that URL.

Building Decorators

When I first encountered these decorators, I thought they were magical - just add a hat to your own function and you can achieve specific logic. Now that I have been learning this language for some time, I think I can explain the principles behind these decorators in my own words. I have always believed that using "Hello World" to explain the principles of a programming language is not enough, because it is too simple and people cannot understand the convenience that this syntax structure can bring in real use.

Data Download and Reading Requirements

For example, I have a complex function that retrieves data from a network database, performs some formatting, and passes the results to the next function for further processing. The code for this function is as follows:

Because I'm working on a data analysis project, my analysis code is constantly being improved, so I have to rerun this function every time I rewrite the analysis steps. Sometimes the data I'm retrieving is very large and the internet connection is slow, so it can take a long time to download the data.

My first thought was to modify the function to save the downloaded data to my local machine. If you've learned pandas before, you might be familiar with this method:

Then, when I need to call the data later, I can write a separate function to read the data:

This method temporarily solved my problem. However, as my project progressed, I found that I had to retrieve more and more data with different parameters. For example, for the get_data function mentioned above, the url and date range can change, and there may be other parameters as well. Do I have to write a separate read_data function for each data retrieval function?

Caching Mechanism

At this point, I thought of finding a way to abstract the logic of data storage and retrieval into a function, and then have each data retrieval function execute the storage and retrieval logic at the same time. This is the caching mechanism used by web browsers - when you visit a webpage, the browser first checks if there is a cache on the local device, and if so, it uses the cache directly. If not, it downloads it from the server. This reduces network requests and improves webpage loading speed.

In my data analysis project, I can do the same thing. When I pull the data for the first time, I save it locally, and the next time I pull the same data, I retrieve it directly from the local cache instead of pulling it from the network again. This reduces network requests and improves data retrieval speed.

There are some details to consider here. In the caching mechanism of a web browser, when the browser visits a URL, it first asks the server if the data corresponding to the URL has been updated. If it has, it will be re-downloaded, otherwise, it will use the local cache. This means that the browser first makes a small request to get information on whether the data has been updated. However, in my data retrieval process, I don't have access to the URL access process mechanism, so I have to find a way to determine whether the data has been downloaded. This is simple - I just save each set of parameters used in data retrieval and compare them to the previous set of parameters used. If they are the same, I retrieve the data from the local cache, otherwise, I re-download the data.

This method is not foolproof, as there may be cases where data is updated without any change to the parameters used to access it. However, for my current project, this method is sufficient. The logic has been thought out, and now it's time to implement it. For functions like this, the convenience of decorators is particularly evident.

Implementation

Storage and Retrieval

In Python, we can use pickle to store and retrieve data. pickle is a built-in serialization module in Python that can convert Python objects into byte streams, and then convert the byte stream back into Python objects.

The advantage of pickle is that storage and retrieval are inverse operations, which means that we can store any Python object locally and then retrieve it, without worrying about format conversion issues. Of course, the disadvantage is that it does not store data in a text format, so we cannot view it directly like we can with CSV files.

File Name Hashing

Sometimes, our parameters may contain special characters, or the function may contain unserializable objects, making it impossible to save the parameters directly in the file name. In this case, we can convert the parameters into a hash value. Hashing is a one-way encryption algorithm, which means that we can convert any string into a hash value, but we cannot reverse-engineer the original string from the hash value.

This type of conversion is widely used for password storage, as we can convert a user's password into a hash value and then store the hash value in a database, even if the database is hacked, the hacker cannot reverse-engineer the user's password from the hash value.

We use this conversion method in the caching logic to generate valid and non-repeating file names.

I want to emphasize here that this algorithm is not perfect, which means that different data may generate the same hash value, a situation known as hash collision. However, the probability of hash collisions is extremely small, and for my project, it can be ignored.

Here, I will write a simple hash function to convert the parameters into a hash value.

Writing Decorators

Let's start with a decorator without any arguments.

The usage of this decorator is like this:

After running the function once, we can find the cache data in the data/get_data directory in the working directory.

Now I have higher requirements. I want to specify the folder name for saving data, so we need to add arguments to the decorator. Let's modify the decorator we just wrote.

Now we can specify the cache directory like this:

"@" is Syntactic Sugar

Lastly, the @ symbol is a syntactic sugar in Python. Its purpose is to put the call to the decorator above the function definition, so that the original function doesn't have to be wrapped every time the function is called.

The following code is equivalent to using @ above the function definition:

Tags:

Development

Python

Decorator