Motivation: Improve Image Quality for Quantification and Presentation
I have worked extensively with image analysis problems on scientific image data. I always wondered how much the latest machine learning could help, but it seemed like too much of a side track from the task at hand to fully investigate the latest machine learning models’ application to my data. Now, I have more time to improve my skills and I have chosen to focus on learning via projects rather than coursework. Therefore, I set out to replicate the latest machine learning results on image enhancement via super-resolution neural networks, a technique I suspected could help other scientists who deal with images from scientific instruments often but may not know as much about machine learning.
Finding the Model
For this project, I have found the latest image resolution enhancement neural networks as described here. I chose to focus on testing the performance of the Densely Residual Laplacian Super Resolution Network (DRLN), which was released in late 2019 and showed good performance across many standard tests. I used a modified version of the DRLN code that was posted to huggingface because it appeared to be designed for ease of use, including helpful scripts for testing the code on Google Colab Jupyter notebooks.
Finding the Data
I wanted to see if the features the DRLN found on a standard dataset of scenes from normal life could be used to enhance the resolution of scanning electron microscope images (SEM). I chose to test this theory on SEM images because capturing high resolution SEM images can often be difficult when the sample is not prepared ideally for the nanoscale imaging via bombardment by electrons. Sometimes, authors may wish the image could be slightly better for their publication. Additionally, a generous team released a batch of over 15,000 SEM images to the public for free use.
Testing the Model
I tested the applicability of the pre-trained DRLN on the SEM data by down-sampling the original images 4x using area interpolation in openCV, then upsampling them back 4x using both bicubic interpolation and the DLRN, then computing the structural similarity (SSIM) and peak signal to noise ratio (PSNR) of the upsampled images vs. the original images. The SSIM and PSNR are two common metrics for evaluating image quality against a reference image after algorithmic manipulation, especially in the super resolution field. SSIM ranges from 0-1. Typically, PSNR ranges from 10-100. Higher values of SSIM and PSNR would indicate closer matching to the original reference image.
The code is available for download on Github. First, I created the validate_drln.py file hand the helper.py module. You would need to download the SEM images to your computer, download this code and install the necessary packages, then point modify the validate_drln.py file to point to your folder containing all the unzipped SEM images. On my computer, I wrote this code specifically to use my GPU for the validation, which took the run time from an estimated 56 hrs to under 5 hours. The code saves a csv file containing the filename and image quality metrics for all the files. You can see an example below.
Evaluating the Performance
The drln_on_sem_analysis.py runs a statistical analysis on the results of the model testing. First, it plots histograms of the differences in SSIM and PSNR for the two sampling methods. From the histogram, we can see that there is generally a noticeable improvement in SSIM but a slight decrease in PSNR.
The code groups the results table based on the SEM image category, then counts the number of samples, computes the difference in SSIM and PSNR between the two upsampling methods, and also performs the students t-test to compare the mean SSIM and PSNR for the two methods. The results are summarized in a single table as shown below.
As we can see, this dataset has many categories of sample types. The ssim_diff_mean shows the average SSIM is higher for DLRN for every category, but the PSNR is lower for most of the categories. This would suggest that overall, the upsampled SEM images using the pretrained DLRN model are more or less comparable to the images using a standard method based solely on these metrics. Looking more specifically, the Powder and Tips categories must have more similar of features to typical life scenes, because they both show statistically significant improvements in both metrics.
Since the PSNR appears to change the most across categories, lets look at some examples of best and worst cases based on the PSNR changes across categories and images.
For the best PSNR category, lets look at the Tips images with best and worst PSNR change. Here, we can see that for the best case, the image looks accurate and much sharper. For the worst case, the image looks sharper and clearer but it is not accurate since it falsely bends the tip.
For the worst PSNR group, lets look at the Films category. Here, both the best and worst PSNR cases in this category look clearer to me. The only issue I see is a false line going up the center feature. In this case, I would say the DLNR result is still better.
In conclusion, the pretrained DRLN showed statistical improvement for some image categories but not others. In my opinion, all the images looked much clearer. In the future, I may try training the DRLN on a subset of the images and comparing the improvement statistics against those of the pretrained model. Overall, the statistics and the visual comparisons suggest to me that the DRLN has value for improving SEM image quality.