Daniel Kostrzewa1,2, Lukasz Skonieczny1, Pawel Benecki1,2, Michal Kawulok1,2
1 Future Processing, Bojkowska 37A, 44-100 Gliwice, Poland
{dkostrzewa,lskonieczny,pbenecki,mkawulok}@future-processing.com
2 Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
{daniel.kostrzewa,pawel.benecki,michal.kawulok}@polsl.pl
The reported work is a part of the SISPARE project run by Future Processing and funded by European Space Agency. In addition, the authors were partially supported by Statutory Research funds of Institute of Informatics, Silesian University of Technology, Gliwice, Poland (grants no. BKM-509/RAu2/2017 (DK) and BK-230/RAu2/2017 (MK)).
Gathering a set of images presenting the same scene in different spatial resolution is not a trivial task. Even if a batch of images is captured using a camera or camcorder, we still do not have reliable ground-truth reference to compare with. It is possible to obtain one by taking a picture of the same scene by other device with higher spatial resolution or try to zoom it in using camera lenses, but then we will expose to some unknown transformations and distortions. This is an important issue, if a reference high-resolution image of the same scene having a low-resolution image or set of images is needed. Such requirement often emerges when an SRR algorithm is to be validated. To make this task easier, we have prepared a Generator of Synthetic Images (GoSim) for controlled image generation and degradation using a known model which serves as a reference.
The proposed benchmark is composed in such a way that it is increasingly diffcult to reconstruct images at each subsequent layer. Therefore, the dataset contains four layers: (i) artificial images (Layer 1), (ii) real-life artificially down-graded images (Layer 2), (iii) real-life images obtained under various conditions (Layer 3), and (iv) real-life satellite images (Layer 4).
Layer 1, artificial images, contains 33 sets of images produced by GoSim. Real-life artificially downgraded images layer (Layer 2), consists of 25 scenes in total. This collection is composed of HR images and sets of their downgraded by GoSim counterparts. The dataset includes images of text and logos. Layer 3 is a group of image collections where HR images are obtained under different circumstances than LR images, i.e., different camera bodies, lighting conditions and distances from the photographed objects. Finally, the real-life satellite layer (Layer 4) is composed of images acquired within the Digital Globe, SPOT, Sentinel-2, Landsat 8 and Hyperion EO-1 missions.
Image files are named with the convention which precisely describes presented scene or graphic. Each name consists of a few fields (Fig. 1).
Scene class is the symbol which distinguishes between synthetic (S) and real image (R).
Scene subclass can take values from different sets, depending on the scene class. The following values are used for synthetic images: P - periodic, W - wave, S - sharp, R - ridge, I - impulse, and M - miscellaneous. On the other hand, two values are available for real images: S - satellite image, and P - proprietary photo.
Scene number is the four-digits number of a captured scene. Each group code, which consists of image class and a subclass, has its own numbering.
Name of the scene, besides simple name, may contain: variant or version of the same scene as well as band or channel number.
Resolution class code for both synthetic and real-life non-satellite images can take one of two values: r10 - high resolution image, r20 - low resolution image, or higher values for smaller resolutions (i.e. r30, r40, etc.). For real-life satellite images, the resolution class code is translated to spatial resolution as given in Table 1.
Table 1: The relationship between spatial resolution and resolution class code for real satellite imagesSpatial resolution (meters) | Resolution class code |
---|---|
0.3 | r05 |
0.41 | r07 |
0.5 | r08 |
5 | r09 |
10 | r10 |
15 | r12 |
20 | r15 |
30 | r20 |
60 | r30 |
Image number contains four digits and the value is the sequence number of the image representing the same scene. If the source of the images comes from the same satellite or if the lower resolution is marked as downsampled (see: Additional information).
Band/color information is an optional field which indicates band or color of the image.
Additional information about the image. This field can take a few options: a - indicates whether LR images of the same scene are aligned to each other (images do not require registration), d - means that the image is artificially downscaled (for real image series), and e - image version with adjusted brightness/contrast (enhanced) or registered to another photo to create a usable Ground Truth (GT)-Low Resolution image pair.
The advantage of using synthetically created images is that it is known exactly what could be expected at any level of magnification (up to hardware limitations), because computing precise value for any argument is possible.
This layer contains 33 sets of images produced by our tool. The generated images can be split into six groups.
Periodic category contains signal based on periodic functions like sine (Fig. 2a). There are signals having single frequency in each direction as well as more complex signals consisting of sines with different frequency and magnitude. This kind of signal can be useful for Fourier analysis, as it is known exactly what should be expected - every signal can be decomposed and compared with the original formula. This category also involves compositions which have variable magnitude over spatial domain (denoted as fading one). The images built on periodic, continuous signals look smoothly and do not have sharp discontinuities.
Waves is a similar category to periodic, but its frequency changes with argument (Fig. 2b). This is so-called chirp. These images can also be used to face with aliasing problems, as the original signal frequency can be aliased by lower frequency due to sampling frequency below Nyquist frequency.
Sharp category contains images with discontinuities and hard transitions (Fig. 2c). Signal changes steeply. Reconstructed images should also have sharp edges in discontinuities.
Impulse and Ridge include only one set of images each. The impulse consists a thin white line on black background, while ridge shows blurred thick white line.
Miscellaneous contains images which do not fit in any other category.
![]() |
![]() |
![]() |
(a) | (b) | (c) |
The layer is composed of 25 collections of images. The collections consist of 5 sub-collections each, which results from different degradation methods (Fig. 3), giving 125 sets in total. Every group contains an HR image and its downgraded LR counterparts. The degradation includes the noise, blur and downscaling image.
Note that images cannot be arbitrarily magnified - we are limited to the original source image size, because this images have not been generated using a mathematical model.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
(a) High resolution image | (b) Low resolution image | (c) Image with Gaussian noise | (d) Blurred image | (e) Blurred image with Gaussian noise | (f) Very low resolution image |
The third layer is a group of images collections where images are obtained using:
This group is composed of following scenes: (i) a Vietnamese painting (Painting, Fig. 4a), (ii) logo of Future Processing made of moss (Logo, Fig. 4b), (iii) a building facade (Building, Fig. 4c), (iv) and (v) fragments of a beach photo (Beach, Fig. 4d), (vi) a printed article (Article, Fig. 4e), (vii) a BDAS 2015 proceedings (Book, Fig. 4f).
![]() |
![]() |
![]() |
(a) Painting | (b) Logo | (c) Building |
![]() |
![]() |
![]() |
(d) Beach | (e) Article | (f) Book |
It is worth noting that images were taken in different circumstances. It includes various distances from the photographed object, lighting conditions and with small shifts. The detailed information is given in Table 2.
Table 2: Detailed information on the conditions of photographed objectsNo. | Sene name | Camera | Distance | Lighting | Shifts |
---|---|---|---|---|---|
1 | Logo | Nikon D50, Nikon D7000 | 5.5 m | natural | with |
2 | Building | Sony DSLR-A300 | 170 m | natural | with |
3 | Beach picture | Sony DSLR-A300 | 2.0 m | natural | with |
4 | Beach picture | Canon EOS 5D Mark IV | 1.1 m, 2.6 m | natural | with |
5 | Article | Nikon D50, Nikon D7000 | 0.75 m | natural | with |
6 | Article | Nikon D50, Nikon D7000 | 1.5 m | natural | with |
7 | Article | Nikon D50, Nikon D7000 | 0.75 m | flash | with |
8 | Article | Nikon D50, Nikon D7000 | 1.5 m, 0.75 m | flash | with |
9 | Article | Nikon D50, Nikon D7000 | 0.75 m | natural | without |
10 | Article | Nikon D50, Nikon D7000 | 1.5 m, 0.75 m | natural | without |
11 | Book | Nikon D50, Nikon D7000 | 0.75 m | natural | with |
12 | Book | Nikon D50, Nikon D7000 | 0.75 m | flash | with |
13 | Painting | Nikon D50, Nikon D7000 | 0.75 m | flash | with |
14 | Painting | Nikon D50, Nikon D7000 | 1.5 m, 0.75 m | flash | with |
15 | Painting | Nikon D50, Nikon D7000 | 1.5 m | flash | without |
16 | Painting | Nikon D50, Nikon D7000 | 1.5 m | natural | with |
17 | Painting | Nikon D50, Nikon D7000 | 1.5 m | sidelight | with |
Finally, the real-life satellite image layer is constructed of 15 scenes with various resolutions and different number of images. Table 3 describes in details the gathered images.
Letter 'H' in brackets (Table 3, 'No. of images' column) means that the number is related to the same image in different spectra. As a result, datasets: no. 1-15 are suitable for experiments concerning multiple-image SRR strategies, and no. 6, 10, and 15 seems to be appropriate for hyperspectral SRR techniques validation.
Table 3: The description of collected satellite imagesNo. | Place | Satellite | Spatial resolution (meters) | Resolution class code | No. of images |
---|---|---|---|---|---|
1 | Temecula, California, USA | Sentinel-2 | 10 | r10 | 6 |
Landsat 8 OLI & TIRS | 30 | r20 | 3 | ||
2 | Kauai, Hawaii, USA | Digital Globe WorldView-4 | 0.3 | r05 | 1 |
Sentinel-2 | 10 | r10 | 5 | ||
3 | Bangkok, northern part, Thailand | Sentinel-2 | 10 | r10 | 4 |
Landsat 8 OLI & TIRS | 30 | r20 | 2 | ||
4 | Bangkok, southern part, Thailand | Sentinel-2 | 10 | r10 | 4 |
Landsat 8 OLI & TIRS | 30 | r20 | 2 | ||
5 | Brasilia, Brazil | Digital Globe WorldView-4 | 0.3 | r05 | 1 |
Sentinel-2 | 10 | r10 | 2 | ||
6 | Washington, DC, USA | Digital Globe GeoEye-1 | 0.41 | r07 | 1 |
Sentinel-2 | 10 | r10 | 28 | ||
EO-1 / Hyperion | 30 | r20 | 242 (H) | ||
7 | Tripoli, Libya | Digital Globe GeoEye-1 | 0.41 | r07 | 1 |
Sentinel-2 | 10 | r10 | 9 | ||
8 | Barcelona, Spain | SPOT | 5 | r09 | 1 |
Sentinel-2 | 10 | r10 | 12 | ||
9 | Barcelona airport, Spain | SPOT | 5 | r09 | 1 |
Sentinel-2 | 10 | r10 | 12 | ||
10 | Sydney, Australia | Digital Globe WorldView-4 | 0.3 | r05 | 6 (H) |
Sentinel-2 | 10 | r10 | 11 | ||
11 | Rio de Janeiro, Brazil | Digital Globe WorldView-4 | 0.3 | r05 | 1 |
Sentinel-2 | 10 | r10 | 7 | ||
12 | Stockholm, Sweden | Digital Globe WorldView-4 | 0.3 | r05 | 1 |
Sentinel-2 | 10 | r10 | 13 | ||
13 | The Bushehr Nuclear Power Plant, Iran | Digital Globe WorldView-4 | 0.3 | r05 | 1 |
Sentinel-2 | 10 | r10 | 26 | ||
Landsat 8 OLI & TIRS | 15 | r12 | 99 | ||
Landsat 8 OLI & TIRS | 30 | r20 | 96 | ||
14 | Bandar Abbas, Iran | Digital Globe WorldView-4 | 0.3 | r05 | 1 |
Sentinel-2 | 10 | r10 | 14 | ||
Landsat 8 OLI & TIRS | 15 | r12 | 55 | ||
Landsat 8 OLI & TIRS | 30 | r20 | 57 | ||
15 | Kollo, Niger | Sentinel-2 | 10 | r10 | 14 (H) |
Sentinel-2 | 20 | r15 | 14 (H) | ||
Landsat 8 OLI & TIRS | 30 | r20 | 18 |
Figure 5 shows a few examples of gathered real-life satellite images.
![]() |
![]() |
![]() |
![]() |
(a) Rio de Janeiro, Brazil | (b) Sydney, Australia | (c) Kauai, HI, USA | (d) Tripoli, Libya |
Layer 2: Real-life Artificially Downgraded Images