The LME-Book-CT dataset

Large Book Example

The animation shows a X-ray scanned and reconstructed book and is intended to give you an idea of this research. By scanning a document (this can be for example a book or a scroll) in a non-invasive manner (in this case X-ray CT), the content should be complete digitized. This concept can only work if the used materials (inks and writing media) attenuate the penetrating rays different than it's surroundings (most likely air, but it could be also contamination material such as earth). The writings get visible, if the ink has a different attenuation for the rays than the writing material. In the given case, the iron-based ink is readable on the paper-page when slicing up the volume. The animation shows the possibility to investigate the entire book volume digitally such that the reader can virtually browse through the document. The natural appearance of such books and the limited resolution and discretization make it impossible to read the book like its analog version such that computer vision and machine learning algorithms have to remap the volumetric data into 2D. This process can be compared to flattening of the pages to get rid of the curvy and wavy structure.

The book

The X-ray CT scanned book shown in the Figure below has a dimension of 17 cm x 13 cm x 3 cm (LxWxH) and consists of 56 pages of handmade paper wrapped up in a buffalo leather cover. Each page has a thickness of around 150 µm.

Contrast-to-noise ratio (CNR) estimation

The excel sheet Ink_config.xlsx shows an exemplary calculation of an X-ray scans estimated CNR. The variable parameters are:

Exposure time, tube voltage and current
Tungsten loss
Fraction of energy in the target direction
Absorption coefficents of the given material
Material thickness
Tube loss (due to Bremsstrahlung)

From those parameters, the CNR for a scan ist estimated. The file can be downloaded to test own configurations.

Initiates file download Ink_config.xlsx

Scan volumes

The system we used for generating the volumes was a 3-D X-ray micro-CT system using cone-beam geometry. The dataset consists of three Opens external link in new window FDK reconstructed volumes of the closed book scanned with different acquisiton parameters. The book was laid down on the turntable. The current of 3 mA and an exposure time of 2 s were kept constant. As tube voltage, we configured 30 kV, 40 kV and 50 kV. Within the 50 kV scan, we furthermore used copper pre-filtration of 0.25 mm to narrow the polychromatic X-ray spectrum. With a source-to-object distance of 710 mm and a source-to-detector distance of 1377 mm, the results ended up in an isotropic resolution of 103 µm.

Each volume is stored in an unit16 .tif file (3-D!). To compress the large size, the .tif-files were zipped (1.5 GByte per file). We recommend Opens external link in new window ImageJ to open the file. Feel free to download the reconstructed volumes here:

Initiates file download 30kV_Scan.zip

Initiates file download 40kV_Scan.zip

Initiates file download 50kV_Scan.zip

Page Extraction

To make the writings of each page visible, a conversion from 3-D into 2-D pages has to be performed. If you use the algoirthmics, please reference our Opens external link in new window paper published at ICDAR 2017. All algorithmic details are discussed in there, too. To impelement the code, you need to install our CONRAD framework, first. Afterswards, create a package "book" under tutorials and place this files in there. You may have to adapt the first line and reference to the newly created directory.

Initiates file download page_flattening.zip

File description:

Vesselness.java: Implementation of vesselness filtering

GuidedFilter.java: Implementation of guided Filtering (pre-processing)

PageFlattenerNNApproximation.java: Implementationof the algorithm discussed in the ICDAR paper. You just have to adapt the input file name. When the code ran through, the output is shown by ImageJ und you can store it.

MapBookpagesTo2D.java: The texturing element of the pipeline. Maximum filtering along the page to receive a 2-D result from the 3-D page.

Results

Here are the algorithm's results for five pages of the three scans. Furthermore, we provide photos of the original pages.

Initiates file download Pages_photos

Initiates file download pages_30kV

Initiates file download pages_40kV

Initiates file download pages_50kV

Contact

Address

Daniel Stromer M. Sc.

Researcher in the Learning Approaches for Medical Big Data Analysis (LAMBDA) group at the Pattern Recognition Lab of the Friedrich-Alexander-Universität Erlangen-Nürnberg

The LME-Book-CT dataset

Large Book Example



	Website deprecated and outdated. Click here for the new site.

Department of Computer Science 5 Our Team Stromer, Daniel Contact Curriculum Vitae Projects Publications Lectures Theses Bamboo Scroll Dataset LME-Book-CT dataset PRLE - OCT Software Photovoltaics Research Publications Free Software Data Courses Curriculum Theses Press Releases Cooperations Open Positions LME Videos Ph.D. Gallery Contact Intranet Impressum Datenschutzerklärung Contact +49 9131 85 25246 +49 9131 85 27270 09.153 Address Universität Erlangen-Nürnberg Chair of Computer Science 5 (Pattern Recognition) Martensstr. 3 91058 Erlangen Germany Driving directions Powered by	Dept. of Computer Sc. » Pattern Recognition » Our Team » Stromer, Daniel » LME-Book-CT dataset Daniel Stromer M. Sc. Researcher in the Learning Approaches for Medical Big Data Analysis (LAMBDA) group at the Pattern Recognition Lab of the Friedrich-Alexander-Universität Erlangen-Nürnberg Referencing and Contact Whenever using this database, reference the following paper: ScientificReports If you have any questions, feel free to contact: daniel.stromer(at)fau.de The LME-Book-CT dataset Large Book Example The animation shows a X-ray scanned and reconstructed book and is intended to give you an idea of this research. By scanning a document (this can be for example a book or a scroll) in a non-invasive manner (in this case X-ray CT), the content should be complete digitized. This concept can only work if the used materials (inks and writing media) attenuate the penetrating rays different than it's surroundings (most likely air, but it could be also contamination material such as earth). The writings get visible, if the ink has a different attenuation for the rays than the writing material. In the given case, the iron-based ink is readable on the paper-page when slicing up the volume. The animation shows the possibility to investigate the entire book volume digitally such that the reader can virtually browse through the document. The natural appearance of such books and the limited resolution and discretization make it impossible to read the book like its analog version such that computer vision and machine learning algorithms have to remap the volumetric data into 2D. This process can be compared to flattening of the pages to get rid of the curvy and wavy structure. The book The X-ray CT scanned book shown in the Figure below has a dimension of 17 cm x 13 cm x 3 cm (LxWxH) and consists of 56 pages of handmade paper wrapped up in a buffalo leather cover. Each page has a thickness of around 150 µm. Contrast-to-noise ratio (CNR) estimation The excel sheet Ink_config.xlsx shows an exemplary calculation of an X-ray scans estimated CNR. The variable parameters are: Exposure time, tube voltage and current Tungsten loss Fraction of energy in the target direction Absorption coefficents of the given material Material thickness Tube loss (due to Bremsstrahlung) From those parameters, the CNR for a scan ist estimated. The file can be downloaded to test own configurations. Ink_config.xlsx Scan volumes The system we used for generating the volumes was a 3-D X-ray micro-CT system using cone-beam geometry. The dataset consists of three FDK reconstructed volumes of the closed book scanned with different acquisiton parameters. The book was laid down on the turntable. The current of 3 mA and an exposure time of 2 s were kept constant. As tube voltage, we configured 30 kV, 40 kV and 50 kV. Within the 50 kV scan, we furthermore used copper pre-filtration of 0.25 mm to narrow the polychromatic X-ray spectrum. With a source-to-object distance of 710 mm and a source-to-detector distance of 1377 mm, the results ended up in an isotropic resolution of 103 µm. Each volume is stored in an unit16 .tif file (3-D!). To compress the large size, the .tif-files were zipped (1.5 GByte per file). We recommend ImageJ to open the file. Feel free to download the reconstructed volumes here: 30kV_Scan.zip 40kV_Scan.zip 50kV_Scan.zip Page Extraction To make the writings of each page visible, a conversion from 3-D into 2-D pages has to be performed. If you use the algoirthmics, please reference our paper published at ICDAR 2017. All algorithmic details are discussed in there, too. To impelement the code, you need to install our CONRAD framework, first. Afterswards, create a package "book" under tutorials and place this files in there. You may have to adapt the first line and reference to the newly created directory. page_flattening.zip File description: Vesselness.java: Implementation of vesselness filtering GuidedFilter.java: Implementation of guided Filtering (pre-processing) PageFlattenerNNApproximation.java: Implementationof the algorithm discussed in the ICDAR paper. You just have to adapt the input file name. When the code ran through, the output is shown by ImageJ und you can store it. MapBookpagesTo2D.java: The texturing element of the pipeline. Maximum filtering along the page to receive a 2-D result from the 3-D page. Results Here are the algorithm's results for five pages of the three scans. Furthermore, we provide photos of the original pages. Pages_photos pages_30kV pages_40kV pages_50kV