We have finalized the contents and status of Chronica for the final release.
We have made some significant updates to Chronica since the last development release. The main updates are listed below. The latest version of Chronica is available for download along with documentation, user manual and installation instructions.
Increase in indexing speed: Implemented an option to perform the isValid() check on the arc files, removing this greatly improves the indexing time. Also refactored the PDF parser, resulting in a vast improvement in time spent parsing PDFs. Implemented a queue and multiple threads for the docuemtn parsing that increases the throughput as well.
Parsing Word Documents: Added a parser (Jakarta Project Poi) to parse Word documents.
XML Configuration file: Added a XML config file to contain all the common run and indexing parameters.
Graphical Interface: Added a GUI for configuring the XML config file, indexer and starting an index run.
Mini Wayback Machine: Implemented the ability to recreate the links for a given page and display the page, no longer linking to the Internet Archive's Wayback Machine, so the archive files can be completely independent of the IA.
Post Processing Results: Implementation of post processing on search results that compacts the results and displays the pages with a date range instead of duplicates.
Result Score Threshold: Added the ability to specify the score threshold for the search results, displaying only those that are above the threshold. Also added a results per page setting.
Posted by rstevens at December 3, 2004 09:47 AM