Changes:
- f1b6ebf Release 0.1
- 6dd907a docs: Fix unit test link
- 59f2ac9 Change namespace to de.digitalcollections
- 7786d5d GitLab CI: Move zipping of release binaries to publish step
- a7484ad CI: Build linux binary for offsets-parser
- c52958c Set up CI with Azure Pipelines
- 7c356bb Throw an exception in case a resolved file cannot be found
- 994ba36 Add more helpful error messages for out-of-band exceptions
- ab757bf Bump version.solr from 8.1.0 to 8.1.1
- 0f146e6 Merge branch 'yogendrasoni-master'
See more
- caf10cd fix null pointer exception when selecting non external field
- 9d31466 Vendor Guava Utf8 util for compatiblity with older Solr versions
- c7e9d11 Update offsets-parser/README.md
- 83726d8 Add README for offsets-parser
- 2bd3cf4 Use Guava method to determine encoded UTF8 size of Java strings
- a58be88 Fix IndexOutOfBunds error when adjust final offset
- 1b32007 CI: Only publish docs on master branch
- af891a7 Test multi-file snippets for UTF8 files.
- 2327c8d MultiCharIter: More sanity checks to prevent invalid state
- 7f5aaac Fix ALTO test to reflect changes in whitespace handling
- 23a0537 docs: Update with info on n:1 file:docs mapping
- 59c45b3 Add support for n:1 mapping of files:docs (closes #28)
- abc8cee Add IterableCharSeq implementation that treats multiple files as one
- d511425 ALTO: fix a few bugs in the passage formatter
- 2edd8ba ascii_escape: Add mode to overwrite input file
- 04c8538 ContextBreakIterator: Unify this references
- 99876b9 Implement multi-page snippets (#29)
- 747f8fc Bump version.solr from 8.0.0 to 8.1.0
- 489b4c3 Fix test runner for Java 12
- 047328a Fix git branch-switcharoo messups
- 258ce09 miniocr: Fix handling of edge coordinates
- 79793f3 Get rid of code for handling external UTF16 files
- 2040cfe ascii_escape: Fix py2/py3 compatibility
- 0794008 docs: Fix example screenshot
- 9e57d19 docs: Resolve remaining TODOs in README
- 4792fce docs: Remove netlify requirements.txt
- 5062271 docs: Add CI config for publishing to GitHub Pages
- f0e73a7 docs: Reflect changes in ALTO analyzer
- 2ec5e88 Add repository url
- 6132eb3 docs: Add explanation of default delimiter
- 1cfb167 docs: Add instructions on how to compile
- 382a62c docs: More editing
- e3ca283 Add script to perform ASCII-encoding/XML-escaping
- 27f64cd Fix license link in README
- 9b31918 docs: Fix markdown typo
- 27d890a docs: Fix format links on index page
- 9a7dd2c More work on docs
- 712a004 Add first draft of documentation, move a lot of stuff out of the README
- 375903f Minor file fixes
- dca3cfb Fix bugs in ALTO handling, thanks @mbennett-uoe for the helpful discussion
- 7a18cbc Add test for entity removal bug fix + test data
- da4b25c Fix ALTO regexp to correctly match TextBlock/Page/etc entities in Passage Formatter
- bb1ab7e example: Fix regular field highlighting bug in frontend
- 8a813f5 Make sure that closing highlighting tags come before any other closing tags
- f40b948 example: Also include metadata in index to showcase mixed highlighting
- 1e4a0ce Add unit test for mixed regular and OCR highlighting
- 2cd10e7 Add a description for the plugin
- 0d9d49c Don't provide a default summary if no matches were found in the text
- 0bdf4b3 Add missing docstring
- f9c601e Refactor block-limitation logic to fix bugs and inconsistencies
- 8b2d5fb hocr: Allow multiple implementations of generic block types (fixes #20)
- 2fdd55f README: State correct default value for
hl.ocr.limitBlock
- 1b8bbae Use Integer.parseInt instead of Integer.valueOf
- 4c06bdd Refactor snippet parsing across formats to increase code reuse
- 47f8938 Update READMEs (fixes #17)
- 670eb11 Code style fixes
- 9f24bac Add support for hl.absoluteHighlights option (implements #6)
- 8d7c22b example: Offload image serving to remote server
- d9d15cc Refactor fragment truncation logic for more code-reuse
- c767949 Fix multiple highlighting bugs (#11) [ #9 ]
- 9a907a3 Fix bug that caused page numbers to be missed (#10)
- 322efa9 Bump assertj-core from 3.11.1 to 3.12.2
- 0b903a5 Bump version.junit from 5.3.2 to 5.4.2
- 4adcefc Bump slf4j-nop from 1.7.25 to 1.7.26
- 0223846 Bump maven-shade-plugin from 3.2.0 to 3.2.1
- 7de1983 Merge pull request #8 from dbmdz/solr8-compatibility
- b7830ee Add compatibility with Solr 8.0...
- ec180b2 Bump version.solr from 7.6.0 to 8.0.0
- 11ac53c ci: use generic CI configuration (production mode)
- 3962564 pom: fix distribution management section
- c225fc4 mvn: add settings.xml
- d59969b ci: use generic CI configuration (currently in testing mode)
- 146f270 Add .gitattributes to exclude test resources from linguist
- 433d0da Update README.md
- 540d151 Update README.md
- 94958d6 Update README, rename to solr-ocrhighlighting', make example scripts Python 3.6 compatible
- f45def1 Change license to MIT
- c2237d6 Merge branch 'alto-highlighting' into 'master'
- 76c57e5 Resolve a lot of TODOs in the README
- da1db0f Add test for masked indexing
- 7944cb5 Update README: Formats, Indexing of sub-structures
- 004f74b Add support for indexing escaped ASCII ALTO content.
- ca73ddb Merge branch 'iiif-example' into 'master'
- b25183f Fix bug where the end of a match could be outside of its passage
- 27d4dac example: Fix IIIF response when no matches were found in volume
- 3575dd2 Don't batch in example ingest script
- b32d121 Update IIIF example
- 18ccac6 Fix UV integration in nginx config
- fb71ee7 Add UV to example, speed up manifest generation
- 3626328 iiif-example: Update to new response format
- 4b38b6b example-iiif: Make all addresses configurable via CFG_ env variables
- 1625bce Add small IIIF content search implementation to example
- 3971d4d Merge branch 'fix-scorer' into 'master' [ #22 ]
- 313f900 Fix tests that were affected by scoring change
- 6d3356c Allow customization of passage scoring boost (closes #22)
- 0e6209d Remove redundant method from OcrSnippet
- 03c197c Rename snippetCount to numTotal
- e41757d Fix bug in distributed search, don't handle regular fields
- 0d3b0fa Update README
- daeafa7 Fix for interpoliation
- 16cba03 Move Solr highlighting logic to its own compoment, don't merge with regular highlighting
- 11e9c2e Fix bug in highlight mergin, add text field to highlight regions
- a0009a0 Merge adjacent highlight regions (closes #20)
- 01c8e83 Added distribution to pom
- c9d2a21 Add plugin JAR to example so it's runnable without building
- 9f4fdb4 README: Make table a bit more readable, remove external UTF-16 option
- 99bfed8 Add solrconfig and schema instructions to README
- 4cb430b Make MiniOCR ByteOffsetsParser more versatile
- d6ccbe5 Fix bug in google1000 ingest script
- 37565fa Fix issue with TagBreakIterator#preceding/following
- 068c23d Add frontend for example setup, fix bug with snippets panning multiple
- c5b204c Fix total snippet count calculation
- 4f28a72 Add option to filter by page id
- 72c7a75 Add total snippet count for field to highlighting results
- 25043a2 Remove some dead code
- f9ca2b4 Merge branch 'byteoffsets-cli' into 'master'
- db14adf Fix JSON highlighting output
- fc9c4e9 Fix a few regressions, don't adjust offset when encoding is ASCII
- b904c8c Add missing Dockerfile for go-iiif service
- c6afd19 Fix a few bugs in highlighting logic based on testing with Google1000
- 0e92b2d Roll back to Solr 7.6 until we can fix the distributed test
- ed4496f Remove/replace all deprecated options from Solr configs
- 736f28f Fix Docker setup
- 4ee443a Add example docker setup with Google1000 dataset
- 4f842dd Some minor fixes, switch to loading external classes without SolrCore
- c06c72d Implement multiple highlighting regions with multiple words (happens with hl.weightMatches)
- 2bfe032 Add support for hl.weightMatches for non-UTF8 fields
- b67e2d1 Add support for escaped OCR [ #383 ]
- b5ee0d6 Add debugging stuff
- 4313237 Yet unsuccessful attempt to build jar with dependencies
- 78c5c7b Update .gitlab-ci.yml
- 7058ea8 Update .gitlab-ci.yml
- a9bfabc Update .gitlab-ci.yml
- 42957ca Fix GitLab CI config
- ed210ac Get rid of some unneccessary unwrap() calls
- 590d93a Some cleanup of CLI tool
- 669f6ed cli: Allow input from stdin
- 5bbc249 Implement CLI tool tool for byte offsets from hOCR/ALTO/MiniOCR
- 353cb1c Fix skeleton
- a293e88 Skeleton for Rust CLI to generate byte offsets format
- 71f3fed Reformat README
- 6d40e5e Update README.md
- c789515 Update README
- fa39e88 MiniOcrByteOffsetParser: Make the end in start/end methods inclusive
- 956b7b5 Make MiniOcrByteOffsetParser id searching pattern more flexible
- aeda732 Remove @disabled annotation from Utf8Ocr tests
- 277937d Add warnings to response if hl.weightMatches is used
- f90b669 Update upstreams document
- e3d667c Add documentation on which parts need to be kept in sync with lucene-solr upstream
- 4c47ed3 Merge branch 'alto' into 'master'
- fd07c19 Merge branch 'hocr' into 'master'
- 7ac59d1 Add support for ALTO format
- 2bacfae Add support for hOCR
- 770c54a Merge branch 'arbitrary-id-selection' into 'master'
- 9a7ed0d Add Section type to MiniOCR
- 35ea617 MiniOcrByteOffsetsParser: Add support for selecting arbitrary ids, also only selecting a single id
- d7965d1 Merge branch '11-better-resolving' into 'master' [ #11 ]
- 46f99cc Add option for more fine-grained external field location resolving
- 665ee87 Merge branch '10-offsets-for-pages' into 'master' [ #10 ]
- 7d850dd Add filtering by start/end page for ByteOffsetParsers (closes #10)
- 4afa637 Merge branch '9-byteoffsets' into 'master' [ #9 ]
- cd71d62 Implement highlighting based on byte offsets from payloads
- 2ad67e0 Add FileBytesCharIterator, make sure BreakIters work with it
- 0c6500f Implement payload encoder and payload token filter for byte offset payloads
- e0d772e Implement source document generators for MiniOCR and hOCR
- e576057 Add LICENSE
- debfc41 Refine test some more
- 1d6775b Add CI config
- dca239d Bring code in line with updated MiniOCR format, better testing
- 2836e38 Add test for distributed search
- 959a94c Unclutter tests by relying on default args
- 8091152 Make phrase assert more fine-grained, fix typo, remove multi-phrase test
- 1398034 More fixes for FileCharIterator, more tests
- 45418bf Test fix. Default for fuzzy search is a distance of 2, not of 1
- 97ce66e Added more tests
- 14b7c6f More tests
- 8cb26a3 More bugfixing
- c513c89 Next 'contains' attempt
- 683bd7a Added another test, but disabled it
- 43df517 Added more tests
- 544d23b Add test for highlighting stored OCR fields, fix a few bugs along the way
- 615b244 Merge branch '7-extensibility' into 'master' [ #7 ]
- 0a56e71 Test for Boolean queries - seems to get into a endless loop, when no documents are found
- cac247b Add README skeleton
- e4ee490 Fix tests
- a63ea11 Add docstrings for classes likely to be implemented/used by third parties.
- 442033f Refine field loading, decouple external field/ocr field definition
- 988057f Greatly improve extensibility (see #7)
- 2641ccc Merge branch '4-highlight' into 'master'
- 0c7d682 Fix maven test runner
- 73cf3b2 Merge branch '4-highlight' into 'master' [ #6, #5, #4 ]
- 52f8cf1 Merge branch '5-breakiterator' into 'master' [ #5 ]
- e810d2e Implement lazy OCR highlighting
- fbd575a Implement OCR-aligned BreakIterator, fix bug with endianness (closes #5)
- 08e8268 Merge branch 'optimize-filechariter' into 'master'
- 7199de0 Massive optimizations for FileCharIter
- 09e6e85 Merge branch '3-load-field-values' into 'master' [ #3 ]
- 83fdc02 Add byte order detecting via BOM, make test actually use the class
- 6c96f96 Add stub implementations for OCR field highlighter and passage formatter
- 6ea4134 Implement lazy-loading of OCR field values from disk
- 8fe1dd4 Implement filtering of OCR fields to be highlighted
- deff47d Implement loading OCR field values from the file system (#3)
This list of changes was auto generated.