hydrusnetwork/hydrus v445 on GitHub

misc

fixed some weird bugs on the pathname tagging dialog related to removal and re-adding of tags with its 'tags just for selected files' list. previously, in some circumstances, all selected paths could accidentally share the same list of tags, so further edits on a subset selection could affect the entire former selection
furthermore, removing a tag from that list when the current path selection has differing tags should now successfully just remove that tag and not accidentally add anything
if your client has a pending menu with 'sticky' small tag count that does not seem to clear, the client now tries to recognise a specific miscount cause for this situation and gives you a little popup with instructions on the correct maintenance routine to fix it
when pending upload ends, it is now more careful about when it clears the pending count. this is a safety routine, but it not always needed
when pending count is recalculated from source, it now uses the older method of counting table rows again. the new 'optimised' count, which works great for current mappings, was working relatively very slow for pending count for large services like the PTR
fixed rendering images at >76800% zoom (usually 1x1 pixels in the media viewer), which had broke with the tile renderer
improved the serialised png load fix from last week--it now covers more situations
added a link, https://github.com/GoAwayNow/Iwara-Hydrus, to Iwara-Hydrus, a userscript to simplify sending Iwara videos to Hydrus Network, to the client api help
it should now again be possible to run the client on Windows when the exe is in a network location. it was a build issue related to modern versions of pyinstaller and shiboken2
thanks to a user's help, the UPnPc executable discoverer now searches your PATH, and also searches for 'upnpc' executable name as a possible alternative on linux and macOS
also thanks to a user, the test script process now exits with code 1 if the test is not OK

optimisations

when a db job is reading data, if that db job happens to fall on a transaction boundary, the result is now returned before the transaction is committed. this should reduce random job lag when the client is busy
greatly reduced the amount of database time it takes to check if a file is 'already in db'. the db lookup here is pretty much always less than a millisecond, but the program double-checks against your actual file store (so it can neatly and silently fill in missing files with regular imports), however on an HDD with a couple million files, this could often be a 20ms request! (some user profiles I saw were 200ms!!! I presume this was high latency drives, and/or NAS storage, that was also very busy at the time). since many download queues will have bursts of a page or more of 'already in db' results (from url or hash lookups), this is why they typically only run 30-50 import items a second these days, and until this week, why this situation was blatting the db so hard. the path existence disk request is pulled out of precious db time, allowing other jobs to do other db work while the importer can wait for disk I/O on its thread. I suspect the key to getting the 20ms down to 8ms will be future granulation of the file store (more than 256 folders when you have more than x files per folder, etc...), which I have plans for. I know this change will de-clunk db access when a lot of importers are working, but we'll see this week if the queues actually process a little faster since they can now do file presence checks in parallel and with luck the OS/disk will order their I/O requests cleverly. it may or may not relieve the UI hangs some people have seen, but if these checks are causing trouble it should expose the next bottleneck
optimised a small test that checks if a single tag is in the parent/sibling system, typically before adding tags to a file (and hence sometimes spammed when downloaders were working). there was a now-unneeded safety check in here that I believe was throwing off the query planner in some situations
the 'review threads' debug UI now has two new tabs for the job schedulers. I will be working with UI-lag-experiencing users in future to see where the biggest problems are here. I suspect part of it will overhead from downloader thread spam, which I have more plans for
all jobs that threads schedule on main UI time are now profiled in 'callto' profile mode

site encoding fixes

fixed a problem with webpages that report an encoding for which there is no available decoder. This error is now caught properly, and if 'chardet' is available to provide a supported encoding, it now steps in fixes things automatically. for most users, this fixes japanese sites that report their encoding as "Windows-31J", which seems to be a synonym for Shift-JIS. the 'non-failing unicode decode' function here is also now better at not failing, ha ha, and it delivers richer error descriptions when all attempts to decode are non-successful
fixed a problem detecting and decoding webpages with no specified encoding (which defaults to windows-1252 and/or ISO-8859-1 in some weird internet standards thing) using chardet
if chardet is not available and all else fails, windows-1252 is now attempted as a last resort
added chardet presence to help->about. requests needs it atm so you likely definitely have it, but I'll make it specific in requirements.txt and expand info about it in future

boring code cleanup

refactored the base file import job to its own file
client import options are moved to a new submodule, and file, tag, and the future note import options are refactored to their own files
wrote a new object to handle current import file status in a better way than the old 'toss a tuple around' method
implemented this file import status across most of the import pipeline and cleaned up a heap of import status, hash, mime, and note handling. rarely do downloaders now inspect raw file import status directly--they just ask the import and status object what they think should happen next based on current file import options etc...
a url file import's pre-import status urls are now tested main url first, file url second, then associable urls (previously it was pseudorandom)
a file import's pre-import status hashes are now tested sha256 first if that is available (previously it was pseudorandom). this probably doesn't matter 99.998% of the time, but maybe hitting 'try again' on a watcher import that failed on a previous boot and also had a dodgy hash parser, it might
misc pre-import status prediction logic cleanup, particularly when multiple urls disagree on status and 'exclude previously deleted' is unchecked
when a hash gives a file pre-import status, the import note now records which hash type it was
pulled the 'already in db but doesn't actually exist on disk' pre-import status check out of the db, fixing a long-time ugly file manager call and reducing db lock load significantly
updated a host of hacky file import unit tests to less hacky versions with the new status object
all scheduled jobs now print better information about themselves in debug code