hydrusnetwork/hydrus v644 on GitHub

new libraries

the 'future build' test last week went well with the exception that some Linux flavours were unable to load mpv. I am folding these updates into the normal builds--
Linux built runner from Ubuntu 22.04 to Ubuntu 24.04
Linux built mpv from libmpv1 to libmpv2
Windows built sqlite from 3.50.1 to 3.50.4
opencv-python-headless from 4.10.0.84 to 4.11.0.86
PySide6 (Qt) from 6.8.2.1 to 6.8.3
if you are a Linux user and cannot load mpv in today's build, please move to running from source (I recommend all Linux users do this these days!): https://hydrusnetwork.github.io/hydrus/running_from_source.html

docker package

thanks to a user, the Docker package is updated from Alpine 3.19 to 3.22. x11vnc is replaced with the more unicode-capable tigervnc, and several other issues, including some permission stuff and the lxml import bug on the server, are fixed (issue #1785)
if you have any trouble, let me know and I'll pass it on to the guy who manages this

misc

the new 'show deleted mappings' eye icon stuff in manage tags now properly syncs across the different service pages of all manage tags dialogs that are open. if you click it somewhere, it now updates everywhere
added all paged importer work to network->pause and clarified the three more specific pause-paged-work options. I noticed at the last minute that these guys don't wake the downloaders when unpaused (if you don't want to wait like ten minutes, atm you have to jog each downloader awake by manually poking them with their own pause/resume etc..), I'll fix this next week
when a large page is loading during session initialisation and it says 'initialising' in the page tab, the status bar text where it says 'Loading initial files... x/y' is now preserved through a page hide/show cycle. when you switch to a still-initialising page, the status bar should now say something sensible (previously it was resetting to 'no search done yet' kind of thing on every page show until the next batch of ~~64 files~~ now 100 files came in)
fixed a crash when a thumbnail suffers a certain critical load-processing failure. it now shows the hydrus fallback thumb and gives you popups

ui optimisation

the session weight in the 'pages' menu is now only recalculated on menu open or while the menu is open (it now has a dirty flag). this guy can really add up when a lot of stuff is going on
same deal with the page history submenu. I KISSed some stuff here too
when a file search is loading, the media results are now loaded in batches of 100 rather than 256. I also fetch them in file_id order, which I'm testing to see if it saves a little time (close ids should share index branches, reducing cache I/O)
on many types of page status update, the GUI is now only hit with a 'update the status bar' call if this is the current page. this was hitting a busy session load a bunch

filename parsing:

I completely overhauled the background worker and data objects that kick in when you drop files on the program and the window appears to parse them all
all paths that fail (zero size, missing, currently in use, bad filetype, Thumbs.db, seems to be a sidecar) are now listed with their failure reason
the cog button to set whether paths within folders be added in the 'human-sorted' way (ordering 'page 3' before 'page 11') is removed. paths are now always added this way
the paths sent to import or tag are now all sorted according to the #, which is just the order they were parsed. this way preserves some nice folder structure. previously I think it was sending whatever the current list sort was, which sounds good but it wasn't obvious that was happening
paths are now processed in more granular, faster blocks
remaining issues: although sidecars are now listed, they are now sorted at the top of the directory structure they parse from. also, we don't have a nice 'retry' menu action, which would be nice to retry currently-in-use or missing results. let me know if you notice anything else IRL

file operations

many file operations are now a lot more efficient, with fewer disk hits per job. I hope that export folders and other 'lots of fast individual file work' jobs will now be a good bit quicker
file-merge operations now bundle their various file property checks into far fewer disk hits
same for file-mirror operations
same for dir-merge operations
same for dir-mirror operations
the 'same size/modified time' check in all file mirror/merge operations now re-uses a previous disk hit and is essentially instant
all the 'ensure file is writable' checks are faster. there's still a slow 'file is writable?'' check however
the 'ensure file is writable' checks on files before delete or overwrite now only occur on Windows. it doesn't matter elsewhere. I think there may be a problem now when doing stuff from Linux on read-only files a Windows network share, but the problem of read-only files appearing in the first place is mostly a legacy issue, so whatever. if you have a weird setup, let me know if you run into any trouble
fixed an issue where on Windows a file-merge operation would fail if the destination differed from the source but was read-only
when mirroring a directory, the 'delete surplus files from dest' work now happens failsafely at the end, after all other copies went ok, rather than interleaved
the delete and recycle file calls now check for symlinks properly and delete only the symlink, not the resolved target. this was true previously in almost all cases by accident, but now it is explicit

image transparency

on update, you will get a popup saying 'hey you have 12,345 files with transparency, want me to recheck them?'. I recommend saying yes
in hydrus, if a file being loaded has completely opaque or completely transparent alpha channel, I discard that alpha channel, deeming it useless. this also determines the 'has transparency' metadata on files. I had an opportunity to closely examine a bunch of real-world transparency-having pngs while doing the visual duplicates work this week, and I decided to soften my 'this transparency is useless' test to cover more situations. Where a value of 255 is 'completely opaque', I encountered one IRL file that had 560k pixels at 255, 442k at 254, 20k at 253, 243 at 252, and 22 at 251. another had a spackling of 1 or 2 pixels of alpha 208, 209, 222, 224, 225, 227, 235, 236, 238, 247, 249, 250, 251, 252, 253, and 254, and many similar situations. we've also long had many images with just one fully transparent pixel in a corner. this data is essentially invisible unless you are looking for it, and it is not useful to carry forward and tell the user about. thus, the rule going forward is now that an alpha channel needs a mix of values, specifically at least 2 * ( width + height ) or 0.5% num_pixels, rounding up to 1 pixels, whichever amount is smaller, not in the >=251 top band and, in a separate test, not in the <=4 bottom band. the minimum interesting state is now something like a one-pixel border of visible transparency or opacity around the file, and anything less than that is discarded as an artifact of an anti-aliasing algorithm or a funny brush setting
the 'eye' icon in the media viewer top hover now lets you flip the 'transparency as checkerboard' options for the normal and duplicate filter media viewers on and off
the 'eye' icon also lets you draw a neon greenscreen instead of checkerboard. this setting is available otherwise under options->media playback
these three actions are also now available under the 'media viewers - all' and 'media viewers - duplicate filter' shortcut sets

duplicates

setting duplicate relationships via the buttons in the normal duplicates page, or by a normal thumbnail menu/shortcut action, or by Client API, will now trigger a 'refresh count' call in the duplicates page
I think this might be painful IRL with lots of new 'initialising' loading time, so let me know how it feels. I strongly suspect I'll want to revisit how smart the refresh/update calls are here

duplicates search math

the new 'n pairs; ~x match' count estimate uses richer statistical math (Wilson Intervals) to now be better than ~2.5% imprecise 95% of the time. it adapts to hitrate and total population size. previously, it just stopped when x>=1000 on a not-totally-random sample, which was apparently giving 95% confidence of better than 6.2% imprecision at high hitrates and much worse at low
when the new incremental duplicate pair search works, there are now two sampling strategies. if we are doing a full, non-estimate count, the sample is sorted (to keep db index access at high throughput) and then randomised in large blocks to smooth out count-rate. in the other cases, being estimated count, duplicate filter fetch, 'get random pairs', and the auto-resolution rule preview panel, which can all end early, I now randomise far more granularily, ignoring sort entirely, emphasising a reliable hit-rate and early exit

duplicates auto-resolution

added 'pixel-perfect gifs vs pngs' as a static-gif complement to the jpegs vs pngs rule. I noticed a bunch of these in my IRL client. before you ask, yes ladies, I am single and available
I updated my visual duplicates testing suite to do some alpha tests and profiled a number of transparent files against it
the visual duplicates algorithm will now accept and test pairs where both files have transparency. the test is intended to be fairly forgiving and just makes sure the respective alpha channels match up closely. if you encounter false negatives here with (transparency does not match) reason in the duplicate filter, I'd be very interested in seeing them (issue #1798)
if only one file has an interesting alpha channel, then those files are still counted as not visual duplicates
the 'visual duplicates' suggested auto-resolution rule no longer excludes transparent files
the 'visual duplicates - only earlier imports' suggested auto-resolution rule is now A has "system:import time" earlier than B + 7 days. just a little safety padding that ensures that files that were all imported at the same time don't fail a test due to your subscription for the nice version hitting five hours after the worse
I do not plan to make any more changes to the suggested rules. maybe we'll add something like the +7 days padding somewhere, or maybe the transparency test has some issue, but if you have been testing this system for me, I think the suggested rules are pretty good now
I thiiink the 'rescan transparency' job is going to reset affected files' status in potential duplicates. fingers crossed, when a file is determined to not actually be transparent after all, it'll get searched against similar looking files again and the auto-resolution rules will give it a re-go without the user having to touch anything. let's see how it goes

ugoiras

ugoiras with external duration (from a note text or simulated) now have the 'duration' icon in their thumbnails. this is also true of a collection that contains external duration ugoiras
the way this stuff is handled and calculated behind the scenes is cleaned up a bit
ugoiras with only one frame no longer get any external duration checks

boring stuff

added the Wayland env info to the Linux 'running from source' help
added some stuff about pacman to the Linux 'running from source' help and reworked the 'which python you need' stuff into the three guides better
sudo'd all my apt install lines in the help
added some stuff about environment variables to hydrus_client.sh
after a user suggestion, reordered the 'making a downloader' help to be URL Class, Parser, GUG (previously GUG was at the start, but it isn't the best initial stepping stone)
gave the 'making a downloader' help a very light pass in some places
fixed some dialog yes/no stuff in the database update code which was failing to fire with recent stricter UI validity rules
I deleted the speedcopy test code and removed its entry from help->about. it didn't do quite what we wanted and there hasn't been any action on it
reworked the old thread loop that used to spawn for local file parsing to the newer async updater-worker I've been using in a bunch of places

hydrusnetwork/hydrus v644 Version 644 on GitHub

new libraries

docker package

misc

ui optimisation

filename parsing:

file operations

image transparency

duplicates

duplicates search math

duplicates auto-resolution

ugoiras

boring stuff

hydrusnetwork/hydrus v644
Version 644

on GitHub