github autopkg/autopkg v2.3.2-b
AutoPkg 2.3.2 Beta

latest releases: v3.0.0RC3, v3.0.0RC2, v3.0.0RC1...
pre-release2 years ago

2.3.2 (December 20, 2021)

I've created a new pre-release version of AutoPkg based on the dev_recipe_map branch. This branch has a radical redesign of the recipe loading logic that dramatically speeds up recipe locating.

Instead of searching for recipes or processors by traversing the file system every time we need to go find something, we instead generate a static map file of all repos and recipes on-disk. This static map (cleverly titled "recipe_map.json") searches the file system during certain operations (adding/removing repos, or adding new recipes, or adding overrides), and stores a list of all recipes in two ways:

  1. A list of all identifiers, with the absolute paths to the recipe as the values;
  2. A list of all recipe shortnames ("GoogleChrome.download") with the absolute paths to the recipes as values

Whenever. you do any operation in AutoPkg, it consults this recipe map only and knows where to go next. This has reduced load times for running a recipe from multiple seconds (which scales higher based on how many recipes you have on-disk) to fractions of a second.

Graham Pugh provided this sample output to show the very stark difference with this branch:
Master:

% autopkg run -v GoogleChrome.jamf
**load_recipe time: 22.676719694
**verify_parent_trust time: 114.754227371
**process_cli_overrides time: 114.754611623
**verify time: 172.079330759
**process time: 216.01272573499998

With the new dev version:

./autopkg run -v GoogleChrome.jamf
**load_recipe time: 0.03708735200000002
**verify_parent_trust time: 0.696461484
**process_cli_overrides time: 0.696903272
**verify time: 0.711556611
**process time: 44.696209888

22 seconds -> 0.03 seconds to load recipes.

FUTURE DIRECTION
So, what's next for this?

I want to replace just about all Github Search API calls with static maps instead. Rather than relying on the API to give us information sometimes, what if we took this static map idea a step further? The AutoPkg Github Repo itself could store a static mapping of all recipes and all repos across the org, and clients would simply fetch that static map when trying to search for recipes.

To enhance and combine this functionality with the recipe map, we'd also need to change the override to contain a bit more info. Right now, the override just contains the chain of parents and their hashes as was generated at the time, but what if it also referenced the recipe map to also store the list of all things the recipe would ever need to execute successfully? If we had a full repo map as well as a local recipe map, we could easily triangulate exactly where all the resources required to run a recipe are located, and then fetch them if we don't have them. If we store that information in the override itself, then CI environments that are ephemeral would have all the info they need to run any recipe contained within the override itself, rather than having to make a lot of guesses or assumptions.

CONTEXT
Right now, AutoPkg interacts with GitHub in a few ways:

  1. You use autopkg search, which generally does what it says;
  2. You use autopkg info -p, which tries to search for the parent repos of a recipe and fetch all of them;

The problem is that the Github search API occasionally just.. doesn't. This API is rate limited really heavily, and is especially harsh for large organizations that have lots of outbound traffic from one set of IPs. This means that if a large organization is talking to GitHub API often, you could be rate limited just by sheer volume of traffic. When the API rate limits you, it doesn't return useful results to AutoPkg.

In a CI environment, if you rely on autopkg info -p to automatically pull your repos, this means that occasionally Github just doesn't give you anything. So AutoPkg will fail to pull its parent repos for recipe chains properly, and that means that recipes just occasionally fail for no reason. Trying it again usually just works, without making any changes. At Facebook/Meta, where I use this feature heavily, I see this very frequently.

So frequently, in fact, that it actually reduces the overall reliability of the automatic parent fetching feature.

I still believe that its intended goal is a good one: to avoid having to maintain a hardcoded list of repos to check out in a CI environment (or any environment). I think it makes sense for Autopkg to be able to dynamically figure out what repos you need, and then go get them. The evidence simply shows that we just can't rely on Github's API search for that.

I'd love to hear people's thoughts and see people's test results.

Don't miss a new autopkg release

NewReleases is sending notifications on new releases.