forked from devuan/amprolla3
3 changed files with 25 additions and 113 deletions
@ -0,0 +1,23 @@ |
|||
amprolla |
|||
======== |
|||
|
|||
amprolla is an apt repository merger originally intended for use with |
|||
the [Devuan](https://devuan.org) infrastructure. This version is the |
|||
third iteration of the software. The original version of amprolla was |
|||
not performing well in terms of speed, and the second version was never |
|||
finished - therefore this version has emerged. |
|||
|
|||
Dependencies |
|||
------------ |
|||
|
|||
### Devuan |
|||
|
|||
``` |
|||
gnupg2 python3-requests, python3-gnupg |
|||
``` |
|||
|
|||
### Gentoo: |
|||
|
|||
``` |
|||
app-crypt/gnupg dev-python/requests dev-python/python-gnupg |
|||
``` |
@ -1,109 +0,0 @@ |
|||
Ok... so the debian repo is essentially a directory heirarchy... |
|||
|
|||
Ok.. Do you understand the repo heirarchy? ie the main folder (in |
|||
amprolla case /merged) with sub folders 'dist' (for repo metadata) and |
|||
'pool' (where the actual binary and source packages go)?? |
|||
forget about the "pool" folder, amprolla doesn't touch it... |
|||
|
|||
in "dists/" you have all the suites ie: jessie, ascii, ceres and all |
|||
the and stable, unstable and version symlinks. |
|||
|
|||
in the suite folder, you find the section folders: main contrib non-free |
|||
and files InRelease, Release and Release.gpg |
|||
|
|||
InRelease is just the pgp/smime version of the Release file - the gpg |
|||
sig is the same as Release.gpg |
|||
|
|||
Anyway the Release file basically is a dictionary of most of the files |
|||
in the subdirectory with size and checksums (SHA256, SHA512 etc) in what |
|||
is essentially RFC822 format, with a bunch of headers at the top that |
|||
specify details about the Release of that suite. |
|||
|
|||
In the suite subdirectories you have a bunch of folders, binary-<arch> |
|||
which contains the Packages file, and compressed copies of that, and a |
|||
Release Stanza, and similar for the source folder with Sources file and |
|||
compressed copies etc. |
|||
|
|||
the Contents files (currently not processed) are their too. |
|||
(They contain a list of all the files in each package) |
|||
|
|||
their is also the i8n - folder which contains the processed files. |
|||
oops s/processed files/translation files/ |
|||
|
|||
|
|||
Amprolla takes several mirrors and merges them in order of priority |
|||
starting with the highest priority. It firsts iterates over the structure |
|||
to create it's repo structure, ie dists/<suite>/<section>/ etc and then first |
|||
copies the highest priority mirror Packages and Sources files in and then for |
|||
the othermirrors iterates over the Packages and Sources files and compares |
|||
each package stanza for a match, and if there is a match on name then the highest |
|||
priority mirror version is kept, if not then the package is added in. |
|||
(This is where the inefficient model really shows up) |
|||
|
|||
|
|||
After all the new Source and Packages files are processed then the Release and |
|||
InRelease files are generated by walking the hierarchy and adding those files in. |
|||
|
|||
There is a lot of complexities, part of which is in the design of amprolla. |
|||
What I had started to do, and in describing it now, it seems obvious to me |
|||
I should probably have started pretty much from scratch is instead of this |
|||
iterative approach of compare and add or skip is keep a cache of each mirrors |
|||
last state, and then on each run create a delta between the last state and |
|||
current state. |
|||
|
|||
|
|||
* and how does dak integrate in all of this? |
|||
it doesn't. Dak is a standalone repository which just deals with the packages built by our CI |
|||
* so it's the same as any debian repo |
|||
Yup, slightly modified to handle our CI and some other tweaks |
|||
and I checked and our version is in gdo too. |
|||
|
|||
|
|||
anyway as I was saying about my approach re delta's: |
|||
There are big efficiencies in this approach. For starters, we only download the InRelease or |
|||
Release and Release.gpg file and after verifying it, compare to the previous state, and we |
|||
can use the delta generated to pick what files are new, changed or removed from the repo. |
|||
This means we only download the changed files in the repo for a start. And for the |
|||
Packages and Sources files we create a delta list of changed stanza's to apply. |
|||
|
|||
Instead of building the entire repo from scratch, we apply the delta |
|||
to a copy of our merged repo with handling for priority etc... |
|||
|
|||
What stumped me in the end is we actually should verify that we only have packages go in that |
|||
have a matching source stanza and we really need to process the contents and translations |
|||
at the same time. |
|||
|
|||
I suspect that nextime realised this which is why he started on amprolla2 which essentially |
|||
replicates dak + amprolla function... |
|||
|
|||
I just realised, I forgot to mention the overrides processing in amprolla. In the very |
|||
top of the dir in "merged/" is the "indices" folder that contains overrides. These |
|||
files specify for each Packages files, any metadata changes that need to be applied to |
|||
package stanza's |
|||
|
|||
In debian their is a entry for every single deb package/source in the archive making |
|||
them very large. We did away with that to reduce the overhead of processing it created. |
|||
|
|||
So we only have entries for those that need changing, usually to change priorities of |
|||
systemd packages and remove recommends and suggests for systemd related packages. |
|||
|
|||
* are indices a part of the repo or only needed by amprolla? |
|||
both. In debian, dak generates them and they are hand modified by the repo masters to |
|||
apply needed fixes. With amprolla, we only create them for applying our own changes as needed. |
|||
Technically they don't need to be in the repo, as they're not used by apt, but practically |
|||
it's good to have them there. |
|||
|
|||
hmmm, I think I've cracked my problem... |
|||
If I use the Sources delta to identify changed packages, I can use that to pick and apply |
|||
the changed Packages stanza's Contents and Translations. This would save lot's of |
|||
iterations, and I only need the delta Processing to be done on the Sources files. |
|||
Wow that would really speed things up |
|||
|
|||
The other benefit, is we can side load packages this way too and use it to replace dak |
|||
as well as either a standalone repo or directly into the merged repo. |
|||
And all without a hefty database. or the writeup |
|||
|
|||
your welcome. It has helped me probably as much as you. I think it's |
|||
turning into a full rewrite, but seems better design and possibly far easier to |
|||
write from scratch. |
|||
Anyway, it's nearly 3:30am here, so better get a couple hours sleep! |
Loading…
Reference in new issue