Browse Source

add readme; remove obsoleteness

debianarchive-update
parazyd 6 years ago
parent
commit
1d9670ade4
Signed by untrusted user: parazyd GPG Key ID: F0CB28FCF78637DE
  1. 23
      README.md
  2. 109
      doc/dan-notes
  3. 6
      orchestrate.py

23
README.md

@ -0,0 +1,23 @@
amprolla
========
amprolla is an apt repository merger originally intended for use with
the [Devuan](https://devuan.org) infrastructure. This version is the
third iteration of the software. The original version of amprolla was
not performing well in terms of speed, and the second version was never
finished - therefore this version has emerged.
Dependencies
------------
### Devuan
```
gnupg2 python3-requests, python3-gnupg
```
### Gentoo:
```
app-crypt/gnupg dev-python/requests dev-python/python-gnupg
```

109
doc/dan-notes

@ -1,109 +0,0 @@
Ok... so the debian repo is essentially a directory heirarchy...
Ok.. Do you understand the repo heirarchy? ie the main folder (in
amprolla case /merged) with sub folders 'dist' (for repo metadata) and
'pool' (where the actual binary and source packages go)??
forget about the "pool" folder, amprolla doesn't touch it...
in "dists/" you have all the suites ie: jessie, ascii, ceres and all
the and stable, unstable and version symlinks.
in the suite folder, you find the section folders: main contrib non-free
and files InRelease, Release and Release.gpg
InRelease is just the pgp/smime version of the Release file - the gpg
sig is the same as Release.gpg
Anyway the Release file basically is a dictionary of most of the files
in the subdirectory with size and checksums (SHA256, SHA512 etc) in what
is essentially RFC822 format, with a bunch of headers at the top that
specify details about the Release of that suite.
In the suite subdirectories you have a bunch of folders, binary-<arch>
which contains the Packages file, and compressed copies of that, and a
Release Stanza, and similar for the source folder with Sources file and
compressed copies etc.
the Contents files (currently not processed) are their too.
(They contain a list of all the files in each package)
their is also the i8n - folder which contains the processed files.
oops s/processed files/translation files/
Amprolla takes several mirrors and merges them in order of priority
starting with the highest priority. It firsts iterates over the structure
to create it's repo structure, ie dists/<suite>/<section>/ etc and then first
copies the highest priority mirror Packages and Sources files in and then for
the othermirrors iterates over the Packages and Sources files and compares
each package stanza for a match, and if there is a match on name then the highest
priority mirror version is kept, if not then the package is added in.
(This is where the inefficient model really shows up)
After all the new Source and Packages files are processed then the Release and
InRelease files are generated by walking the hierarchy and adding those files in.
There is a lot of complexities, part of which is in the design of amprolla.
What I had started to do, and in describing it now, it seems obvious to me
I should probably have started pretty much from scratch is instead of this
iterative approach of compare and add or skip is keep a cache of each mirrors
last state, and then on each run create a delta between the last state and
current state.
* and how does dak integrate in all of this?
it doesn't. Dak is a standalone repository which just deals with the packages built by our CI
* so it's the same as any debian repo
Yup, slightly modified to handle our CI and some other tweaks
and I checked and our version is in gdo too.
anyway as I was saying about my approach re delta's:
There are big efficiencies in this approach. For starters, we only download the InRelease or
Release and Release.gpg file and after verifying it, compare to the previous state, and we
can use the delta generated to pick what files are new, changed or removed from the repo.
This means we only download the changed files in the repo for a start. And for the
Packages and Sources files we create a delta list of changed stanza's to apply.
Instead of building the entire repo from scratch, we apply the delta
to a copy of our merged repo with handling for priority etc...
What stumped me in the end is we actually should verify that we only have packages go in that
have a matching source stanza and we really need to process the contents and translations
at the same time.
I suspect that nextime realised this which is why he started on amprolla2 which essentially
replicates dak + amprolla function...
I just realised, I forgot to mention the overrides processing in amprolla. In the very
top of the dir in "merged/" is the "indices" folder that contains overrides. These
files specify for each Packages files, any metadata changes that need to be applied to
package stanza's
In debian their is a entry for every single deb package/source in the archive making
them very large. We did away with that to reduce the overhead of processing it created.
So we only have entries for those that need changing, usually to change priorities of
systemd packages and remove recommends and suggests for systemd related packages.
* are indices a part of the repo or only needed by amprolla?
both. In debian, dak generates them and they are hand modified by the repo masters to
apply needed fixes. With amprolla, we only create them for applying our own changes as needed.
Technically they don't need to be in the repo, as they're not used by apt, but practically
it's good to have them there.
hmmm, I think I've cracked my problem...
If I use the Sources delta to identify changed packages, I can use that to pick and apply
the changed Packages stanza's Contents and Translations. This would save lot's of
iterations, and I only need the delta Processing to be done on the Sources files.
Wow that would really speed things up
The other benefit, is we can side load packages this way too and use it to replace dak
as well as either a standalone repo or directly into the merged repo.
And all without a hefty database. or the writeup
your welcome. It has helped me probably as much as you. I think it's
turning into a full rewrite, but seems better design and possibly far easier to
write from scratch.
Anyway, it's nearly 3:30am here, so better get a couple hours sleep!

6
orchestrate.py

@ -2,7 +2,7 @@
# see LICENSE file for copyright and license details
"""
Module used to orchestrace the entire amprolla merge
Module used to orchestrate the entire amprolla merge
"""
from os.path import join
@ -12,8 +12,6 @@ from lib.config import (arches, categories, suites, mergedir, mergesubdir,
pkgfiles, srcfiles, spooldir, repos)
from lib.release import write_release
# from pprint import pprint
def do_merge():
"""
@ -33,7 +31,7 @@ def do_merge():
am = __import__('amprolla_merge')
p = Pool(4)
p = Pool(4) # Set it to the number of CPUs you want to use
p.map(am.main, pkg)

Loading…
Cancel
Save