amprolla is an apt repository merger originally intended for use with the Devuan infrastructure. This version is the third iteration of the software.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

91 lines
2.9 KiB

#!/usr/bin/env python3
5 years ago
# See LICENSE file for copyright and license details.
6 years ago
"""
This module will download the initial Release files used to populate
the spooldir, along with all the files hashed inside the Release files
"""
6 years ago
from multiprocessing import Pool
Fix locking by using flock instead of tombstone files. Necessary changes to orchestrate.sh: The way I understand things, there are three processes (A,B,C) A. amprolla_update is run with orchestrate.sh (very often!): 1. there is always a consistent working set in merged which usually points to merged-production 2. before amprolla_update, merged switches to merged-staging (why here and not after amprolla_update?) 3. amprolla_update works against -volatile, during this process that directory is not necessarily fully consistent 4. after amprolla_update, merged-volatile is synchronised to merged-production 5. merged switches to merged-production 6. merged-volatile is synchronised to merged-staging 7. merged-production is synchronised to pkgmaster B. Sometimes amprolla_merge is run C. Sometimes amprolla_merge_contents + amprolla_merge are run The *intent* of the implemented locks appears to be that only one process out of A,B,C is active at a time, but the reality is that this is not formally provided, timing just happens to match, but shouldn't be relied upon: Existence of an active lock does prevent A, B or C from starting. But in A the lock is only active while point A.3. is executed, everything else runs without an active lock, so if e.g. A.4 to A.7 take long enough that A.4 is running again while A.7 is happening, bogus data will be synchronised to pkgmaster. Since A.7 is a network operation and A.5 and A.6 are disk operations, delays *could* happen and they could lead to this scenario (maybe it has happened at some point). That's why I added the support for the --no-lock-I-am-sure argument to amprolla_update, and instead obtain the lock as a step A.0 that is valid throughout all process A in the orchestrate.sh script. With that, I think there is now only need for 2 directories: -volatile and -production, with A being redefined as: A. amprolla_update is run with orchestrate.sh (very often!): 0. obtain amprolla lock and exit if it can't be obtained 1. there is always a consistent working set in merged which usually points to merged-production 2. amprolla_update --no-lock-I-am-sure works against -volatile, during this process that directory is not necessarily fully consistent 3. merged switches to merged-volatile 4. merged-volatile is synchronised to merged-production 5. merged switches to merged-production 7. merged-production is synchronised to pkgmaster I have adapted orchestrate.sh and therefore merged-staging is not used with these patches and could be removed in theory.
3 years ago
from os.path import join
import sys
6 years ago
from lib.config import (aliases, arches, categories, cpunm, mainrepofiles,
repos, spooldir, suites, skips)
from lib.lock import run_with_args_locking
from lib.net import download
from lib.parse import parse_release
from lib.log import die, info
def pop_dirs(repo):
"""
Crawls through the directories to come up with complete needed
directory structure.
Returns a list of tuples holding the remote and local locations
of the files
Example:
(http://deb.debian.org/debian/dists/jessie/main/binary-all/Packages.gz,
./spool/debian/dists/jessie/main/binary-all/Packages.gz)
"""
repodata = repos[repo]
urls = []
for i in suites:
for j in suites[i]:
baseurl = join(repodata['host'], repodata['dists'])
suite = j
if repodata['aliases'] is True:
if j in aliases[repodata['name']]:
suite = aliases[repodata['name']][j]
elif repodata['skipmissing'] is True:
continue
if repo == 'debian' and j in skips:
continue
6 years ago
pair = (join(baseurl, suite),
join(baseurl.replace(repodata['host'],
spooldir), suite))
urls.append(pair)
return urls
def main():
"""
Loops through all repositories, and downloads their Release files, along
with all the files listed within those Release files.
"""
for dist in repos:
print('Downloading %s directory structure' % dist)
dlurls = pop_dirs(dist)
for url in dlurls:
tpl = []
for file in mainrepofiles:
urls = (join(url[0], file), join(url[1], file))
tpl.append(urls)
dlpool = Pool(cpunm)
dlpool.map(download, tpl)
dlpool.close()
release_contents = open(join(url[1], 'Release')).read()
release_contents = parse_release(release_contents)
tpl = []
for k in release_contents:
# if k.endswith('/binary-armhf/Packages.gz'):
# for a in arches:
# for c in categories:
# if a in k and ("/%s/" % c) in k:
# urls = (join(url[0], k), join(url[1], k))
# tpl.append(urls)
urls = (join(url[0], k), join(url[1], k))
tpl.append(urls)
dlpool = Pool(cpunm)
dlpool.map(download, tpl)
dlpool.close()
if __name__ == '__main__':
run_with_args_locking(main, "init"):