amprolla is an apt repository merger originally intended for use with the Devuan infrastructure. This version is the third iteration of the software.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

110 lines
3.0 KiB

#!/usr/bin/env python3
5 years ago
# See LICENSE file for copyright and license details.
"""
Amprolla module for merging Contents files
"""
from gzip import open as gzip_open
from multiprocessing import Pool
from os import makedirs
from os.path import dirname, join, isfile
Fix locking by using flock instead of tombstone files. Necessary changes to orchestrate.sh: The way I understand things, there are three processes (A,B,C) A. amprolla_update is run with orchestrate.sh (very often!): 1. there is always a consistent working set in merged which usually points to merged-production 2. before amprolla_update, merged switches to merged-staging (why here and not after amprolla_update?) 3. amprolla_update works against -volatile, during this process that directory is not necessarily fully consistent 4. after amprolla_update, merged-volatile is synchronised to merged-production 5. merged switches to merged-production 6. merged-volatile is synchronised to merged-staging 7. merged-production is synchronised to pkgmaster B. Sometimes amprolla_merge is run C. Sometimes amprolla_merge_contents + amprolla_merge are run The *intent* of the implemented locks appears to be that only one process out of A,B,C is active at a time, but the reality is that this is not formally provided, timing just happens to match, but shouldn't be relied upon: Existence of an active lock does prevent A, B or C from starting. But in A the lock is only active while point A.3. is executed, everything else runs without an active lock, so if e.g. A.4 to A.7 take long enough that A.4 is running again while A.7 is happening, bogus data will be synchronised to pkgmaster. Since A.7 is a network operation and A.5 and A.6 are disk operations, delays *could* happen and they could lead to this scenario (maybe it has happened at some point). That's why I added the support for the --no-lock-I-am-sure argument to amprolla_update, and instead obtain the lock as a step A.0 that is valid throughout all process A in the orchestrate.sh script. With that, I think there is now only need for 2 directories: -volatile and -production, with A being redefined as: A. amprolla_update is run with orchestrate.sh (very often!): 0. obtain amprolla lock and exit if it can't be obtained 1. there is always a consistent working set in merged which usually points to merged-production 2. amprolla_update --no-lock-I-am-sure works against -volatile, during this process that directory is not necessarily fully consistent 3. merged switches to merged-volatile 4. merged-volatile is synchronised to merged-production 5. merged switches to merged-production 7. merged-production is synchronised to pkgmaster I have adapted orchestrate.sh and therefore merged-staging is not used with these patches and could be removed in theory.
3 years ago
import sys
import lib.globalvars as globalvars
from lib.config import (arches, categories, cpunm, mergedir, mergesubdir,
repos, spooldir)
from lib.lock import run_with_args_locking
from lib.log import die, info
from amprolla_merge import prepare_merge_dict
def merge_contents(filelist):
"""
Merges a list of Contents files and returns a dict of the merged files
"""
pkgs = {}
for i in filelist:
if i and isfile(i):
cfile = gzip_open(i).read()
cfile = cfile.decode('utf-8')
contents = cfile.split('\n')
header = False
for line in contents:
if line.startswith('This file maps each file'):
header = True
if line.startswith('FILE'):
header = False
continue
if line != '' and not header:
sin = line.split()
if sin[-1] not in pkgs.keys():
pkgs[sin[-1]] = []
pkgs[sin[-1]].append(' '.join(sin[:-1]))
return pkgs
def write_contents(pkgs, filename):
"""
Writes a merged Contents dict to the given filename in gzip format
"""
makedirs(dirname(filename), exist_ok=True)
gzf = gzip_open(filename, 'w')
for pkg, files in sorted(pkgs.items()):
5 years ago
for file in files:
line = "%s %s\n" % (file, pkg)
gzf.write(line.encode('utf-8'))
gzf.write(b'\n')
gzf.close()
def main_merge(contents_file):
"""
Main merge logic. First parses the files into dictionaries, and
writes them to the mergedir afterwards
"""
to_merge = prepare_merge_dict()
for suite in to_merge:
globalvars.suite = suite
cont_list = []
for rep in to_merge[suite]:
if rep:
cont_list.append(join(rep, contents_file))
else:
cont_list.append(None)
print("Merging contents: %s" % cont_list)
contents_dict = merge_contents(cont_list)
outfile = cont_list[0].replace(join(spooldir,
repos['devuan']['dists']),
join(mergedir, mergesubdir))
print("Writing contents: %s" % outfile)
write_contents(contents_dict, outfile)
def main():
"""
Main function to allow multiprocessing.
"""
cont = []
for i in arches:
for j in categories:
if i != 'source':
cont.append(join(j, i.replace('binary', 'Contents')+'.gz'))
else:
cont.append(join(j, 'Contents-'+i+'.gz'))
mrgpool = Pool(cpunm)
mrgpool.map(main_merge, cont)
mrgpool.close()
if __name__ == '__main__':
run_with_args_locking(main, "contents merge")