Update splitter to fedora modules upstream and improve documentation.

The grobisplitter parts need some documentation to explain what they are doing and for whom. This is a first attempt at getting that right Signed-off-by: Stephen Smoogen <ssmoogen@redhat.com>
2020-12-03 16:22:44 -05:00 · 2020-12-03 16:22:44 -05:00 · ddb13e640a
commit ddb13e640a
parent 8b4c38e29e
4 changed files with 451 additions and 135 deletions
--- a/roles/grobisplitter/README.md
+++ b/roles/grobisplitter/README.md
@ -0,0 +1,183 @@
 # Grobisplitter
 ### Or how I learned to stop worrying and love modules
 ## Where are the sources 
 The Current Master Git Repository for the grobisplitter program is
 https://github.com/fedora-modularity/GrobiSplitter . The program
 depends upon python3 and some other programs.
 * gobject-introspection
 * libmodulemd-2.5.0
 * libmodulemd1-1.8.11
 * librepo
 * python3-gobject-base
 * python3-hawkey
 * python3-librepo
 ## What does Grobisplitter splitter.py do?
 Grobisplitter was born out of the addition of modules to Fedora and
 RHEL-8. A module is a virtual rpm repository inside of a standard rpm
 repository where a sysadmin can choose which virtual repositories are
 used in a system or not. This allows for useful choices without having
 to add more repository configs, but it adds a complexity that the koji
 build system does not understand. While the MBS system could help
 handle this for packages it knows it built, it can not do so for ones
 that are external which is the case when building CentOS or EPEL
 packages. 
 Grobisplitter was created by Patrick Uiterwijk to deal with part of
 this while permanent solutions were created in MBS and
 koji. Grobisplitter takes a modular repository (as example a reposync
 copy of RHEL-8) and 'flattens' it out with each module becoming its
 own independent repository. Options to the command are
 ``` shell
 [smooge@batcave01 RHEL-8-001]$ /usr/local/bin/splitter.py --help
 usage: splitter.py [-h] [--action {hardlink,symlink,copy}] [--target TARGET]
                   [--skip-missing] [--create-repos] [--only-defaults]
                   repository
 Split repositories up
 positional arguments:
  repository            The repository to split
 optional arguments:
  -h, --help            show this help message and exit
  --action {hardlink,symlink,copy}
                        Method to create split repos files
  --target TARGET       Target directory for split repos
  --skip-missing        Skip missing packages
  --create-repos        Create repository metadatas
  --only-defaults       Only output default modules
 ```
 To save diskspace, one can use different methods to copy packages,
 target a specific directory, only allow for default modules, and
 create repos for each of the virtual repositories seperately. 
 Each module is split into a name matching its modular dataname, for
 example as of 2020-12-03, here are the httpd modules of RHEL-8 split out:
 ``` shell
 [smooge@batcave01 RHEL-8-001]$ ls -1d httpd*
 httpd:2.4:8000020190405071959:55190bc5:x86_64/
 httpd:2.4:8000020190829150747:f8e95b4e:x86_64/
 httpd:2.4:8010020190829143335:cdc1202b:x86_64/
 httpd:2.4:8020020200122152618:6a468ee4:x86_64/
 httpd:2.4:8020020200824162909:4cda2c84:x86_64/
 httpd:2.4:8030020200818000036:30b713e6:x86_64/
 ```
 The reason that there are multiple modules versus just the latest
 module was due to problems in knowing what the 'latest' module was to
 use. It needs to know about all the packages in the upstream
 repositories for modular decisions to be made. This means that the
 staged data will be a complete copy of the RHN repository.
 ``` shell
 total 4980
 -rw-r--r--. 1 root sysadmin-main 1463679 2020-11-03 09:18 httpd-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64.rpm
 -rw-r--r--. 1 root sysadmin-main  224591 2020-11-03 09:18 httpd-devel-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64.rpm
 -rw-r--r--. 1 root sysadmin-main   37599 2020-11-03 09:18 httpd-filesystem-2.4.37-30.module+el8.3.0+7001+0766b9e7.noarch.rpm
 -rw-r--r--. 1 root sysadmin-main 2486719 2020-11-03 09:18 httpd-manual-2.4.37-30.module+el8.3.0+7001+0766b9e7.noarch.rpm
 -rw-r--r--. 1 root sysadmin-main  106479 2020-11-03 09:18 httpd-tools-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64.rpm
 -rw-r--r--. 1 root sysadmin-main  157763 2020-11-03 09:18 mod_http2-1.15.7-2.module+el8.3.0+7670+8bf57d29.x86_64.rpm
 -rw-r--r--. 1 root sysadmin-main   84163 2020-11-03 09:18 mod_ldap-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64.rpm
 -rw-r--r--. 1 root sysadmin-main  189343 2020-11-03 09:18 mod_md-2.0.8-8.module+el8.3.0+6814+67d1e611.x86_64.rpm
 -rw-r--r--. 1 root sysadmin-main   60531 2020-11-03 09:18 mod_proxy_html-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64.rpm
 -rw-r--r--. 1 root sysadmin-main   72475 2020-11-03 09:18 mod_session-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64.rpm
 -rw-r--r--. 1 root sysadmin-main  135799 2020-11-03 09:18 mod_ssl-2.4.37-30.module+el8.3.0+7001+0766b9e7.x86_64.rpm
 ```
 All non-modular rpms from the repository are put in a directory called
 `non-modular` which can also have its own repodata set up for it.
 ## What does rhel8-split.sh do?
 While the splitter command does the hard work of splitting out the
 packages, the rhel8-split.sh shell does the 'business' work of setting
 up the repositories so that koji can consume it for EPEL-8 and other
 builds.
 The first part of this is done by a cron job which reposyncs down from
 the Red Hat access.redhat.com the various packages for the
 architectures Fedora Infrastructure needs. The data is synced down
 into subdirectories in `/mnt/fedora/app/fi-repo/rhel/rhel8` which
 match channels in RHEL BaseOS, AppStream, CodeReadyBuilder as needed. 
 Next a new destination directory is made in
 `/mnt/fedora/app/fi-repo/rhel/rhel8/koji/` with the date of the cron
 job being run so that we can always roll back to an older external Red
 Hat repo if needed. Afterwards we begin breaking apart the repos per
 architecture. The splitter is then called per channel that is wanted
 to be used in EPEL. The Base and AppStream channel only splits out the
 'default' modules while the Code Ready Builder splits out all modules
 as many are non-default.
 After the files have been copied into a single tree a `createrepo_c`
 is run with the data. This creates a 'flattened' repository with data
 in it. However modular data from all these repos is currently lost.
 Once the data has been synced and flattened for all repositories, a
 series of links are set up that koji can point to. At this point a
 last reposync cycle is done using dnf to pull in only the newest
 rpms. This effectively cleans up large number of older packages to
 make sure the builders have an easier time deciding which package to
 use. [Basically as of 2020-12-03, the staged repo has 66130 packages
 in it, and the latest shrinks that down to 26530.]
 Koji then is pointed to the trees on batcave served from
 `/mnt/fedora/app/fi-repo/rhel/rhel8/koji/latest/${arch}/RHEL-8-001`.
 TODO:
 1. Currently the RHEL-8-001 is a consequence of the rhel8-split.sh
   script. We split each repo into its own tree and then copy them
   into one final one. This should be done better.
 2. A way to clean up the 'empty' directory names in latest would help
   make it easier to see what is actually being 'used' by koji.
   ```
 [smooge@batcave01 latest]$ ls -1d x86_64/RHEL-8-001/go-toolset\:rhel8\:80*
 x86_64/RHEL-8-001/go-toolset:rhel8:8000020190509153318:b9255456:x86_64/
 x86_64/RHEL-8-001/go-toolset:rhel8:8000120190520160856:4a778a88:x86_64/
 x86_64/RHEL-8-001/go-toolset:rhel8:8000120190828225436:14bc675c:x86_64/
 x86_64/RHEL-8-001/go-toolset:rhel8:8010020190829001136:ccff3eb7:x86_64/
 x86_64/RHEL-8-001/go-toolset:rhel8:8010020191220185136:0ed30617:x86_64/
 x86_64/RHEL-8-001/go-toolset:rhel8:8020020200128163444:0ab52eed:x86_64/
 x86_64/RHEL-8-001/go-toolset:rhel8:8020020200817154239:02f7cb7a:x86_64/
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/
   ``` 
   makes this look like it has lots of files .. however only one tree
   has files in it.
   ```
 [smooge@batcave01 latest]$ find x86_64/RHEL-8-001/go-toolset\:rhel8\:80*
 x86_64/RHEL-8-001/go-toolset:rhel8:8000020190509153318:b9255456:x86_64
 x86_64/RHEL-8-001/go-toolset:rhel8:8000120190520160856:4a778a88:x86_64
 x86_64/RHEL-8-001/go-toolset:rhel8:8000120190828225436:14bc675c:x86_64
 x86_64/RHEL-8-001/go-toolset:rhel8:8010020190829001136:ccff3eb7:x86_64
 x86_64/RHEL-8-001/go-toolset:rhel8:8010020191220185136:0ed30617:x86_64
 x86_64/RHEL-8-001/go-toolset:rhel8:8020020200128163444:0ab52eed:x86_64
 x86_64/RHEL-8-001/go-toolset:rhel8:8020020200817154239:02f7cb7a:x86_64
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/delve-1.4.1-1.module+el8.3.0+7840+63dfb1ed.x86_64.rpm
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/go-toolset-1.14.7-1.module+el8.3.0+7840+63dfb1ed.x86_64.rpm
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/golang-1.14.7-2.module+el8.3.0+7840+63dfb1ed.x86_64.rpm
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/golang-bin-1.14.7-2.module+el8.3.0+7840+63dfb1ed.x86_64.rpm
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/golang-docs-1.14.7-2.module+el8.3.0+7840+63dfb1ed.noarch.rpm
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/golang-misc-1.14.7-2.module+el8.3.0+7840+63dfb1ed.noarch.rpm
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/golang-race-1.14.7-2.module+el8.3.0+7840+63dfb1ed.x86_64.rpm
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/golang-src-1.14.7-2.module+el8.3.0+7840+63dfb1ed.noarch.rpm
 x86_64/RHEL-8-001/go-toolset:rhel8:8030020200827141259:13702366:x86_64/golang-tests-1.14.7-2.module+el8.3.0+7840+63dfb1ed.noarch.rpm
   ```
--- a/roles/grobisplitter/README.txt
+++ b/roles/grobisplitter/README.txt
@ -1,12 +0,0 @@
 The Current Master Git Repository for the grobisplitter program is
 https://github.com/smooge/GrobiSplitter.git to be moved under a
 Community Infrastructure repository later. The program depends upon
 python3 and other programs.
 gobject-introspection
 libmodulemd-2.5.0
 libmodulemd1-1.8.11
 librepo
 python3-gobject-base
 python3-hawkey
 python3-librepo
--- a/roles/grobisplitter/files/rhel8-split.sh
+++ b/roles/grobisplitter/files/rhel8-split.sh
@ -1,4 +1,6 @@
 #!/bin/bash
 ## Setup basic environment variables.
 HOMEDIR=/mnt/fedora/app/fi-repo/rhel/rhel8
 BINDIR=/usr/local/bin
@ -7,6 +9,10 @@ DATE=$(date -Ih | sed 's/+.*//')
 DATEDIR=${HOMEDIR}/koji/${DATE}
 ##
 ## Make a directory for where the new tree will live. Use a new date
 ## so that we can roll back to an older release or stop updates for
 ## some time if needed. 
 if [ -d ${DATEDIR} ]; then
    echo "Directory already exists. Please remove or fix"
    exit
@ -14,6 +20,9 @@ else
 mkdir -p ${DATEDIR}
 fi
 ##
 ## Go through each architecture and 
 ## 
 for ARCH in ${ARCHES}; do
    # The archdir is where we daily download updates for rhel8
    ARCHDIR=${HOMEDIR}/${ARCH}
--- a/roles/grobisplitter/files/splitter.py
+++ b/roles/grobisplitter/files/splitter.py
@ -12,32 +12,33 @@ import tempfile
 import os
 import subprocess
 import sys
 import logging
 # Look for a specific version of modulemd. The 1.x series does not
 # have the tools we need.
 try:
    gi.require_version('Modulemd', '2.0')
-    from gi.repository import Modulemd
+    from gi.repository import Modulemd as mmd
-except:
+except ValueError:
-    print("We require newer vesions of modulemd than installed..")
+    print("libmodulemd 2.0 is not installed..")
-    sys.exit(0)
+    sys.exit(1)
 mmd = Modulemd
-# This code is from Stephen Gallagher to make my other caveman code
+# We only want to load the module metadata once. It can be reused as often as required
-# less icky.
+_idx = None
-def _get_latest_streams (mymod, stream):
+
 def _get_latest_streams(mymod, stream):
    """
    Routine takes modulemd object and a stream name.
    Finds the lates stream from that and returns that as a stream
-    object. 
+    object.
    """
    all_streams = mymod.search_streams(stream, 0)
    latest_streams = mymod.search_streams(stream,
-                                          all_streams[0].props.version) 
+                                          all_streams[0].props.version)
-    
+
    return latest_streams
-    
+
 def _get_repoinfo(directory):
    """
    A function which goes into the given directory and sets up the
@ -54,6 +55,46 @@ def _get_repoinfo(directory):
        r = h.perform()
        return r.getinfo(librepo.LRR_YUM_REPO)
 def _get_modulemd(directory=None, repo_info=None):
    """
    Retrieve the module metadata from this repository.
    :param directory: The path to the repository. Must contain repodata/repomd.xml and modules.yaml.
    :param repo_info: An already-acquired repo_info structure
    :return: A Modulemd.ModulemdIndex object containing the module metadata from this repository.
    """
    # Return the cached value
    global _idx
    if _idx:
        return _idx
    # If we don't have a cached value, we need either directory or repo_info
    assert directory or repo_info
    if directory:
        directory = os.path.abspath(directory)
        repo_info = _get_repoinfo(directory)
    if 'modules' not in repo_info:
        return None
    _idx = mmd.ModuleIndex.new()
    with gzip.GzipFile(filename=repo_info['modules'], mode='r') as gzf:
        mmdcts = gzf.read().decode('utf-8')
        res, failures = _idx.update_from_string(mmdcts, True)
        if len(failures) != 0:
            raise Exception("YAML FAILURE: FAILURES: %s" % failures)
        if not res:
            raise Exception("YAML FAILURE: res != True")
    # Ensure that every stream in the index is using v2
    _idx.upgrade_streams(mmd.ModuleStreamVersionEnum.TWO)
    return _idx
 def _get_hawkey_sack(repo_info):
    """
    A function to pull in the repository sack from hawkey.
@ -66,9 +107,10 @@ def _get_hawkey_sack(repo_info):
    primary_sack = hawkey.Sack()
    primary_sack.load_repo(hk_repo, build_cache=False)
-    
+
    return primary_sack
 def _get_filelist(package_sack):
    """
    Determine the file locations of all packages in the sack. Use the
@ -77,10 +119,12 @@ def _get_filelist(package_sack):
    """
    pkg_list = {}
    for pkg in hawkey.Query(package_sack):
-        nevr="%s-%s:%s-%s.%s"% (pkg.name,pkg.epoch,pkg.version,pkg.release,pkg.arch)
+        nevr = "%s-%s:%s-%s.%s" % (pkg.name, pkg.epoch,
                                   pkg.version, pkg.release, pkg.arch)
        pkg_list[nevr] = pkg.location
    return pkg_list
 def _parse_repository_non_modular(package_sack, repo_info, modpkgset):
    """
    Simple routine to go through a repo, and figure out which packages
@ -97,20 +141,14 @@ def _parse_repository_non_modular(package_sack, repo_info, modpkgset):
        pkgs.add(pkg.location)
    return pkgs
-def _parse_repository_modular(repo_info,package_sack):
+
 def _parse_repository_modular(repo_info, package_sack):
    """
    Returns a dictionary of packages indexed by the modules they are
    contained in.
    """
    cts = {}
-    idx = mmd.ModuleIndex()
+    idx = _get_modulemd(repo_info=repo_info)
    with gzip.GzipFile(filename=repo_info['modules'], mode='r') as gzf:
        mmdcts = gzf.read().decode('utf-8')
        res, failures = idx.update_from_string(mmdcts, True)
        if len(failures) != 0:
            raise Exception("YAML FAILURE: FAILURES: %s" % failures)
        if not res:
            raise Exception("YAML FAILURE: res != True")
    pkgs_list = _get_filelist(package_sack)
    idx.upgrade_streams(2)
@ -124,14 +162,14 @@ def _parse_repository_modular(repo_info,package_sack):
                else:
                    continue
            cts[stream.get_NSVCA()] = templ
-                
+
    return cts
 def _get_modular_pkgset(mod):
    """
    Takes a module and goes through the moduleset to determine which
-    packages are inside it. 
+    packages are inside it.
    Returns a list of packages
    """
    pkgs = set()
@ -142,6 +180,7 @@ def _get_modular_pkgset(mod):
    return list(pkgs)
 def _perform_action(src, dst, action):
    """
    Performs either a copy, hardlink or symlink of the file src to the
@ -160,6 +199,7 @@ def _perform_action(src, dst, action):
    elif action == 'symlink':
        os.symlink(src, dst)
 def validate_filenames(directory, repoinfo):
    """
    Take a directory and repository information. Test each file in
@ -176,107 +216,175 @@ def validate_filenames(directory, repoinfo):
    return isok
-def get_default_modules(directory):
+def _get_recursive_dependencies(all_deps, idx, stream, ignore_missing_deps):
    if stream.get_NSVCA() in all_deps:
        # We've already encountered this NSVCA, so don't go through it again
        logging.debug('Already included {}'.format(stream.get_NSVCA()))
        return
    # Store this NSVCA/NS pair
    local_deps = all_deps
    local_deps.add(stream.get_NSVCA())
    logging.debug("Recursive deps: {}".format(stream.get_NSVCA()))
    # Loop through the dependencies for this stream
    deps = stream.get_dependencies()
    # At least one of the dependency array entries must exist in the repo
    found_dep = False
    for dep in deps:
        # Within an array entry, all of the modules must be present in the
        # index
        found_all_modules = True
        for modname in dep.get_runtime_modules():
            # Ignore "platform" because it's special
            if modname == "platform":
                logging.debug('Skipping platform')
                continue
            logging.debug('Processing dependency on module {}'.format(modname))
            mod = idx.get_module(modname)
            if not mod:
                # This module wasn't present in the index.
                found_module = False
                continue
            # Within a module, at least one of the requested streams must be
            # present
            streamnames = dep.get_runtime_streams(modname)
            found_stream = False
            for streamname in streamnames:
                stream_list = _get_latest_streams(mod, streamname)
                for inner_stream in stream_list:
                    try:
                        _get_recursive_dependencies(
                            local_deps, idx, inner_stream, ignore_missing_deps)
                    except FileNotFoundError as e:
                        # Could not find all of this stream's dependencies in
                        # the repo
                        continue
                    found_stream = True
            # None of the streams were found for this module
            if not found_stream:
                found_all_modules = False
        # We've iterated through all of the modules; if it's still True, this
        # dependency is consistent in the index
        if found_all_modules:
            found_dep = True
    # We were unable to resolve the dependencies for any of the array entries.
    # raise FileNotFoundError
    if not found_dep and not ignore_missing_deps:
        raise FileNotFoundError(
            "Could not resolve dependencies for {}".format(
                stream.get_NSVCA()))
    all_deps.update(local_deps)
 def get_default_modules(directory, ignore_missing_deps):
    """
    Work through the list of modules and come up with a default set of
-    modules which would be the minimum to output. 
+    modules which would be the minimum to output.
-    Returns a set of modules 
+    Returns a set of modules
    """
    directory = os.path.abspath(directory)
    repo_info = _get_repoinfo(directory)
-    provides = set()
+    all_deps = set()
    contents = set()
    if 'modules' not in repo_info:
        return contents
    idx = mmd.ModuleIndex()
    with gzip.GzipFile(filename=repo_info['modules'], mode='r') as gzf:
        mmdcts = gzf.read().decode('utf-8')
        res, failures = idx.update_from_string(mmdcts, True)
        if len(failures) != 0:
            raise Exception("YAML FAILURE: FAILURES: %s" % failures)
        if not res:
            raise Exception("YAML FAILURE: res != True")
-    idx.upgrade_streams(2)
+    idx = _get_modulemd(directory)
    if not idx:
        return all_deps
-    # OK this is cave-man no-sleep programming. I expect there is a
+    for modname, streamname in idx.get_default_streams().items():
-    # better way to do this that would be a lot better. However after
+        # Only the latest version of a stream is important, as that is the only one that DNF will consider in its
-    # a long long day.. this is what I have.
+        # transaction logic. We still need to handle each context individually.
    # First we oo through the default streams and create a set of
    # provides that we can check against later.
    for modname in idx.get_default_streams():
        mod = idx.get_module(modname)
-        # Get the default streams and loop through them.
+        stream_set = _get_latest_streams(mod, streamname)
        stream_set = mod.get_streams_by_stream_name(
            mod.get_defaults().get_default_stream())
        for stream in stream_set:
-            tempstr = "%s:%s" % (stream.props.module_name,
+            # Different contexts have different dependencies
-                                 stream.props.stream_name)
+            try:
-            provides.add(tempstr)
+                logging.debug("Processing {}".format(stream.get_NSVCA()))
                _get_recursive_dependencies(all_deps, idx, stream, ignore_missing_deps)
                logging.debug("----------")
            except FileNotFoundError as e:
                # Not all dependencies could be satisfied
                print(
                    "Not all dependencies for {} could be satisfied. {}. Skipping".format(
                        stream.get_NSVCA(), e))
                continue
    logging.debug('Default module streams: {}'.format(all_deps))
    return all_deps
-    # Now go through our list and build up a content lists which will
+def _pad_svca(svca, target_length):
-    # have only modules which have their dependencies met
+    """
-    tempdict = {}
+    If the split() doesn't return all values (e.g. arch is missing), pad it
-    for modname in idx.get_default_streams():
+    with `None`
-        mod = idx.get_module(modname)
+    """
-        # Get the default streams and loop through them.
+    length = len(svca)
-        # This is a sorted list with the latest in it. We could drop
+    svca.extend([None] * (target_length - length))
-        # looking at later ones here in a future version. (aka lines
+    return svca
        # 237 to later)
        stream_set = mod.get_streams_by_stream_name(
            mod.get_defaults().get_default_stream())
        for stream in stream_set:
            ourname = stream.get_NSVCA()
            tmp_name = "%s:%s" % (stream.props.module_name,
                                 stream.props.stream_name)
            # Get dependencies is a list of items. All of the modules
            # seem to only have 1 item in them, but we should loop
            # over the list anyway.
            for deps in stream.get_dependencies():
                isprovided = True # a variable to say this can be added.
                for mod in deps.get_runtime_modules():
                    tempstr=""
                    # It does not seem easy to figure out what the
                    # platform is so just assume we will meet it.
                    if mod != 'platform':
                        for stm in deps.get_runtime_streams(mod):
                            tempstr = "%s:%s" %(mod,stm)
                            if tempstr not in provides:
                                # print( "%s : %s not found." % (ourname,tempstr))
                                isprovided = False
                    if isprovided:
                        if tmp_name in tempdict:
                            # print("We found %s" % tmp_name)
                            # Get the stream version we are looking at
                            ts1=ourname.split(":")[2]
                            # Get the stream version we stored away
                            ts2=tempdict[tmp_name].split(":")[2]
                            # See if we got a newer one. We probably
                            # don't as it is a sorted list but we
                            # could have multiple contexts which would
                            # change things.
                            if ( int(ts1) > int(ts2) ):
                                # print ("%s > %s newer for %s", ts1,ts2,ourname)
                                tempdict[tmp_name] = ourname
                        else:
                            # print("We did not find %s" % tmp_name)
                            tempdict[tmp_name] = ourname
    # OK we finally got all our stream names we want to send back to
    # our calling function. Read them out and add them to the set.
    for indx in tempdict:
        contents.add(tempdict[indx])
-    return contents
+
 def _dump_modulemd(modname, yaml_file):
    idx = _get_modulemd()
    assert idx
    # Create a new index to hold the information about this particular
    # module and stream
    new_idx = mmd.ModuleIndex.new()
    # Add the module streams
    module_name, *svca = modname.split(':')
    stream_name, version, context, arch = _pad_svca(svca, 4)
    logging.debug("Dumping YAML for {}, {}, {}, {}, {}".format(
        module_name, stream_name, version, context, arch))
    mod = idx.get_module(module_name)
    streams = mod.search_streams(stream_name, int(version), context, arch)
    # This should usually be a single item, but we'll be future-compatible
    # and account for the possibility of having multiple streams here.
    for stream in streams:
        new_idx.add_module_stream(stream)
    # Add the module defaults
    defs = mod.get_defaults()
    if defs:
        new_idx.add_defaults(defs)
    # libmodulemd doesn't currently expose the get_translation()
    # function, but that will be added in 2.8.0
    try:
        # Add the translation object
        translation = mod.get_translation()
        if translation:
            new_idx.add_translation(translation)
    except AttributeError as e:
        # This version of libmodulemd does not yet support this function.
        # Just ignore it.
        pass
    # Write out the file
    try:
        with open(yaml_file, 'w') as output:
            output.write(new_idx.dump_to_string())
    except PermissionError as e:
        logging.error("Could not write YAML to file: {}".format(e))
        raise
 def perform_split(repos, args, def_modules):
    for modname in repos:
        if args.only_defaults and modname not in def_modules:
            continue
-        
+
        targetdir = os.path.join(args.target, modname)
        os.mkdir(targetdir)
@ -287,8 +395,12 @@ def perform_split(repos, args, def_modules):
                os.path.join(targetdir, pkgfile),
                args.action)
        # Extract the modular metadata for this module
        if modname != 'non_modular':
            _dump_modulemd(modname, os.path.join(targetdir, 'modules.yaml'))
-def create_repos(target, repos,def_modules, only_defaults):
+
 def create_repos(target, repos, def_modules, only_defaults):
    """
    Routine to create repositories. Input is target directory and a
    list of repositories.
@ -297,9 +409,19 @@ def create_repos(target, repos,def_modules, only_defaults):
    for modname in repos:
        if only_defaults and modname not in def_modules:
            continue
        targetdir = os.path.join(target, modname)
        subprocess.run([
-            'createrepo_c', os.path.join(target, modname),
+            'createrepo_c', targetdir,
            '--no-database'])
        if modname != 'non_modular':
            subprocess.run([
                'modifyrepo_c',
                '--mdtype=modules',
                os.path.join(targetdir, 'modules.yaml'),
                os.path.join(targetdir, 'repodata')
            ])
 def parse_args():
@ -309,6 +431,8 @@ def parse_args():
    """
    parser = argparse.ArgumentParser(description='Split repositories up')
    parser.add_argument('repository', help='The repository to split')
    parser.add_argument('--debug', help='Enable debug logging',
                        action='store_true', default=False)
    parser.add_argument('--action', help='Method to create split repos files',
                        choices=('hardlink', 'symlink', 'copy'),
                        default='hardlink')
@ -319,6 +443,11 @@ def parse_args():
                        action='store_true', default=False)
    parser.add_argument('--only-defaults', help='Only output default modules',
                        action='store_true', default=False)
    parser.add_argument('--ignore-missing-default-deps',
                        help='When using --only-defaults, do not skip '
                             'default streams whose dependencies cannot be '
                             'resolved within this repository',
                        action='store_true', default=False)
    return parser.parse_args()
@ -337,6 +466,7 @@ def setup_target(args):
        else:
            os.mkdir(args.target)
 def parse_repository(directory):
    """
    Parse a specific directory, returning a dict with keys module NSVC's and
@ -353,45 +483,51 @@ def parse_repository(directory):
    # If we have a repository with no modules we do not want our
    # script to error out but just remake the repository with
    # everything in a known sack (aka non_modular).
-     
+
    if 'modules' in repo_info:
-        mod = _parse_repository_modular(repo_info,package_sack)
+        mod = _parse_repository_modular(repo_info, package_sack)
        modpkgset = _get_modular_pkgset(mod)
    else:
        mod = dict()
        modpkgset = set()
-    non_modular = _parse_repository_non_modular(package_sack,repo_info, 
+    non_modular = _parse_repository_non_modular(package_sack, repo_info,
-                                  modpkgset) 
+                                                modpkgset)
    mod['non_modular'] = non_modular
-    ## We should probably go through our default modules here and
+    # We should probably go through our default modules here and
-    ## remove them from our mod. This would cut down some code paths.
+    # remove them from our mod. This would cut down some code paths.
    return mod
 def main():
-    # Determine what the arguments are and 
+    # Determine what the arguments are and
    args = parse_args()
    if args.debug:
        logging.basicConfig(level=logging.DEBUG)
    # Go through arguments and act on their values.
    setup_target(args)
    repos = parse_repository(args.repository)
    if args.only_defaults:
-        def_modules = get_default_modules(args.repository)
+        def_modules = get_default_modules(args.repository, args.ignore_missing_default_deps)
    else:
        def_modules = set()
-    def_modules.add('non_modular')        
+
-    
+    def_modules.add('non_modular')
    if not args.skip_missing:
        if not validate_filenames(args.repository, repos):
            raise ValueError("Package files were missing!")
    if args.target:
        perform_split(repos, args, def_modules)
        if args.create_repos:
-            create_repos(args.target, repos,def_modules,args.only_defaults)
+            create_repos(args.target, repos, def_modules, args.only_defaults)
 if __name__ == '__main__':
    main()