Localization (thus non-vital) files of packages can take up to ~2/3 of the delivered bits #20

Open
opened 2020-06-17 22:48:59 +00:00 by jpokorny · 5 comments

Hello,

haven't seen this topic in a cursory search, hence raising it here
demonstrated with one particularly frequent package findutils:

$ rpm -qi findutils | grep '^\(Version\|Release\|Size\)'
Version     : 4.7.0
Release     : 4.fc33
Size        : 1808667
$ rpm -ql findutils | { while read f; do test -d "$f" || echo "$f"; done; } | xargs du -b | tee >(cut -d "$(printf '\t')" -f1 | paste -s -d+ - | bc)
328944	/usr/bin/find
80072	/usr/bin/xargs
25	/usr/lib/.build-id/43/34dd190460c23206f77d2e79168f0164cb5fdf
24	/usr/lib/.build-id/63/c7dd8dd86c5bae0f774c341fe9323bd3a9713c
1375	/usr/share/doc/findutils/AUTHORS
83731	/usr/share/doc/findutils/NEWS
4539	/usr/share/doc/findutils/README
1539	/usr/share/doc/findutils/THANKS
2860	/usr/share/doc/findutils/TODO
24251	/usr/share/info/find-maint.info.gz
89616	/usr/share/info/find.info-1.gz
1878	/usr/share/info/find.info-2.gz
2478	/usr/share/info/find.info.gz
35149	/usr/share/licenses/findutils/COPYING
2343	/usr/share/locale/be/LC_MESSAGES/findutils.mo
48466	/usr/share/locale/bg/LC_MESSAGES/findutils.mo
7982	/usr/share/locale/ca/LC_MESSAGES/findutils.mo
36184	/usr/share/locale/cs/LC_MESSAGES/findutils.mo
34612	/usr/share/locale/da/LC_MESSAGES/findutils.mo
36905	/usr/share/locale/de/LC_MESSAGES/findutils.mo
44457	/usr/share/locale/el/LC_MESSAGES/findutils.mo
34447	/usr/share/locale/eo/LC_MESSAGES/findutils.mo
24941	/usr/share/locale/es/LC_MESSAGES/findutils.mo
33712	/usr/share/locale/et/LC_MESSAGES/findutils.mo
36236	/usr/share/locale/fi/LC_MESSAGES/findutils.mo
37042	/usr/share/locale/fr/LC_MESSAGES/findutils.mo
20984	/usr/share/locale/ga/LC_MESSAGES/findutils.mo
24078	/usr/share/locale/gl/LC_MESSAGES/findutils.mo
35520	/usr/share/locale/hr/LC_MESSAGES/findutils.mo
37131	/usr/share/locale/hu/LC_MESSAGES/findutils.mo
20287	/usr/share/locale/id/LC_MESSAGES/findutils.mo
33636	/usr/share/locale/it/LC_MESSAGES/findutils.mo
28336	/usr/share/locale/ja/LC_MESSAGES/findutils.mo
1916	/usr/share/locale/ko/LC_MESSAGES/findutils.mo
2663	/usr/share/locale/lg/LC_MESSAGES/findutils.mo
6271	/usr/share/locale/lt/LC_MESSAGES/findutils.mo
1514	/usr/share/locale/ms/LC_MESSAGES/findutils.mo
34789	/usr/share/locale/nb/LC_MESSAGES/findutils.mo
35503	/usr/share/locale/nl/LC_MESSAGES/findutils.mo
35962	/usr/share/locale/pl/LC_MESSAGES/findutils.mo
35253	/usr/share/locale/pt/LC_MESSAGES/findutils.mo
36212	/usr/share/locale/pt_BR/LC_MESSAGES/findutils.mo
6589	/usr/share/locale/ro/LC_MESSAGES/findutils.mo
46244	/usr/share/locale/ru/LC_MESSAGES/findutils.mo
24148	/usr/share/locale/sk/LC_MESSAGES/findutils.mo
35181	/usr/share/locale/sl/LC_MESSAGES/findutils.mo
46489	/usr/share/locale/sr/LC_MESSAGES/findutils.mo
34848	/usr/share/locale/sv/LC_MESSAGES/findutils.mo
33280	/usr/share/locale/tr/LC_MESSAGES/findutils.mo
46292	/usr/share/locale/uk/LC_MESSAGES/findutils.mo
38059	/usr/share/locale/vi/LC_MESSAGES/findutils.mo
32873	/usr/share/locale/zh_CN/LC_MESSAGES/findutils.mo
13436	/usr/share/locale/zh_TW/LC_MESSAGES/findutils.mo
21948	/usr/share/man/man1/find.1.gz
5466	/usr/share/man/man1/xargs.1.gz
1808716

(note: regarding 1808667 vs. 1808716 discrepancy;
it must be accounted to .build-id, it seems, EDIT: filed a bug)

We can easily see that, barring find and xargs split, must-have portion
is: 328944 + 80072 + 25 + 24 = 409065, or ~23% of the whole package.

Rest is:

  • documentation (perhaps except for %license droppable,
    see also rpm --nodocs):
    1375+83731+4539+1539+2860+24251+89616+1878+2478+35149+21948+5466
    = 274830, or ~15%

  • localization:
    the rest, i.e. 1124821, or ~62%

At least for quick containers, mockbuilds, etc. only about 1/4 of the packaged
bits are useful. Would there be a room for improvement regarding minimization?

Hello, haven't seen this topic in a cursory search, hence raising it here demonstrated with one particularly frequent package `findutils`: ``` $ rpm -qi findutils | grep '^\(Version\|Release\|Size\)' Version : 4.7.0 Release : 4.fc33 Size : 1808667 ``` ``` $ rpm -ql findutils | { while read f; do test -d "$f" || echo "$f"; done; } | xargs du -b | tee >(cut -d "$(printf '\t')" -f1 | paste -s -d+ - | bc) 328944 /usr/bin/find 80072 /usr/bin/xargs 25 /usr/lib/.build-id/43/34dd190460c23206f77d2e79168f0164cb5fdf 24 /usr/lib/.build-id/63/c7dd8dd86c5bae0f774c341fe9323bd3a9713c 1375 /usr/share/doc/findutils/AUTHORS 83731 /usr/share/doc/findutils/NEWS 4539 /usr/share/doc/findutils/README 1539 /usr/share/doc/findutils/THANKS 2860 /usr/share/doc/findutils/TODO 24251 /usr/share/info/find-maint.info.gz 89616 /usr/share/info/find.info-1.gz 1878 /usr/share/info/find.info-2.gz 2478 /usr/share/info/find.info.gz 35149 /usr/share/licenses/findutils/COPYING 2343 /usr/share/locale/be/LC_MESSAGES/findutils.mo 48466 /usr/share/locale/bg/LC_MESSAGES/findutils.mo 7982 /usr/share/locale/ca/LC_MESSAGES/findutils.mo 36184 /usr/share/locale/cs/LC_MESSAGES/findutils.mo 34612 /usr/share/locale/da/LC_MESSAGES/findutils.mo 36905 /usr/share/locale/de/LC_MESSAGES/findutils.mo 44457 /usr/share/locale/el/LC_MESSAGES/findutils.mo 34447 /usr/share/locale/eo/LC_MESSAGES/findutils.mo 24941 /usr/share/locale/es/LC_MESSAGES/findutils.mo 33712 /usr/share/locale/et/LC_MESSAGES/findutils.mo 36236 /usr/share/locale/fi/LC_MESSAGES/findutils.mo 37042 /usr/share/locale/fr/LC_MESSAGES/findutils.mo 20984 /usr/share/locale/ga/LC_MESSAGES/findutils.mo 24078 /usr/share/locale/gl/LC_MESSAGES/findutils.mo 35520 /usr/share/locale/hr/LC_MESSAGES/findutils.mo 37131 /usr/share/locale/hu/LC_MESSAGES/findutils.mo 20287 /usr/share/locale/id/LC_MESSAGES/findutils.mo 33636 /usr/share/locale/it/LC_MESSAGES/findutils.mo 28336 /usr/share/locale/ja/LC_MESSAGES/findutils.mo 1916 /usr/share/locale/ko/LC_MESSAGES/findutils.mo 2663 /usr/share/locale/lg/LC_MESSAGES/findutils.mo 6271 /usr/share/locale/lt/LC_MESSAGES/findutils.mo 1514 /usr/share/locale/ms/LC_MESSAGES/findutils.mo 34789 /usr/share/locale/nb/LC_MESSAGES/findutils.mo 35503 /usr/share/locale/nl/LC_MESSAGES/findutils.mo 35962 /usr/share/locale/pl/LC_MESSAGES/findutils.mo 35253 /usr/share/locale/pt/LC_MESSAGES/findutils.mo 36212 /usr/share/locale/pt_BR/LC_MESSAGES/findutils.mo 6589 /usr/share/locale/ro/LC_MESSAGES/findutils.mo 46244 /usr/share/locale/ru/LC_MESSAGES/findutils.mo 24148 /usr/share/locale/sk/LC_MESSAGES/findutils.mo 35181 /usr/share/locale/sl/LC_MESSAGES/findutils.mo 46489 /usr/share/locale/sr/LC_MESSAGES/findutils.mo 34848 /usr/share/locale/sv/LC_MESSAGES/findutils.mo 33280 /usr/share/locale/tr/LC_MESSAGES/findutils.mo 46292 /usr/share/locale/uk/LC_MESSAGES/findutils.mo 38059 /usr/share/locale/vi/LC_MESSAGES/findutils.mo 32873 /usr/share/locale/zh_CN/LC_MESSAGES/findutils.mo 13436 /usr/share/locale/zh_TW/LC_MESSAGES/findutils.mo 21948 /usr/share/man/man1/find.1.gz 5466 /usr/share/man/man1/xargs.1.gz 1808716 ``` (note: regarding `1808667` vs. `1808716` discrepancy; it must be accounted to `.build-id`, it seems, EDIT: [filed a bug](https://bugzilla.redhat.com/show_bug.cgi?id=1848199)) We can easily see that, barring `find` and `xargs` split, must-have portion is: 328944 + 80072 + 25 + 24 = 409065, or ~23% of the whole package. Rest is: * documentation (perhaps except for `%license` droppable, see also `rpm --nodocs`): 1375+83731+4539+1539+2860+24251+89616+1878+2478+35149+21948+5466 = 274830, or ~15% * localization: the rest, i.e. 1124821, or ~62% At least for quick containers, mockbuilds, etc. only about 1/4 of the packaged bits are useful. Would there be a room for improvement regarding minimization?
Author

Something like rpm --locale-filter=CMD, perhaps?

Something like `rpm --locale-filter=CMD`, perhaps?
Author

Btw. this "content demultiplexing" is what I had in mind that would
nicely combine with cleverly chunked RPMs:
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/EQ2UDRE6NA7IUC7IA7VZMEHIUJQ7H2K6/

Btw. this "content demultiplexing" is what I had in mind that would nicely combine with cleverly chunked RPMs: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/EQ2UDRE6NA7IUC7IA7VZMEHIUJQ7H2K6/

You can already set the %_install_langs rpm macro to only install the relevant files. This is common on container builds. Indeed, this is not save the download bandwidth. Only the disk space.

You can already set the %_install_langs rpm macro to only install the relevant files. This is common on container builds. Indeed, this is not save the download bandwidth. Only the disk space.
Author

Oh, thanks, so that's something thought of, perfect!
Just the interface towards users is rather buried.

Oh, thanks, so that's something thought of, perfect! Just the interface towards users is rather buried.
Author

Something like rpm --locale-filter=CMD, perhaps?

This idea, beside eclipsed with %_install_langs as mentioned, is mostly surpassed
with existing --excludepath that I missed originally (the idea was to "functionize"
which language identifiers to allow, where CMD could be something like:

cut -z -c1-1024 /etc/locale.conf /home/*/.config/locale.conf | xargs -0 -I{} sh -x -c "echo '{}' | sed -nE 's|^LANG=([\"]?)([[:alnum:]]+)[[:alnum:]._-]*\1|\2|p'"

). Problem with exclusion approach is that it's harder to work with than with the
list of the desired languanges -- but making %_install_langs trigger something
like the above command would be doable, nonetheless.
Note: the command would need more hardenings for sure.

Sidenote, .build-id links will be explicitly avoidable at install time with
--excludeartifacts option to rpm.

> Something like `rpm --locale-filter=CMD`, perhaps? This idea, beside eclipsed with `%_install_langs` as mentioned, is mostly surpassed with existing `--excludepath` that I missed originally (the idea was to "functionize" which language identifiers to _allow_, where `CMD` could be something like: ``` cut -z -c1-1024 /etc/locale.conf /home/*/.config/locale.conf | xargs -0 -I{} sh -x -c "echo '{}' | sed -nE 's|^LANG=([\"]?)([[:alnum:]]+)[[:alnum:]._-]*\1|\2|p'" ``` ). Problem with exclusion approach is that it's harder to work with than with the list of the desired languanges -- but making `%_install_langs` trigger something like the above command would be doable, nonetheless. Note: the command would need more hardenings for sure. Sidenote, `.build-id` links will be explicitly avoidable at install time with [--excludeartifacts](https://github.com/rpm-software-management/rpm/pull/1274) option to `rpm`.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Docs/minimization#20
No description provided.