OLD | NEW |
1 | 1 |
2 History of LZMA Utils and XZ Utils | 2 History of LZMA Utils and XZ Utils |
3 ================================== | 3 ================================== |
4 | 4 |
5 Tukaani distribution | 5 Tukaani distribution |
6 | 6 |
7 In 2005, there was a small group working on Tukaani distribution, which | 7 In 2005, there was a small group working on the Tukaani distribution, |
8 was a Slackware fork. One of the project goals was to fit the distro on | 8 which was a Slackware fork. One of the project's goals was to fit the |
9 a single 700 MiB ISO-9660 image. Using LZMA instead of gzip helped a | 9 distro on a single 700 MiB ISO-9660 image. Using LZMA instead of gzip |
10 lot. Roughly speaking, one could fit data that took 1000 MiB in gzipped | 10 helped a lot. Roughly speaking, one could fit data that took 1000 MiB |
11 form into 700 MiB with LZMA. Naturally compression ratio varied across | 11 in gzipped form into 700 MiB with LZMA. Naturally, the compression |
12 packages, but this was what we got on average. | 12 ratio varied across packages, but this was what we got on average. |
13 | 13 |
14 Slackware packages have traditionally had .tgz as the filename suffix, | 14 Slackware packages have traditionally had .tgz as the filename suffix, |
15 which is an abbreviation of .tar.gz. A logical naming for LZMA | 15 which is an abbreviation of .tar.gz. A logical naming for LZMA |
16 compressed packages was .tlz, being an abbreviation of .tar.lzma. | 16 compressed packages was .tlz, being an abbreviation of .tar.lzma. |
17 | 17 |
18 At the end of the year 2007, there was no distribution under the | 18 At the end of the year 2007, there was no distribution under the |
19 Tukaani project anymore, but development of LZMA Utils was kept going. | 19 Tukaani project anymore, but development of LZMA Utils was kept going. |
20 Still, there were .tlz packages around, because at least Vector Linux | 20 Still, there were .tlz packages around, because at least Vector Linux |
21 (a Slackware based distribution) used LZMA for its packages. | 21 (a Slackware based distribution) used LZMA for its packages. |
22 | 22 |
23 First versions of the modified pkgtools used the LZMA_Alone tool from | 23 First versions of the modified pkgtools used the LZMA_Alone tool from |
24 Igor Pavlov's LZMA SDK as is. It was fine, because users wouldn't need | 24 Igor Pavlov's LZMA SDK as is. It was fine, because users wouldn't need |
25 to interact with LZMA_Alone directly. But people soon wanted to use | 25 to interact with LZMA_Alone directly. But people soon wanted to use |
26 LZMA for other files too, and the interface of LZMA_Alone wasn't | 26 LZMA for other files too, and the interface of LZMA_Alone wasn't |
27 comfortable for those used to gzip and bzip2. | 27 comfortable for those used to gzip and bzip2. |
28 | 28 |
29 | 29 |
30 First steps of LZMA Utils | 30 First steps of LZMA Utils |
31 | 31 |
32 The first version of LZMA Utils (4.22.0) included a shell script called | 32 The first version of LZMA Utils (4.22.0) included a shell script called |
33 lzmash. It was wrapper that had gzip-like command line interface. It | 33 lzmash. It was a wrapper that had a gzip-like command-line interface. It |
34 used the LZMA_Alone tool from LZMA SDK to do all the real work. zgrep, | 34 used the LZMA_Alone tool from LZMA SDK to do all the real work. zgrep, |
35 zdiff, and related scripts from gzip were adapted work with LZMA and | 35 zdiff, and related scripts from gzip were adapted to work with LZMA and |
36 were part of the first LZMA Utils release too. | 36 were part of the first LZMA Utils release too. |
37 | 37 |
38 LZMA Utils 4.22.0 included also lzmadec, which was a small (less than | 38 LZMA Utils 4.22.0 included also lzmadec, which was a small (less than |
39 10 KiB) decoder-only command line tool. It was written on top of the | 39 10 KiB) decoder-only command-line tool. It was written on top of the |
40 decoder-only C code found from the LZMA SDK. lzmadec was convenient in | 40 decoder-only C code found from the LZMA SDK. lzmadec was convenient in |
41 situations where LZMA_Alone (a few hundred KiB) would be too big. | 41 situations where LZMA_Alone (a few hundred KiB) would be too big. |
42 | 42 |
43 lzmash and lzmadec were written by Lasse Collin. | 43 lzmash and lzmadec were written by Lasse Collin. |
44 | 44 |
45 | 45 |
46 Second generation | 46 Second generation |
47 | 47 |
48 The lzmash script was an ugly and not very secure hack. The last | 48 The lzmash script was an ugly and not very secure hack. The last |
49 version of LZMA Utils to use lzmash was 4.27.1. | 49 version of LZMA Utils to use lzmash was 4.27.1. |
50 | 50 |
51 LZMA Utils 4.32.0beta1 introduced a new lzma command line tool written | 51 LZMA Utils 4.32.0beta1 introduced a new lzma command-line tool written |
52 by Ville Koskinen. It was written in C++, and used the encoder and | 52 by Ville Koskinen. It was written in C++, and used the encoder and |
53 decoder from C++ LZMA SDK with little modifications. This tool replaced | 53 decoder from C++ LZMA SDK with some little modifications. This tool |
54 both the lzmash script and the LZMA_Alone command line tool in LZMA | 54 replaced both the lzmash script and the LZMA_Alone command-line tool |
55 Utils. | 55 in LZMA Utils. |
56 | 56 |
57 Introducing this new tool caused some temporary incompatibilities, | 57 Introducing this new tool caused some temporary incompatibilities, |
58 because LZMA_Alone executable was simply named lzma like the new | 58 because the LZMA_Alone executable was simply named lzma like the new |
59 command line tool, but they had completely different command line | 59 command-line tool, but they had a completely different command-line |
60 interface. The file format was still the same. | 60 interface. The file format was still the same. |
61 | 61 |
62 Lasse wrote liblzmadec, which was a small decoder-only library based | 62 Lasse wrote liblzmadec, which was a small decoder-only library based |
63 on the C code found from LZMA SDK. liblzmadec had API similar to zlib, | 63 on the C code found from LZMA SDK. liblzmadec had an API similar to |
64 although there were some significant differences, which made it | 64 zlib, although there were some significant differences, which made it |
65 non-trivial to use it in some applications designed for zlib and | 65 non-trivial to use it in some applications designed for zlib and |
66 libbzip2. | 66 libbzip2. |
67 | 67 |
68 The lzmadec command line tool was converted to use liblzmadec. | 68 The lzmadec command-line tool was converted to use liblzmadec. |
69 | 69 |
70 Alexandre Sauvé helped converting build system to use GNU Autotools. | 70 Alexandre Sauvé helped converting the build system to use GNU |
71 This made is easier to test for certain less portable features needed | 71 Autotools. This made it easier to test for certain less portable |
72 by the new command line tool. | 72 features needed by the new command-line tool. |
73 | 73 |
74 Since the new command line tool never got completely finished (for | 74 Since the new command-line tool never got completely finished (for |
75 example, it didn't support LZMA_OPT environment variable), the intent | 75 example, it didn't support the LZMA_OPT environment variable), the |
76 was to not call 4.32.x stable. Similarly, liblzmadec wasn't polished, | 76 intent was to not call 4.32.x stable. Similarly, liblzmadec wasn't |
77 but appeared to work well enough, so some people started using it too. | 77 polished, but appeared to work well enough, so some people started |
| 78 using it too. |
78 | 79 |
79 Because the development of the third generation of LZMA Utils was | 80 Because the development of the third generation of LZMA Utils was |
80 delayed considerably (3-4 years), the 4.32.x branch had to be kept | 81 delayed considerably (3-4 years), the 4.32.x branch had to be kept |
81 maintained. It got some bug fixes now and then, and finally it was | 82 maintained. It got some bug fixes now and then, and finally it was |
82 decided to call it stable, although most of the missing features were | 83 decided to call it stable, although most of the missing features were |
83 never added. | 84 never added. |
84 | 85 |
85 | 86 |
86 File format problems | 87 File format problems |
87 | 88 |
88 The file format used by LZMA_Alone was primitive. It was designed for | 89 The file format used by LZMA_Alone was primitive. It was designed with |
89 embedded systems in mind, and thus provided only minimal set of | 90 embedded systems in mind, and thus provided only a minimal set of |
90 features. The two biggest problems for non-embedded use were lack of | 91 features. The two biggest problems for non-embedded use were the lack |
91 magic bytes and integrity check. | 92 of magic bytes and an integrity check. |
92 | 93 |
93 Igor and Lasse started developing a new file format with some help | 94 Igor and Lasse started developing a new file format with some help |
94 from Ville Koskinen. Also Mark Adler, Mikko Pouru, H. Peter Anvin, | 95 from Ville Koskinen. Also Mark Adler, Mikko Pouru, H. Peter Anvin, |
95 and Lars Wirzenius helped with some minor things at some point of the | 96 and Lars Wirzenius helped with some minor things at some point of the |
96 development. Designing the new format took quite a long time (actually, | 97 development. Designing the new format took quite a long time (actually, |
97 too long time would be more appropriate expression). It was mostly | 98 too long a time would be a more appropriate expression). It was mostly |
98 because Lasse was quite slow at getting things done due to personal | 99 because Lasse was quite slow at getting things done due to personal |
99 reasons. | 100 reasons. |
100 | 101 |
101 Originally the new format was supposed to use the same .lzma suffix | 102 Originally the new format was supposed to use the same .lzma suffix |
102 that was already used by the old file format. Switching to the new | 103 that was already used by the old file format. Switching to the new |
103 format wouldn't have caused much trouble when the old format wasn't | 104 format wouldn't have caused much trouble when the old format wasn't |
104 used by many people. But since the development of the new format took | 105 used by many people. But since the development of the new format took |
105 so long time, the old format got quite popular, and it was decided | 106 such a long time, the old format got quite popular, and it was decided |
106 that the new file format must use a different suffix. | 107 that the new file format must use a different suffix. |
107 | 108 |
108 It was decided to use .xz as the suffix of the new file format. The | 109 It was decided to use .xz as the suffix of the new file format. The |
109 first stable .xz file format specification was finally released in | 110 first stable .xz file format specification was finally released in |
110 December 2008. In addition to fixing the most obvious problems of | 111 December 2008. In addition to fixing the most obvious problems of |
111 the old .lzma format, the .xz format added some new features like | 112 the old .lzma format, the .xz format added some new features like |
112 support for multiple filters (compression algorithms), filter chaining | 113 support for multiple filters (compression algorithms), filter chaining |
113 (like piping on the command line), and limited random-access reading. | 114 (like piping on the command line), and limited random-access reading. |
114 | 115 |
115 Currently the primary compression algorithm used in .xz is LZMA2. | 116 Currently the primary compression algorithm used in .xz is LZMA2. |
116 It is an extension on top of the original LZMA to fix some practical | 117 It is an extension on top of the original LZMA to fix some practical |
117 problems: LZMA2 adds support for flushing the encoder, uncompressed | 118 problems: LZMA2 adds support for flushing the encoder, uncompressed |
118 chunks, eases stateful decoder implementations, and improves support | 119 chunks, eases stateful decoder implementations, and improves support |
119 for multithreading. Since LZMA2 is better than the original LZMA, the | 120 for multithreading. Since LZMA2 is better than the original LZMA, the |
120 original LZMA is not supported in .xz. | 121 original LZMA is not supported in .xz. |
121 | 122 |
122 | 123 |
123 Transition to XZ Utils | 124 Transition to XZ Utils |
124 | 125 |
125 The early versions of XZ Utils were called LZMA Utils. The first | 126 The early versions of XZ Utils were called LZMA Utils. The first |
126 releases were 4.42.0alphas. They dropped the rest of the C++ LZMA SDK. | 127 releases were 4.42.0alphas. They dropped the rest of the C++ LZMA SDK. |
127 The code was still directly based on LZMA SDK but ported to C and | 128 The code was still directly based on LZMA SDK but ported to C and |
128 converted from callback API to stateful API. Later, Igor Pavlov made | 129 converted from a callback API to a stateful API. Later, Igor Pavlov |
129 C version of the LZMA encoder too; these ports from C++ to C were | 130 made a C version of the LZMA encoder too; these ports from C++ to C |
130 independent in LZMA SDK and LZMA Utils. | 131 were independent in LZMA SDK and LZMA Utils. |
131 | 132 |
132 The core of the new LZMA Utils was liblzma, a compression library with | 133 The core of the new LZMA Utils was liblzma, a compression library with |
133 zlib-like API. liblzma supported both the old and new file format. The | 134 a zlib-like API. liblzma supported both the old and new file format. |
134 gzip-like lzma command line tool was rewritten to use liblzma. | 135 The gzip-like lzma command-line tool was rewritten to use liblzma. |
135 | 136 |
136 The new LZMA Utils code base was renamed to XZ Utils when the name | 137 The new LZMA Utils code base was renamed to XZ Utils when the name |
137 of the new file format had been decided. The liblzma compression | 138 of the new file format had been decided. The liblzma compression |
138 library retained its name though, because changing it would have | 139 library retained its name though, because changing it would have |
139 caused unnecessary breakage in applications already using the early | 140 caused unnecessary breakage in applications already using the early |
140 liblzma snapshots. | 141 liblzma snapshots. |
141 | 142 |
142 The xz command line tool can emulate the gzip-like lzma tool by | 143 The xz command-line tool can emulate the gzip-like lzma tool by |
143 creating appropriate symlinks (e.g. lzma -> xz). Thus, practically | 144 creating appropriate symlinks (e.g. lzma -> xz). Thus, practically |
144 all scripts using the lzma tool from LZMA Utils will work as is with | 145 all scripts using the lzma tool from LZMA Utils will work as is with |
145 XZ Utils (and will keep using the old .lzma format). Still, the .lzma | 146 XZ Utils (and will keep using the old .lzma format). Still, the .lzma |
146 format is more or less deprecated. XZ Utils will keep supporting it, | 147 format is more or less deprecated. XZ Utils will keep supporting it, |
147 but new applications should use the .xz format, and migrating old | 148 but new applications should use the .xz format, and migrating old |
148 applications to .xz is often a good idea too. | 149 applications to .xz is often a good idea too. |
149 | 150 |
OLD | NEW |