-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathCHANGELOG
310 lines (292 loc) · 19.8 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
Version 1.2.0-SNAPSHOT
New features
- Support (offline copies of) online-only openbooks such as the new Java 8 version of "Java
Insel", 12th edition (closes #33). The new book edition is named JAVA_8 and comes with a
special hint about the need to manually crawl the publisher's website (e.g. via HTTrack) and
manually create a ZIP archive to be processed by GOC.
- Verbose error & info message in case of MD5 mismatch (closes #25)
- Remove links to online feedback forms (not just the forms themselves) from 29 books in 4,271
files. Inline feedback forms are now only left in 2 books (SHELL_PROG and VB_2008) and 191
files, in the other books they have been replaced by the publisher.
Changed features
- Several changes to support publisher's re-branding from Galileo to Rheinwerk
- Add book WINDOWS_SERVER_2012, remove unavailable book PYTHON
- Enhancement: handle truncated (partly downloaded) ZIP file gracefully. Throw exception with
message: "Probably the download of <zipFile> was interrupted. Please delete the file and
retry."
- Bugfix: handle illegal book ID correctly
- Bugfix: titles of Java_Insel and Javascript books were swapped in config.xml
Internal changes
- Improve error handling/logging for failed file system operations such as creating directories
or renaming files
- Recognise titles in meta tags and not just in explicit title tags
- Recognise "Rheinwerk" in titles addition to "Galileo"
- Automatically determine the HTML character set via jsoup instead of setting a fixed charset,
because the "Java Insel" book for Java 8 uses UTF-8 instead of ISO-8859-1 or windows-1252.
The output stream's charset is adjusted accordingly *after* the document was parsed.
- Upgrade Java/AspectJ source/target levels to 1.8.
- Upgrade AspectJ 1.8.13 and AspectJ Maven 1.11
- Change project directory layout to Maven default
- Switch to com.jolira Fork or One-JAR plugin available on Maven Central because the original
org.dstovall plugin repository is no longer available
Version 1.1.0
New features
- New command-line option "-w|--write-config" writes an editable book list file config.xml
to current the current directory. This is an easy way for JAR users to extract the embedded
default config.xml to disk and edit it subsequently, e.g. in order to add new books not
contained in the upstream version. This way they do not have to wait for me to update the
Git repo and the downloadable JAR. User self-service. :-)
- New command-line option "-c|--check-avail" checks Galileo homepage for available books,
comparing them with known ones, cf. "internal changes" section about AvailableBooksChecker.
- New command-line option "-m|--check-md5" downloads all known books without storing them,
verifying their MD5 checksums (slow! >1 Gb download), cf. "internal changes" section about
DownloadChecker.
- The CLI options "--check-avail" and "--check-md5" mentioned above can be combined by
experienced users or the GOC author to detect changed download packages and re-test them,
new books, deleted books and so forth.
- In verbose logging mode, the removal of feedback forms is now logged.
- New books APPS_IPHONE_IOS6, VB_2012_EINSTIEG. I detected the books using the new tool
AvailableBooksChecker. :-)
Changed features
- Update MD5 for VCSHARP_2012 once again
- Rename book APPS_IPHONE to APPS_IPHONE_IOS5 to differentiate it from the new *_IOS6 book
Internal changes
- Add XStream 1.4.4 as a new external library: XStream is a simple library to serialize objects
to XML and back again, see http://xstream.codehaus.org. It is used in this project for
loading/saving openbook meta data from/to a config file (config.xml, see below).
- Make book list configurable: class Book used to be an enum with hard-coded items. Now it is
a simple class containing String values which are read from an XML configuration file. In
order to make this work in the IDE and as well as in JAR files, the file config.xml (which
was created and is parsed by XStream) is searched in three places in this order:
1. ./config.xml
If this local config file is found in either scenario (IDE or JAR) it will be preferred
to other locations, which enables the user to override the default version.
2. resource/config.xml
This is where the default file is located and found in Eclipse if no local config file
exists. From there it will be maintained in Git and, because 'resource' is configured
as an Eclipse source folder, added to runnable JARs when creating them from the IDE.
3. /config.xml
This is where the default file is located and found in the JAR if no local config file
exists. From there it will be read as a resource via Class.getResourceAsStream.
- Project can be built with Maven now from Eclipse (added Maven nature to Eclipse settings)
as well as from cloud services like
* BuildHive: https://buildhive.cloudbees.com/job/kriegaex/job/Galileo-Openbook-Cleaner/
* CloudBees: https://scrum-master.ci.cloudbees.com/job/galileo-openbook-cleaner/
Both of them are in use at the moment, but after a few improvements to the CloudBees version
I will probably delete the BuildHive version again because it must copy build artifacts on my
personal FTP server while the CloudBees version features its own deployment repository.
- Use one-jar Maven plugin to create a JAR of JARs similar to what we had before when creating a
"runnable JAR" from Eclipse.
- New tool class DownloadChecker checks downloads and verifies their MD5. The class has a 'main'
method which can be called by advanced users who know it. Attention: This is a long-running
operation (~14 minutes on my old PC using 16 Mbit/s downstream DSL) which as of 2013-03-10
downloads at least 1.1 Gb of data!
- New tool class AvailableBooksChecker compares predefined to online books: This class has a
'main' method to be used by advanced users knowing how to interpret the output. Basically two
sets of book URLs are compared: the list of known books from config.xml to the lists of books
found online on the Galileo Computing and Galileo Design web sites. For each set the orphans,
i.e. elements not found in the respective other set, are determined and printed. This helps
to discover new, renamed or deleted books. It also helps to spot known books which are still
available for download, but no longer listed on the web site.
- Upgrade to jsoup-1.7.2
- Refactor JsoupFilter for cleaner code
- Refactor LoggingAspect several times for cleaner code and thread safety, trying different
approaches until as satisfactory solution was found.
- JsoupFiter: update JavaDoc info for getFeedbackFormNeigbourhood
- Eclipse .project: remove obsolete reference to JTidy project
- FileDownloader: calculate MD5 while reading input, not writing output. This is necessary for
making the write-to-file step optional (e.g. for MD5 check only). The constructor now accepts
a null value for 'to' and sets the new boolean member 'doWriteToFile' accordingly. The latter
is checked later during download.
Version 1.0.3
New features
- New book UBUNTU_12_04 ("Ubuntu 12.04 'Precise Pangolin'")
- New book VCSHARP_2012 ("Visual C# 2012")
Changed features
- Update MD5 for VCSHARP_2012 and JAVA_INSEL: Galileo Press have changed the download files.
Maybe they fixed some errata.
Bugfixes
- If there are download problems, no more files with size 0 bytes will be created anymore
(cf. "internal changes").
- Make sure to keep non-element nodes: When moving the main content of the nested "grey table"
directly under the BODY tag and removing the previously surrounding clutter, only element
nodes were moved, but other content (such as dangling text nodes) were ignored and thus lost.
This affected content in several books in different degrees:
actionscript_1_und_2, actionscript_einstieg, apps_iphone, asp_net, dreamweaver_8,
excel_2007, it_handbuch, java_7, java_insel, javascript_ajax, joomla_1_5, linux,
linux_unix_prog, microsoft_netzwerk, php_pear, python, ruby_on_rails_2, shell_prog,
ubuntu_11_04, ubuntu_12_04, unix_guru, vcsharp_2010
Internal changes
- Initialise output after input stream for downloads (avoids 0-byte file): In class
FileDownloader, FileOutputStream 'outStream' is now only initialised *after* the InputStream
on the URL to be downloaded has been opened successfully. This avoids the creation of 0-byte
files in case of server, proxy or other connection problems.
Version 1.0.2
New feature
- Add HTTP proxy support incl. authentication if necessary: JVM parameters for host, port, user,
password are supported and can be specified on the command line e.g. like this (without line
feeds):
java
-Dhttp.proxyHost=localhost -Dhttp.proxyPort=8080
-Dhttp.proxyUser=kriegaex -Dhttp.proxyPassword="test XYZ"
Bugfixes
- Make debugging easier by de-obfuscating callstacks in FileDownloader (cf. "internal changes")
Internal changes
- FileDownloader: catch all Exceptions in 'finally' block. Otherwise debugging in cases where
'in' or 'out' are null gets difficult because then the application will exit with a
NullPointerException thrown from the nested 'try' block inside 'finally' rather than the real
exception that happened before.
Version 1.0.1
New features
- New book ASP_NET ("Einstieg in ASP.NET")
Version 1.0
New features
- The souce code is now explicitly licenced under GPLv3 (see file COPYING).
- Openbook conversion is now much faster because the switch to jsoup (see "internal changes").
- For the same reason the runnable JAR file is now smaller, containing fewer libraries.
- Not a new feature, but a new requirement for developers wishing to edit the code or build
the program from source: The Eclipse project is no longer a plain Java project, but an
AspectJ project because I now use AOP (aspect-oriented programming) techniques for
cross-cutting concerns like timing and logging to keep the main application code cleaner
and more readable, no longer cluttering it with calls to functionality outside the main
scope of cleaning openbooks.
- Some minor bugs in page titles and layout (e.g. font sizes in "UNIX guru" book, garbled keyboard
symbols in "Excel 2007" book [appendix A, "Tastenkombinationen"]) were fixed.
Changed features
- CLI (command line interface)
* Option '-a' replaced by magic book ID 'all' which can be used instead
* Options: show error message + usage info on missing book ID
* Options '-d'/'-v' replaced by '-l|--log-level=(0|1|2)' with a default of
0 (normal). 1 is verbose, 2 is debug.
* New log level 3 (trace) activates a call trace output on System.err (lower log levels
still go to System.out). This is implemented via AspectJ and prints constructor and method
signatures. So the trace output is very technical, but a nice toy if you want to avoid
debugging the source code in an IDE, but still be able to see what is going on internally.
* download_dir is no longer a positional parameter, but a named option
'-d|--download-dir' with default '.' (current directory)
* Option '-s' replaced by '-t|--threading=(0|1)' with a default of 1 (multi)
* Option '-n' replaced by '-p|--pretty-print=(0|1)' with a default of 1 (pretty).
This avoids the negative flag "no pretty-print". Later this option was removed altogether
because the old filter chain was removed and jsoup introduced. So the flag became redundant
because jsoup always pretty-prints.
* Option '-?' can also be written as '--help' now
Internal changes
- Rename BookInfo to just Book and add member 'title' containing a book title to be used on the
TOC page (index.htm*).
- Temporarily add JCommander 1.26 as CLI parsing tool. Later upgraded to 1.27 and then removed
again because of the switch to JOpt Simple. JCommander ist still available in branch
cli_jcommander at the time of writing this (commit 0050d681).
- Add JOpt Simple 4.3 as CLI parsing tool in class Options. This makes class OpenbookCleaner
smaller and cleaner, moving constant USAGE_TEXT and most content of methods processArgs and
displayUsageAndExit to Options. Static config variables are also in Options now. So CLI and
option handling is nicely encapsulated now.
- SimpleLogger: add thread-safe in-/dedent functionality. Thread-safe means: variables
indentLevel and indentText are InheritableThreadLocal, not static.
- SimpleLogger: optionally log thread ID (static flag LOG_THREAD_ID). If the flag is true, the
log is written in tab-separated format (two columns: thread ID and indentation + log message)
which can be imported into Excel easily via copy & paste. Then just add a header row, convert
into table layout and there you go: filtering, sorting etc. are at your command, making it
easy to see what happens in multi-threaded mode.
- Start using AspectJ for logging and timing
* Converted Eclipse project from Java to AspectJ
* New TimingAspect is the stopwatch for the main loop and per book
* New LoggingAspect takes care of indented echo/verbose output in
classes OpenbookCleaner and Downloader.
* Other classes like the filters still do their logging within the
application code because they rather use intrinsic, fine-granular
logging (several outputs per method) than wrapping comments
("entering", "exiting") which can easily be outsourced to AOP around()
advice.
* New enum SimpleLogger.IndentMode makes it possible to print and
in-/dedent before/after printing in one method call. This helps to
keep AOP around() advice code a bit shorter.
* Corrected some minor indenting issues in filter classes
- Downloader: rename & refactor moveDirectoryContents, closing a to-do: Now for books which are
unzipped one directory level too deep it is no longer necessary to rename all files one by
one, but the whole subdirectory is moved in three steps:
1) move download_dir/book_id/subdir -> download_dir/book_id.tmp
2) delete empty download_dir/book_id
3) move download_dir/book_id.tmp -> download_dir/book_id
- Eclipse: For being able to browse javadocs for external libs offline, I changed all javadoc
links to point to folders on my personal hard drive. This is not as interoperable as having
them point to online URLs, but it works offline for me.
Version 0.9.1.1
Changed features
- Openbook Cleaner now runs under Java 6, Java 7 is no longer required
Internal changes
- Switch zip file handling to Apache Commons Compress 1.4.1. Thanks to user
Christian Feneberg for pointing out that Java 7 on MacOS ist still in preview
stage and thus not preinstalled on most Macs.
Version 0.9.1
New features
- CLI (command line interface)
* Accept multiple book IDs or '-a' (all books) on command line. This makes
convenience class AllOpenbooksCleaner with its own main method obsolete.
* New option '-n' skips final pretty-printing after clean up. This optionally saves
around 15% processing time, but the final result will not be pretty-printed by
JTidy, it remains as written by XOM (which produces correct, but not nicely
formatted XHTML).
Changed features
- Java 7 is now required because of the vcsharp_2008 unpacking bugfix (see below)
Bugfixes
- Book vcsharp_2008 could not be unpacked because it contains two files with special
characters (German umlauts). This was due to a character encoding of "Cp437" while
Java always expects UTF-8. Thanks to user Dirk Höpfner for notifying me about the bug.
Internal changes
- Option parsing: simplify loop and make it more readable
- Option parsing: add log output for book_id arguments
- Create separate methods for single- and multi-threaded conversion
- ZipInputStream now explicitly uses "Cp437" encoding, a new feature available only
in Java 7 after years of waiting.
- BasicFilter: turn member debugLogMessage into method getDebugLogMessage. This
removes the constructor parameter debugLogMessage from BasicFilter, making the
constructor signature consistent with that of its subclasses. Also update all
concrete subclasses so as to implement the abstract method and to use the changed
constructor.
- File extensions moved from application class OpenbookCleaner to static member
FILE_EXTENSION in all concrete BasicFilter subclasses.
- Rename SINGLE_THREADED_WITH_INTERMEDIATE_FILES to MULTI_THREADED. This also means
that the logic is reversed now, but still multi-threaded mode is the default, so
there is no change in behaviour.
- Encapsulate filtering & threading in new class FilterChain. The new class iterates
over a Queue of class objects (not instances!) and dynamically sets up the
conversion/filtering chain, instantiating concrete BasicFilter subclass objects via
reflection. This works in single- as well as multi-threaded mode, as usual using
intermediate files for the former and piped streams for the latter. This makes class
OpenbookCleaner cleaner, not violating the single purpose principle anymore. Only a
nicely readable version of method 'convert' remains, methods 'convertSingleThreaded'
and 'convertMultiThreaded' are gone. Furthermore, 'convert' now takes File arguments
and no longer File*Stream ones, because FilterChain creates its own streams.
Version 0.9
New features
- Download, MD5-check and unpack books automatically before filtering
- Download hi-res book cover image for each book (can be used as covers for e-books)
- Page titles are filtered so as to remove book names and get clean chapter numbers and names.
E.g. something like "Galileo Computing :: Book title - 1.2.3 Chapter title" becomes
"1.2.3 Chapter title". Special fixes are applied to TOC (index.htm*) pages.
- AllOpenbooksCleaner knows new books visualbasic_2008, einstieg_vb_2010
- CLI (command line interface)
* Former book path (subdirectory) is now a book ID
* New mandatory parameter download directory (also base directory for book subdirectories)
is needed because of the new download feature and must be inserted as the first
non-option parameter before the book ID.
Changed features
- Books are now unpacked into subdirectories with predefined names which might be different
from the download archive names. E.g. archive galileocomputing_shell_programmierung.zip
will be unpacked to folder shell_prog.
Internal changes
- Refactor pre-JTidy HTML fixing into its own BasicConverter subclass PreJTidyFilter
and move some code there. Currently this filter class only fixes the "Ruby on Rails 2"
book because other books doe not need to be pre-filtered just to readable by JTidy.
- Reorganise package structure and rename some filter (converter) classes & methods
- New class (enum) BookInfo to be used by the download/unpack features
- New utility classes FileDownloader (incl. MD5 check), ZipFileExtractor
- New class Downloader can download & unpack openbooks
Version 0.8
- Initial release
- Can convert 30 known openbooks, but has not download/unpack functionality yet
- Developed in Java7
- Needs libraries
* JTidy r938
* XOM 1.2.8
* TagSoup 1.2.1