Merge pull request #16 from tbrowder/next-ver

Next ver
tbrowder · Jun 27, 2024 · a5fd2c8 · a5fd2c8
2 parents eb7fbcf + f3f599f
commit a5fd2c8
Show file tree

Hide file tree

Showing 15 changed files with 192 additions and 159 deletions.
diff --git a/Changes b/Changes
@@ -1,4 +1,15 @@
 {{$NEXT}}
+    - Removed sub 'split-line-rw'
+    - Add aliases for 'split-line':
+      + split-str
+      + splitstr
+    - Add new option to sub 'split-line'
+      + 'break-after'
+         - split following the end of the $brk string
+    - Add new option to sub 'normalize-string':
+      + 'no-trim'
+         - do NOT trim and collapse spaces by default
+    - Add tests for new options
 
 3.4.3  2024-05-06T16:01:13-05:00
     - Fix doc error
@@ -20,7 +31,7 @@
 
 3.4.1  2024-04-23T08:20:00-05:00
     - Add new option to sub 'strip-comment':
-      + 'normalize-all': 
+      + 'normalize-all':
           - normalizes the returned string
           - also normalizes the returned comment when
               the 'save-comment' option is True

diff --git a/README.md b/README.md
@@ -35,7 +35,7 @@ The module contains several routines to make text handling easier for module and
 <th>Name</th> <th>Notes</th>
 </tr></thead>
 <tbody>
-<tr> <td>commify</td> <td></td> </tr> <tr> <td>count-substrs</td> <td></td> </tr> <tr> <td>list2text</td> <td></td> </tr> <tr> <td>normalize-string</td> <td>alias &#39;normalize-text&#39;</td> </tr> <tr> <td>sort-list</td> <td></td> </tr> <tr> <td>split-line</td> <td></td> </tr> <tr> <td>split-line-rw</td> <td></td> </tr> <tr> <td>strip-comment</td> <td></td> </tr> <tr> <td>wrap-paragraph</td> <td>&#39;width&#39; is in PS points</td> </tr> <tr> <td>wrap-text</td> <td>&#39;width&#39; is in number of chars</td> </tr>
+<tr> <td>commify</td> <td></td> </tr> <tr> <td>count-substrs</td> <td></td> </tr> <tr> <td>list2text</td> <td></td> </tr> <tr> <td>normalize-string</td> <td>alias &#39;normalize-text&#39;</td> </tr> <tr> <td>sort-list</td> <td></td> </tr> <tr> <td>split-line</td> <td>aliases &#39;splitstr&#39;, &#39;split-str&#39;</td> </tr> <tr> <td>strip-comment</td> <td></td> </tr> <tr> <td>wrap-paragraph</td> <td>&#39;width&#39; is in PS points</td> </tr> <tr> <td>wrap-text</td> <td>&#39;width&#39; is in number of chars</td> </tr>
 </tbody>
 </table>
 
@@ -148,6 +148,14 @@ Collapse to a newline:
 
 Notice that in the normalization routines, spaces (' ') are always normalized, even when tabs and newlines are normalized separately.
 
+Also notice all strings are normally trimmed of leading and trailing whitespace regardless of the option used. However, option `:no-trim` protects the input string from any such trimming. Consider the first example from above:
+
+    my $s = " 1   \t\t\n\n 2 \n\t  3  ";
+
+Using the 'no-trim' option:
+
+    say normalize-string($s, :no-trim) # OUTPUT: « 1 2 3  ␤»
+
 ### sort-list
 
     #  StrLength, LengthStr, Str, Length, Number
@@ -161,11 +169,13 @@ The routine's output can be modified for other uses by entering the `:$type` par
 
 ### split-line
 
-This routine splits a string into two pieces.
+Splits a string into two pieces.
 
-Inputs are the string to be split, the split character, maximum length, a starting position for the search, and the search direction.
+Inputs are the string to be split, the split character or string, maximum length, a starting position for the search, and the search direction (normally forward unless the `:$rindex` option is `True`).
 
-It returns the two parts of the split string; the second part will be an empty string if the input string is not too long.
+An additional option, `:$break-after`, causes the split to be delayed to the position after the input break string on a normal forward split.
+
+It returns the two parts of the split string. The second part will be shortened to the `:$max-line-length` value if its entered value is greater than the default zero.
 
 The signature:
 
@@ -175,32 +185,12 @@ sub split-line(
     Str:D $brk,
     UInt  :$max-line-length = 0,
     UInt  :$start-pos       = 0,
-    Bool  :$rindex          = False
+    Bool  :$rindex          = False,
+    Bool  :$break-after     = False,
     --> List) is export(:split-line)
 {...}
 ```
 
-### split-line-rw
-
-Splits a string into two pieces.
-
-Inputs are the string to be split, the split character, maximum length, a starting position for the search, and search direction.
-
-It returns the part of the input string past the break character, or an empty string (the input string is modified in-place if it is too long).
-
-The signature:
-
-```Raku
-sub split-line-rw(
-    Str:D $line is rw,
-    Str:D $brk,
-    UInt  :$max-line-length = 0,
-    UInt  :$start-pos       = 0,
-    Bool  :$rindex          = False
-    --> Str) is export(:split-line-rw)
-{...}
-```
-
 ### strip-comment
 
 Strip the comment from an input text line, save comment if requested, normalize returned text if requested.
@@ -287,7 +277,7 @@ The fonts currently handled are the the 14 PostScript and PDF *Core Fonts*:
 
 <table class="pod-table">
 <tbody>
-<tr> <td>Courier Courier-Bold Courier-Oblique Courier-BoldOblique</td> </tr> <tr> <td>Helvetica Helvatica-Bold Helvetica-Oblique Helvatica-BoldOblique</td> </tr> <tr> <td>Times-Roman Times-Bold Times-Italic Times-BoldItalic</td> </tr> <tr> <td>Symbol</td> </tr> <tr> <td>Zaphdingbats</td> </tr>
+<tr> <td>Courier</td> </tr> <tr> <td>Courier-Bold</td> </tr> <tr> <td>Courier-Oblique</td> </tr> <tr> <td>Courier-BoldOblique</td> </tr> <tr> <td>Helvetica</td> </tr> <tr> <td>Helvatica-Bold</td> </tr> <tr> <td>Helvetica-Oblique</td> </tr> <tr> <td>Helvatica-BoldOblique</td> </tr> <tr> <td>Times-Roman</td> </tr> <tr> <td>Times-Bold</td> </tr> <tr> <td>Times-Italic</td> </tr> <tr> <td>Times-BoldItalic</td> </tr> <tr> <td>Symbol</td> </tr> <tr> <td>Zaphdingbats</td> </tr>
 </tbody>
 </table>
 

diff --git a/TODO b/TODO
@@ -1,4 +1,15 @@
 - need to handle signature constraints better
 - add a feature to consider the font and print width
   for wrapping paragraphs
-
+- get no-trim working
+    save lesding and trailing spaces
+      and add back at the end of 
+      processing
+- add aliases for split-line;
+    split-str, splitstr
+    split-str-rw, splitstr-rw
+- for splitstr, split on str should
+    clearly state the effect
+    split at beginning or end of 
+      the str found
+    make sure that is clear
diff --git a/docs/README.rakudoc b/docs/README.rakudoc
@@ -36,11 +36,10 @@ count-substrs
 list2text
 normalize-string | alias 'normalize-text'
 sort-list
-split-line
-split-line-rw
+split-line       | aliases 'splitstr', 'split-str'
 strip-comment
-wrap-paragraph | 'width' is in PS points
-wrap-text  |  'width' is in number of chars
+wrap-paragraph   | 'width' is in PS points
+wrap-text        | 'width' is in number of chars
 =end table
 
 Following is a short synopsis and
@@ -179,10 +178,22 @@ Collapse to a newline:
 say normalize-string($s, :c<n>) # OUTPUT: «1\n2\n3␤»
 =end code
 
-Notice that in the normalization 
-routines, spaces (' ') are always 
-normalized, even when tabs and
-newlines are normalized separately.
+Notice that in the normalization routines, spaces (' ') are always
+normalized, even when tabs and newlines are normalized separately.
+
+Also notice all strings are normally trimmed of leading and trailing
+whitespace regardless of the option used. However, option C<:no-trim>
+protects the input string from any such trimming.  Consider the first
+example from above:
+
+=begin code
+my $s = " 1   \t\t\n\n 2 \n\t  3  ";
+=end code
+
+Using the 'no-trim' option:
+=begin code
+say normalize-string($s, :no-trim) # OUTPUT: « 1 2 3  ␤»
+=end code
 
 =head3 sort-list
 
@@ -202,13 +213,19 @@ C<:$type> parameter to choose another of the C<enum Sort-type>s.
 
 =head3 split-line
 
-This routine splits a string into two pieces.
+Splits a string into two pieces.
+
+Inputs are the string to be split, the split character or string,
+maximum length, a starting position for the search, and the search
+direction (normally forward unless the C<:$rindex> option is C<True>).
 
-Inputs are the string to be split, the split character, maximum
-length, a starting position for the search, and the search direction.
+An additional option, C<:$break-after>, causes the split to be delayed
+to the position after the input break string on a normal forward
+split.
 
-It returns the two parts of the split string; the second part will be
-an empty string if the input string is not too long.
+It returns the two parts of the split string.  The second part will be
+shortened to the C<:$max-line-length> value if its entered value is
+greater than the default zero.
 
 The signature:
 
@@ -218,35 +235,12 @@ sub split-line(
     Str:D $brk,
     UInt  :$max-line-length = 0,
     UInt  :$start-pos       = 0,
-    Bool  :$rindex          = False
+    Bool  :$rindex          = False,
+    Bool  :$break-after     = False,
     --> List) is export(:split-line)
 {...}
 =end code
 
-=head3 split-line-rw
-
-Splits a string into two pieces.
-
-Inputs are the string to be split, the split character, maximum
-length, a starting position for the search, and search direction.
-
-It returns the part of the input string past the break character, or
-an empty string (the input string is modified in-place if it is too
-long).
-
-The signature:
-
-=begin code :lang<Raku>
-sub split-line-rw(
-    Str:D $line is rw,
-    Str:D $brk,
-    UInt  :$max-line-length = 0,
-    UInt  :$start-pos       = 0,
-    Bool  :$rindex          = False
-    --> Str) is export(:split-line-rw)
-{...}
-=end code
-
 =head3 strip-comment
 
 Strip the comment from an input text line, save comment if requested,
@@ -354,12 +348,22 @@ as is the C<:font-size>.  The default C<:width> is 468 points, the
 length of a line on a Letter paper, portrait orientation, with
 one-inch margins on all sides.
 
-The fonts currently handled are the the 14 PostScript and PDF I<Core Fonts>:
+The fonts currently handled are the the 14 PostScript and PDF I<Core
+Fonts>:
 
 =begin table
-Courier Courier-Bold Courier-Oblique Courier-BoldOblique
-Helvetica Helvatica-Bold Helvetica-Oblique Helvatica-BoldOblique
-Times-Roman Times-Bold Times-Italic Times-BoldItalic
+Courier
+Courier-Bold
+Courier-Oblique
+Courier-BoldOblique
+Helvetica
+Helvatica-Bold
+Helvetica-Oblique
+Helvatica-BoldOblique
+Times-Roman
+Times-Bold
+Times-Italic
+Times-BoldItalic
 Symbol
 Zaphdingbats
 =end table