Differences

This shows you the differences between two versions of the page.

--- public:nnels:etext:regex [2018/07/12 09:49]
leah.brochu
+++ public:nnels:etext:regex [2022/04/11 14:02] (current)
rachel.osolen
@@ Line 23: / Line 23: @@
 [[https://support.office.com/en-ca/article/Find-and-replace-text-and-other-data-in-your-Word-2010-files-c6728c16-469e-43cd-afe4-7708c6c779b7?ui=en-US&rs=en-CA&ad=CA#__toc282774574|Using wildcards in Microsoft Word]] (this is similar to regular expressions, but Word has a lot of its own syntax)
-  * Word has a lot of options to find letters (^$) and numbers (^#) but these only work with the wildcard option //off// (which it is by default). Only turn the wildcard option on if you're using regex options. Read the info page carefully on when things apply with the wildcard option on/off.
+  * Word has a lot of options to find letters (^$) and numbers (^#) when using the non-regex [[public:nnels:etext:find-and-replace|Find & Replace]], but these only work with the wildcard option //off// (which it is by default). Only turn the wildcard option on if you're using regex options. Read the info page carefully on when things apply with the wildcard option on/off.
   * A lot of the codes for special characters (e.g. page break) are under the "Special..." button.
@@ Line 66: / Line 66: @@
 Find: ''([a-z])-^13([a-z])''
+Replace with: ''\1\2''
+Using a-z restricts what it finds to lowercase.
+You will likely have to do it again for lines that end with a comma, and possibly en and em dash. Look through your document for patterns of anything else it might have missed.
+</WRAP>
+----
+<WRAP center round box 80%>
+**PROBLEM**: Hyphenated words that break single word (not over two lines).
+**SOLUTION**: Replace with the same text minus the hyphen.
+Find: ''([a-z])-([a-z])''
 Replace with: ''\1\2''
@@ Line 90: / Line 106: @@
 ----
 <WRAP center round box 80%>
@@ Line 123: / Line 140: @@
 **PROBLEM**: There are extra paragraph breaks. We want to keep the real paragraph breaks and remove the fake extra paragraph breaks.
-**SOLUTION**: Use MS Word's find and replace to remove the extra paragraph breaks using special Word symbols.
+**SOLUTION**: See: [[public:nnels:etext:find-and-replace|Find & Replace]]
-Find: ''^p^p'' (you can also search for more than 2 paragraph breaks, i.e. ''^p^p^p'')
-Replace with: ''^p''
 </WRAP>
@@ Line 135: / Line 148: @@
 **PROBLEM**: There are newlines/line breaks (↵) instead of paragraph marks (¶).
-**SOLUTION**: Find and remove all line breaks and replace with a single paragraph break.
+**SOLUTION**: See: [[public:nnels:etext:find-and-replace|Find & Replace]]
-Find: ''^m''
-Replace with: ''^p''
-In LibreOffice, replace all ''\n'' with ''\p'' to convert them to paragraphs.
 </WRAP>
@@ Line 150: / Line 157: @@
 ''231(paragraph break)MacG_9781770494220_5p_all_r1.indd 231(paragraph break)10/27/14 11:56 AM(paragraph break)''
-**SOLUTION**: Without using wildcards:
+**SOLUTION**: See: [[public:nnels:etext:find-and-replace|Find & Replace]]
-Find:  ''^#^#^#^pMacG_9781770494220_5p_all_r1.indd ^#^#^#^p10/27/14 11:56 AM^p''
-Replace with: nothing. If you're doing a paginated title, replace with page breaks.
-You will need to remove one of the ^# at the beginning and after the .indd to remove it for 2 digit page numbers, and one last time for single digit page numbers. The following screenshot is an example with a 1-digit page number (see below), followed by the command used to isolate all such instances.
-<WRAP center round box 60%>
-{{:nnels:documentation:content:production:screen_shot_2015-08-06_at_6.10.55_pm.png?300|}}
-Find: ^#^pMacG_9781770494220_5p_all_r1.indd ^#^p10/27/14 11:56 AM^p
-</WRAP>
-You will also need to do it with the leading ^#^p to catch the footer text that do not have any page numbers with it.
 </WRAP>
@@ Line 181: / Line 173: @@
   * ''\p.+\s+[0-9OoIil]{1,3}\p'' ### Detect bad line breaks ###
   * ''[^\."?!]$''
+[[public:nnels:etext:start|Return to main eText Page]]

User Tools

Differences

Page Tools

BC Libraries Coop wiki

Site Tools