User Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:nnels:etext:regex [2018/07/12 09:49]
leah.brochu
public:nnels:etext:regex [2022/04/11 14:02] (current)
rachel.osolen
Line 23: Line 23:
 [[https://support.office.com/en-ca/article/Find-and-replace-text-and-other-data-in-your-Word-2010-files-c6728c16-469e-43cd-afe4-7708c6c779b7?ui=en-US&rs=en-CA&ad=CA#__toc282774574|Using wildcards in Microsoft Word]] (this is similar to regular expressions, but Word has a lot of its own syntax) [[https://support.office.com/en-ca/article/Find-and-replace-text-and-other-data-in-your-Word-2010-files-c6728c16-469e-43cd-afe4-7708c6c779b7?ui=en-US&rs=en-CA&ad=CA#__toc282774574|Using wildcards in Microsoft Word]] (this is similar to regular expressions, but Word has a lot of its own syntax)
    
-  * Word has a lot of options to find letters (^$) and numbers (^#) but these only work with the wildcard option //off// (which it is by default). Only turn the wildcard option on if you're using regex options. Read the info page carefully on when things apply with the wildcard option on/off.+  * Word has a lot of options to find letters (^$) and numbers (^#) when using the non-regex [[public:nnels:etext:find-and-replace|Find & Replace]], but these only work with the wildcard option //off// (which it is by default). Only turn the wildcard option on if you're using regex options. Read the info page carefully on when things apply with the wildcard option on/off.
  
   * A lot of the codes for special characters (e.g. page break) are under the "Special..." button.   * A lot of the codes for special characters (e.g. page break) are under the "Special..." button.
Line 66: Line 66:
  
 Find: ''([a-z])-^13([a-z])'' Find: ''([a-z])-^13([a-z])''
 +
 +Replace with: ''\1\2''
 +
 +Using a-z restricts what it finds to lowercase.
 +
 +You will likely have to do it again for lines that end with a comma, and possibly en and em dash. Look through your document for patterns of anything else it might have missed.
 +</WRAP>
 +
 +----
 +
 +<WRAP center round box 80%>
 +**PROBLEM**: Hyphenated words that break single word (not over two lines).
 +
 +**SOLUTION**: Replace with the same text minus the hyphen.
 +
 +Find: ''([a-z])-([a-z])''
  
 Replace with: ''\1\2'' Replace with: ''\1\2''
Line 90: Line 106:
  
 ---- ----
 +
  
 <WRAP center round box 80%> <WRAP center round box 80%>
Line 123: Line 140:
 **PROBLEM**: There are extra paragraph breaks. We want to keep the real paragraph breaks and remove the fake extra paragraph breaks.   **PROBLEM**: There are extra paragraph breaks. We want to keep the real paragraph breaks and remove the fake extra paragraph breaks.  
  
-**SOLUTION**: Use MS Word'find and replace to remove the extra paragraph breaks using special Word symbols. +**SOLUTION**: See: [[public:nnels:etext:find-and-replace|Find Replace]]
- +
-Find: ''^p^p'' (you can also search for more than 2 paragraph breaks, i.e. ''^p^p^p''+
- +
-Replace with: ''^p''+
 </WRAP> </WRAP>
  
Line 135: Line 148:
 **PROBLEM**: There are newlines/line breaks (↵) instead of paragraph marks (¶). **PROBLEM**: There are newlines/line breaks (↵) instead of paragraph marks (¶).
  
-**SOLUTION**: Find and remove all line breaks and replace with a single paragraph break. +**SOLUTION**: See: [[public:nnels:etext:find-and-replace|Find Replace]]
- +
-Find: ''^m'' +
- +
-Replace with: ''^p'' +
- +
-In LibreOffice, replace all ''\n'' with ''\p'' to convert them to paragraphs.+
 </WRAP> </WRAP>
  
Line 150: Line 157:
 ''231(paragraph break)MacG_9781770494220_5p_all_r1.indd 231(paragraph break)10/27/14 11:56 AM(paragraph break)'' ''231(paragraph break)MacG_9781770494220_5p_all_r1.indd 231(paragraph break)10/27/14 11:56 AM(paragraph break)''
  
-**SOLUTION**: Without using wildcards: +**SOLUTION**: See[[public:nnels:etext:find-and-replace|Find & Replace]]
- +
-Find:  ''^#^#^#^pMacG_9781770494220_5p_all_r1.indd ^#^#^#^p10/27/14 11:56 AM^p'' +
- +
-Replace with: nothing. If you're doing a paginated title, replace with page breaks. +
- +
-You will need to remove one of the ^# at the beginning and after the .indd to remove it for 2 digit page numbers, and one last time for single digit page numbers. The following screenshot is an example with a 1-digit page number (see below), followed by the command used to isolate all such instances.  +
- +
-<WRAP center round box 60%> +
- +
-{{:nnels:documentation:content:production:screen_shot_2015-08-06_at_6.10.55_pm.png?300|}} +
- +
-Find: ^#^pMacG_9781770494220_5p_all_r1.indd ^#^p10/27/14 11:56 AM^p +
-</WRAP> +
- +
-You will also need to do it with the leading ^#^p to catch the footer text that do not have any page numbers with it.+
 </WRAP> </WRAP>
  
Line 181: Line 173:
   * ''\p.+\s+[0-9OoIil]{1,3}\p'' ### Detect bad line breaks ###   * ''\p.+\s+[0-9OoIil]{1,3}\p'' ### Detect bad line breaks ###
   * ''[^\."?!]$''   * ''[^\."?!]$''
 +
 +
 +[[public:nnels:etext:start|Return to main eText Page]]
  
public/nnels/etext/regex.1531414178.txt.gz · Last modified: 2018/07/12 09:49 by leah.brochu