This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
public:nnels:etext:regex [2017/10/01 16:48] sabina.iseli-otto Page moved from public:nnels:regex to public:nnels:public:nnels:etext:regex |
public:nnels:etext:regex [2022/03/01 19:56] rachel.osolen |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Regular Expressions ====== | ||
+ | Regular expressions (aka regex) is useful for replacing patterns of text, such as headers/ | ||
+ | |||
+ | With regex, you can define patterns of text in a number of different ways, but the most commonly used ones for our purposes are **Ranges** and **Groups**. For more information about others, you can take a look at [[https:// | ||
+ | * Ranges | ||
+ | * Square brackets are always used in pairs and are used to identify //specific characters// | ||
+ | * [A-Z] will find any upper case letter; | ||
+ | * [a-z] will find any lower case letter; | ||
+ | * [A-z] will find any letter (upper or lower case); | ||
+ | * [0-9] will find any number | ||
+ | * [abc] will find any of the letters a, b, or c. | ||
+ | * [F] will find upper case “F” | ||
+ | * [Fred] will find " | ||
+ | * Groups | ||
+ | * Round brackets are used in pairs to enclose //groups//. For example: | ||
+ | * '' | ||
+ | * They must be used in pairs and are addressed by number in the replacement. In the replace field, \1 represents the first group, \2 represents the second group, and so on. For example: | ||
+ | * If you wanted to remove the hyphen from " | ||
+ | * Another example: '' | ||
+ | |||
+ | ====Tips==== | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | * Word has a lot of options to find letters (^$) and numbers (^#) when using the non-regex [[public: | ||
+ | |||
+ | * A lot of the codes for special characters (e.g. page break) are under the " | ||
+ | {{: | ||
+ | ==== In LibreOffice & OpenOffice ==== | ||
+ | Make sure that the '' | ||
+ | |||
+ | [[https:// | ||
+ | [[https:// | ||
+ | |||
+ | ===== Conversion Fixes ===== | ||
+ | The following fixes assume you are using Word, unless otherwise stated. | ||
+ | |||
+ | < | ||
+ | |||
+ | ---- | ||
+ | |||
+ | <WRAP center round box 80%> | ||
+ | **PROBLEM**: | ||
+ | |||
+ | **SOLUTION**: | ||
+ | |||
+ | In Word, this will only work with wildcards turned on. | ||
+ | |||
+ | Find: '' | ||
+ | |||
+ | Replace with: '' | ||
+ | |||
+ | This looks for the pattern: '' | ||
+ | |||
+ | The parentheses are used to group what it finds, so \1 refers to the first " | ||
+ | |||
+ | In this way, you are putting back exactly what it found minus the paragraph break. | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | <WRAP center round box 80%> | ||
+ | **PROBLEM**: | ||
+ | |||
+ | **SOLUTION**: | ||
+ | |||
+ | Find: '' | ||
+ | |||
+ | Replace with: '' | ||
+ | |||
+ | Using a-z restricts what it finds to lowercase. | ||
+ | |||
+ | You will likely have to do it again for lines that end with a comma, and possibly en and em dash. Look through your document for patterns of anything else it might have missed. | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | <WRAP center round box 80%> | ||
+ | **PROBLEM**: | ||
+ | |||
+ | **SOLUTION**: | ||
+ | |||
+ | Find: '' | ||
+ | |||
+ | Replace with: '' | ||
+ | |||
+ | Using a-z restricts what it finds to lowercase. | ||
+ | |||
+ | You will likely have to do it again for lines that end with a comma, and possibly en and em dash. Look through your document for patterns of anything else it might have missed. | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | <WRAP center round box 80%> | ||
+ | **PROBLEM: | ||
+ | |||
+ | **SOLUTION: | ||
+ | |||
+ | - | ||
+ | - Find: '' | ||
+ | - Replace: '' | ||
+ | - | ||
+ | - Find: '' | ||
+ | - Replace: '' | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | |||
+ | <WRAP center round box 80%> | ||
+ | |||
+ | **PROBLEM: | ||
+ | * Example A: As one of Montgomery' | ||
+ | * Example B: The "nasty little '' | ||
+ | This problem has an added complexity; the pattern has two different solutions: | ||
+ | * Example A will need to say: ... later put '' | ||
+ | * Example B will need to say: The "nasty little troublemaker''," | ||
+ | |||
+ | **SOLUTIONS: | ||
+ | Example A:\\ | ||
+ | |||
+ | Find: '' | ||
+ | Replace: '' | ||
+ | |||
+ | Example B: | ||
+ | |||
+ | Find: '' | ||
+ | Replace: '' | ||
+ | |||
+ | Notes: | ||
+ | * You will **not** be able to use " | ||
+ | * You will also need to re-do this, searching for periods instead of commas. | ||
+ | |||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | |||
+ | <WRAP center round box 80%> | ||
+ | **PROBLEM**: | ||
+ | |||
+ | **SOLUTION**: | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | <WRAP center round box 80%> | ||
+ | **PROBLEM**: | ||
+ | |||
+ | **SOLUTION**: | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | <WRAP center round box 80%> | ||
+ | **PROBLEM**: | ||
+ | '' | ||
+ | |||
+ | **SOLUTION**: | ||
+ | </ | ||
+ | |||
+ | In LibreOffice: | ||
+ | |||
+ | * Verso (left hand) | ||
+ | * '' | ||
+ | * taken piece-by-piece, | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * Recto (right hand) | ||
+ | * '' | ||
+ | * '' | ||