User Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:nnels:cataloguing:metadata-cleanup [2024/04/05 11:34]
robert.macgregor
public:nnels:cataloguing:metadata-cleanup [2024/04/08 09:56] (current)
robert.macgregor
Line 52: Line 52:
 ===3.1 Subject=== ===3.1 Subject===
  
-These are subject headings that will be applied to the item.  Currently we use FAST subject headings and copy catalogue them from [[https://search.worldcat.org/ | WorldCat]].+These are subject headings that will be applied to the item.  Currently we use FAST subject headings and copy catalogue them from MarcEdit via the Z39.50 module or from [[https://search.worldcat.org/ | WorldCat]].
  
 <note>Remove Subject Heading ''Blacks'' from any title. We no longer use the Subject Heading ''Blacks'' as it is a culturally outdated term. We do accept more precise Subject Headings including ''Black race'', ''Author, Black'', ''Women, Black'' etc. Check [[https://search.worldcat.org/ | WorldCat]] or LC for the appropriate Subject Heading to use for each title.</note> <note>Remove Subject Heading ''Blacks'' from any title. We no longer use the Subject Heading ''Blacks'' as it is a culturally outdated term. We do accept more precise Subject Headings including ''Black race'', ''Author, Black'', ''Women, Black'' etc. Check [[https://search.worldcat.org/ | WorldCat]] or LC for the appropriate Subject Heading to use for each title.</note>
  
-This is an important field that can be difficult at times.  There will usually be multiple 650 field entries.  We want at least one.  **Appendix B** is a batch method for copying large amounts of 650 fields at once through Z39.50.+This is an important field that can be difficult at times.  There will usually be multiple entries.  We want at least one.
  
-Important note:  Drupal only parses the 650 field, which is for Topical subject headings.  You will come across other 6XX fields like 600 and 651 fields (People and Places).  If you want these fields parsed in Drupal, you will need to change the field to 650 (it is generally safe to ignore the 648 field).+We use FAST Subject Headings (and remove the rest).  They are essentially simplified Library of Congress (LoCSubject Headings.  Over time working with them, they will become easier to recognize and get a feel for.  Most of the time, FAST Subject Headings will just be copied directly from a source - the following discussion about LoC Subject Headings may come in handy for spotting FAST vs. LoC Subject Headingsand also for times when you may need to convert LoC to FAST.
  
-We use FAST Subject Headings (and remove the rest).  They are essentially simplified Library of Congress (LoCSubject Headings.  Over time working with them, they will become easier to recognize and get a feel for. +FAST Subject Headings are usually comprised of a single term, whereas LoC Subject Headings tend towards multiple terms.
- +
-LoC subject terms will be in the form:  **=650  \0$aSubject term.**\\ +
-FAST subject terms will be in the form:  **=650  \7$aSubject term.$2fast**\\ +
-The 0 indicator specifically identifies Library of Congress, and the 7 means the term is from a taxonomy identified after the $2. +
- +
-FAST terms can also be written in the form:  **=650  \4$aSubject term.**\\ +
-The 4 indicates that the taxonomy is unidentified.  This is quicker and easier to do if FAST terms are not already included in the record with the \7...$2fast format. +
- +
-In the case of multi-term headings, this is where FAST is simplified.+
  
 An LoC term may look like this:\\ An LoC term may look like this:\\
-**=650  \0$aRefugees$zCambodia.**+**Refugees%%--%%Cambodia**
  
 FAST would handle it this way:\\ FAST would handle it this way:\\
-**=650  \4$aRefugees.**\\ +**Refugees**\\ 
-**=651  \4$aCambodia.**\\ +**Cambodia**\\ 
-(and we would change the 651 to 650 to have Drupal parse it)+ 
 +Essentially splitting the Subject Heading into 2 terms.
  
 There are also instances where FAST can have multiple terms as well. There are also instances where FAST can have multiple terms as well.
  
 LoC term:\\ LoC term:\\
-**=650  \0$aWomen$xSocial conditions.**+**Women%%--%%Social conditions**
  
 FAST term:\\ FAST term:\\
-**=650  \4$aWomen%%--%%Social conditions.**\\ +**Women%%--%%Social conditions**
-or\\ +
-**=650  \7$aWomen%%--%%Social conditions.$2fast** +
- +
-The subfield indicator separating the terms is replaced by 2 dashes (%%--%%).  This is generally rare as most FAST headings are just a single term (as in the Cambodia example above, so you can't just do this all the time), but you will see certain terms again and again (for example, Murder%%--%%Investigation is common for mystery novels).  Some Vendor and Copied Records will use subfields for FAST headings, like this: +
- +
-**=650  \7$aWomen$xSocial conditions.$2fast**\\ +
-This should be changed to:\\ +
-**=650  \7$aWomen%%--%%Social conditions.$2fast**+
  
-When Drupal parses subject terms, it splits terms based on the subfield indicators, so if there is an $x in the subject termthe Drupal record will actually show 2 separate terms (WomenSocial conditions instead of Women%%--%%Social conditions).+This is generally rare as most FAST headings are just a single term (as in the Cambodia example above, so you can't just do this all the time)but you will see certain terms again and again (for example**Murder%%--%%Investigation** is common for mystery novels).
  
-You can check [[https://fast.oclc.org/searchfast/ | searchFAST]] to verify how certain terms are handled.  Over time you will learn to spot which terms are likely to use the dash (%%--%%) format, but searchFAST is always a good resource for this.+You can check [[https://fast.oclc.org/searchfast/ | searchFAST]] to verify how certain terms are handled.  Over time you will learn to spot which Subject Headings are likely to use 2 terms, but searchFAST is always a good resource for this.
  
 Also be aware that some FAST syntaxes are different than LoC.  For example, place names. Also be aware that some FAST syntaxes are different than LoC.  For example, place names.
  
-LoC:  **=650  \0$aGeorgia (Atla.)**\\ +LoC:  **Georgia (Atla.)**\\ 
-FAST:  **=650  \4$aAtlanta%%--%%Georgia.**+FAST:  **Atlanta%%--%%Georgia**
  
-LoC is City first with State/Province/Country in parentheses.  FAST is State/Province/Country%%--%%City.  So, take care when manually converting LoC subject terms to FAST.  There are also other differences, for example when dealing with people's names and their birth and death dates, and when dealing with named events (for example the Vietnam War).  Again, use [[https://fast.oclc.org/searchfast/ | searchFAST]] to get the syntax, and then you will know going forward.+LoC is City first with State/Province/Country in parentheses.  FAST is State/Province/Country%%--%%City.  So, take care when manually converting LoC subject terms to FAST.  There are also other differences, for example when dealing with people's names and their birth and death dates, and when dealing with named events (for example the Vietnam War).  Again, use [[https://fast.oclc.org/searchfast/ | searchFAST]] to get the general syntax, and then you will know going forward.
  
 The majority of FAST terms can simply be derived from LoC terms by just taking the first part of the LoC subject term.  This is most apparent when it comes to fiction.\\ The majority of FAST terms can simply be derived from LoC terms by just taking the first part of the LoC subject term.  This is most apparent when it comes to fiction.\\
-LoC adds the term $vFiction at the end of subject terms for works of fiction. For example:\\ +LoC adds the term %%--%%Fiction at the end of subject terms for works of fiction. For example:\\ 
-**=650  \0$aMissing persons$vFiction.**\\+**Missing persons%%--%%Fiction**\\
 The FAST term would just be:\\ The FAST term would just be:\\
-**=650  \4$aMissing persons.** or **=650  \7$aMissing persons.$2fast**+**Missing persons**
  
 **Where to find FAST subject terms** **Where to find FAST subject terms**
Line 118: Line 102:
 1.  Z39.50.  The best way to search for records in Z39.50 is by using the ISBN.  This will generally return multiple records for the same item.  Check each record until you find one with FAST subject headings.  If the records for a particular ISBN don't have FAST subject headings, try different ISBNs (ie:  Paperback vs. Hard cover vs. Large print vs. Audiobook vs. etc.).  Failing that, search via title and author. Searching via title often yields pages of irrelevant records.  If you must use a title search, use the AND operator and second search box to search for author Name. 1.  Z39.50.  The best way to search for records in Z39.50 is by using the ISBN.  This will generally return multiple records for the same item.  Check each record until you find one with FAST subject headings.  If the records for a particular ISBN don't have FAST subject headings, try different ISBNs (ie:  Paperback vs. Hard cover vs. Large print vs. Audiobook vs. etc.).  Failing that, search via title and author. Searching via title often yields pages of irrelevant records.  If you must use a title search, use the AND operator and second search box to search for author Name.
  
-Note that sometimes these Copied Records will not come with the 650 end of field punctuation (which is a . after the subject term/before the $2).  Add that in. +2.  [[https://search.worldcat.org/ | WorldCat.org]] - This is probably the better bet, and faster.  This OCLC website allows you to search by title and/or author.  It will return separate entries for each form of the item (ie:  print, audiobook, ebook, etc.).  Generally the print entries are the best to use.
- +
-2.  [[https://search.worldcat.org/ | WorldCat.org]].  This OCLC website allows you to search by title and/or author.  It will return separate entries for each form of the item (ie:  print, audiobook, ebook, etc.).  Generally the print entries are the best to use.+
  
 After searching, click on the result and in the result page click on "Show more information" to get a variety of information, including subject headings (listed as "Subjects") - the first few subject headings will show on-screen.  Click "Show more" to see all of them. After searching, click on the result and in the result page click on "Show more information" to get a variety of information, including subject headings (listed as "Subjects") - the first few subject headings will show on-screen.  Click "Show more" to see all of them.
Line 130: Line 112:
 FAST subject headings are marked with a Green Star.  Notice that LoC terms are similar - in this case they just have the term Fiction at the end. FAST subject headings are marked with a Green Star.  Notice that LoC terms are similar - in this case they just have the term Fiction at the end.
  
-The terms that WorldCat provides do not have subfields or double dashes (%%--%%), however when there is a capitalized word (ie:  "Fiction" in the LoC examples) that usually indicates a break in the subject heading.+The terms that WorldCat provides do not have subfields or double dashes (%%--%%), however when there is a capitalized word (ie:  "Fiction" in the LoC examples) that usually indicates a break in the Subject Heading.
  
-Note:  Wives Crimes against.  This is a FAST term and by noticing the capitalization of Crimes, we can tell that the form should be "Wives%%--%%Crimes against.  That will need to be changed when copied into a 650 field.  Moving forward, "&&--&&Crimes against" is now recognizable as a secondary FAST term that can be spotted in the future.+Note:  Wives Crimes against.  This is a FAST term and by noticing the capitalization of Crimes, we can tell that the form should be "Wives%%--%%Crimesagainst.  That will need to be changed when copied into the Subject field in Drupal.  Moving forward, "&&--&&Crimes against" is now recognizable as a secondary FAST term that you can spot in the future.
  
-You may also see terms that identify the genre of the item.  This is what the 655 field is for, and so can be omitted in the 650 field.  In the past, before LoC created a genre taxonomy, genre terms were put in the 650 field, but that is an outdated method.  BISAC terms are also genre identifying so can be left out.  However, these terms are a good guide as to what the genre is, and so can be helpful in creating the 655 fields.+You may also see terms that identify the genre of the item.  This is what the Genre field is for, and so can be omitted in the Subject field.  In the past, before LoC created a genre taxonomy, genre terms were put in the Subject field, but that is an outdated method.  BISAC terms are also genre identifying so can be left out.  However, these terms are a good guide as to what the genre is, and so can be helpful in creating the Genre Terms.
  
-For example you may omit this term from the 650 field:+For example you may omit this term from the Subject field:
  
-**=650  \0$aDetective and mystery fiction.**\\+**Detective and mystery fiction**\\
  
-There are also deprecated LoC terms to keep an eye out for - some of the old Genre terms for fiction ended in "stories" but the new ones end in "fiction":+There are also deprecated LoC terms to keep an eye out for - some of the old Genre terms for fiction ended in "stories" but the new ones end in "fiction" - for example:
  
-**=650  \0$aDetective and mystery stories.**\\ +**Detective and mystery stories**\\ 
-**=650  \0$aRomance stories.**\\ +**Romance stories**\\ 
-**=650  \0$aLove stories.**\\+**Love stories**\\
  
-These can be removed as well. +These can be omitted as well.
- +
- +
- +
- +
-  *Search by title.  If it is a pretty generic title you may get a lot of hits (hundreds), in which case include the author's last name in your search. +
-  *When searching OCLC, the item may not appear if the title we have includes series information.  Ex:  The two towers : the lord of the rings book 2.  Just search for The two towers. +
-  *Sometimes the subtitle won't be in OCLC, so you won't get any results.  Ex:  The hobbit : there and back again.  If it doesn't show up, just search for The hobbit. +
-  *Some special characters will interfere with your OCLC search.  Ex:  Hit & run.  If the item doesn't show up, search for Hit and run.  Even when replacing the "&" with "and", the result in OCLC may actually show up as Hit & run. +
-  *After you find the title, click on it and scroll down to FAST Subject Headings. +
-  *Copy and paste each Heading into the Subject field - separate each one with a comma.  Ex:  Assassins, Fugitives from justice, United States +
-  *If the Heading contains a comma, then it must be enclosed in quotation marks.  Ex:  "Keller, John (Fictitious character)" +
-  *The Usage Count tells you how many libraries use each particular heading.  Sometimes there will be a list of headings that have a Usage Count of 1 (while the others have hundreds or thousands) - if there are a lot of these 1s then they can be omitted if there are a bunch of more used ones. +
-  *If you can't find any Subject headings to copy and paste, try to find something similar and take one or two that fit.  If the item is part of a series, you can probably take one from one of the other books. +
-  *If a record set comes with BISAC terms those should be kept. You can find a full list of terms on the BISAC website at [[https://bisg.org/page/BISACEdition|Complete BISAC Subject Headings List, 2021 Edition]] +
-  *LCSH terms can be used if FAST terms are difficult to find, or at cataloguer's discretion if it would speed up the process significantly (for example if a large record set comes with robust LCSH terms already attached) - A lot of FAST terms are deconstructed LCSH terms+
  
 === Indigenous Subject Headings === === Indigenous Subject Headings ===
Line 201: Line 168:
   *Adult - for adult material.   *Adult - for adult material.
   *Sometimes it isn't entirely clear which one to use - there is crossover, especially in Juvenile and Adolescent.   *Sometimes it isn't entirely clear which one to use - there is crossover, especially in Juvenile and Adolescent.
-  *The abstract should give you an idea of who the audience is.+  *The Abstract should give you an idea of who the audience is.
   *If you know authors, then they should give you an idea as well (especially children and teen authors).   *If you know authors, then they should give you an idea as well (especially children and teen authors).
   *At the bottom of the OCLC page for the item, there are links to WorldCat pages for them - these will often have useful information about audience.   *At the bottom of the OCLC page for the item, there are links to WorldCat pages for them - these will often have useful information about audience.
Line 239: Line 206:
 This field only needs one entry, but can have as many as necessary separated by a comma. This field only needs one entry, but can have as many as necessary separated by a comma.
  
-  *Here is a list of Genre terms with descriptions: [[public:nnels:cataloguing:metadata-cleanup:genre|NNELS Genre Taxonomy]] +  *Here is a list of Genre terms with descriptions:  {{ :public:nnels:cataloguing:nnels_genre_terms_20240401.xlsx | NNELS Genre Terms}} 
-  *It is important to ONLY use those terms.  The field will auto-populate in Drupal.  If an incorrect genre terms is used, then Drupal will include that term in the list that it auto-populates from - it is time consuming to get rid of those incorrect terms periodically.+  *It is important to only use those terms.  The field will auto-populate in Drupal - so you can start typing and then select the correct term from the dropdown menu.
   *There are terms specifically for Non-fiction, and terms specifically for Fiction.   *There are terms specifically for Non-fiction, and terms specifically for Fiction.
-  *Most times a single genre is fine, sometimes multiple genres are better.  Ex:  Science fiction, Apocalyptic fiction might be better than just Science fiction.  The same applies to nonfiction.  Ex:  I would use History, Medicine, Health and Fitness for a history of medicine and medical procedures (in fact I did!).  Just use the least necessary to accurately describe the item.+  *Most times a single genre is fine, sometimes multiple genres are better.  Ex:  Science fiction, Apocalyptic fiction might be better than just Science fiction.  The same applies to nonfiction.  Ex:  I would use History and geography, Medicine, health and fitness for a history of medicine and medical procedures (in fact I did!).  Just use the least necessary to accurately describe the item.
   *There are genre terms that should be added to describe the form or type of the item in addition to what it's about.  Ex:  Fantasy fiction, Comics (Graphic works) would be a fantasy graphic novel; Music, Nonfiction comics, Biographies and autobiographies would be a biography about a musician or musical group told in a graphic, comic book style format.   *There are genre terms that should be added to describe the form or type of the item in addition to what it's about.  Ex:  Fantasy fiction, Comics (Graphic works) would be a fantasy graphic novel; Music, Nonfiction comics, Biographies and autobiographies would be a biography about a musician or musical group told in a graphic, comic book style format.
   *There are genre terms that signify special content that should be added as needed.  These are Canadian fiction, Canadian nonfiction, Canadian drama, Canadian poetry, French language materials, Indigenous materials, Juvenile fiction, Juvenile nonfiction, Young adult fiction, Young adult nonfiction.   *There are genre terms that signify special content that should be added as needed.  These are Canadian fiction, Canadian nonfiction, Canadian drama, Canadian poetry, French language materials, Indigenous materials, Juvenile fiction, Juvenile nonfiction, Young adult fiction, Young adult nonfiction.
-  *Canadian genre terms are for books by Canadian authors or about Canadian subjects.  Same with Indigenous materials.+  *Canadian genre terms are for books by Canadian authors or about Canadian subjects.  Same idea with Indigenous materials.
  
 ===Genre tips=== ===Genre tips===
Line 253: Line 220:
   *Juvenile fiction can be tough because it's usually a big combination of Humorous fiction, Magical realist fiction, Science fiction, Fantasy fiction, Detective and mystery fiction, etc.  So instead of trying to pin it down, just use Juvenile fiction.  This also prevents juvenile results from showing up when patrons looks for adult genre books like mystery or science fiction.  Genres that should be added to Juvenile fiction should be things like Comics (Graphic works), Picture books, Choose-your-own stories, and Canadian fiction and Indigenous materials.   *Juvenile fiction can be tough because it's usually a big combination of Humorous fiction, Magical realist fiction, Science fiction, Fantasy fiction, Detective and mystery fiction, etc.  So instead of trying to pin it down, just use Juvenile fiction.  This also prevents juvenile results from showing up when patrons looks for adult genre books like mystery or science fiction.  Genres that should be added to Juvenile fiction should be things like Comics (Graphic works), Picture books, Choose-your-own stories, and Canadian fiction and Indigenous materials.
   *Young adult items should be treated like adult books, in that they should get full genre treatment.  This is because young adult material tends to be more focused in its content, and also adults read them.   *Young adult items should be treated like adult books, in that they should get full genre treatment.  This is because young adult material tends to be more focused in its content, and also adults read them.
-  *Picture books are specifically for children's picture books.+  *Picture books are specifically for children's picture books (sometimes these may be non-fiction).
   *If unsure, picture books can be identified in the WorldCat description - they are often around 30 pages long, the pages are unnumbered, are illustrated, and often over-sized.  Ex from WorldCat Description field:  36 unnumbered pages : colour illustrations ; 24 cm.   *If unsure, picture books can be identified in the WorldCat description - they are often around 30 pages long, the pages are unnumbered, are illustrated, and often over-sized.  Ex from WorldCat Description field:  36 unnumbered pages : colour illustrations ; 24 cm.
-  *If you can't figure out the genre, or it doesn't fit any of the categories, use Literature - only for fiction+  *If you can't figure out a novel'genre, or it doesn't fit any of the categories, use General fiction.
- +
-=====General tips===== +
- +
-  *Sometimes the record won't save properly when you click on Save.  Click on View changes first, then hit Save. +
- +
-====Fix invalid characters in Drupal==== +
- +
-Sometimes when a record set is uploaded to Drupal there will be invalid characters (they will generally show up as a string of random nonsense characters).  This has to do with character encoding - MarcEdit uses Mark8 format and Drupal uses UTF8.  It is rarely a problem, but converting the character encoding should fix it.  This can be done in MarcEdit in Marc Tools when converting MRK to MRC or MRC to XML - just make sure "Default Character Encoding" is set to Mark8 and the "Translate to UTF8" box is ticked.+
public/nnels/cataloguing/metadata-cleanup.1712342094.txt.gz · Last modified: 2024/04/05 11:34 by robert.macgregor