NCBI Minute: Finding Gene, Protein and Chemical Names, Aliases and Synonyms

National Library of Medicine
9 Feb 201715:17
EducationalLearning
32 Likes 10 Comments

TLDRThe webinar, hosted by Peter Cooper and presented by Rana Morris, delves into the complexities of gene, protein, and chemical nomenclature. It highlights how evolving scientific understanding can lead to multiple names for the same entities, causing confusion in research and literature. The NCBI (National Center for Biotechnology Information) addresses this through its RefSeq project, which curates gene and protein names, and the PubChem project, which does the same for chemical compounds. The presentation outlines how to access and utilize the vast databases of names and synonyms through NCBI's Gene database and PubChem, including how to download bulk information via FTP. It also mentions the regular updates to these databases and directs users to the NCBI Insights Blog and YouTube channel for further guidance and resources.

Takeaways
  • ๐Ÿ” **Confusion in Scientific Terminology**: Scientific nomenclature for genes, proteins, and chemical compounds can evolve over time, leading to confusion in literature and research discussions.
  • ๐ŸŒŸ **NCBI's Role in Nomenclature**: The National Center for Biotechnology Information (NCBI) manages the nomenclature for genes and proteins through the RefSeq project and for chemicals via the PubChem group.
  • ๐Ÿ“š **Gene Symbol Evolution**: Official gene symbols and their aliases or synonyms can be found on the NCBI Gene database, which includes terms used historically in scientific literature.
  • ๐Ÿงฌ **CDKN1A Example**: The gene symbol CDKN1A serves as an example where various aliases such as p21sdl1, p21Cip1, and p21Waf1 all refer to the same gene product.
  • ๐Ÿ”ฌ **Chemical Nomenclature Complexity**: Chemicals can have numerous names based on their atomic structure, market names, and identifiers like CAS numbers, managed by different organizations.
  • ๐Ÿ—ƒ๏ธ **RefSeq Curation**: The RefSeq group compiles a list of terms for each gene or protein, including official symbols and names, and displays them on the Gene database records.
  • ๐Ÿ“ˆ **PubChem's Automated Processes**: PubChem uses automated processes to aggregate terms used for chemical compounds, including synonyms and computed descriptors based on chemical structures.
  • ๐Ÿ“Š **Accessing NCBI Resources**: NCBI provides FTP sites where detailed files on gene and protein names, as well as chemical synonyms and identifiers, can be downloaded in bulk.
  • ๐Ÿ”— **Gene and Protein Information Files**: The GENE_INFO files on the NCBI FTP site contain comprehensive data on gene symbols and protein names, and are regularly updated.
  • โš™๏ธ **PubChem Compound Records**: PubChem maintains records that include computed descriptors and identifiers for chemical compounds, which are also available for download.
  • ๐Ÿ“… **Database Update Frequency**: While the frequency of updates to gene and protein databases may vary, PubChem updates its files daily.
  • ๐Ÿ“ง **NCBI HelpDesk**: For inquiries or assistance with NCBI resources, users can reach out to the NCBI HelpDesk via email at info@ncbi.nlm.nih.gov.
Q & A
  • What is the main topic of today's webinar?

    -The main topic of the webinar is gene and protein, chemical names, and aliases, synonyms, and other confusing terms.

  • Who is presenting the webinar?

    -Rana Morris is presenting the webinar.

  • What is the role of the NCBI in managing gene and protein names?

    -The NCBI manages gene and protein names through the RefSeq group, which curates terms used for genes and proteins from literature and reputable databases.

  • How does the NCBI handle the issue of different names for the same gene or chemical compound?

    -NCBI compiles all the names, including official symbols, synonyms, and aliases, and displays them on their Gene database records and PubChem Compound pages.

  • What is the official gene symbol for the cyclin-dependent kinase inhibitor 1A?

    -The official gene symbol for cyclin-dependent kinase inhibitor 1A is CDKN1A.

  • How can one find the list of synonyms and identifiers for a chemical compound?

    -One can find the list of synonyms and identifiers for a chemical compound on the PubChem Compound page or by downloading relevant files from the NCBI FTP site.

  • What are the different types of names and identifiers compiled for chemical compounds on PubChem?

    -PubChem compiles aliases or synonyms, computed descriptors (like IUPAC names, InChI, InChI Key, SMILES, and SMARTs), and identifiers from various organizations (like CAS Registry number, EC Number, and FDAโ€™s UNII).

  • How can users download information on genes and proteins in bulk?

    -Users can download information in bulk by accessing the GENE_INFO files on the NCBI FTP site, which are tab-separated text files containing comprehensive data on genes and proteins.

  • What is the process for updating gene and protein databases on the NCBI FTP site?

    -The GENE_INFO file on the NCBI FTP site appears to be updated daily, while the exact frequency for gene records is not specified but is believed to be updated regularly.

  • How can one get access to the slide deck and Q&A document from the webinar?

    -The slide deck and Q&A document can be accessed through the bit.ly (go.usa.gov) link provided on the cover slide of the webinar.

  • What resource can be used to find more tips and tricks about using NCBI resources?

    -The NCBI Insights Blog and the NCBI YouTube channel are resources where users can find more tips, tricks, and how-to videos about using NCBI resources.

  • How can one get help or ask questions about NCBI resources?

    -For help or questions about NCBI resources, one can send an email to the NCBI HelpDesk at info@ncbi.nlm.nih.gov.

Outlines
00:00
๐ŸŽ“ Introduction to Gene, Protein, and Chemical Nomenclature

The first paragraph introduces the topic of the webinar, which is about the complexities of gene, protein, and chemical names, including aliases and synonyms. Peter Cooper welcomes the audience and introduces Rana Morris, who will present on the subject. The webinar aims to address the confusion that arises from the evolution of scientific nomenclature over time. Rana explains that the National Center for Biotechnology Information (NCBI) will share how they handle nomenclature issues internally and provide resources to help others navigate these complexities. The paragraph also highlights the importance of standardized naming for scientific communication and literature search.

05:02
๐Ÿงฌ Gene and Protein Nomenclature Management at NCBI

The second paragraph delves into how NCBI manages gene and protein nomenclature through the RefSeq group. It discusses the process of curating a list of terms for each gene or protein, including official symbols and names, as well as aliases from various reputable databases and expert submissions. The RefSeq group compiles this information and displays it on the Gene database, with gene symbols and protein names listed in their respective sections. The paragraph also provides an example of how this information is presented on the CDKN1A gene page, including the official gene symbol and protein name as designated by the Human Gene Nomenclature Committee (HGNC).

10:03
๐Ÿ›๏ธ Chemical Nomenclature and Synonyms in PubChem

The third paragraph shifts the focus to chemical nomenclature and how it is managed by the PubChem group at NCBI. It explains the process of aggregating terms for chemical compounds, including computed descriptors based on the chemical structure and identifiers from various organizations. The paragraph provides an example of the Tamoxifen compound page on PubChem, detailing the sections where names, identifiers, and synonyms are listed. It also discusses the availability of downloading information for individual compounds or bulk data through the NCBI FTP site, including GENE_INFO files for gene and protein information and CID-Synonym files for chemical synonyms.

Mindmap
Keywords
๐Ÿ’กGene and Protein
Genes and proteins are fundamental to biological functions and are the focus of much scientific research. In the video, they discuss how the nomenclature for genes and proteins can evolve over time, leading to confusion. The video provides examples of how different names for the same gene or protein can be consolidated and understood through resources like NCBI's Gene database.
๐Ÿ’กChemical Names and Aliases
Chemical compounds can have various names, synonyms, or aliases, which can make literature searches and compound identification challenging. The video explains how NCBI's PubChem group manages this issue by aggregating all terms used for chemical compounds, including official IUPAC names, registry identifiers like CAS, and marketable names.
๐Ÿ’กNCBI (National Center for Biotechnology Information)
NCBI is a vital resource in the field of bioinformatics, providing databases and tools for scientific research. In the video, NCBI is highlighted for its role in standardizing and providing access to information on gene symbols, protein names, and chemical compound synonyms through platforms like the Gene database and PubChem.
๐Ÿ’กRefSeq
RefSeq, or the Reference Sequence project, is a group within NCBI that curates a comprehensive list of terms used for genes and proteins. The video mentions how RefSeq compiles official gene symbols, protein names, and aliases from various sources, including literature and databases, to help researchers navigate the complexities of nomenclature.
๐Ÿ’กIUPAC Name
The IUPAC name, or International Union of Pure and Applied Chemistry name, is the official and systematic nomenclature for chemical compounds based on their atomic structure. The video discusses how IUPAC names are used alongside other identifiers to ensure precise chemical identification, despite the existence of marketable names like Tamoxifen.
๐Ÿ’กAliases or Synonyms
Aliases or synonyms are alternative names for genes, proteins, or chemical compounds. The video emphasizes the importance of recognizing these alternative terms to avoid confusion and ensure accurate scientific communication. NCBI databases, such as Gene and PubChem, include these synonyms to facilitate comprehensive searches.
๐Ÿ’กFTP Site
The FTP (File Transfer Protocol) site mentioned in the video is a resource where researchers can access bulk data files from NCBI, including GENE_INFO files and PubChem databases. These files are crucial for those who need extensive datasets for research or analysis, as they provide a wealth of information in a downloadable format.
๐Ÿ’กGENE_INFO Files
GENE_INFO files are tab-separated text files produced by the RefSeq group that contain extensive information about genes, including gene symbols and protein names. The video explains that these files are updated regularly and can be accessed via the NCBI FTP site, offering a valuable resource for researchers interested in gene and protein data.
๐Ÿ’กPubChem Compound Identifier (CID)
The PubChem Compound Identifier (CID) is a unique number assigned to each chemical compound in the PubChem database. The video discusses how the CID is used in conjunction with synonyms to help researchers identify and search for chemical compounds within the PubChem database.
๐Ÿ’กComputed Descriptors
Computed Descriptors are terms generated by a software program based on the chemical structure of a compound. These include IUPAC names, InChI, InChI Key, SMILES, and SMARTs. The video explains that these descriptors are part of the comprehensive data available on PubChem Compound record pages, aiding in the identification and search of chemical compounds.
๐Ÿ’กNCBI Insights Blog
The NCBI Insights Blog is a platform that provides information about NCBI resources, including tips and tricks for using their databases effectively. The video script mentions that detailed instructions on how to access gene and protein aliases, as well as chemical synonyms, are available through blog posts, which can be easily accessed through provided links.
Highlights

The webinar discusses gene, protein, and chemical nomenclature, addressing the challenges of evolving scientific terminology.

Rana Morris presents the NCBI Minute, focusing on the confusion caused by changes in scientific naming conventions over time.

The NCBI provides resources to help researchers navigate the complexities of gene, protein, and chemical names and aliases.

Examples are given to illustrate the confusion that can arise from the use of different names for the same gene or protein.

The Human Gene Nomenclature Committee (HGNC) is mentioned as the official body for gene symbol designation.

The issue of chemical compound naming is highlighted, with Tamoxifen as a case study of the challenges in chemical nomenclature.

NCBI's RefSeq project is introduced as a curation group that compiles gene and protein names and terms.

The presentation explains how to find and use the official gene symbols and protein names on the NCBI Gene database.

The PubChem Group is responsible for aggregating chemical compound terms, including synonyms and computed descriptors.

The webinar demonstrates how to access and download information on gene and chemical compound synonyms and identifiers from NCBI FTP sites.

GENE_INFO files on the NCBI FTP site provide comprehensive gene and protein information for various organisms.

The frequency of updates to gene and protein databases is discussed, with PubChem updating daily and gene records updated less frequently.

The NCBI Insights Blog is mentioned as a resource for further information and guidance on using NCBI resources.

The webinar provides bit.ly links for easy access to the NCBI Insights Blog posts on gene and protein names and chemical synonyms.

The importance of the NCBI YouTube channel for helpful how-to videos is emphasized.

Contact information for the NCBI HelpDesk is provided for users with questions or needing assistance with NCBI resources.

The webinar concludes with an invitation for further questions and a reminder of the resources available for gene, protein, and chemical nomenclature.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: