NCBI Minute: Finding Gene, Protein and Chemical Names, Aliases and Synonyms
TLDRThe webinar, hosted by Peter Cooper and presented by Rana Morris, delves into the complexities of gene, protein, and chemical nomenclature. It highlights how evolving scientific understanding can lead to multiple names for the same entities, causing confusion in research and literature. The NCBI (National Center for Biotechnology Information) addresses this through its RefSeq project, which curates gene and protein names, and the PubChem project, which does the same for chemical compounds. The presentation outlines how to access and utilize the vast databases of names and synonyms through NCBI's Gene database and PubChem, including how to download bulk information via FTP. It also mentions the regular updates to these databases and directs users to the NCBI Insights Blog and YouTube channel for further guidance and resources.
Takeaways
- π **Confusion in Scientific Terminology**: Scientific nomenclature for genes, proteins, and chemical compounds can evolve over time, leading to confusion in literature and research discussions.
- π **NCBI's Role in Nomenclature**: The National Center for Biotechnology Information (NCBI) manages the nomenclature for genes and proteins through the RefSeq project and for chemicals via the PubChem group.
- π **Gene Symbol Evolution**: Official gene symbols and their aliases or synonyms can be found on the NCBI Gene database, which includes terms used historically in scientific literature.
- 𧬠**CDKN1A Example**: The gene symbol CDKN1A serves as an example where various aliases such as p21sdl1, p21Cip1, and p21Waf1 all refer to the same gene product.
- π¬ **Chemical Nomenclature Complexity**: Chemicals can have numerous names based on their atomic structure, market names, and identifiers like CAS numbers, managed by different organizations.
- ποΈ **RefSeq Curation**: The RefSeq group compiles a list of terms for each gene or protein, including official symbols and names, and displays them on the Gene database records.
- π **PubChem's Automated Processes**: PubChem uses automated processes to aggregate terms used for chemical compounds, including synonyms and computed descriptors based on chemical structures.
- π **Accessing NCBI Resources**: NCBI provides FTP sites where detailed files on gene and protein names, as well as chemical synonyms and identifiers, can be downloaded in bulk.
- π **Gene and Protein Information Files**: The GENE_INFO files on the NCBI FTP site contain comprehensive data on gene symbols and protein names, and are regularly updated.
- βοΈ **PubChem Compound Records**: PubChem maintains records that include computed descriptors and identifiers for chemical compounds, which are also available for download.
- π **Database Update Frequency**: While the frequency of updates to gene and protein databases may vary, PubChem updates its files daily.
- π§ **NCBI HelpDesk**: For inquiries or assistance with NCBI resources, users can reach out to the NCBI HelpDesk via email at info@ncbi.nlm.nih.gov.
Q & A
What is the main topic of today's webinar?
-The main topic of the webinar is gene and protein, chemical names, and aliases, synonyms, and other confusing terms.
Who is presenting the webinar?
-Rana Morris is presenting the webinar.
What is the role of the NCBI in managing gene and protein names?
-The NCBI manages gene and protein names through the RefSeq group, which curates terms used for genes and proteins from literature and reputable databases.
How does the NCBI handle the issue of different names for the same gene or chemical compound?
-NCBI compiles all the names, including official symbols, synonyms, and aliases, and displays them on their Gene database records and PubChem Compound pages.
What is the official gene symbol for the cyclin-dependent kinase inhibitor 1A?
-The official gene symbol for cyclin-dependent kinase inhibitor 1A is CDKN1A.
How can one find the list of synonyms and identifiers for a chemical compound?
-One can find the list of synonyms and identifiers for a chemical compound on the PubChem Compound page or by downloading relevant files from the NCBI FTP site.
What are the different types of names and identifiers compiled for chemical compounds on PubChem?
-PubChem compiles aliases or synonyms, computed descriptors (like IUPAC names, InChI, InChI Key, SMILES, and SMARTs), and identifiers from various organizations (like CAS Registry number, EC Number, and FDAβs UNII).
How can users download information on genes and proteins in bulk?
-Users can download information in bulk by accessing the GENE_INFO files on the NCBI FTP site, which are tab-separated text files containing comprehensive data on genes and proteins.
What is the process for updating gene and protein databases on the NCBI FTP site?
-The GENE_INFO file on the NCBI FTP site appears to be updated daily, while the exact frequency for gene records is not specified but is believed to be updated regularly.
How can one get access to the slide deck and Q&A document from the webinar?
-The slide deck and Q&A document can be accessed through the bit.ly (go.usa.gov) link provided on the cover slide of the webinar.
What resource can be used to find more tips and tricks about using NCBI resources?
-The NCBI Insights Blog and the NCBI YouTube channel are resources where users can find more tips, tricks, and how-to videos about using NCBI resources.
How can one get help or ask questions about NCBI resources?
-For help or questions about NCBI resources, one can send an email to the NCBI HelpDesk at info@ncbi.nlm.nih.gov.
Outlines
π Introduction to Gene, Protein, and Chemical Nomenclature
The first paragraph introduces the topic of the webinar, which is about the complexities of gene, protein, and chemical names, including aliases and synonyms. Peter Cooper welcomes the audience and introduces Rana Morris, who will present on the subject. The webinar aims to address the confusion that arises from the evolution of scientific nomenclature over time. Rana explains that the National Center for Biotechnology Information (NCBI) will share how they handle nomenclature issues internally and provide resources to help others navigate these complexities. The paragraph also highlights the importance of standardized naming for scientific communication and literature search.
𧬠Gene and Protein Nomenclature Management at NCBI
The second paragraph delves into how NCBI manages gene and protein nomenclature through the RefSeq group. It discusses the process of curating a list of terms for each gene or protein, including official symbols and names, as well as aliases from various reputable databases and expert submissions. The RefSeq group compiles this information and displays it on the Gene database, with gene symbols and protein names listed in their respective sections. The paragraph also provides an example of how this information is presented on the CDKN1A gene page, including the official gene symbol and protein name as designated by the Human Gene Nomenclature Committee (HGNC).
ποΈ Chemical Nomenclature and Synonyms in PubChem
The third paragraph shifts the focus to chemical nomenclature and how it is managed by the PubChem group at NCBI. It explains the process of aggregating terms for chemical compounds, including computed descriptors based on the chemical structure and identifiers from various organizations. The paragraph provides an example of the Tamoxifen compound page on PubChem, detailing the sections where names, identifiers, and synonyms are listed. It also discusses the availability of downloading information for individual compounds or bulk data through the NCBI FTP site, including GENE_INFO files for gene and protein information and CID-Synonym files for chemical synonyms.
Mindmap
Keywords
π‘Gene and Protein
π‘Chemical Names and Aliases
π‘NCBI (National Center for Biotechnology Information)
π‘RefSeq
π‘IUPAC Name
π‘Aliases or Synonyms
π‘FTP Site
π‘GENE_INFO Files
π‘PubChem Compound Identifier (CID)
π‘Computed Descriptors
π‘NCBI Insights Blog
Highlights
The webinar discusses gene, protein, and chemical nomenclature, addressing the challenges of evolving scientific terminology.
Rana Morris presents the NCBI Minute, focusing on the confusion caused by changes in scientific naming conventions over time.
The NCBI provides resources to help researchers navigate the complexities of gene, protein, and chemical names and aliases.
Examples are given to illustrate the confusion that can arise from the use of different names for the same gene or protein.
The Human Gene Nomenclature Committee (HGNC) is mentioned as the official body for gene symbol designation.
The issue of chemical compound naming is highlighted, with Tamoxifen as a case study of the challenges in chemical nomenclature.
NCBI's RefSeq project is introduced as a curation group that compiles gene and protein names and terms.
The presentation explains how to find and use the official gene symbols and protein names on the NCBI Gene database.
The PubChem Group is responsible for aggregating chemical compound terms, including synonyms and computed descriptors.
The webinar demonstrates how to access and download information on gene and chemical compound synonyms and identifiers from NCBI FTP sites.
GENE_INFO files on the NCBI FTP site provide comprehensive gene and protein information for various organisms.
The frequency of updates to gene and protein databases is discussed, with PubChem updating daily and gene records updated less frequently.
The NCBI Insights Blog is mentioned as a resource for further information and guidance on using NCBI resources.
The webinar provides bit.ly links for easy access to the NCBI Insights Blog posts on gene and protein names and chemical synonyms.
The importance of the NCBI YouTube channel for helpful how-to videos is emphasized.
Contact information for the NCBI HelpDesk is provided for users with questions or needing assistance with NCBI resources.
The webinar concludes with an invitation for further questions and a reminder of the resources available for gene, protein, and chemical nomenclature.
Transcripts
Browse More Related Video
5.0 / 5 (0 votes)
Thanks for rating: