NCBI Minute: A Beginner's Guide to Genes and Sequences at NCBI
TLDRPeter Cooper from NCBI guides viewers through a webinar focused on 'A Beginner's Guide to Genes and Sequences at NCBI.' The presentation was motivated by librarians at Fanshawe College, Canada, who sought assistance in utilizing NCBI for sequence discovery. Cooper addresses common knowledge gaps regarding NCBI's operations and offers live demonstrations. The webinar covers the representation of genetic information at NCBI, the origins of sequence data, and the use of Entrez and BLAST for searching sequences. Cooper introduces a new search experience designed to simplify the process of finding relevant results across the NCBI site without needing to know the specific database. The discussion also includes the representation of genes like APRT, the process of finding related sequences through orthologs, and the exploration of protein structures using tools like iCn3D. The session concludes with a Q&A, providing clarification on linking PubMed with data, the distinction between data reuse and discovery, and methods for narrowing down nucleotide sequences to ESTs.
Takeaways
- π Peter Cooper from NCBI introduces a webinar focusing on a beginner's guide to genes and sequences at NCBI, prompted by librarians at Fanshawe College in Canada.
- π The new search experience at NCBI aims to streamline the process of finding relevant data across different databases without needing to know which specific database to search.
- 𧬠NCBI represents the central dogma of molecular biology, showing the flow from DNA, to RNA, to proteins, and how these are expressed and function within the cell.
- π NCBI's representation includes various types of sequence records such as GenBank, Reference Sequences, and Swiss-Prot, each serving different purposes and levels of curation.
- π The NCBI Reference Sequences are generated from submitted sequences to represent specific genes and gene products, offering a less redundant and more curated set of data.
- π¬ Peter demonstrates how to use the Entrez system and BLAST for searching sequences, highlighting the challenges and the improvements made with the new search experience.
- π The new search experience provides a more intuitive and Google-like interface, offering compact and relevant results from a simple search query.
- 𧡠Peter shows how to find related sequences using the Orthologs button, which can act as a quick alternative to BLAST for finding related proteins and sequences.
- π A live demonstration is given on how to use the new search features, including finding gene records, exploring genomic context, and aligning sequences with tools like COBALT.
- π The importance of selecting the right database in BLAST searches is emphasized for efficiency, using the smallest relevant database to find matches.
- π Lastly, Peter discusses the possibility of linking PubMed to data and vice versa, showing that literature links are still available in gene records and can be accessed through the NCBI interface.
Q & A
What is the main purpose of the webinar presented by Peter Cooper from NCBI?
-The main purpose of the webinar is to provide a beginner's guide to understanding genes and sequences at NCBI, prompted by librarians at Fanshawe College in Canada who requested information on how to use NCBI to find sequences.
What are the two traditional search systems used at NCBI?
-The two traditional search systems used at NCBI are the Entrez text search system and BLAST, which is a sequence similarity search system.
How does the new search experience at NCBI aim to improve the user's search process?
-The new search experience at NCBI aims to improve the user's search process by providing a more Google-like interface that is independent of which database the data are in, offering a compact set of relevant results without needing to know the specific database to search.
What is the central dogma of molecular biology, and how does NCBI represent it?
-The central dogma of molecular biology refers to the flow of genetic information from DNA to RNA to protein. NCBI represents this by showing the gene, its transcripts, protein isoforms, and their relationships in their database, using various sequence records with accession numbers.
What is the difference between GenBank and NCBI Reference sequences?
-GenBank is a set of submitted sequences that are part of the International Sequence Database Collaboration (INSDC), while NCBI Reference sequences are generated by NCBI from submitted sequences to represent particular genes and gene products. They are a less redundant set with more curation.
How can one find related sequences to a specific gene using the new search experience at NCBI?
-One can find related sequences to a specific gene using the new search experience by searching with the gene name or protein name, which will provide a result that shows the gene, sequences, and access to related sequences through the orthologs button.
What is the significance of the APRT gene that was discussed in the webinar?
-The APRT gene is significant because it is associated with certain diseases due to mutations, it is highly conserved across different forms of life, and it plays a role in purine metabolism, which is essential for all life forms.
How can one find the genomic context of a gene using NCBI's genome browser?
-One can find the genomic context of a gene using NCBI's genome browser by accessing the browser and selecting the gene of interest. The browser displays the gene's location on the chromosome, its transcripts, and allows users to zoom in to view the actual codons and how they code for proteins.
What is the RefSeq Select project, and how does it help in representing a gene?
-The RefSeq Select project is a way of selecting a preliminary representative transcript for a gene, which is useful when one doesn't need all the splice variants of a gene. It helps in representing the gene by providing a single transcript that is considered to be the most representative.
How can one perform a multiple sequence alignment using NCBI tools?
-One can perform a multiple sequence alignment using NCBI's multiple alignment tool, COBALT. After selecting and adding the desired sequences to the cart, users can click on the protein alignment button to initiate the alignment process.
What is the process for linking a gene sequence to its structure using NCBI's resources?
-The process involves using BLAST to find similar sequences and then selecting the sequence of interest to access its structure. From there, users can utilize a viewer like iCn3D to examine the three-dimensional structure of the protein and its functional sites.
Outlines
π Introduction to NCBI Webinar
Peter Cooper from the NCBI introduces the webinar focused on 'A Beginner's Guide to Genes and Sequences at NCBI.' He mentions the webinar was prompted by librarians at Fanshawe College in Canada who were seeking guidance on using NCBI for sequence research. Cooper emphasizes the importance of understanding how NCBI works, given the gaps in people's knowledge. He plans to spend most of the time on live demonstrations, covering topics from gene and sequence representation at NCBI to the use of traditional search systems and tools like Entrez and BLAST. Cooper also introduces a new search experience on the NCBI site that streamlines the search across different databases.
𧬠Understanding Genes and Sequences at NCBI
The speaker discusses how NCBI represents the central dogma of molecular biology, starting from DNA, through RNA splicing, to protein formation. He uses the APRT gene as an example, showing its representation in the NCBI genome browser with its two transcript variants and corresponding protein isoforms. The paragraph also covers the source of sequences at NCBI, including GenBank and the NCBI Reference sequences. It highlights the different types of sequences available for the APRT gene, such as transcripts, proteins, and genomic sequences, and touches on the connection to structural data via the PDB database.
π Searching NCBI's Sequence Databases
The paragraph explains the traditional search methods at NCBI, using the Entrez text system and the BLAST sequence similarity search system. It points out the challenges of searching across different databases and silos. The speaker then introduces the new search experience, which provides a more streamlined and Google-like interface, allowing users to search without knowing the specific database. The paragraph also demonstrates how to find the APRT gene and its related sequences using the new search features, including the Orthologs button for finding related proteins and sequences.
π Navigating Reference Sequences and Genomic Context
The speaker delves into the reference sequences for the APRT gene, showing how to find and work with splice variants and the RefSeq Select project for representative transcripts. He also guides on finding related sequences in other organisms using the Orthologs button and the 'Genes similar to APRT' link for a broader range of sequences. The paragraph includes a demonstration of adding sequences to a cart and using the protein alignment tool, COBALT, to compare selected sequences from different organisms.
𧡠BLAST Search and Sequence Alignment
The paragraph focuses on using BLAST for searching and aligning sequences. It demonstrates how to download sequences and use BLAST to find closely related organisms. The speaker also shows how to perform a BLAST search with nucleotide sequences and discusses the significance of BLAST results, including E-values and the importance of selecting the appropriate database for the search. The paragraph concludes with a brief mention of traditional BLAST searches to find bacterial transferases and examine their structures.
π Exploring Protein Structures and Functional Sites
The speaker guides through accessing protein structures from BLAST results and using the iCn3D viewer to examine the three-dimensional structure of the APRT protein. He highlights the ability to view functional sites, such as the active site, and how these sites are conserved across different organisms. The paragraph emphasizes the journey from sequence to the folded protein structure and the availability of functional information within the viewer.
π€ Addressing Questions and Wrapping Up
The final paragraph addresses questions from the audience, such as linking PubMed to data and vice versa, the distinction between data reuse and data discovery, and how to find specific types of sequences like ESTs within the nucleotide database. The speaker also provides information on where to find additional resources and assistance, including the NCBI blog, Learn page, fact sheets, and YouTube channel. The webinar concludes with an invitation for further questions and thanks to the participants.
Mindmap
Keywords
π‘NCBI
π‘Gene Sequences
π‘Entrez System
π‘BLAST
π‘GenBank
π‘RefSeq
π‘Orthologs
π‘Protein Structure
π‘Multiple Sequence Alignment
π‘iCn3D
π‘PubMed
Highlights
Peter Cooper from NCBI introduces a beginner's guide to genes and sequences at NCBI.
Webinar prompted by librarians at Fanshawe College in Canada to assist in finding sequences using NCBI.
NCBI's new search experience de-silodizes results, making it easier to find relevant information without knowing the specific database.
Demonstration of live searches using the Entrez system and BLAST for sequence similarity.
Explanation of the central dogma of molecular biology and how NCBI represents this information.
Introduction to the APRT gene, its transcript variants, and protein isoforms.
Differentiation between GenBank and NCBI Reference sequences, including their curation and redundancy.
Discussion on the representation of gene sequences in the NCBI genome browser.
Use of the Orthologs button as a partial replacement for BLAST to find related proteins and sequences.
Live demonstration of a sequence comparison using the multiple sequence alignment tool COBALT.
Explanation of the process to find and analyze adenine ribosyltransferase enzyme and its gene using known item search.
Traditional nucleotide search demonstration with filtering for human refseqs.
Illustration of the genomic context of the APRT gene using the genome browser.
Highlighting the RefSeq Select project for selecting preliminary representative transcripts.
Procedure for finding related sequences in other organisms using the protein and Orthologs button.
Demonstration of a broad spectrum gene search and subsequent protein alignment using multiple organisms.
Use of BLAST for aligning nucleotide sequences and obtaining statistical significance of alignments.
Final task involves a regular BLAST search to find matches in bacteria for the APRT sequence.
Direct linking from protein sequences to their structures using iCn3D for visual analysis.
Peter Cooper provides resources for further learning about NCBI tools and encourages questions from attendees.
Transcripts
Browse More Related Video
NCBI Minute: New Version of E-utilities Supports Accession.version Identifiers
NCBI Minute: Finding Gene, Protein and Chemical Names, Aliases and Synonyms
Introduction to Sequences (Precalculus - College Algebra 67)
Geometric Sequences (Precalculus - College Algebra 71)
Arithmetic Sequences (Precalculus - College Algebra 69)
Peek into the fresh new lightweight CRM by WPManageNinja | FluentCRM Product Demo | WordPress CRM
5.0 / 5 (0 votes)
Thanks for rating: