NCBI Minute: A Beginner's Guide to Genes and Sequences at NCBI

National Library of Medicine
16 Sept 201933:44
EducationalLearning
32 Likes 10 Comments

TLDRPeter Cooper from NCBI guides viewers through a webinar focused on 'A Beginner's Guide to Genes and Sequences at NCBI.' The presentation was motivated by librarians at Fanshawe College, Canada, who sought assistance in utilizing NCBI for sequence discovery. Cooper addresses common knowledge gaps regarding NCBI's operations and offers live demonstrations. The webinar covers the representation of genetic information at NCBI, the origins of sequence data, and the use of Entrez and BLAST for searching sequences. Cooper introduces a new search experience designed to simplify the process of finding relevant results across the NCBI site without needing to know the specific database. The discussion also includes the representation of genes like APRT, the process of finding related sequences through orthologs, and the exploration of protein structures using tools like iCn3D. The session concludes with a Q&A, providing clarification on linking PubMed with data, the distinction between data reuse and discovery, and methods for narrowing down nucleotide sequences to ESTs.

Takeaways
  • πŸ“š Peter Cooper from NCBI introduces a webinar focusing on a beginner's guide to genes and sequences at NCBI, prompted by librarians at Fanshawe College in Canada.
  • πŸ”Ž The new search experience at NCBI aims to streamline the process of finding relevant data across different databases without needing to know which specific database to search.
  • 🧬 NCBI represents the central dogma of molecular biology, showing the flow from DNA, to RNA, to proteins, and how these are expressed and function within the cell.
  • 🌐 NCBI's representation includes various types of sequence records such as GenBank, Reference Sequences, and Swiss-Prot, each serving different purposes and levels of curation.
  • πŸ”— The NCBI Reference Sequences are generated from submitted sequences to represent specific genes and gene products, offering a less redundant and more curated set of data.
  • πŸ”¬ Peter demonstrates how to use the Entrez system and BLAST for searching sequences, highlighting the challenges and the improvements made with the new search experience.
  • πŸ“ˆ The new search experience provides a more intuitive and Google-like interface, offering compact and relevant results from a simple search query.
  • 🧡 Peter shows how to find related sequences using the Orthologs button, which can act as a quick alternative to BLAST for finding related proteins and sequences.
  • πŸ“Š A live demonstration is given on how to use the new search features, including finding gene records, exploring genomic context, and aligning sequences with tools like COBALT.
  • πŸ“Š The importance of selecting the right database in BLAST searches is emphasized for efficiency, using the smallest relevant database to find matches.
  • πŸ“˜ Lastly, Peter discusses the possibility of linking PubMed to data and vice versa, showing that literature links are still available in gene records and can be accessed through the NCBI interface.
Q & A
  • What is the main purpose of the webinar presented by Peter Cooper from NCBI?

    -The main purpose of the webinar is to provide a beginner's guide to understanding genes and sequences at NCBI, prompted by librarians at Fanshawe College in Canada who requested information on how to use NCBI to find sequences.

  • What are the two traditional search systems used at NCBI?

    -The two traditional search systems used at NCBI are the Entrez text search system and BLAST, which is a sequence similarity search system.

  • How does the new search experience at NCBI aim to improve the user's search process?

    -The new search experience at NCBI aims to improve the user's search process by providing a more Google-like interface that is independent of which database the data are in, offering a compact set of relevant results without needing to know the specific database to search.

  • What is the central dogma of molecular biology, and how does NCBI represent it?

    -The central dogma of molecular biology refers to the flow of genetic information from DNA to RNA to protein. NCBI represents this by showing the gene, its transcripts, protein isoforms, and their relationships in their database, using various sequence records with accession numbers.

  • What is the difference between GenBank and NCBI Reference sequences?

    -GenBank is a set of submitted sequences that are part of the International Sequence Database Collaboration (INSDC), while NCBI Reference sequences are generated by NCBI from submitted sequences to represent particular genes and gene products. They are a less redundant set with more curation.

  • How can one find related sequences to a specific gene using the new search experience at NCBI?

    -One can find related sequences to a specific gene using the new search experience by searching with the gene name or protein name, which will provide a result that shows the gene, sequences, and access to related sequences through the orthologs button.

  • What is the significance of the APRT gene that was discussed in the webinar?

    -The APRT gene is significant because it is associated with certain diseases due to mutations, it is highly conserved across different forms of life, and it plays a role in purine metabolism, which is essential for all life forms.

  • How can one find the genomic context of a gene using NCBI's genome browser?

    -One can find the genomic context of a gene using NCBI's genome browser by accessing the browser and selecting the gene of interest. The browser displays the gene's location on the chromosome, its transcripts, and allows users to zoom in to view the actual codons and how they code for proteins.

  • What is the RefSeq Select project, and how does it help in representing a gene?

    -The RefSeq Select project is a way of selecting a preliminary representative transcript for a gene, which is useful when one doesn't need all the splice variants of a gene. It helps in representing the gene by providing a single transcript that is considered to be the most representative.

  • How can one perform a multiple sequence alignment using NCBI tools?

    -One can perform a multiple sequence alignment using NCBI's multiple alignment tool, COBALT. After selecting and adding the desired sequences to the cart, users can click on the protein alignment button to initiate the alignment process.

  • What is the process for linking a gene sequence to its structure using NCBI's resources?

    -The process involves using BLAST to find similar sequences and then selecting the sequence of interest to access its structure. From there, users can utilize a viewer like iCn3D to examine the three-dimensional structure of the protein and its functional sites.

Outlines
00:00
πŸŽ“ Introduction to NCBI Webinar

Peter Cooper from the NCBI introduces the webinar focused on 'A Beginner's Guide to Genes and Sequences at NCBI.' He mentions the webinar was prompted by librarians at Fanshawe College in Canada who were seeking guidance on using NCBI for sequence research. Cooper emphasizes the importance of understanding how NCBI works, given the gaps in people's knowledge. He plans to spend most of the time on live demonstrations, covering topics from gene and sequence representation at NCBI to the use of traditional search systems and tools like Entrez and BLAST. Cooper also introduces a new search experience on the NCBI site that streamlines the search across different databases.

05:02
🧬 Understanding Genes and Sequences at NCBI

The speaker discusses how NCBI represents the central dogma of molecular biology, starting from DNA, through RNA splicing, to protein formation. He uses the APRT gene as an example, showing its representation in the NCBI genome browser with its two transcript variants and corresponding protein isoforms. The paragraph also covers the source of sequences at NCBI, including GenBank and the NCBI Reference sequences. It highlights the different types of sequences available for the APRT gene, such as transcripts, proteins, and genomic sequences, and touches on the connection to structural data via the PDB database.

10:05
πŸ”Ž Searching NCBI's Sequence Databases

The paragraph explains the traditional search methods at NCBI, using the Entrez text system and the BLAST sequence similarity search system. It points out the challenges of searching across different databases and silos. The speaker then introduces the new search experience, which provides a more streamlined and Google-like interface, allowing users to search without knowing the specific database. The paragraph also demonstrates how to find the APRT gene and its related sequences using the new search features, including the Orthologs button for finding related proteins and sequences.

15:05
πŸ“š Navigating Reference Sequences and Genomic Context

The speaker delves into the reference sequences for the APRT gene, showing how to find and work with splice variants and the RefSeq Select project for representative transcripts. He also guides on finding related sequences in other organisms using the Orthologs button and the 'Genes similar to APRT' link for a broader range of sequences. The paragraph includes a demonstration of adding sequences to a cart and using the protein alignment tool, COBALT, to compare selected sequences from different organisms.

20:12
🧡 BLAST Search and Sequence Alignment

The paragraph focuses on using BLAST for searching and aligning sequences. It demonstrates how to download sequences and use BLAST to find closely related organisms. The speaker also shows how to perform a BLAST search with nucleotide sequences and discusses the significance of BLAST results, including E-values and the importance of selecting the appropriate database for the search. The paragraph concludes with a brief mention of traditional BLAST searches to find bacterial transferases and examine their structures.

25:13
πŸ“Š Exploring Protein Structures and Functional Sites

The speaker guides through accessing protein structures from BLAST results and using the iCn3D viewer to examine the three-dimensional structure of the APRT protein. He highlights the ability to view functional sites, such as the active site, and how these sites are conserved across different organisms. The paragraph emphasizes the journey from sequence to the folded protein structure and the availability of functional information within the viewer.

30:14
πŸ€” Addressing Questions and Wrapping Up

The final paragraph addresses questions from the audience, such as linking PubMed to data and vice versa, the distinction between data reuse and data discovery, and how to find specific types of sequences like ESTs within the nucleotide database. The speaker also provides information on where to find additional resources and assistance, including the NCBI blog, Learn page, fact sheets, and YouTube channel. The webinar concludes with an invitation for further questions and thanks to the participants.

Mindmap
Keywords
πŸ’‘NCBI
NCBI stands for the National Center for Biotechnology Information. It is a part of the United States National Library of Medicine (NLM). In the video, it serves as the central database and resource for accessing genetic and molecular biology data, including gene sequences and related information. It is the primary platform where the webinar's content is based and where the demonstrations take place.
πŸ’‘Gene Sequences
Gene sequences refer to the specific order of nucleotides within a DNA or RNA molecule that determines the genetic traits of an organism. In the video, the discussion revolves around how to find and analyze gene sequences using NCBI tools, which is crucial for understanding genetic information and its role in biological functions.
πŸ’‘Entrez System
Entrez is a search and retrieval system for biomedical literature, developed by the NCBI. It allows users to search across multiple databases, including nucleotide and protein sequences. In the video, it is mentioned as a traditional search system that requires a certain level of expertise to use effectively.
πŸ’‘BLAST
BLAST (Basic Local Alignment Search Tool) is an algorithm and program used for sequence similarity searching. It is a widely used tool for finding regions of similarity between biological sequences, which can help in identifying homologous sequences. In the video, BLAST is shown as a method to search for related sequences based on sequence similarity.
πŸ’‘GenBank
GenBank is a public database of nucleic acid sequences, including DNA and RNA sequences. It is part of the International Sequence Database Collaboration (INSDC) and is mentioned in the video as a source of submitted sequences that are available for public use. GenBank is a foundational database for many of the sequences analyzed and discussed in the webinar.
πŸ’‘RefSeq
RefSeq (Reference Sequence) is a database of genomic reference sequences provided by NCBI. It aims to provide a comprehensive set of sequences for various organisms. In the video, RefSeq is used to represent particular genes and gene products, and it is noted for being a curated and less redundant set of sequences.
πŸ’‘Orthologs
Orthologs are genes in different species that evolved from a common ancestral gene through speciation. In the video, the concept of orthologs is used to find related sequences in different organisms, which is important for understanding evolutionary relationships and functional conservation across species.
πŸ’‘Protein Structure
Protein structure refers to the three-dimensional arrangement of atoms within a protein molecule. It is crucial for understanding how proteins function within a cell. In the video, the presenter discusses how to use NCBI resources to go from the sequence of a protein to its folded structure, which is vital for studying protein function and interactions.
πŸ’‘Multiple Sequence Alignment
Multiple sequence alignment is a technique used in bioinformatics to align multiple homologous sequences to identify regions of similarity and conserve motifs. In the video, the presenter uses the COBALT tool for multiple sequence alignment to compare different sequences of the APRT gene across various organisms.
πŸ’‘iCn3D
iCn3D is a web-based molecular viewer that allows users to visualize and analyze molecular structures, such as proteins. In the video, iCn3D is used to explore the three-dimensional structure of the APRT protein, highlighting the active site and the residues involved, which is essential for understanding the protein's function.
πŸ’‘PubMed
PubMed is a free search engine that primarily accesses the MEDLINE database of references and abstracts on life sciences and biomedical topics. In the video, PubMed is mentioned in the context of linking literature to gene records and vice versa, which is important for researchers looking for scientific publications related to specific genes or sequences.
Highlights

Peter Cooper from NCBI introduces a beginner's guide to genes and sequences at NCBI.

Webinar prompted by librarians at Fanshawe College in Canada to assist in finding sequences using NCBI.

NCBI's new search experience de-silodizes results, making it easier to find relevant information without knowing the specific database.

Demonstration of live searches using the Entrez system and BLAST for sequence similarity.

Explanation of the central dogma of molecular biology and how NCBI represents this information.

Introduction to the APRT gene, its transcript variants, and protein isoforms.

Differentiation between GenBank and NCBI Reference sequences, including their curation and redundancy.

Discussion on the representation of gene sequences in the NCBI genome browser.

Use of the Orthologs button as a partial replacement for BLAST to find related proteins and sequences.

Live demonstration of a sequence comparison using the multiple sequence alignment tool COBALT.

Explanation of the process to find and analyze adenine ribosyltransferase enzyme and its gene using known item search.

Traditional nucleotide search demonstration with filtering for human refseqs.

Illustration of the genomic context of the APRT gene using the genome browser.

Highlighting the RefSeq Select project for selecting preliminary representative transcripts.

Procedure for finding related sequences in other organisms using the protein and Orthologs button.

Demonstration of a broad spectrum gene search and subsequent protein alignment using multiple organisms.

Use of BLAST for aligning nucleotide sequences and obtaining statistical significance of alignments.

Final task involves a regular BLAST search to find matches in bacteria for the APRT sequence.

Direct linking from protein sequences to their structures using iCn3D for visual analysis.

Peter Cooper provides resources for further learning about NCBI tools and encourages questions from attendees.

Transcripts
Rate This

5.0 / 5 (0 votes)

Thanks for rating: