Chronic hepatitis C its defective gene interpretation, and designing of primer by using bioinformatics tools

 Chronic Hepatitis C Disease

Introduction

Hepatitis C

Hepatitis C is an infection of the liver that results from the Hepatitis C virus. Acute Hepatitis C refers to the first several months after someone is infected. Acute infection can range in severity from a very mild illness with few or no symptoms to a serious condition requiring hospitalization. For reasons that are not known, about 20% of people are able to clear, or get rid of, the virus without treatment in the first 6 months of chronic Hepatitis C:

Chronic Hepatitis C virus

Hepatitis C virus infection is one of the main causes of chronic liver disease worldwide. The long-term natural history of HCV infection is highly variable. The hepatic injury can range from minimal histological changes to extensive fibrosis and cirrhosis with or without hepatocellular carcinoma. There are approximately 71 million chronically infected individuals worldwide, many of whom are unaware of their infection, with important variations according to the geographical area. Clinical care for patients with HCV-related liver disease has advanced considerably during the last two decades, thanks to an enhanced understanding of the pathophysiology of the disease, and because of developments in diagnostic procedures and improvements in therapy and prevention.

Gene

NFE2L2:

This gene encodes a transcription factor which is a member of a small family of basic leucine zipper (bZIP) proteins. The encoded transcription factor regulates genes that contain antioxidant response elements (ARE) in their promoters; many of these genes encode proteins involved in response to injury and inflammation which includes the production of free radicals. Multiple transcript variants encoding different isoforms have been characterized for this gene.


Protein 

mutation

1.2.Mutation:


The disease is caused by mutations affecting the gene represented in this entry.

An early onset multisystem disorder characterized by immunodeficiency, recurrent infections, developmental delay, poor growth, intellectual disability, and hyperhomocysteinemia. Some patients manifest congenital cardiac defects. The IMDDHH inheritance pattern is autosomal dominant.

2.  Identification of Mutations in nucleotide and protein sequence

·        Normal Protein Sequence


·                              Mutated Sequence


Normal nucleotide sequence



Mutated sequence

a

A      Annotation 

Disease

Gene

Protein

Associated Proteins

Mutation

Chronic Hepatitis Disease

NFE2L2

Nuclear factor erythroid 2-related factor 2 (Nrf2)

MAFG

MAFF

MAFK

JUN

HMOX1

HMOX2

PRKCA

KEAP1

CRYZ

GSTA2

Missense

G81S


Methodology

1.1.          TOOL 1

CLUSTALW is a multiple sequence alignment tool that is also used to find out the sequence similarity between multiple sequences. Input format: FASTA, Pearson, PIR, EMBL, GDE, and GCG. Output format: CLUSTAL, GDE, Phylip, etc. Pairwise alignment parameters gap penalty-3, number of top diagonals 5, scoring method Pearson. Multiple alignment parameters; open gap penalty 10, gap extension penalty 0.1, weight matrix BLOUSM for protein, IUB for DNA. [4]

Working:

1.      First I retrieved the sequence of NFE2L2 Gene from the NCBI i.e. FASTA format.

2.      Added the normal and mutated sequence of RB1 into the sequence alignment box for alignment. In this tool, you can add multiple sequences in a one-sequence alignment box

3.      Then click on the execute multiple sequence alignment.


Result

In my case, CLUSTALW didn’t show the desired results. It shows an error because the normal and mutated sequences both contain the same accession number with only one nucleotide change. In the CLUSTALWmultiple sequences with different accession numbers can be added so that one can get the desired results that’s why I didn’t get the desired results through this tool.

1.1.          TOOL 2

CLUSTAL Omega is another tool that is used to align multiple sequences and tells about the sequence similarity. It is also used to check the evolutionary relationship between the species. This tool can align sequences about 4000 sequences. You can add either the protein, DNA, or RNA sequences to align the sequences. [4]

Working:

1. First, retrieved the sequence from the NCBI in FASTA format.

2.  Entered the input sequences and selected the DNA.

3. Set my parameters with the output format CLUSTALW with character count.

4. Then it was submitted.

Result

I didn’t obtain desired results after executing this tool because it requires sequences with different accession numbers. As I have inserted mutation in the normal nucleotide sequence, so both sequences have the same accession number. That is why this tool hadn’t given the desired results,

1.1.          TOOL 3

·        T COFFEE

T coffee is a multiple sequence alignment tool that is used to check the sequence similarity of multiple sequences at a time. T coffee can align the protein, RNA, and DNA sequences. In T coffee sequence length cannot exceed the character length of 2500 bp. 

Input format: FASTA 

Output format: HTML [5].

Working

·         In the T coffee tool, the input sequence is added in the FASTA format, and in the “sequence to align box” multiple sequences about which you want to check the sequence similarity and then run the program.

·         The alignment shows different colors that indicate the region where the sequence is best matched or not. The pink color shows a good match, yellow shows an average match, and green shows a bad match or sequence are not matched [5].

· I added my FASTA format of the NFE2L2 gene of normal and mutated sequence in the sequence alignment box and submitted my work to get the result.

Result:

The sequence length that I added is 2,988 bp and we cannot add a sequence of more than 2500 bp in length so I didn’t find the desired results.

1.1.          TOOL 4:

BLAST stands for basic local alignment search tool and is used to find out the local sequence similarity between the sequences of same gene or protein or nucleotide in different species and calculates that how much sequence is matched, mismatched or gapes are present. This is tool is also used to identify the evolutionary relationship and help to find the gene families.

Formats

·         Input formats: FASTA or GENE BANK and weight matrix.

·         Output formats: HTML, plain text, and XML formatting [6].

Working

1.      Firstly, add the query sequence about which you want to get information.

2.      Select “align two or more sequences” option

3.      Add the sequence in the subject which you want to compare.

4.      After doing this select highly similar sequence if you want to check the similarity

5.      If you want to check the dissimilar sequences select discontiguousmegablast and so on.

6.      Now click on the BLAST so that system will run the program to find out the results.

7.      The results tell how much the sequences are similar meaning the percentage identity how much the total identities are present and what is the query coverage and E value and how much the gaps matched and mismatched values are present.

8.      There are different forms of alignment like pairwise, pairwise with dot identities, Query-anchored with dot identities, etc. You can also draw the dot plot with the help of Blast.

In my case, I added NFE2L2 typical sequence in the query box and mutated the sequence in the subject and selected the highly similar sequences, and clicked on the blast option [8].

Result

The total length of the sequence is 2988 in which 99 percent are match identities 0 gaps and 1 mismatch the mismatch is at position 241 in the sequence where G is replaced with the A. alignment that I used is pairwise alignment with dot identities [8].

Description:

·         Maximum score shows the highest alignment of the query sequence with the subject sequence.

·         E value tells us the background noise. E value describes the number of hits one can expect to see by chance when searching a database of a particular size.

·         Percentage identity of the sequence was 99%. A 1% difference is present between the query and subject sequence.

·         Accession number is a unique identity that is given to biological sequence.

Graphical summary:

Red color shows the higher identity and black color shows the lower identity on top. The horizontal red line shows the query sequence representation.

·         Red Bar shows most similar sequences.

·         Pink Bar shows less good match.

·         Green Bar shows not impressive match.

·         Blue Bar shows worst score.

·         Black Bar shows bad hits.

Similarity sequence is more than 200% which means query sequence is more similar to subject sequence. But similarity is not 100% because one mismatch is present there.


 Dot Plot:

There are no gaps in the sequence. So the straight line is obtained which shows maximum similarity.


 


The School of California St Scratch Cruz (UCSC) Genome Program is a notable Web-based instrument for quickly showing a referenced piece of a genome at any scale, joined by a movement of changed remark "tracks". The clarifications delivered by the UCSC Genome Bioinformatics Social affair and outside partners show quality estimates, mRNA and imparted progression mark courses of action, clear nucleotide polymorphisms, explanation, and authoritative data, total and assortment data, and pairwise and various species relative genomics data. Everything information relevant to a region is presented in one window, working with regular assessment and interpretation. The informational index tables concealed in the Genome Program tracks should be visible, downloaded, and controlled using another Web-based application, the UCSC Table Program. Clients can move the data as custom remark tracks in the two projects for research or educational use. This unit portrays how to include the Genome Program and Table Program for genome assessment, download the fundamental informational collection tables, and make and show custom remark tracks[9].

Moves toward utilizing the genome browsers

Following are the steps involved in the working of the UCSC genome browser

        i.            First open the browser and click on the genome browser and visualize the genome data.

      ii.            Then click on Asia or other option depending on the location.

    iii.            Now select the latest version and entre position, gene symbol or other search term that you want to search.

    iv.            Suppose I entre the RB1 gene in the human and click on the search button.

      v.            The result shows a graphical summary of the RB1 gene where every steps show the annotation of the gene.

The result shows the expression of gene, that RB1 gene is located on the chromosome 2 at q31.2 regions.


This region shows introns and exons and NFE2L2 contain 5 exons and 4 introns.



The results show the expression of genes in different tissue and organs. It also shows the high expression of genes in a specific area. For example, in this case, the high expression occurs in Esophagus- Mucosa.


The following different colors show the presence of regulatory elements in the given gene 


This result shows the similarity regions. these thick region shows that the gene is present in different organisms. For example, the highest thickness is present in Rh



Thedifferent region indicated by lines are single Nucleotide markers in this gene. NFE2L2 contain 151 SNPs.



 

 

 


Phylogenetic trees are constructed to show the evolutionary relationship between different organisms. The phylogenetic tree may be rooted or unrooted. Rooted tree tells about the ancestor.

Phylogenetic tree contains

·         Outs (operational taxonomic units) or nodes such as internal or hypothetical nodes.

·         Internal or external branch length.

·         clades

·         Tree topology

·         Outgroup (mostly ancestor)

·         Scaled tree (branch length constant, convey no message)

·         Unscaled tree (branching length not constant and its tells about the information about the evolution period that how much time it takes to evolve)

·         Orthologous (same gene in different organisms)

·         Paralogous (gene duplication in same organism)

I took the NFE2L2 gene, collected the sequence of this gene in 50 different species, and construct the phylogenetic tree through MEGA X software.

·         There are about 20 classes in the phylogentic tree. In clade 1, macaca  nemstrina is closely related to theropithecus gelada, both of these species contain the isoform 2 of NFE2L2 gene.

·         Macaca fascicularis is related to macaca nemestrina and theropithecus Gelada as compared to chlorocebus sabaeus.

·         Clade 2 contains two species i.e pongo Abelii and nomascus leucogenys, which are closely related to each other. Clade 2 is closely related to clade 1 as compared to other clades.

·         In Clade 3, piliocolobus tephroscelesbis closely related to Colobus angolensis palliatus and this clade has close resemblance to clade 2.

·         Clade 4 consists of three species in which aotus nancymaae and callithrix bacchus have close resemblance. Clade 4 is closely related to clade 3.

·         Clade 5 contains four species namely pan paniscus, homo sapiens, gorilla and pan troglodytes. Gorilla and pan troglodytes have close resemblance, while homo sapiens show similarity with pan paniscus.

·         Clade 6 consists of only one specie and this clade is closely related to cllade 5.

·         Clade 7 contains only two species which are closely related and this clade have resemblance to clade 6.

·         In clade 8, orcinus orca is closely related to monodon monoceros.

·         Clade 9 is closely related to clade 10 and both of these clades contain only one species.

·         In clade 11, tursiops truncatus have a close resemblance to the ropithecus Gelada.

·         Clade 11 is closely related to clade 12 which contains homo sapiens that has a variant of NFE2L2 gene.

·         Clade 13 contains only two closely related species and this clade is closely related to clade 12.

·         In clade 14 there are about four species out of which Acinonyx jubatus is closely to Lnyx canadensis and felix cactus is least closely related to these two species.

·         In clade 15, Marmota flaviventris is closely related to urocitellus parryii. In this clade pan troglodytes is more related to gorilla as compared to marmota flaviventris.

·         Clade 15 is closely related to clade 16 and 17.

·         In clade 16 macaca nemstrina is closely related to theropithecus Gelada.

·         In clade 17 the two species of the genus Rhinopithecus i.e Rhinopithecus roxellana and Rhinopithecus bieti are closely related to each other.

·         Clade 16 and 17 have close resemblance to each other.

·         Clade 18 consists of only one species and this clade have close resemblance to clade 10 and 11.

·         In clade 19, globicephla melas is closely related to turciops truncatus. Nomascus leucogenys, that contain an isoform 1 of NFE2L2 gene, have less resemblance to turciops truncatus and globicephla melas. This clade is closely related to clade 20.

·         Clade 20 comprises of only two closely related species i.e. Lipotes vexillifer and Muntiacus muntjak.

A primer is a short synthetic oligonucleotide, which is used in many molecular techniques from PCR to DNA sequencing.  These primers are designed to have a sequence, which is the reverse complement of a region of template, or target DNA to which we wish the primer to anneal.

Primer 3 Plus is a tool that picks primers from a DNA sequence.

Steps 

·         Then I opened the Primer 3 plus tool and pasted my sequence in box given.

·         After that, I clicked on the pick primers button.

·         After that, I got the available left and right primers for the selected gene that is NFE2L2.

·         After that, I opened the sequence manipulation suite to get the best primer from the DNA sequence depending upon the following conditions:

        i.            Length should be from 18-25 base pairs.

      ii.            Base composition should be from 45-55% GC

    iii.            Melting Temperature should be 55-80 degrees Celsius

·         Took the DNA sequence of NFE2L2 gene from NCBI.



·         Inserted the sequence in the box given in Primer 3 Plus tool to get primers of the desired sequence and obtained the following r

  i got five primer pairs for my gene sequence.

Ø  In the above sequence, the purple color is showing the left primer and the yellow one is showing the right primer.

 



The Sequence Manipulation Suite is a collection of JavaScript programs for generating, formatting, and analyzing short DNA and protein sequences. It is commonly used by molecular biologists, for teaching, and for program and algorithm testing.

·         After getting left and right primers form primer 3 plus tool, I opened the sequence manipulation suite to select the best primer for my gene sequence.

·         For this purpose I opened the PCR primer Stats option given in the list on SMS and I entered the sequences of all he left primers of my gene sequence.

·         I got the following results that is the primer with be best properties.




According to the above result, the left primer of pair five is the best primer because it has:

  Primer sequence:  CGGTATGCAACAGGACATTG


·         Sequence length: 20


·         Base counts: G=6; A=6; T=4; C=4; Other=0;


·         GC content (%): 50.00


·         Molecular weight (Daltons): 6166.08


·         nmol/A260: 5.02


·         micrograms/A260: 30.94


·         Basic Tm (degrees C): 52


·         Salt adjusted Tm (degrees C): 47


·         Nearest neighbor Tm (degrees C): 62.06




Si      single base runs: Pass

·         Dinucleotide base runs: Pass

·         Length: Pass

·         Percent GC: Pass

·         Tm (Closest neighbor): Cautioning: Tm is more noteworthy than 58;

·         GC clamp: Pass

·         Self-annealing: Pass

·         Hairpin formation: Pass

·        

·     Protein structure prediction

·         3.7.1.CFSSP[15]

·         CFSSP is used to predict the secondary structure of a protein [15].

Normal



Interpretation

In this picture 

Ø    red color shows helix 

Ø    green shows sheets, 

Ø    blue shows turn and

Ø    yellow shows coils.

2D structure of NFE2l2 gene contains:

Mutated



In this mutated 2D structure 

Ø  red,

Ø  green, 

Ø  blue and 

Ø  yellow color 




shows helix, sheets, turns and coils respectively.

 


 

2D structure of the RB1 protein contains, 45 coils, 68 turns, 106 alpha helixes and 100 beta sheets.

Total no of H residues is 691 in mutated sequence as compared to the normal because of one mutation but the coils, turns, and sheets are same in both cases but the alpha helix are different in both in mutated helix is 106 and in normal it is 112.

 

Swiss model

In NFE2L2 protein 



No of models is 9 select model with highest sequence similarity in my case sequence similarity is 100 percent.

Templates are 50

Oligo state is monomer 

Ligand is 0 in this case 

Swiss model gives us a 3d model of a protein.



If a value of a model is greater than 0.5 it is ideal in my case it lies between 0.5 and 1.

 

 

 GOR TOOL:

Normal



Interpretation

This picture shows the secondary structure of a protein the red shows the alpha helix and the blue shows the beta sheets.


Sequence length : 605
GOR4:

      Alpha helix: 205 is 33.88%

   310 helix (Gg): 0 is 0.00%

   Pi helix (Ii): 0 is 0.00%

   Beta bridge(Bb): 0 is 0.00%

   Extended strand (Ee): 68 is 11.24%

   Beta turn (Tt): 0 is 0.00%

   Bend region  (Ss): 0 is 0.00%

   Random coil (Cc): 332 is 54.88%

   Ambiguous states (?): 0 is 0.00%

   Other states: 0 is 0.00%

Mutated



Interpretation

This picture shows the secondary structure of a protein the red shows the alpha helix and the blue shows the beta sheets.



Sequence length: 605


GOR:


   Alpha helix (Hh): 205 is 33.88%


   310 helix (Gg: 0 is 0.00%


   Pi helix (Ii): 0 is 0.00%


   Beta bridge (Bb): 0 is 0.00%


   Extended strand (Ee): 68 is 11.24%


   Beta turn (Tt): 0 is 0.00%


   bend region(Ss): 0 is 0.00%


   Random coil (Cc: 332 is 54.88%


  Ambiguous states (?): 0 is 0.00%


   other states: 0 is 0.00%

Modeller

Modeller is a software which is used to view the best 3D model of a protein or a gene.



select the model with the lowest molpdf value and then open Tctex file with low molpdf file into the chimera tool to get the best 3D model of the desired protein in this case the protein is NFE2L2.



 

[1] Asma AA Zahidi, J. M. (2017). Chronic Hepatitis C: an optometrist’s perspective. Clinical optometry, 9, 123-131. Retrieved October 5, 2019

[2] Rachel J. Watkins, M. G. (2012). The Role of NFE2L2 in Idiopathic Infantile Nystagmus. Journal of Ophthalmology, 1-7. Retrieved October 05, 2019

[3] National Center for Biotechnology Information. ClinVar; [VCV000263089.1], https://www.ncbi.nlm.nih.gov/clinvar/variation/VCV000263089.1 (accessed Oct. 5, 2019).

[4] ‘Multiple Sequence Alignment - CLUSTALW’. [Online]. Available: https://www.genome.jp/tools-bin/clustalw. [Accessed: 22-Oct-2019].

[5] ‘Clustal Omega < Multiple Sequence Alignment < EMBL-EBI’. [Online]. Available:

https://www.ebi.ac.uk/Tools/msa/clustalo/. [Accessed: 22-Oct-2019].

[6] ‘T-COFFEE Multiple Sequence Alignment Server’. [Online]. Available: http://tcoffee.crg.cat/. [Accessed: 22-Oct-2019].

[7] ‘BLAST: Basic Local Alignment Search Tool’. [Online]. Available: https://blast.ncbi.nlm.nih.gov/Blast.cgi. [Accessed: 22-Oct-2019].

[8] ‘NCBI Blast: NM_000321.2 Homo sapiens RB transcriptional’. [Online]. Available:

https://blast.ncbi.nlm.nih.gov/Blast.cgi#Query_56969. [Accessed: 22-Oct-2019].

[9] D. Karolchik, A. S. Hinrichs, and W. J. Kent, ‘The UCSC Genome Browser’, in Current Protocols in Bioinformatics, Hoboken, NJ, USA: John Wiley & Sons, Inc., 2009.

 

Contents

1.    Introduction: 1

Hepatitis C: 1

Chronic Hepatitis C: 1

1.1.     Gene: 1

1.2.     Protein: 2

1.3.     Mutation: 2

2.    Identification of Mutations in nucleotide and protein sequence. 2

3.    Annotation. 4

4.    Methodology. 4

CLUSTALW... 4

CLUSTAL Omega. 5

BLAST. 7

Formats. 8

Working. 8

Result 9

Graphical summary: 9

5.    Gene Information. 11

5.1.     UCSC Genome Browser 11

Steps to use genome browser 11

Result 11

Exon and Intron. 12

Expression of Gene. 12

Regulatory Elements Present in Gene. 13

Similarity with other organisms. 13

SNP’S. 14

6.    Mega X.. 16

6.1.     Phylogenetic Tree. 16

7.    primer designing. 17

7.1.     Primer 3 plus. 17

7.2.     Sequence Manipulation Suite. 19

References. 20

 

Comments

Post a Comment

Popular posts from this blog

Negative self talk and health

Role of microbiota in health and diseases