Murray_H
Tank
- Joined
- May 14, 2003
- Messages
- 5,964
- Reaction score
- 0
I'm doing molecular biology and biochemistry, look at my assignment I have to do for one module:
I don't even know where to start :|
Biotechnology Module Assessment 2005-6
The accompanying figures give maps and partial DNA sequence data for the yeast expression vector pGAPZA (a shuttle plasmid, propagated in E.coli, which is incorporated into the yeast genome by homologous recombination on transformation into Pichia pastoris), and the cDNA cloning vector pBlueScript SK- (derived from the lambda vector ZAPII by plasmid excision). In addition, sequence data for a transcript produced by Drosophila melanogaster gene CG13095, thought to encode an aspartic proteinase, is given. The transcript encodes a preproprotein; the signal peptide is removed co-translationally, and the proteinase is activated by post-translational removal of an N-terminal pro-region. Assume a full-length cDNA corresponding to the CG13095 transcript has been cloned into pBluescript SK- using a directed cloning strategy, which adds linkers to the 5’ and 3’ ends of the cDNA as shown in the strategy diagram given.
You wish to investigate the specificity of proteolysis of the aspartic proteinase encoded by CG13095. Describe a strategy for using Pichia pastoris to produce recombinant proproteinase, and for purifying the recombinant protein for subsequent activation and assay. Your final product can have extra residues at the N-terminus compared to the protein produced in Drosophila, because these extra residues do not interfere with proprotein processing, and are removed when the proprotein is activated. A C-terminal “tag” is also acceptable, as results with expression of similar enzymes as recombinant proteins have suggested it does not interfere with activity.
Give sufficient details of the operations you propose to enable a competent but unimaginative technician to carry out the experiments. Describe the expected results after each step, and outline how you would verify that each step of the procedure had proceeded as expected. You should give exact details of DNA sequences used, and provide (partial) sequence data for the final expression construct which shows how the coding sequence fits into the expression vector.
Why do you think this expression host and vector were chosen? Can you see any potential problems with the system?
(Hint 1: you will need to assemble a composite sequence showing how the protein and nucleic acid sequences fit together. You can do this by electronic or physical cut and paste.)
(Hint 2: a font with uniform character widths – Monaco or Courier – makes producing a “lined-up” nucleotide and amino acid sequence a lot easier . . .)
(Hint 3: start with the cDNA assembled into its cloning vector. You can use this as a basis for preparing restriction fragments for restriction-ligation, or as a template for PCR. Assume your technician knows about cloning PCR products into an intermediate vector – such as pCR2.1 – for checking, and to avoid having to ligate directly after PCR and restriction.)
(Hint 4: the whole business is very much easier if you work out what you want to end up with first . . .)
EMBL database entry for transcript produced by Drosophila melanogaster gene CG13095.
ID BT016133 standard; mRNA; INV; 1209 BP.
XX
AC BT016133;
XX
SV BT016133.1
XX
DT 28-OCT-2004 (Rel. 81, Created)
DT 28-OCT-2004 (Rel. 81, Last updated, Version 1)
XX
DE Drosophila melanogaster GH11417 full insert cDNA.
XX
KW FLI_CDNA.
XX
OS Drosophila melanogaster (fruit fly)
OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; Neoptera;
OC Endopterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea;
OC Drosophilidae; Drosophila.
XX
RN [1]
RP 1-1209
RA Stapleton M., Carlson J., Chavez C., Frise E., George R., Pacleb J.,
RA Park S., Wan K., Yu C., Rubin G.M., Celniker S.;
RT ;
RL Submitted (27-OCT-2004) to the EMBL/GenBank/DDBJ databases.
RL Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory,
RL One Cyclotron Road, Berkeley, CA 94720, USA
XX
CC Sequence submitted by:
CC Berkeley Drosophila Genome Project
CC Lawrence Berkeley National Laboratory
CC Berkeley, CA 94720
CC This clone was sequenced as part of a high-throughput process to
CC sequence clones from Drosophila Gene Collection (Rubin et al.,
CC Science 2000, Stapleton et al., Genome Biology 2002). The sequence
CC has been subjected to integrity checks for sequence accuracy,
CC presence of a polyA tail and contiguity within 100 kb in the
CC genome. Thus we believe the sequence to reflect accurately this
CC particular cDNA clone. However, there are artifacts associated with
CC the generation of cDNA clones that may have not been detected in
CC our initial analyses such as internal priming, priming from
CC contaminating genomic DNA, retained introns due to reverse
CC transcription of unspliced precursor RNAs, and reverse
CC transcriptase errors that result in single base changes. For
CC further information about this sequence, including its location and
CC relationship to other sequences, please visit our Web site
CC (http://fruitfly.berkeley.edu) or send email to [email protected].
XX
FH Key Location/Qualifiers
FH
FT source 1..1209
FT /db_xref="taxon:7227"
FT /mol_type="mRNA"
FT /organism="Drosophila melanogaster"
FT /strain="y; cn bw sp"
FT /map="29D2-29D2"
FT gene 1..1209
FT /note="blastp alignment with CG13095-RA"
FT /gene="CG13095"
FT CDS 22..1140
FT /codon_start=1
FT /db_xref="FLYBASE:FBgn0032049"
FT /db_xref="GOA:Q9VLK3"
FT /db_xref="InterPro:IPR001461"
FT /db_xref="InterPro:IPR001969"
FT /db_xref="InterPro:IPR009007"
FT /db_xref="UniProtKB/TrEMBL:Q9VLK3"
FT /note="Longest ORF"
FT /gene="CG13095"
FT /product="GH11417p"
FT /protein_id="AAV37018.1"
FT /translation="MFKTIAVVVLLAALASAELHRVPILKEQNFVKTRQNVLAEKSYLR
FT TKYQLPSLRSVDEEQLSNSMNMAYYGAISIGTPAQSFKVLFDSGSSNLWVPSNTCKSDA
FT CLTHNQYDSSASSTYVANGESFSIQYGTGSLTGYLSTDTVDVNGLSIQSQTFAESTNEP
FT GTNFNDANFDGILGMAYESLAVDGVAPPFYNMVSQGLVDNSVFSFYLARDGTSTMGGEL
FT IFGGSDASLYSGALTYVPISEQGYWQFTMAGSSIDGYSLCDDCQAIADTGTSLIVAPYN
FT AYITLSEILNVGEDGYLDCSSVSSLPDVTFNIGGTNFVLKPSAYIIQSDGNCMSAFEYM
FT GTDFWILGDVFIGQYYTEFDLGNNRIGFAPVA"
XX
SQ Sequence 1209 BP; 275 A; 372 C; 288 G; 274 T; 0 other;
agtagactag aaaccaagag aatgttcaag accatcgctg tagtagtgct cctggcagcc 60
ctggccagtg ccgagctcca tcgcgtgccg atcctcaagg agcagaactt tgtgaagacg 120
cgtcagaatg ttttggccga gaaatcctat ctgcgcacca agtaccagct gccctcgctt 180
cgcagcgtgg atgaggaaca gctgtccaac tcgatgaata tggcttacta cggagccatc 240
tccatcggaa ctcccgctca gagcttcaag gttctgttcg actcaggctc ctcgaacctg 300
tgggtgccat cgaacacctg caagagcgat gcctgcctga cccacaacca gtacgactcc 360
agcgccagct ccacctacgt ggccaacggc gaatccttct ccatccagta tggcactggc 420
agcctaactg gctacctgtc caccgatacc gtcgacgtca atggcctgag catccagagc 480
cagacctttg ctgaatccac caacgagccg ggcaccaact tcaacgatgc caacttcgat 540
ggcattctcg gtatggccta tgagtccctg gccgtggatg gtgtggctcc tccgttctac 600
aacatggtgt cccagggtct ggtcgacaac tccgtcttct cgttctatct ggcccgcgat 660
ggcacctcca ctatgggtgg tgaactcatc ttcggtggct ccgatgcctc tctgtactct 720
ggcgctctga cctacgttcc catctcggag cagggctact ggcagttcac catggctgga 780
tcctccattg acggttactc gctgtgtgat gattgccagg ctattgccga taccggcacc 840
tccctgatcg tggctcccta caatgcttac attaccctct ccgagatcct gaacgtgggt 900
gaggatggct atctggactg ctcaagcgtc agctccctgc ccgatgtcac cttcaacatc 960
ggtggcacca acttcgtcct gaagccctcg gcctacatca tccagtcgga cggcaactgt 1020
atgtccgcct tcgagtacat gggcaccgac ttctggattc tgggcgatgt cttcattggc 1080
cagtactaca ccgagttcga tttgggcaac aaccgcattg gcttcgcccc agtcgcctaa 1140
ttaacgatcc gcaattacga atcaataaag aaacgagctc ttaaaaaaaa aaaaaaaaaa 1200
aaaaaaaaa 1209
//
Notes
1) “CDS” above refers to the coding sequence, which runs from nucleotides 22 (start codon) to 1140 (stop codon).
2) The sequence given corresponds to a full length cDNA. (Note the poly(A) tail at the 3’ end of the mRNA.) Assume the linkers for cDNA cloning fit directly onto the sequence given.
3) In the translated sequence given, amino acids 1-17 correspond to a predicted signal peptide (see http://www.cbs.dtu.dk/services/SignalP/), which is removed co-translationally in vivo (in Drosophila). The protein product after translation starts at amino acid 18.
I don't even know where to start :|