03. Bash scripting

Practice

Author: Dr. Alejandra Rougon

Creative Commons License
Este trabajo está bajo la licencia Creative Commons Atribución-NonComercial 4.0 Licencia Internacional.

For this practice go to the folder Analysis that you have created on the previous activity.

Download the following files and upload them to your virtual terminal
Hp1.fasta (this file was used in the previous activity)
Hp2.fasta
Hp3.fasta

🚴 Exercise 1

In the previous activity you answered the following questions for the file Hp1.fasta.

a. How many records does Hp1.fasta have?

b. How many of those records are RxLR proteins?

c. How many of those records are cysteine-rich proteins?

d. How many of the RxLR proteins belong to the strain Emoy2?

  1. Now, create a Bash script called Hp1_yourname.sh that answers the previous questions. Make sure it uses at least one variable.

🚴 Exercise 2

  1. Make a copy of your script with the name Hploop_yourname.sh and modify it so it iterates over the files Hp1.fasta, Hp2.fasta, and Hp3.fasta to answer the questions of exercise 1.

🚴 Exercise 3

  1. If you have access to an HPC cluster or remote server and you have your user name and password it is time to practice uploading your script from exercise 2 to the HPC cluster to your home directory via scp. Connect to the cluster with ssh to verify your file is there.

🚴 Exercise 4

We have performed homology searches using BLAST with the scaffolds of a recently sequenced genome of a phytopathogenic nematode against the chromosomes of the model organism Caenorhabditis elegans. We have 6 tabular BLAST results files. Each one containing the sequences that are similar [hits] to each one of C.elegans chromosomes.

blastnCeChr1.tab
blastnCeChr2.tab
blastnCeChr3.tab
blastnCeChr4.tab
blastnCeChr5.tab
blastnCeChrX.tab

We want to know how many hits has each scaffold got to each one of the chromosomes. The scaffolds are indicated in the first column and the chromosomes in the second one. To answer this question we have to count for each scaffold (there are 5 scaffolds) how many times each chromosome appears in the same line.

  1. Write a Bash script that tells you how many hits has each scaffold got for each chromosome (each file contains a different chromosome).

  2. Again if you have access to an HPC cluster try to upload your file.


Thanks for completing this activity!