Characterization of Drought Responsive Genes of CIPK Families in Rice, Maize and Sorghum

The CIPK gene family plays a key role in plant development and in stress signal transduction. The orthologs for experimentally proven drought stress responsive CIPKs from rice was identified in maize and sorghum. A total of 49 genes from the three species were analyzed for their phylogenetic relationship, gene structure, expression level and tissue specificity upon drought stress. The drought stress tolerance specificity of intronless group of CIPKs and multi-stress responsive nature of intron rich CIPKs were identified. The group level characterization revealed the similarity in function among Group I and Group II (A) CIPKs as the number and distribution of motifs were similar. Functional similarity of the genes was analyzed by in-silico expression analysis using publicly available data and which confirmed the drought responsiveness of 31 genes as they had similar expression level and it also shows the conservation of functions between species.


Introduction
Climate change induced by emission of green house gas can cause severe drought [1,2]. The higher plants adopt numerous mechanisms to cope with drought. The role of protein kinase gene family in drought stress response was revealed by many genome-wide gene expression profiling studies and pointed that the drought stress response given by them are efficient, fast-acting and reversible [3][4][5] responses to many environmental stresses such as salinity, cold and drought [6]. The changes in concentration levels of Ca 2+ are recognized by several Ca 2+ binding proteins including calmodulin (CaM), calmodulin like proteins (CMLs), Ca 2+ -dependent protein kinases (CDPKs) and calcineurin B-like proteins (CBLs) and results in downstream responses [7,8]. CDPKs are exceptional in this category as they have a kinase domain and other three Ca 2+ sensors had no enzymatic domains. Except CDPKs, other Ca 2+ sensors interact with their respective target proteins and modulate their activity [8]. Whereas, CDPKs serves as special sensor as they directly initiate the downstream phosphorylation events up on Ca 2+ binding due to the presence of CaM like and protein kinase domains [9]. The target protein of CBLs is referred to as CBL-interaction protein kinases (CIPKs) [7] and is also known as SnRK3. CIPK proteins consist of a conserved N-terminal kinase domain followed by junction domain and C-terminal regulatory domain. The Ca 2+ bound CBL proteins interact with target protein CIPK through a conserved NAF/ FISL motif at the C terminal regulatory domain of CIPK and activate its catalytic activity [10]. Total of 33 CIPKs was iden-tified in rice through bioinformatics analysis [11,12], 43 CIPKs are identified in maize [13] and 32 CIPKs are identified in sorghum [14]. CIPKs are reported to be expressed in response to various stresses. Over expression of OsCIPK23 improved drought tolerance in rice [15]. In Arabidopsis AtCIPK24 and AtCIPK7 contribute to salt and cold stress [16,17]. A cotton CIPK gene GhCIPK6 was over expressed in Arabidopsis and found that the tolerance of the plant increased in drought stress [18].
database of expression data collected from variety of public repositories including Gene Expression Omnibus [28] and Array Express [29] (https://www.ebi.ac.uk/arrayexpress/). The gene expression at various development stages were observed for all the species. For rice, Affymetrix Rice Genome Array platform was used with water deficit microarray datasets GSE6901, GSE14275, GSE26280, E-MEXP-2401, GSE25176, GSE23211, GSE31077, GSE42683, GSE81253, GSE41647, GSE80246, GSE83378, and GSE57154. For maize, Affymetrix Maize Genome Array and mRNA-Seq Gene Level Zea Mays (ref: AGPV4) platforms were used with water deficit microarray datasets GSE16567, GSE43088 and GSE59533. In the case of sorghum the selected genes were analyzed for expression at various growing stage by using the Affymetrix Whole-transcriptome Sorghum Array platform with dataset GSE49879. The drought stress expression profile was analyzed using dataset GSE80699. The seven development stages in rice were considered for the analysis and they were seedling, tillering, stem elongation, booting, heading, flowering and milk stages. Three stages were considered in maize viz., seedling, stem elongation and anthesis. Five stages were included in sorghum analysis viz., seedling, stem elongation, booting, flowering and dough stages.

Analysis of protein properties
The nature of the proteins was analyzed by using physiochemical properties such as isoelectric point (pI), molecular weight, instability index and grand average of hydropathy (GRAVY). The pI value of the proteins ranged from 5.32 to 9.53 (Table 1). It shows that CIPK proteins have heterogeneous nature. Molecular weight of the proteins ranged from 40347.82 to 110358.1 Da. The variation in molecular weight among the members of the same group exists due to the variable number of domains contributing to protein size difference [30]. Except A0A1D6N844 all the other proteins exhibited hydrophilic nature since they had a negative GRAVY score [31]. Instability index showed that 26.5% proteins were unstable and 73.5% proteins were stable.

Phylogenetic analysis
To understand the evolutionary relationship of the drought responsive genes of CIPK gene family a rooted phylogenetic tree was constructed ( Figure 1). Two major groups were identified from the analysis and the Group II had subdivision. Group I consisted of 3 members of rice, 5 members of maize and 3 members of sorghum. This group was found to be the smallest group. Group II (A) consisted of 3 members of rice, 7 members of maize and 3 members of sorghum. The largest group was Group II (B) which consisted total of 25 members; 5 from rice, 13 from maize and 7 from sorghum. The estimated value of the shape parameter for the discrete Gamma Distribution was 0.7910. Total of 5 categories were considered in the analysis of sites. Mean evolutionary rates in these categories were 0.07, 0.29, 0.63, 1.20, 2.81 substitutions per site. As the shape parameter is small most of the sites evolved very slowly in the evolutionary tree [32].

Datasets
Genomic, CDS and protein sequences of rice genes were retrieved from Rice Genome Annotation Project (http://rice. plantbiology.msu.edu). Ensembl Plants (http://plants.ensembl.org) and NCBI (https://www.ncbi.nlm.nih.gov) were used for retrieving maize and sorghum gene sequences. The retrieved protein sequences of rice were subjected to a BLASTp (https://blast.ncbi.nlm.nih.gov) analysis to find the orthologous genes in maize and sorghum by means of reciprocal best hit approach. The protein sequences which showed identity ≥ 75% were considered as orthologous. The orthologous genes for CIPK 20, CIPK 22, CIPK 29 and CIPK 30 were not identified in sorghum and maize. Therefore a total of 49 protein sequences were analyzed further which included the homologues sequences of rice in sorghum and maize.

Phylogenetic analysis
The multiple sequence alignment of 49 full length protein sequences of all the three species were constructed using CLUSTALW [20]. Phylogenetic tree was constructed by using Neighbor-Joining method by considering 1,000 rapid bootstrap replicates with the help of MEGA X [21] and it was visualized using iTOL (http://itol.embl.de). The aligned sequence file was also used for finding out the discrete Gamma distribution to recognize the evolutionary rate difference. Number of discrete categories used for the analysis was 5. Substitution pattern and rates were estimated under the Jones-Taylor-Thornton [22] model (+G) [22]. The tree topology was automatically computed in MEGA X for estimating ML values.

Characterization of phylogenetic groups
The gene structures of all the genes were predicted by aligning the coding sequence with its corresponding genomic sequence by using GSDS 2.0 server (http://gsds.cbi.pku.edu. cn). GSDS 2.0 is an improved version of GSDS and it supports two more widely used annotation formats, providing more comprehensive support for annotation files. To identify the conserved motifs in each group, the complete sequence of proteins at the groups were submitted to MEME suite (http:// meme.sdsc.edu/meme/) [23]. This tool discovers the ungapped motifs in the sequence and splits the variable-length patterns in to more than two unique motifs. For the analysis we used the optimum width of motifs ranging from 12 to 60 by setting the search for 5 best motifs. The identified motifs were annotated by using Motif Scan (https://myhits.isb-sib. ch/cgi-bin/motif_scan) and InterProScan (https://www.ebi. ac.uk/interpro/search/sequence-search) [24]. The conserved domains were identified by Pfam [25]. Physiochemical properties of proteins were analyzed by using ProtParam tool available on ExPasy proteomics server (http://web.expasy. org/compute_pi) [26].

Gene expression analysis
To understand the function of the genes, expression pattern under drought stress was analyzed using GENEVESTIGA-TOR [27], which has a manually curated and well annotated present in ≥ 4 introns and 3 introns category. The loss or gain of introns could have played in grouping of CIPK genes. The variation in the number of introns among the three Groups points to the genome evolution by means of selection pressure and population size [33,34]. Moreover the intron rich behavior of the genes will add to the functional diversity through alternate splicing and exon shuffling [35]. Therefore the intron rich Group I and II (A) genes might be involved in multiple pathways in response to various abiotic stress signals. The intronless feature of Group II (B) CIPK genes strongly indicates that they are single stress inducible genes and possibly the stress is drought [36]. The significance of intron poor genes in drought tolerance was analyzed by Zhu, et al. [36], and proved its role in soybean CIPK-intron poor clade genes.
The conserved motifs and their organization were identified in each group (Figure 1). It was found that all the motifs

Characterization of the Groups by gene structure analysis, motif analysis and protein properties analysis
The characterization of the phylogenetic groups was carried out by gene structure analysis which showed the divergence among the groups. Above 60% of the genes in Group I had ≥ 4 introns. All the sorghum genes belonged to this category ( Figure 2). In rice and sorghum 40% and 20% genes were intronless respectively. All the genes in Group II (A) showed intron richness (≥ 4 introns). The Group II (B) dominated in intronless feature. In this category all the sorghum genes were intronless. In the case of rice, 80% of genes were intronless and 20% of genes belonged to the category of genes with single intron. Similarly, 84% of genes in maize were intronless and an equal distribution of genes (8% in each category) was    [12,38]. Hence the expression levels at various development stages were analyzed. In rice, CIPK genes showed up regulated expression at all the stages except 4 genes, LOC_Os11g02240, LOC_Os07g05620, LOC_ Os01g10890 and LOC_Os07g44290 at milk, stem elongation, milk and stem elongation and heading stages respectively as seen in Figure 3. It is also noted that they possessed high expression potential at other stages. The expression level in tissues of leaf, seedling, sheath, panicle, roots, pistil, caryopsis, flag leaf and anther were analyzed for all the genes and it was observed up regulated expression for 4 genes viz., LOC-Os11g02240, LOC-Os01g18800, LOC-Os07g48100 and LOC-Os06g40370 respectively in all the selected tissues as depicted in Table 2. Medium level of expression in all the tissues were reported for CIPK12 (LOC-Os01g55450).
were part of protein kinase domain. The Protein kinase domain which is conserved at the N-terminus included a protein kinase ATP binding site followed by serine/threonine protein kinase active site. Group I and Group II (A) had similarity in distribution of motifs which points to their similarity in function.
In Group I number of motifs varied from 5 to 8 and in Group II (A) it was varied from 6-7. Meanwhile, in Group II (B) 5 motifs were reported and all of them were highly conserved among the members. Motif 1 was either present nearby N-terminus or in the N-terminus and it represented a protein kinase ATP binding site domain. These domains are glycine rich with lysine residue in vicinity and are located at N terminus [37].

Functional analysis by gene expression level
The CIPK genes play an important role in the growth and logenetic analysis. Maximum number of rice orthologs was found in maize. Comparative analysis of the gene structure showed that Group II (B) CIPK genes dominated intronless feature whereas, Group II (A) CDPK genes and Group I CIPK genes dominated intron rich feature. The intron richness indicates that the genes might have included in multiple stress signal transduction other than drought. The genes in Group II (B) are specifically induced in drought due to their intronless feature [36]. Alternate splicing and exon shuffling could be the reason for functional diversity among CIPK groups [35]. This also points to the adaptation of plants with respect to environmental changes during evolution which in turn altered their phenotypes significantly by transforming the form and function of genes [40]. The most common motifs seen among the groups were parts of protein kinase domain. The similar distribution of motifs in Group I and Group II (A) indicates the functional similarity of the groups. Gene expression analysis showed that above 80% of the genes in respective groups have shown medium or high level expression up on drought stress. The similarity in the expression pattern also shows their functional similarity [39] and conservation of functions between species.
In maize, the gene expression analysis was carried out by observing the expression level at three development stages with respect to drought stress. All the 9 selected genes had up regulated expression at three stages of development. The tissues analyzed for expression level were foliar leaf, shoot and root. It was observed that genes Zm00001d015325 and Zm00001d036879 had medium level of expression in all the three tissues. The gene Zm00001d000407 had low level expression in roots and medium expression in foliar leaf and shoot.
The expression of 11 genes in sorghum was analyzed at various tissues and it was observed that the gene SOR-BI_3003G024400 was not expressed at dough stage. All the other genes exhibited potential for expression at all development stages. The tissues analyzed for expression were rind, internode, shoot, pith, leaf and roots. The genes SORBI_3003G139500, SORBI_3001G523200 and SORBI_3002G390100 showed high level of expression in all the selected tissues. Drought response of the 11 sorghum genes were analyzed manually by using the dataset GSE80699 and it was observed that 6 genes viz., SOR-BI_3003G139500, SORBI_3003G024400, SORBI_3001G523200, SORBI_3004G049500, SORBI_3003G302800 and SOR-BI_3010G186300 were up regulated and 5 genes viz., SOR-BI_3002G034700, SORBI_3003G339700, SORBI_3005G012000, SORBI_3008G032000 and SORBI_3002G390100 were down regulated during the treatment.
It was identified that above 80% of the genes in respective groups have shown medium or high level expression up on drought stress. It was also noted that all the genes had high or medium level expression at leaf and shoot part. This expression analysis indicates that these orthologous genes posses similar expression with respect to drought stress. The similar expression of orthologous genes are already reported by Kong, et al. [39].

Conclusion
The present study screened 49 genes and scrutinized 31 genes in rice, maize and sorghum for potential drought stress response based on functional analysis. The experimentally proved drought tolerant genes of CIPKs from rice and their orthologous genes in maize and sorghum were grouped by phy-