Application of Molecular Descriptor Derived from Weighted Line Graph in Narcosis-QSAR

Aquatic toxicity is one of the most important measures in the ecotoxicological risk assessment of chemicals, as it constitutes a chain of different tropic level species for toxicity assays and in addition, it is less cumbersome than other test methods [28]. Due to the habitat characteristics, amphibians are often the main vertebrate group at risk of exposure to contaminants in both terrestrial and aquatic environment [29]. However, the limited data pertaining to the effects of these contaminants on amphibians enthuse on the QSAR studies to propose a structural model for optimized activity [30]. Frogs are common amphibians bridging aquatic organisms and terrestrial animals. The tadpoles are proved to be more sensitive to hazards than adult frogs [31] and have been recommended by the EU-TGD [32] for narcotic analysis. Introduction


Introduction
Day by day the environment is contaminated with numerous new chemicals as a consequence of new industrial or natural biological processes [1]. Many of these exhibit adverse environmental effects and may cause severe pollution problems [2,3]. Therefore the toxicity estimation of each and every chemical is a must prior to its practical application. Performing a toxicological experiment for a given substance is expensive, time consuming and need animal testing [4]. At this end, European Union REACH (Registration, Evaluation, Authorisation and Restriction of Chemicals) legislation aimed to provide toxicity information for all new and existing chemicals [5] and promotes for the use of sufficiently validated computational prediction models based on QSAR as an alternative to fill in the toxicity data gaps [4,6,7]. Various regulatory agencies like the United States Environmental Protection Agency (US EPA), European Centre for the Validation of Alternative Methods (ECVAM) of the European Union, the Agency for Toxic Substances and Disease Registry (ATSDR) and the European Union Commission's Scientific Committee on Toxicity, Ecotoxicity and Environment (CSTEE), etc. have recommended the use of QSAR models for the risk assessment of chemical compounds [8].
heteroatom enhance the tadpole narcosis as the descriptor distinguishes the degree of unsaturation and the presence of heteroatom in the considered organic compound.
A hydrogen depleted molecular graph (G) can be obtained by taking the atoms as vertices and the bond between the atoms as the edges. In a line graph (L (G)) of G, the vertices correspond to the edges of G and two vertices of the L (G) are adjacent if their corresponding edges of the molecular graph G have a common vertex. The prologue of line graph indices in the QSPR world is due to Bertz [56]. A sporadic report of the use of line graphs in QSPR was found in literature for some mutually unrelated chemical fields [57][58][59][60][61][62][63][64][65].
Herein, we derived a novel weighted line graph index (L2) from the weighted iterated line graph of hydrogen depleted molecular graphs of 123 organic compounds with narcotic activity. So far, parameters derived from line graph have not yet been used in QSAR studies. Line graph has been used for the carboneous compounds while for heteroatomic compounds the weighted line graph will be more appropriate for derivation of molecular descriptors.

Enumeration of Weighted Line Graph Index (L2)
A weighted line graph is the line graph Lw (G) of a molecular graph, G, in which each vertex (u i ) of Lw (G) is assigned with a nonnegative number w (u i ), referred to the weight of u i . Starting with the hydrogen depleted molecular graph G or the zeroth order line graph L0 (G), each vertex of the L (G) is assigned with a value equal to the weight on the connecting edge of (G) attached to the vertex 'v i ' to obtain zero order weighted line graph Lw (G) 0 . The weight of the edge of G is the weight of the vertex (w(u i )) is given by Where i δ and j δ are the weight of the vertices v i and v j respectively. v z H δ = − Where, Z v and H are the number of valence electron and H-atom bonded to the i th atom respectively. While tagging weight to the multiple bonds, the weight of the edge is multiplied by a factor equal to the multiplicity of the concerned bond.
The molecular descriptor for the zeroth order weighted line graph is defined as In a similar manner, by considering the edge of Lw (G) 0 as the vertex and their connectivity, the first iterated line graph, Lw (G) 1 , is constructed and the sum of the weight of the edges of the line graph is considered as the molecular descriptor L1. Accordingly, higher iterated graphs Lw (G) i and corresponding molecular descriptors Li can be determined. An example of weighted line graphs of ethylisobutanoate is presented in Figure 1.
QSAR studies generally stem on reliable data of a wide range of compounds, which is mostly lacking in chemical literature. In this context a huge data on narcotic property of organic compounds were collected by Overton [33,34]. Abraham and Rafols have extended the data set of Overton by adding new compounds in terms of their narcotic behavior from various sources making it a data set of 123 diverse organic compounds [35]. With this extensive data set and using the linear free energy relationship (LEFR), as is proposed by many authors [31,[36][37][38] for narcosis phenomenon, they expressed the tadpole narcosis as a linear function of solvatometric parameters (Eq. 1).
Where the descriptors used are: R 2 the solute excess molar refraction, Applying Overton Meyer's relationship [39] for tadpole narcosis, Abraham and Rafols derived the correlation equation for narcotic property in terms of water-octanol partition coefficient (log P oct ) with correlation coefficient r = 0.9301 and standard deviation (SD) = 0.414. That indicates water-octanol partition coefficient alone is sufficient to explain the tadpole narcosis as is observed in many other biological phenomena [12,14,[40][41][42][43][44][45].
The objective of the present work is to introduce a new molecular descriptor, weighted line graph index (L2) in the QSAR world. Hansch, et al. put the milestone in the way to search parameters as a substitute to log P [46]. Probably this forms the basis to use topological and/or other molecular descriptor in place of log P in the QSAR for toxicity of chemical compounds as is done by Agarwal, et al. [47][48][49][50][51][52][53].
In this context we have taken tadpole narcosis as a general data set and tried to express the narcotic activity in terms of topological indices. While doing so, they combined the distance based topological indices (W, Sz, 1 χ (=B), J and log RB) with Abraham's molecular descriptors and gave a hexaparametric regression model for tadpole narcosis with better statistical significance (r = 0.9592, SE = 0.3217) [54] than Abraham and Rafols model [35].
In a subsequent study, Jaiswal and Khadikar examined the potential of distance based topological indices Wiener (W), PI    pological indices (W, Sz, 1 χ (= B), J and log RB) for the same set of 123 compounds were collected from Ref. [54] and presented in Table 2. The weighted line graph parameters L0, L1 and L2 were derived and L2 was selected for the present QSAR studies due to its nondegeneracy. Further, due to higher complexity in deriving Li of higher orders, only L2 was considered in the present study. The L2 were calculated by above reported method and presented in Table 2, along with

Database and Methodology
The narcotic activity (log 1/C nar ) along with the Abraham's molecular descriptors ( )   Abraham and Rafols (1995) [35] used four molecular descriptors to explain the narcotic values of these 123 compounds and found nine compounds as the outliers in the plot of observed and predicted values. The outliers are triacetin, acetamide, methylurethane, nicotine, 2-propylpiperidine, urea, hexanol-1, decanol, acetal and they assigned solubility factors for the deviation of these molecules. Excluding these outliers, they obtained a regression model In a subsequent work Agrawal, et al. [54] added two more independent variables out of five selected topological indices in the regression model for the same data set and increased the correlation coefficient values to 0.9592.
To obtain an optimized regression model by using Abraham and Rafols's parameters with topological parameters reported by Agrawal, et al. and L2, the data of all 114 compounds were subjected to multiple regression analysis. The Abraham and Rafols's parameters are derived from solute-solvent inter-correlations are only considered in the optimization process. As indicated by the correlation parameters, the molecular descriptor L2 does not have good correlation with other descriptors under consideration. Accordingly the optimization process is carried out by taking all the ten descriptors under study.
Multiple regression analysis was used to find out the QSAR for the narcotic activity of 123 diverse organic compounds. The regression model was optimized by reducing the number of variables using successive exclusion of variable (SEV) technique that considers the significance of the variable to explain the variance through student-test. Thus the variables with minimum't' value were excluded during the regression [66]. Further, to be valid under REACH, the model was evaluated through external validation. All the models were derived and validated using Microsoft Excel 07 and MINITAB software.

Result and Discussion
The Overton-Meyer relationship or simply Overton rule, as stated by Meyer and Hemmi [39], is the relationship between any biological activity (SP) like tadpole narcosis in the present study and the partition coefficient log P and is as follows: Log SP = a x log P + C Abraham and Rafols's (1995) model based on log P for the set of 123 compounds is found as Log (1/C nar ) = 1.272 + 0.780 (± 0.035) log P oct (6) n = 123, r = 0.894, SD = 0.504, F = 486.3 When Vx was incorporated in the above regression model the correlation coefficient was increased significantly (Eq. 7). The parameter V x is a solute volume parameter derived from reversed phase liquid chromatography [67]. In this regression  (Figure 2). Considering a deviation of > 0.8 from the linearity nine outliers were identified, out of which seven are identified by Abraham and Rafols. When the data of these outliers were excluded, the regression model with 114 compounds was found to be the best among all the regression models proposed earlier (n = 114, SD = 0.3165, r = 0.96012, F = 254.7). The statistical significance of the model was further tested in terms of K parameter [68]. The parameter was normally used in SEV technique to select the optimized model in situations where simultaneous increase in F and RMS value is found and computed by dividing the F value by the corresponding RMS value for a given model. The same parameter can also be employed to compare the statistical significance of two models. Increase in K value suggests the statistical improvement in the model. In the present case, an increase in K value to 778.9 in Eq. 11 from that of Eq. 9 (K = 597.1) prescribed the regression model to be better.

Model validation
The main objective of a QSAR model is to predict the activity of an external compound which was not used in the development of the model. As per the principles of OECD, external validation is the only way to "determine" the true predictive power of a QSAR model. In this context, the whole data set actions, while Agrawal, et al.'s parameters are distance based, while L2 is a complex parameter explaining the compactness of the structure including valence electrons as a component.
The regression model was optimized by successive exclusion of variable considering t min , F, R 2 (or r) and RMS. Increase in R 2 and F and decrease in RMS values suggest improvement of regression model and hence leads to optimization [66]. Table 4 shows the successive exclusion of variables to obtain the optimized model. Interestingly the optimized model was found to include L2 as the only topological parameter to explain the narcotic values.
With an encouraging results on the use of L2 in the regression model, we used the above five independent variables in the regression model for 123 compounds to obtain a general Eq. 11. When the narcotic values predicted by this equation were plotted against the observed values a straight line was  posed. The predictive ability of the model was also satisfactory as the parameters 2 ext Q and R 2 for the training set are close to each other. This is further supported by the evenly distribution of data points on both sides of the dashed line in Figure 3.

Principal component analysis
To reduce the number of descriptors in QSPR/QSAR studies, principal component analysis (PCA) plays a significant role. This technique also helps in classification of the descriptors from their relationships with the derived principal components. When all the twelve descriptors have been subjected to PCA, the first PC found to explain 60.5% variance of the toxicity and cumulatively first eight PCs can explain 99% of the variance. The PCs have orthogonal relationship with each other and hence considered as good candidates for correlation analysis, albeit the resultant coefficients do not contribute much to the physical significance of the regression model. The molecular descriptors were correlated with the PCs and it is found that beyond PC3 the correlations are found to be poor (i.e. < 0.5). PC2 correlate well with R2 and J, while PC3 has a good correlationship with  there is no specific reasoning of this division of parameters into different groups, the generic characteristics like connectivity, solvation may have some contribution to this classification. The newly generated weight line graph indices are found to have a common characteristic domain with the connectivity parameters.
With an aim to utilize the PCs for the prediction of toxicity, seven PCs were subjected to regression analysis with the was distributed in to a training set and a test set. The training set was used in the development of the model and the predictive ability of the model developed was evaluated with the external test set. The distribution was done in the ration 1:4. For this, the compounds in the whole data set were sorted according to their narcotic activity (log (1/C nar )) and then, every fifth compound was coined for the validation set and the rest were saved for the training set. The predictive ability of the model developed was evaluated in terms of the statistical parameter 2 ext Q and is defined as: 2 1 ext Q = -PRESS/ SD, where PRESS is the sum of squared differences between the observed and the predicted activity for each molecule in the validation set, and SD is the sum of squared deviations between the observed activity for each molecule in the validation set and the mean observed activity of the training set [69]. After evaluation the final model with its statistical parameters is as follows: Log (1/C nar ) = 0.317 + 1.0660 (± 0.1339) R 2 -0.5453 (± 0.1282) n train and n valid stand for the number of compounds in training set and validation set, respectively. The statistical parameters for the whole data set, training set and also for the validation set are close to each other that indicate, the model proposed is not by chance. Further, the parameters (R 2 , R 2 (adj) and SD) and internal validation parameter (R 2 (pred)) clearly states the fitting efficiency and robustness of the model pro- narcosis activity. Using SEV technique as reported earlier, the optimized equation was obtained with an r = 0.9382 and F = 158.781, which are lower than the model proposed earlier (r = 0.96012, F = 254.7).

Conclusion
Albeit narcotic behavior is a complex phenomenon in aquatic animals, the structure of the narcotic plays an important role. The parameters derived from solute-solvent interactions are the most important factor in QSAR as these parameters explain the transportability of the compounds in the body fluid and specific interactions with the biomolecules. However, the role of topological parameters, which can be obtained directly from molecular structure, cannot be ruled out. Parameters derived from line graphs, which are already reported to explain many physical characteristics of carboneous compounds, the weighted line graphs are found to be excellent additives with empirical parameters in QSAR.