Tuesday, January 25, 2011

~ EXCEL ~

Regression

What Does Regression Mean?
A statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables).

Investopedia explains Regression
The two basic types of regression are linear regression and multiple regression. Linear regression uses one independent variable to explain and/or predict the outcome of Y, while multiple regression uses two or more independent variables to predict the outcome. The general form of each type of regression is:

Linear Regression: Y = a + bX + u
Multiple Regression: Y = a + b₁X_{1 ⁺} b₂X₂ + B₃X₃ + ... + B_tX_t + u

Where:
Y= the variable that we are trying to predict
X= the variable that we are using to predict Y
a= the intercept
b= the slope
u= the regression residual.

In multiple regression the separate variables are differentiated by using subscripted numbers.

Regression takes a group of random variables, thought to be predicting Y, and tries to find a mathematical relationship between them. This relationship is typically in the form of a straight line (linear regression) that best approximates all the individual data points. Regression is often used to determine how much specific factors such as the price of a commodity, interest rates, particular industries or sectors influence the price movement of an asset.

Linear Regression

n statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more variables denoted X. In linear regression, models of the unknown parameters are estimated from the data using linear functions. Such models are called linear models. Most commonly, linear regression refers to a model in which the conditional mean of y given the value of X is an affine function of X. Less commonly, linear regression could refer to a model in which the median, or some other quantile of the conditional distribution of y given X is expressed as a linear function of X. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of y given X, rather than on the joint probability distribution of y and X, which is the domain of multivariate analysis. Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.
Linear regression has many practical uses. Most applications of linear regression fall into one of the following two broad categories:

If the goal is prediction, or forecasting, linear regression can be used to fit a predictive model to an observed data set of y and X values. After developing such a model, if an additional value of X is then given without its accompanying value of y, the fitted model can be used to make a prediction of the value of y.
Given a variable y and a number of variables X₁, ..., X_p that may be related to y, then linear regression analysis can be applied to quantify the strength of the relationship between y and the X_j, to assess which X_j may have no relationship with y at all, and to identify which subsets of the X_j contain redundant information about y, thus once one of them is known, the others are no longer informative.

Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the “lack of fit” in some other norm, or by minimizing a penalized version of the least squares loss function as in ridge regression. Conversely, the least squares approach can be used to fit models that are not linear models. Thus, while the terms “least squares” and linear model are closely linked, they are not synonymous.

Quadratic Regression

Quadratic Regression is a process by which the equation of a parabola of "best fit" is found for a set of data.Before performing the quadratic regression, first set an appropriate viewing rectangle.To calculate the Quadratic Regression, press STAT, then RIGHT ARROW to CALC. Now select 5:QuadReg.After QuadReg appears alone on the screen, press ENTER.Then the quadratic regression will appear on the screen.Y= while leaving PLOT1 on for the data values.Then press GRAPH to see how well the curve fits the data points. NOTE: The regression results may be copied directly into

for graphing purposes by using the following procedure: After the data values have been entered, press STAT, then RIGHT ARROW to CALC. Now select 5:QuadReg.

After QuadReg appears alone on the screen, press VARS, then ARROW RIGHT to Y-VARS, noting 1:Function is selected. Press ENTER to accept and note that 1:

is already selected. Press ENTER to accept, then press ENTER to calculate. The result appears on the screen to several decimal places.

Now press to see that the equation has already been entered for

and is ready to graph.

This is the preferred method for entering the regression equation into

, since rounding the values can introduce significant rounding errors.

Example of Graph as follow:-

Tuesday, January 11, 2011

SMILES

-Inroduction-

The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.

The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. In 2007, an open standard called "OpenSMILES" was developed by the Blue Obelisk open-source chemistry community. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc).

In July 2006, the IUPAC introduced the InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing.

-Example Done-

Molecules	Name

That's all for today interesting software..TQ..

Tuesday, January 4, 2011

 Introduction to Protein Data Bank  

The Protein Data Bank (PDB) is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids.The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted by biologists and biochemists from around the world, are freely accessible on the Internet via the websites of its member organisations (PDBe, PDBj, and RCSB). The PDB is overseen by an organization called the Worldwide Protein Data Bank, wwPDB.

The PDB is a key resource in areas of structural biology, such as structural genomics. Most major scientific journals, and some funding agencies, such as the NIH in the USA, now require scientists to submit their structure data to the PDB. If the contents of the PDB are thought of as primary data, then there are hundreds of derived (i.e., secondary) databases that categorize the data differently. For example, both SCOP and CATH categorize structures according to type of structure and assumed evolutionary relations; GO categorize structures based on genes.

History

The PDB originated as a grassroots effort.In 1971, Walter Hamilton of the Brookhaven National Laboratory agreed to set up the data bank at Brookhaven. Upon Hamilton's death in 1973, Tom Koeztle took over direction of the PDB. In January 1994, Joel Sussman was appointed head of the PDB. In October 1998, the PDB was transferred to the Research Collaboratory for Structural Bioinformatics (RCSB); the transfer was completed in June 1999. The new director was Helen M. Berman of Rutgers University (one of the member institutions of the RCSB).In 2003, with the formation of the wwPDB, the PDB became an international organization. The founding members are PDBe (Europe), RCSB(USA), and PDBj (Japan). The BMRB joined in 2006. Each of the four members of wwPDB can act as deposition, data processing and distribution centers for PDB data. The data processing refers to the fact that wwPDB staff review and annotates each submitted entry. The data are then automatically checked for plausibility. The source code for this validation software has been made available to the public at no charge.

Two Important Things in Protein Data Bank 

- RasMol Downloader
- An Information Portal To Biological Macromolucular Structure

Structure Example

CRYSTALLOGRAPHIC ANALYSIS OF COUNTER-ION EFFECTS ON SUBTILISIN ENZYMATIC ACTION IN ACETONITRILE

When enzymes are in low dielectric nonaqueous media, it would be expected that their charged groups would be more closely associated with counterions. There is evidence that these counterions may then affect enzymatic activity. Published crystal structures of proteins in organic solvents do not show increased numbers of associated counterions, and this might reflect the difficulty of distinguishing cations like Na⁺ from water molecules. In this paper, the placement of several Cs⁺ and Cl⁻ ions in crystals of the serine protease subtilisin Carlsberg is presented. Ions are more readily identified crystallographically through their anomalous diffraction using softer X-rays. The protein conformation is very similar to that of the enzyme without CsCl in acetonitrile, both for the previously reported (1SCB) and our own newly determined model. No fewer than 11 defined sites for Cs⁺ cations and 8 Cl⁻ anions are identified around the protein molecule, although most of these have partial occupancy and may represent nonspecific binding sites. Two Cs⁺ and two Cl⁻ ions are close to the mouth of the active site cleft, where they may affect catalysis. In fact, cross-linked CsCl-treated subtilisin crystals transferred to acetonitrile show catalytic activity several fold higher than the reference crystals containing Na⁺. Presoaking with another large cation, choline, also increases the enzyme activity. The active site appears only minimally sterically perturbed by the ion presence around it, so alternative activation mechanisms can be suggested: an electrostatic redistribution and/or a larger hydration sphere that enhances the protein domain.

THE CRYSTAL STRUCTURE OF PROPYL AMINOPEPTIDASE COMPLEX WITH Sar-TBODA

The prolyl aminopeptidase complexes of Ala-TBODA [2-alanyl-5-tert-butyl-(1, 3, 4)-oxadiazole] and Sar-TBODA [2-sarcosyl-5-tert-butyl-(1, 3, 4)-oxadiazole] were analyzed by X-ray crystallography at 2.4 angstroms resolution. Frames of alanine and sarcosine residues were well superimposed on each other in the pyrrolidine ring of proline residue, suggesting that Ala and Sar are recognized as parts of this ring of proline residue by the presence of a hydrophobic proline pocket at the active site. Interestingly, there was an unusual extra space at the bottom of the hydrophobic pocket where proline residue is fixed in the prolyl aminopeptidase. Moreover, 4-acetyloxyproline-betaNA (4-acetyloxyproline beta-naphthylamide) was a better substrate than Pro-betaNA. Computer docking simulation well supports the idea that the 4-acetyloxyl group of the substrate fitted into that space. Alanine scanning mutagenesis of Phe139, Tyr149, Tyr150, Phe236, and Cys271, consisting of the hydrophobic pocket, revealed that all of these five residues are involved significantly in the formation of the hydrophobic proline pocket for the substrate. Tyr149 and Cys271 may be important for the extra space and may orient the acetyl derivative of hydroxyproline to a preferable position for hydrolysis. These findings imply that the efficient degradation of collagen fragment may be achieved through an acetylation process by the bacteria.

HUMAN START DOMAIN OF Acyl-COENZYME A THIOESTERASE 11 (ACOT11)

Human Acyl-coenzyme A thioesterase 11, also known as brown fat-inducible thioesterase (BFIT) or STARD14, exists as two tissue-specific splice variants that differ slightly in their C-termini (1). ACOT11 expression is cold-induced, and expression levels are linked to obesity, with obesity resistant mice displaying higher ACOT11 expression than obesity-prone mice (2). The rat ortholog has acyl-CoA thioesterase activity with specificity towards medium to long-chain (C12-18) fatty acyl-CoA substrates (3).ACOT11 consists of two thioesterase domains and a C-terminal lipid binding START (StAR-related lipid-transfer) domain. START domains are found in proteins involved in lipid metabolism, lipid trafficking and cell signaling (4). We solved the structure of the C-terminal START domain (residues 350-594) at a resolution of 2.0 Å. The structure shows a globular domain consisting of a 9-stranded antiparallel β-sheet surrounded by α-helices. The curved β-sheet, together with 4 helices on its concave side, forms a hydrophobic cavity which is the putative lipid binding site.