import numpy as np import pandas as pd import matplotlib.pyplot as plt

The post Python Programming appeared first on Data Science Anywhere.

]]>The post How to test/evaluate Linear Regression Statistically? appeared first on Data Science Anywhere.

]]>Heading

As in the previous tutorial on linear regression we got the following result as shows below:

import statsmodels.api as sm # fitting Ordinary Least square regression model model = sm.OLS.from_formula('hourly_wages_usd ~ big_mac_price',data=df).fit() print(model.summary())

OLS Regression Results ============================================================================== Dep. Variable: hourly_wages_usd R-squared: 0.662 Model: OLS Adj. R-squared: 0.648 Method: Least Squares F-statistic: 48.88 Date: Wed, 16 Sep 2020 Prob (F-statistic): 2.50e-07 Time: 19:45:28 Log-Likelihood: -66.251 No. Observations: 27 AIC: 136.5 Df Residuals: 25 BIC: 139.1 Df Model: 1 Covariance Type: nonrobust ================================================================================= coef std err t P>|t| [0.025 0.975] --------------------------------------------------------------------------------- Intercept -4.5397 1.619 -2.805 0.010 -7.873 -1.206 big_mac_price 4.7435 0.678 6.991 0.000 3.346 6.141 ============================================================================== Omnibus: 5.597 Durbin-Watson: 2.151 Prob(Omnibus): 0.061 Jarque-Bera (JB): 3.761 Skew: 0.829 Prob(JB): 0.153 Kurtosis: 3.771 Cond. No. 7.94 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From the results, the equation of the model is as follows

Hourly Wages = -4.5397 + 4.743 * Big Mac Price

and from the results,

- No. of Observations = n = 27
- degree of freedom of the model (number of independent variables) = k = 1
- degree of freedom of residuals = n – 1 – k = 25

Is the Coefficients is Significant ?

From the equation of the model

Is the intercept (-4.397) and slope (4.743) is significant or not. Meaning, can we consider this coefficient. To answer the following question we need to perform a * student t-test*.

From Fig-1: as shown the p-value of the intercept is 0.01 which is less than significance values = 0.025 indicates that it falls in the critical region which states that the intercept is far away from the **0**. Hence we reject the null hypothesis ( no significance) and accept the alternate hypothesis that is **intercept **is significant for predicting the net hourly wages.

From Fig-2: as shown the p-value of the coefficient is 0.000 which is less than significance values = 0.025 indicates that it falls in the critical region which states that the intercept is far away from the **0**. Hence we reject the null hypothesis ( no significance) and accept the alternate hypothesis that is **coefficient **(big_mac_price) is significant for predicting the net hourly wages from big mac price.

Hence the equation of the model is significant and the final equation of the model is :

Hourly Wages =−𝟒.𝟓𝟑𝟗𝟕+𝟒.𝟕𝟒𝟑 ∗ Big Mac price.

This is a very important test for the significance of the model. As shown in Fig-3, here we will test that, can we use the predicted values to estimate the actual values **meaning is the red line is significant or no**t.

For that, we consider the two sample actual values (blue dots) and predicted values (red line) and compute three components

- With the Variance – (
**SSR**– Sum of Square Regression) - Between the Variance – (
**SSE**– Sum of Square Regression) - Total Variance – (
**SST**– Sum of Square Total)

With the three variances, we calculate F-statistics and we test for a variance with Hypothesis testing.

F = {{SSR \above1pt df_r} \above1pt {SSE \above1pt df_e}}

where,

df_{r} = degree of freedom of regression

df_{e} = degree of freedom of error

SSR = \sum (\hat y - \bar y)^2

SSE = \sum (y_i - \hat y_i)^2

df_r = k , df_e = n - 1-k

R-square = 1 - {SSE \above 1pt SST}

In this test, we are looking at that is there is intelligence in the model. If there is no intelligence then we can reject the model. In other words can we red sample in Fig.3 to estimate blue samples.

From the Fig-4, results from ANOVA and F-statistics the F-values = 48.88 and the corresponding probability in F-distribution is 2.5 x 10^{-7} which is very much less than 0.025. Indicates that there is clear variance can is there w.r.t to the independent variable (big mac price) to dependent variable (net hourly wages). Hence we reject Null and accept alternate hypothesis.

Hence we can use to this machine learning model to predict the net hourly wages from big mac and model has intelligence.

Manual Calculation of the Values.

The post How to test/evaluate Linear Regression Statistically? appeared first on Data Science Anywhere.

]]>The post Why to Build/Develop Machine Learning Model ? appeared first on Data Science Anywhere.

]]>*Age, Gender, Income, Education level etc.*

and a difficult-to-measure metric

*Amount of loan to give**Will she buy or not**How many days a patient will stay in the hospital etc.,*

Machine Learning model helps to **predict **or find out hard to measure parameter from easy to measure parameter.

One of the machine learning algorithm that computes or predict some thing from the past values is **Regression**

The post Why to Build/Develop Machine Learning Model ? appeared first on Data Science Anywhere.

]]>The post Simple Linear Regression with Python appeared first on Data Science Anywhere.

]]>Regression is a supervised machine learning model where we will predict or find out the continuous or analog values based on past values or data.

The McDonald’s Corporation is the leading global food service retailer with more than 30,000 local restaurants serving nearly 50 million people in more than 119 countries each day. This global presence, in addition to its consistency in food offerings and restaurant operations, makes McDonald’s a unique and attractive setting for economists to make salary and price comparisons around the world. Because the Big Mac hamburger is a standardized hamburger produced and sold in virtually every McDonald’s around the world, the Economist, a weekly newspaper focusing on international politics and business news and opinion, as early as 1986 was compiling information about Big Mac prices as an indicator of exchange rates.

Building on this idea, researchers Ashenfelter and Jurajda proposed comparing wage rates across countries and the price of a Big Mac hamburger. Shown below are Big Mac prices and net hourly wage figures (in U.S. dollars) for 27 countries. Note that net hourly wages are based on a weighted average of 12 professions.

# importing required libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # loading the data df = pd.read_excel('big_mac_index_dataset.xlsx',sheet_name='Sheet1') print(df.head())

big_mac_price | hourly_wages_usd |

1.47 | 1.7 |

1.86 | 7.8 |

1.48 | 2.05 |

3.14 | 12.3 |

*Click here to download the data set.*

1. Is there a relationship between the price of a

Big Macand thenet hourly wagesof workers around the world? If so, how strong is the relationship?2.Is it possible to develop a model to predict or determine the net hourly wage of a worker around the world by the price of a Big Mac hamburger in that country? If so, how good is the model?

3.If a model can be constructed to determine the net hourly wage of a worker around the world by the price of a Big Mac hamburger, what would be the predicted net hourly wage of a worker in a country if the price of a Big Mac hamburger was $3.00?

As the data consists of two columns **big_mac_price **(independent variable) and **hourly_wages_usd** (dependent variable), as we want to predict hourly wages based on the price of big mac hamburger. So, lets try to identify the relationship between them using scatter plot and also find the probability distribution.

fig,ax = plt.subplots(figsize=(10,3)) sns.kdeplot(df['big_mac_price'],shade=True,ax=ax) plt.xlabel('Big Mac Hamburger Price (USD)') plt.ylabel('Probability Distribution') plt.show() fig,ax = plt.subplots(figsize=(10,3)) sns.kdeplot(df['hourly_wages_usd'],shade=True,ax=ax) plt.xlabel('Hourly Wages (USD)') plt.ylabel('Probability Distribution') plt.show()

It seems that both the variables are skew towards the right and look into the scatter plot and correlation between the variables

sns.jointplot('big_mac_price','hourly_wages_usd',data=df,kind='kde') sns.jointplot('big_mac_price','hourly_wages_usd',data=df,kind='scatter') plt.show()

print(df.corr())

big mac price | hourly wages usd | |

big mac price | 1.0000 | 0.8133 |

hourly wages usd | 0.8133 | 1.0000 |

From the scatter and kernel density estimate most of the value is between big mac price = 1 to 3 USD and hourly wages is between 0 to 8 USD and from the correlation (** r**) between variable is 0.8133. Meaning that the

Hence, the coefficient of determination (* r^{2}*) = (0.8133)

So, if we build a linear model to predict the hourly wages USD (dependent) then 66.14% of the variance of hourly wages can be explained by only the big mac price in USD. Which is really awesome right. But it doesn’t mean that there is a relationship between the independent and dependent variables. We need to perform hypothesis testing and make the statement statistically. Let’s start by the concept of linear regression then we will build the model first and then we will see the hypothesis test to our model.

Linear Regression is a line, which will identify the relationship between the independent and dependent variables**.** Such a way that line should have a minimum **error (sum of square error)**. The lines whose residual error on all points is the least is the best line. To ensure residual errors don’t cancel. We take squares of residual errors

As shown in Fig. 3, on top the scatter we drew a blue straight line and the equation of the line can be given as :

\hat y = a + b *X -----(1)

Now the question is, what is the value of a and b such a way the error is minimum ?. For that, we understand the ** error**.

Error is defined as the difference between the actual samples and predicted samples as shown in Fig.4.

e = (y_i - hat y_i) ----- (2)

Hence total error with the true sample predicted sample is the sum of all errors also called **SUM OF SQUARE ERROR (SSE)**. So we can find the value of ** a** and

SSE = \sum (y_i - \hat y_i) ----- (3)

In order to find the minimum sum of square error we need to find the optimal value of intercept and slope of the line. Slope of a line which minimizes the error can be find from the covariance between independent and dependent variable w.r.t variance of independent variable. Hence, the value of b, the slope, that minimizes the SSE is given by

b = \frac{\sum (x - \bar x)(y- \bar y)}{\sum(x - \bar x)^2} ----- (4)

How do you calculate a? The line of best fit must pass through (x ̅, y ̅). Substituting in the equation

a = \bar y - b * \bar X ----- (5)

we can find a.

This method of fitting the line of best fit is called **least squares regression.**

This method of fitting a linear line is called least square regression

HourlyWages = -4.5397 + 4.743 * BigMacPrice ----- (6)

import statsmodels.api as sm # fitting Ordinary Least square regression model model = sm.OLS.from_formula('hourly_wages_usd ~ big_mac_price',data=df).fit() print(model.summary())

OLS Regression Results ============================================================================== Dep. Variable: hourly_wages_usd R-squared: 0.662 Model: OLS Adj. R-squared: 0.648 Method: Least Squares F-statistic: 48.88 Date: Wed, 16 Sep 2020 Prob (F-statistic): 2.50e-07 Time: 19:45:28 Log-Likelihood: -66.251 No. Observations: 27 AIC: 136.5 Df Residuals: 25 BIC: 139.1 Df Model: 1 Covariance Type: nonrobust ================================================================================= coef std err t P>|t| [0.025 0.975] --------------------------------------------------------------------------------- Intercept -4.5397 1.619 -2.805 0.010 -7.873 -1.206 big_mac_price 4.7435 0.678 6.991 0.000 3.346 6.141 ============================================================================== Omnibus: 5.597 Durbin-Watson: 2.151 Prob(Omnibus): 0.061 Jarque-Bera (JB): 3.761 Skew: 0.829 Prob(JB): 0.153 Kurtosis: 3.771 Cond. No. 7.94 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

# fitted values net_hourly_wages_pred = -4.5397 + 4.7435 * df['big_mac_price'] # let display linear regression sns.relplot(x='big_mac_price',y='hourly_wages_usd',data=df) plt.plot(df['big_mac_price'],net_hourly_wages_pred,'r-') plt.show()

This is how we can build simple linear regression in python and we can able predict the net hourly wages using the equation 6. In the next blog you will be see how to do the hypothesis testing for

- Coefficient Significance using
*t-test* - Model Significance with
*ANOVA*

DOWNLOAD THE COMPLETE CODE BY SUBSCRIBE AND WE WILL SEND YOU IN MAIL

The post Simple Linear Regression with Python appeared first on Data Science Anywhere.

]]>The action of the commutator is to produce a fixed spatial distribution of current directions in the armature conductors independent of shaft rotation. The field created by these currents (armature reaction) is horizontally directed and is represented by the space vector **I _{a}. **

The field established by the excitation of the stator poles is directed along the vertical axis and is represented by the space vector λ_{f} . The electromagnetic torque may be expressed as

*T _{e} = k( I_{a }x λ_{f } ).*

The rotor flux density distribution produced by the permanent magnets is represented by the green space vector B_{r}. A power-processing unit supplies three-phase currents to the stator windings in such a manner that the resultant stator current space vector I_{s} is 90^{o} ahead of the rotor flux vector.

The *MMF *distribution produced by the stator currents flowing in the stationary three-phase windings is equivalent to that produced by Is flowing in a single sinusoidally-distributed rotating winding whose magnetic axis lines up with I_{s} as illustrated in this animation.

The post Difference between DC Motor and Brushless DC Motor appeared first on Data Science Anywhere.

]]>The post 14 Electrical Machine Lab Experiments without Physical Lab appeared first on Data Science Anywhere.

]]>The post What is Machine Learning? appeared first on Data Science Anywhere.

]]>import pandas as pd

import matplotlib

import numpy as np import pandas as pd import matplotlib.pyplot as plt

The post What is Machine Learning? appeared first on Data Science Anywhere.

]]>