提供高质量的essay代写,Paper代写,留学作业代写-天才代写

首頁 > > 詳細

代做COMP 451編程、Java,c/c++,Python程序代寫幫做R語言編程|代做R語言程序

Theory Assignment 3
COMP 451 - Fundamentals of Machine Learning
Winter 2021
Preamble The assignment is due April 6th at 11:59pm via MyCourses. Late work will be automatically
subject to a 20% penalty, and can be submitted up to 5 days after the deadline. You may scan written
answers or submit a typeset assignment, as long as you submit a single pdf file with clear indication of
what question each answer refers to. You may consult with other students in the class regarding solution
strategies, but you must list all the students that you consulted with on the first page of your submitted
assignment. You may also consult published papers, textbooks, and other resources, but you must cite any
source that you use in a non-trivial way (except the course notes). You must write the answer in your own
words and be able to explain the solution to the professor, if asked.
Question 1 [13 points]
In class we introduced the Gaussian mixture model (GMM). In this question, we will consider a mixture
of Bernoulli distributions. Here, our data points will be defined as m-dimensional vectors of binary values
x ∈ {0, 1}
m.
First, we will introduce a single multivariate Bernoulli distribution, which is defined by a mean vector µ
P(x|µ) =
mY−1
j=0
µ[j]
x[j]
(1 − µ[j])(1−x[j])
. (1)
Thus, we see that a the individual binary dimensions are independent for a single multivariate Bernoulli.
Now, we can define a mixture of K multivariate Bernoulli distributions as follows
, πk, k = 0, .., K − 1} are the parameters of the mixture and P(x|µk
) is the probability
assigned to the point by each individual component in the model.
Note that the mean of each individual component distribution P(x|µk) is given by
Ek[x] = µk
, (5)
and the covariance matrix of each component is given by
Cov[x] = Σk = diag(µk ? (1 − µk
)), (6)
1
where ? denotes elementwise multiplication. In other words, the covariance matrix Σk for each component
is a diagonal matrix with diagonal entries given by Σk[j, j] = µ[j](1 − µ[j]). It is a diagonal matrix because
each dimension is independent.
Part 1 [8 points]
Derive expression for the mean vector and the covariance matrix of the full mixture distribution defined in
Equation 2. That is, give expressions for the following:
E[x] =? Cov[x] =? (7)
Hint: use the fact that
Cov[x] = E

(x − E[x])(x − E[x])>

= E[xx>] − E[x]E[x]
>.
2
Part 2 [5 points]
Just as with a GMM, we can use the expectation maximization (EM) algorithm to compute learn the
parameters of a Bernoulli mixture model. Here, we will provide you with the formula for the expectation
step as well as the log-likelihood of the model. You must derive the formula for the maximization step.
Expectation step. In the expectation step of the Bernoulli mixture model, we compute scores r(x, k), which
tell us how likely it is that point x belongs to component k. These scores are computed as follows:
r(x, k) = πkP(x|µk
)
PK
j=1 πjP(x|πj )
, (8)
where P(x|µk
) is defined as in Equation 2.
Log-likelihood.
? (9)
Maximization step. You must find the formula for the µk parameters in the maximization step:
µk =? (10)
3
Question 2 [5 points]
Recall that the low dimensional codes in PCA are defined as
zi = U>(xi − µ), (11)
where U is a matrix containing the top-k eigenvectors of the covariance matrix and. (12)
Recall that the reconstruction of a point xi using its code zi
is given by
x˜i = Uzi + µ. (13)
Show that
(x˜i − xi)
>(x˜i − µ) = 0. (14)
4
Question 3 [short answers; 2 points each]
Answer each question with 1-3 sentences for justification, potentially with equations/examples for support.
a) True or false: It is always possible to choose an initialization so that K-means converges in one iteration.
b) Suppose you are learning a decision tree for email spam classification. Your current sample of the training
data has the following distribution of labels:
[43+, 30−], (15)
i.e., the training sample has 43 examples that are spam and 30 that are not spam. Now, you are choosing
between two candidate tests.
Test 1 (T1) tests whether the number of words in the email is greater than 30 and would result in the
following splits:
• num words > 30 : [5+, 15−]
• num words ≤ 30: [38+, 15−]
Test 2 (T2) tests whether the email contains an external URL link and would result in the following splits:
• has link: [25+, 5−]
• not has link: [18+, 25−]
Which test should you use to split the data? I.e., which test provides a higher information gain?
c) Which of the following statements is false:
1. If the covariance between two variables is zero, then their mutual information is also zero.
2. Adding more features is a useful strategy to combat underfitting.
3. Decision trees can learn non-linear decision boundaries.
4. The Gaussian mixture model contains more parameters than K-means.
 

聯系我們
  • QQ:1067665373
  • 郵箱:1067665373@qq.com
  • 工作時間:8:00-23:00
  • 微信:Badgeniuscs
熱點文章
程序代寫更多圖片

聯系我們 - QQ: 1067665373 微信:Badgeniuscs
? 2021 uk-essays.net
程序代寫網!

在線客服

售前咨詢
售后咨詢
微信號
Essay_Cheery
微信
全优代写 - 北美Essay代写,Report代写,留学生论文代写作业代写 北美顶级代写|加拿大美国论文作业代写服务-最靠谱价格低-CoursePass 论文代写等留学生作业代做服务,北美网课代修领导者AssignmentBack 北美最专业的线上写作专家:网课代修,网课代做,CS代写,程序代写 代码代写,CS编程代写,java代写北美最好的一站式学术代写服务机构 美国essay代写,作业代写,✔美国网课代上-最靠谱最低价 美国代写服务,作业代写,CS编程代写,java代写,python代写,c++/c代写 代写essay,作业代写,金融代写,business代写-留学生代写平台 北美代写,美国作业代写,网课代修,Assignment代写-100%原创 北美作业代写,【essay代写】,作业【assignment代写】,网课代上代考