## 统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Use Case: Bad Debt

One of the major problems in telecommunications is bad debt, whether it is unintentional or intentional. There are many reasons for a customer to have unintentional bad debt, like unemployment, unexpected expenditures, and so on. There are also many ways companies can handle this bad debt: renegotiating the debt, partially collecting the debt, or partially suspending the services. For intentional bad debt, there is nothing to do, unless it can be prevented. In certain markets, like many in South America, there is also an aggravating circumstance. Companies providing utility services pay sales taxes to the government when they issue the bill. When those customers do not pay their bills, the utility companies do not get the revenue associated with the invoices, but even worse, they do not get back the taxes already paid to the government. It is a double loss.

This type of problem can be tackled by using an accurate supervised model, even though it is not possible to interpret it. A neural network can be trained on the historical invoices to capture the relationship of the usage and the intentional bad debt. Thinking about just the top percentage of cases based on the predictive probability, there is a huge opportunity here to save money in terms of taxes. This model runs right before the billing process and creates a list of the invoices most likely not to be paid. Even if the threshold is raised to the top $5 \%$ or just the cases where the predictive probability is greater than $92 \%$, the savings in taxes can be substantial. What does the company do in this case? It does not issue the bill. It sounds weird. However, for the top cases in terms of predictive probability, the company does not issue the bill and that can save millions of dollars. Of course, there are always errors in models, such as misclassified cases. Unfortunately, these customers will have their services temporarily disconnected, until they contact the Call Center to complain. This is the moment when the company realizes the misclassification. Fraudsters usually do not complain to the Call Center. When a misclassified customer complains about the service, the company immediately needs to reactivate the services and replace the customer’s loss. However, considering all losses and savings, the model still can avoid millions in taxes.

The simulation in Figure $4.6$ shows the possible financial return by deploying a bad debt model. For example, in some markets, telecommunications companies need to pay taxes when they issue a customer bill, sometimes around $33 \%$. If customers do not pay their bills, the company cannot recoup the taxes. This is the risk of the business. However, some of the bad debt is fraud. What if the company identifies the fraud before issuing the bill? The damage by the fraud is already done. But the company can at least avoid the taxes. The predictive model evaluates all bills before issuing them. Every bill has an associated likelihood identifying the risk of that bill associated with fraud or not. Considering the top $5 \%$ of the bills with higher predictive probability, the overall accuracy is $92 \%$. That means, the model can lead to a wrong decision in $8 \%$ of the cases. Considering an average bill of USD 41 and a population of possible intentional bad dept (on the top $5 \%$ ) of 480,954 , the total billing amount is about USD $19.7$ million. Making the right decision in $92 \%$ of the cases would avoid USD $5.5$ million in taxes. Making a wrong decision in $8 \%$ of the cases would cost to the company USD 1 million. The savings would still reach over USD $4.5$ million at the end. In a real scenario of fraud, all these customers identified as fraudsters would not have their bills issued and would have their services cutoff. The good customers, the $8 \%$ included in the model’s mistake, would be in touch with the company to determine what happened. The services would be resumed, and the bill would be reissued, reducing the loss caused by the predictive model.

## 统计代写|统计与机器学习作业代写Statistical and Machine Learning代考|Support Vector Machines

Support vector machines (SVM) are one of the newest machine learning models presented to solve real-world problems. SVM was created in the 90 s, and it is a robust model to classify categorical or continuous targets. Like neural networks, these models tend to be black boxes, but they are very flexible. Support vector machines automatically discover any relationship between the input variables and the target. Data scientists do not need to specify the functional form, or the relationship between the inputs and the target before fitting the model.
Support vector machines were originally developed for pure classification tasks to solve pattern recognition problems. In other words, the model makes decision predictions instead of ranks or estimates. In that way, the SVM separates the outcomes of a binary target into two classes, for example, squares and circles. Support vector machines can now be used for regression tasks as well. In the simple example shown in Figure 5.1, the goal is to classify dark squares versus light circles. There are many classification rules or “regression lines” that can be used to separate the square and circle cases. In fact, if the data is linearly separable, as shown in the figure, there are a limitless number of solutions, or lines, to separate squares and circles or any cases in a binary target. Is there an optimal solution considering all possible lines that split the squares and circles? Given two input variables, the SVM is a line. Given three input variables, the support vector is a plane. With more than three input variables, the support vector is a hyperplane.
For mathematical convenience, the binary target is defined by values $+1$ and $-1$, rather than the usual 1 and 0 in logistic regression. Because the linear separator equals 0 , classification is determined by a point falling on the positive or negative side of the line. In other words, if the outcome is positive, then the case fits to one class. If the outcome is negative, then the case fits to the other class.
This is a quite simple linear problem to start with. Finding the best solution to a linear classification problem is an optimization problem. The SVM gets more complicated when the problem is not linearly separable. In Figure 5.1, think of the vector $\mathrm{W}$ as the mechanism that affects the slope of $\mathrm{H}$ (the optimal line that correctly classifies the observations).

The formula for $\mathrm{H}$ is shown below. The bias parameter $\mathrm{b}$ is the measure of offset of the separating line from the origin, or the plane in three dimensions or hyperplane in higher dimensions. The quantity $\langle w, x\rangle$ is the dot product between the vectors $w$ and $x$. A dot product is a way to multiply vectors that result in a scalar, or a single number, as the answer. It is an element-by-element multiplication and then a sum across the products. The algorithm of support vector machines selects values for $w$ and $\mathbf{b}$ that define the optimal line that correctly classifies the cases.
$$H={\langle w, x\rangle+b=0}$$

# 统计与机器学习代考

## 统计代写|统计与机器学习作业代写统计与机器学习代考|支持向量机

$\mathrm{H}$的公式如下所示。偏置参数$\mathrm{b}$是分离线与原点、三维平面或高维超平面的偏移量的度量。数量$\langle w, x\rangle$是向量$w$和$x$之间的点积。点积是一种向量相乘的方法，结果是一个标量，或一个数字，作为答案。它是一个元素对元素的乘法然后是乘积的和。支持向量机的算法为$w$和$\mathbf{b}$选择值，定义正确分类案例的最佳行。
$$H={\langle w, x\rangle+b=0}$$

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: